OpenAI Agents SDK Overview and Core Primitives
The OpenAI Agents SDK, released in early 2025, is OpenAI's official framework for building agentic applications. It evolved from the earlier Swarm experiment and provides a production-ready set of primitives that are intentionally minimal. The SDK is built around four core concepts: Agents, Handoffs, Guardrails, and Tracing.
An Agent in the SDK is a configuration object that wraps an LLM with instructions, tools, and output schema. Unlike frameworks like CrewAI or LangGraph that add substantial abstraction layers, the OpenAI SDK stays close to the metal. An agent is essentially a system prompt plus a set of tools, with the agent loop handling the iteration logic.
The agent loop is the execution engine. When you call Runner.run(agent, input), the loop sends the input to the model, checks if the response includes tool calls, executes those tools, feeds results back to the model, and repeats until the model produces a final text response or a structured output matching the agent's output schema. This is the standard ReAct pattern, but implemented with careful attention to edge cases like tool execution failures and context window management.
What distinguishes the OpenAI SDK is its focus on multi-agent orchestration through handoffs. Instead of building one complex agent, you create multiple specialized agents and let them hand off conversations to each other. A triage agent might analyze the user's request and hand off to a billing agent, a technical support agent, or a sales agent. Each agent has its own instructions and tools, keeping complexity manageable.
The SDK is model-agnostic despite the OpenAI branding. It works with any model provider that supports the OpenAI chat completions API format, including local models served by Ollama or vLLM. This flexibility makes it a viable option even if you are not exclusively using OpenAI models.
Defining Agents with Tools and Structured Outputs
Creating an agent starts with the Agent class. You provide a name, instructions (the system prompt), a model identifier, and optional tools and output schema. The instructions should clearly define the agent's role, capabilities, and behavioral guidelines. Tools are Python functions decorated with @function_tool that the agent can call during execution.
Tool definitions in the SDK are straightforward. Use Python type hints and docstrings to describe parameters, and the SDK automatically generates the JSON schema that the model needs. For complex input types, use Pydantic models as parameters. The SDK validates inputs before calling your function and serializes outputs for the model.
Structured outputs let you define a Pydantic model as the agent's output type. When you set output_type on an Agent, the agent loop continues until the model produces a response that matches the schema. This is powerful for extraction tasks, classification, and any workflow where you need structured data rather than free-form text.
A practical pattern is chaining agents with structured outputs: the first agent extracts structured data from user input, the second agent uses that data to query a database, and the third agent formats the results for the user. Each agent has a clear input/output contract, making the system testable and maintainable.
Context management is handled through the context parameter, a Python object that you pass through the runner. Every tool function can access this context to read shared state like database connections, user sessions, or configuration. This keeps your tools stateless while allowing them to access the resources they need. The context pattern is cleaner than global variables and makes dependency injection straightforward.
Handoffs: Orchestrating Multi-Agent Systems
Handoffs are the SDK's mechanism for transferring control between agents. When an agent determines that a different specialist should handle the conversation, it performs a handoff. The new agent takes over with full access to the conversation history and continues the interaction.
To enable handoffs, list the target agents in the handoffs parameter of the source agent. The SDK automatically creates a tool for each handoff target that the model can call. The tool's description includes the target agent's name and instructions, helping the model decide when to hand off.
The handoff mechanism supports several patterns. Fan-out routing has a triage agent that hands off to one of several specialists based on the user's intent. Sequential processing has agents that hand off to the next agent in a pipeline after completing their task. Escalation has a basic agent that hands off to a more capable agent when the problem is too complex.
A critical production consideration is handoff context. By default, the new agent receives the full conversation history. For long conversations, this can exceed context window limits. The SDK supports context filtering through custom handoff functions that trim or summarize the history before passing it to the new agent.
Guardrails can be applied at the handoff boundary. Before completing a handoff, you can run validation logic that checks whether the handoff is appropriate. For example, a guardrail might prevent a billing agent from handing off to a refund agent without first verifying the customer's identity. This ensures that your multi-agent system maintains security and compliance requirements even as control transfers between agents.
For complex orchestration beyond simple handoffs, you can use the SDK's Runner class to manually control agent execution. Run one agent, inspect the result, decide the next agent programmatically, and run again. This imperative style gives you the control of LangGraph while using the SDK's clean agent abstraction.
Guardrails: Safety and Quality Control
Guardrails are validation checks that run alongside agent execution to ensure safety and quality. The SDK supports two types: input guardrails that validate user messages before the agent processes them, and output guardrails that validate the agent's response before it reaches the user.
Input guardrails are defined as functions that receive the agent and the input and return a GuardrailResult indicating whether to proceed or block. A common input guardrail checks for prompt injection attempts, where a user tries to override the agent's instructions. Another validates that the input is within the agent's domain, preventing a billing agent from being asked medical questions.
Output guardrails validate the agent's response. They can check for personally identifiable information (PII) that should not be included, verify that the response aligns with company policies, or ensure that numeric claims in the response are supported by the retrieved data. Output guardrails run before the response is returned to the user, giving you a last line of defense against problematic outputs.
Implementing guardrails effectively requires balancing safety with usability. Overly aggressive guardrails block legitimate requests and frustrate users. A good approach is to start with loose guardrails, monitor the types of inputs and outputs that occur in production, and tighten specific guardrails based on observed issues.
The SDK also supports guardrails as LLM calls. You can define a guardrail agent whose sole purpose is to evaluate inputs or outputs. This is useful for nuanced checks that cannot be implemented with simple rules, like detecting subtle policy violations or evaluating response quality. The guardrail agent runs in parallel with the main agent to minimize latency impact. This dual-LLM pattern adds cost but provides robust protection for high-stakes applications.
Tracing and Production Deployment
The OpenAI Agents SDK includes built-in tracing that captures every step of agent execution: LLM calls, tool invocations, handoffs, and guardrail checks. Traces are structured hierarchically, with spans representing individual operations nested within a parent trace for the entire agent run.
By default, traces are sent to OpenAI's tracing dashboard, but you can configure custom trace processors to send data to your own observability stack. Common integrations include sending traces to Datadog, Grafana, or a custom logging service. Each span includes timing data, input/output payloads, and metadata like model name and token counts.
For production deployment, wrap your agent in a FastAPI endpoint. The Runner.run() method is async-native, making it efficient to serve multiple concurrent requests. Add request validation, authentication, and rate limiting at the API layer. Use structured logging to correlate trace IDs with request IDs for end-to-end debugging.
Scaling considerations include managing concurrent agent executions (each consumes memory for conversation state), handling long-running agents that make multiple tool calls (set timeout limits), and managing costs (track token usage per request and set budget alerts). The SDK's streaming mode (Runner.run_streamed) provides real-time output to users while the agent is still working, which is essential for good user experience on multi-step tasks.
Error handling in production requires attention to several failure modes: model API timeouts, tool execution failures, guardrail rejections, and context window overflow. Implement retry logic with exponential backoff for transient API errors, circuit breakers for persistent tool failures, and graceful degradation when the agent cannot complete a task. The SDK's exception hierarchy helps you distinguish between recoverable errors and situations that require human intervention.
Code Example
from agents import Agent, Runner, function_tool
@function_tool
def lookup_order(order_id: str) -> str:
"""Look up order details by order ID."""
# In production, query your database
return f"Order {order_id}: Shipped on March 1, arriving March 5"
@function_tool
def process_refund(order_id: str, reason: str) -> str:
"""Process a refund for an order."""
return f"Refund initiated for order {order_id}. Reason: {reason}"
support_agent = Agent(
name="Support Agent",
instructions="You help customers with order inquiries and refunds.",
tools=[lookup_order, process_refund],
model="gpt-4o",
)
result = Runner.run_sync(support_agent, "Where is my order #12345?")
print(result.final_output)