What Is an AI Agent and How Does It Work
An AI agent is a program that uses a large language model as its reasoning engine to autonomously decide what actions to take. Unlike a chatbot that simply generates responses, an agent can observe its environment, reason about what to do, take actions using tools, and iterate until it achieves a goal.
The mental model is straightforward. You give the agent a task: 'Find the cheapest flight from Delhi to Mumbai next Friday and book it.' A chatbot would generate a paragraph about how to find flights. An agent would search a flight API, compare prices, ask you to confirm the best option, and proceed with the booking. The agent thinks, acts, observes, and repeats.
This behavior is powered by the ReAct pattern: Reasoning plus Acting. At each step, the model generates a 'thought' explaining its reasoning, an 'action' specifying which tool to call with what arguments, and processes the 'observation' from the tool's response. The loop continues until the model decides it has enough information to provide a final answer.
In Python, agents are built on three pillars: an LLM (GPT-4o, Claude, Llama) that handles reasoning, tools (Python functions) that perform actions, and an orchestration framework (LangChain, LangGraph, or the OpenAI SDK) that manages the reasoning-action loop. The framework handles the prompt formatting, tool execution, and iteration logic so you can focus on defining tools and writing clear agent instructions.
The agent's effectiveness depends primarily on two things: the quality of its instructions (how clearly you define its role and decision-making guidelines) and the quality of its tools (how well your tool functions work and how clearly their descriptions communicate their purpose to the model).
Setting Up Your Python Environment for Agent Development
A clean Python environment prevents dependency conflicts and ensures reproducibility. Start by creating a virtual environment with Python 3.11 or later. Install the core packages: langchain, langchain-openai, langgraph, and python-dotenv for configuration management.
Organize your project with a clear structure. Keep your agent definitions in a dedicated module, tool functions in their own module, and configuration (API keys, model parameters) in environment variables loaded via python-dotenv. Never hardcode API keys in your source files.
Configure your LLM connection. For OpenAI, set the OPENAI_API_KEY environment variable and create a ChatOpenAI instance. For Anthropic, use ANTHROPIC_API_KEY with ChatAnthropic. For local models, use ChatOllama pointing to your Ollama server. The beauty of LangChain's abstraction is that your agent code works identically regardless of the model provider.
Set up logging from the start. Python's logging module with a structured formatter gives you visibility into agent execution. For deeper tracing, configure LangSmith by setting LANGCHAIN_TRACING_V2=true and your LANGCHAIN_API_KEY. LangSmith captures every LLM call, tool execution, and chain step in a hierarchical trace that you can inspect in the web dashboard.
Install development tools: pytest for testing, black for formatting, and mypy for type checking. Agent code benefits enormously from type hints because the frameworks use Pydantic and TypedDict extensively. Good type annotations catch errors at development time that would otherwise surface only during runtime when the agent makes unexpected decisions.
Finally, create a simple test script that verifies your setup: instantiate the model, call it with a simple prompt, and confirm you get a response. Having a verified baseline eliminates environment issues when you start building more complex agents.
Building Your First ReAct Agent with Tools
Let us build a practical agent step by step. Our agent will be a research assistant that can search the web, do calculations, and answer questions by combining information from multiple sources.
First, define the tools. A tool is a Python function decorated with @tool from LangChain. The decorator uses the function's name, docstring, and type hints to create the schema the LLM needs. Write clear, specific docstrings because the model reads them to decide when and how to use each tool. A vague docstring like 'does a search' will confuse the model; 'Search the web for current information about a specific topic. Returns relevant search results.' gives the model clear guidance.
Next, create the agent. With LangChain's create_react_agent, you provide the model, tools, and a prompt. The prompt includes the ReAct format instructions that tell the model to think step-by-step and use tools. The AgentExecutor wraps this in an execution loop with error handling and iteration limits.
Test the agent with progressively complex queries. Start with a simple question that requires one tool call: 'What is the current temperature in Delhi?' Verify the agent calls the search tool and returns a coherent answer. Then try a multi-step query: 'What is the population of India and what is 15% of that number?' This should trigger both a search and a calculation. Finally, test an edge case: ask something the tools cannot answer and verify the agent handles it gracefully.
Common issues at this stage include the agent not calling tools when it should (improve tool descriptions or add explicit instruction in the prompt to use tools), calling the wrong tool (make tool descriptions more specific and non-overlapping), and looping without making progress (set max_iterations on the AgentExecutor and add a time limit). Address these through iterative prompt and tool description refinement.
Adding Memory and Conversation History
Stateless agents treat every user message as independent. This means asking 'What is the capital of France?' followed by 'What is its population?' fails because the agent does not know what 'its' refers to. Memory solves this by maintaining conversation history across interactions.
The simplest memory approach passes the full conversation history to the model on each turn. In LangChain, use a list to accumulate messages and pass them as part of the prompt. Each user message and agent response is appended to the list, giving the model full context for understanding references and follow-up questions.
For longer conversations, full history exceeds the context window. Implement a sliding window that keeps the last N message pairs and summarizes older messages. The summarization can use a cheaper model since it does not need sophisticated reasoning. Store the summary as a system message at the beginning of the conversation, followed by the recent messages in full.
LangGraph provides a more robust memory solution through checkpointing. The graph state persists across invocations, identified by a thread_id. When the user returns to a conversation, the checkpointer loads the previous state and the agent continues seamlessly. For production applications, use SqliteSaver for single-server deployments or PostgresSaver for distributed deployments.
Semantic memory adds another dimension: instead of remembering everything sequentially, the agent stores and retrieves specific facts. When a user says 'My name is Priya and I work at Infosys', store these facts in a key-value store. When the user later asks 'Can you recommend training for my company?', retrieve the stored fact about Infosys and personalize the recommendation. This requires a separate retrieval mechanism, often a vector store or a simple database, that the agent queries as a tool.
Design your memory system based on your use case. Customer support agents need full conversation history for context. Research assistants need semantic memory for accumulating findings. Task-oriented agents might need only the current task state. Over-engineering memory adds latency and cost without improving user experience.
Error Handling, Testing, and Going to Production
Production agents need robust error handling because failures are inevitable. Tool functions can throw exceptions, APIs can timeout, and the model can generate invalid tool calls. Your agent needs to recover gracefully from all of these.
Wrap tool functions in try-except blocks and return informative error messages rather than raising exceptions. If a web search fails, return 'Search temporarily unavailable. Try rephrasing your query.' This gives the model information to work with. It might try a different search query or use a different tool instead of crashing.
Handle model-level errors with retry logic. Rate limit errors (429) should trigger exponential backoff. Timeout errors should retry once, then fall back to a faster model. Content filter rejections should return a user-friendly message explaining that the request could not be processed.
Testing agents requires a different approach than testing regular software. Write deterministic tests by mocking the LLM: use unittest.mock to replace the model with a function that returns predetermined responses, then verify that your agent executes the correct sequence of tool calls. This tests your orchestration logic without the non-determinism of real LLM calls.
For quality testing, maintain a test suite of 30-50 representative queries with expected behaviors (not exact outputs). Run these against the real model periodically and score results on correctness, tool usage, and response quality. Automated evaluation using an LLM-as-judge can scale this to hundreds of test cases.
Deployment starts with wrapping your agent in a FastAPI endpoint with async support. Add request validation, authentication, rate limiting, and timeout handling at the API layer. Use Docker for containerization and deploy to your cloud provider of choice. Monitor token usage, response latency, and error rates from day one. Set up alerts for anomalies and review agent traces weekly to identify improvement opportunities.
Code Example
from langchain_openai import ChatOpenAI
from langchain.agents import tool, AgentExecutor, create_react_agent
from langchain import hub
@tool
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression. Example: '15 * 24 + 7'"""
try:
result = eval(expression, {"__builtins__": {}})
return str(result)
except Exception as e:
return f"Error: {e}. Please provide a valid math expression."
@tool
def get_weather(city: str) -> str:
"""Get current weather for a city."""
return f"Weather in {city}: 28°C, partly cloudy, humidity 65%"
llm = ChatOpenAI(model="gpt-4o", temperature=0)
prompt = hub.pull("hwchase17/react")
agent = create_react_agent(llm, [calculate, get_weather], prompt)
executor = AgentExecutor(agent=agent, tools=[calculate, get_weather],
max_iterations=5, verbose=True)
result = executor.invoke({"input": "What is 20% of the temperature in Delhi?"})