Complete guide to developing AI agents that can perceive, reason, and act autonomously. Learn agent architectures, tool integration, memory systems, and production deployment patterns.
AI agents represent the evolution from passive question-answering systems to autonomous entities that can perceive their environment, make decisions, and take actions to accomplish goals. Unlike traditional chatbots that simply respond to queries, agents can break down complex tasks, use tools, maintain context across multiple interactions, and adapt their approach based on results.
Building effective AI agents requires understanding several key concepts: agent architectures (ReAct, Plan-Execute, Reflexion), tool integration and function calling, memory systems for context persistence, error handling and recovery, and orchestration patterns for coordinating multiple agents. This guide provides a comprehensive walkthrough of these concepts with practical implementation examples.
Whether you're building a research assistant that can search the web and analyze documents, a customer service agent that can query databases and update tickets, or a complex multi-agent system for workflow automation, you'll learn the patterns and best practices for production-ready agent systems.
Different agent architectures suit different use cases. Let's explore the major patterns.
The ReAct pattern alternates between reasoning (thinking) and acting (using tools). This is the most widely used agent architecture.
from openai import OpenAI
import json
class ReActAgent:
def __init__(self, tools):
self.client = OpenAI()
self.tools = tools # Dictionary of available tools
self.max_iterations = 10
def run(self, task):
"""Execute a task using ReAct pattern."""
conversation = [
{"role": "system", "content": self._build_system_prompt()},
{"role": "user", "content": f"Task: {task}"}
]
for iteration in range(self.max_iterations):
# Get next action from LLM
response = self.client.chat.completions.create(
model="gpt-4o",
messages=conversation
)
content = response.choices[0].message.content
# Parse thought, action, and action input
thought, action, action_input = self._parse_response(content)
print(f"Thought: {thought}")
print(f"Action: {action}")
# Check if task is complete
if action == "Final Answer":
return action_input
# Execute action
if action in self.tools:
observation = self.tools[action](action_input)
conversation.append({"role": "assistant", "content": content})
conversation.append({"role": "user", "content": f"Observation: {observation}"})
else:
conversation.append({"role": "user", "content": f"Error: Unknown action '{action}'"})
return "Task incomplete after maximum iterations"
def _build_system_prompt(self):
return f"""You are an AI agent that can use tools to accomplish tasks.
Available tools:
{json.dumps([{"name": name, "description": tool.__doc__} for name, tool in self.tools.items()], indent=2)}
Use this format:
Thought: [your reasoning about what to do next]
Action: [tool name]
Action Input: [input for the tool]
After receiving an Observation, continue with another Thought/Action or provide:
Thought: [final reasoning]
Action: Final Answer
Action Input: [your final response to the user]"""
# Define tools
def search_database(query):
"""Search the customer database for information."""
# Implement actual database search
return f"Found 3 results for: {query}"
def send_email(recipient, subject, body):
"""Send an email to a recipient."""
# Implement email sending
return f"Email sent to {recipient}"
# Use agent
agent = ReActAgent({
"search_database": search_database,
"send_email": send_email
})
result = agent.run("Find customers who haven't logged in for 30 days and send them a re-engagement email")First create a complete plan, then execute steps. Better for complex multi-step tasks.
class PlanExecuteAgent:
def run(self, task):
# Step 1: Create plan
plan = self._create_plan(task)
print(f"Plan created with {len(plan)} steps")
# Step 2: Execute each step
results = []
for i, step in enumerate(plan, 1):
print(f"Executing step {i}: {step}")
result = self._execute_step(step, results)
results.append(result)
# Step 3: Synthesize final answer
return self._synthesize_answer(task, results)
def _create_plan(self, task):
"""Create a step-by-step plan."""
response = self.client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": f"""Create a step-by-step plan for this task: {task}
Return as JSON array of steps:
["step 1", "step 2", "step 3"]"""
}]
)
return json.loads(response.choices[0].message.content)Modern LLMs have native function calling capabilities—more reliable than parsing text outputs:
# Define tools as JSON schema
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
},
{
"type": "function",
"function": {
"name": "search_web",
"description": "Search the web for information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}
}
]
# Agent with function calling
messages = [{"role": "user", "content": "What's the weather in Sydney and any news about AI?"}]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto" # Let model decide when to use tools
)
# Check if model wants to call functions
if response.choices[0].message.tool_calls:
for tool_call in response.choices[0].message.tool_calls:
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
# Execute the function
if function_name == "get_weather":
result = get_weather(**arguments)
elif function_name == "search_web":
result = search_web(**arguments)
# Add function result to conversation
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})
# Get final response with function results
final_response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools
)| Pattern | Best For | Pros | Cons |
|---|---|---|---|
| ReAct | Dynamic tasks, exploration | Flexible, adapts to observations | Can be inefficient, may loop |
| Plan-Execute | Complex multi-step workflows | Clear structure, predictable | Less adaptive to changes |
| Function Calling | Production systems | Reliable, structured, fast | Requires schema definition |
Tools extend agent capabilities beyond language processing. Let's build a robust tool system.
Good tools are:
from typing import Callable, Dict, Any
from dataclasses import dataclass
@dataclass
class Tool:
name: str
description: str
function: Callable
parameters: Dict[str, Any]
class ToolRegistry:
def __init__(self):
self.tools: Dict[str, Tool] = {}
def register(self, tool: Tool):
"""Register a tool."""
self.tools[tool.name] = tool
def get_tool_schemas(self):
"""Get OpenAI function calling schemas."""
return [{
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.parameters
}
} for tool in self.tools.values()]
def execute(self, tool_name: str, **kwargs):
"""Execute a tool with error handling."""
if tool_name not in self.tools:
return {"error": f"Unknown tool: {tool_name}"}
try:
result = self.tools[tool_name].function(**kwargs)
return {"success": True, "result": result}
except Exception as e:
return {"success": False, "error": str(e)}
# Initialize registry
registry = ToolRegistry()
# Register tools
registry.register(Tool(
name="calculate",
description="Perform mathematical calculations",
function=lambda expression: eval(expression), # In production, use safe eval!
parameters={
"type": "object",
"properties": {
"expression": {"type": "string", "description": "Math expression to evaluate"}
},
"required": ["expression"]
}
))
registry.register(Tool(
name="get_current_time",
description="Get the current date and time",
function=lambda: datetime.now().isoformat(),
parameters={"type": "object", "properties": {}}
))1. Data Retrieval Tools
def search_documents(query: str, limit: int = 5):
"""Search vector database for relevant documents."""
embedding = get_embedding(query)
results = vector_db.search(embedding, limit=limit)
return [{"text": r.text, "score": r.score} for r in results]
def query_database(sql: str):
"""Execute a SQL query (read-only)."""
# Add safety checks: read-only, query timeout, result limits
if not sql.lower().startswith("select"):
return {"error": "Only SELECT queries allowed"}
with db.connect() as conn:
results = conn.execute(text(sql)).fetchall()
return [dict(row) for row in results[:100]] # Limit results2. Action Tools
def send_slack_message(channel: str, message: str):
"""Send a message to a Slack channel."""
slack_client.chat_postMessage(channel=channel, text=message)
return f"Message sent to {channel}"
def create_jira_ticket(project: str, summary: str, description: str):
"""Create a JIRA ticket."""
issue = jira.create_issue(
project=project,
summary=summary,
description=description,
issuetype={"name": "Task"}
)
return f"Created ticket: {issue.key}"3. Analysis Tools
def analyze_sentiment(text: str):
"""Analyze sentiment of text."""
# Use sentiment analysis model
result = sentiment_analyzer(text)
return {
"sentiment": result["label"],
"confidence": result["score"]
}
def extract_entities(text: str):
"""Extract named entities from text."""
doc = nlp(text)
return [{
"text": ent.text,
"label": ent.label_
} for ent in doc.ents]Implement safeguards for destructive actions:
class SafetyWrapper:
def __init__(self, tool, requires_confirmation=False, allowed_users=None):
self.tool = tool
self.requires_confirmation = requires_confirmation
self.allowed_users = allowed_users or []
def execute(self, user_id, **kwargs):
# Check permissions
if self.allowed_users and user_id not in self.allowed_users:
return {"error": "Unauthorized"}
# Require human confirmation for destructive actions
if self.requires_confirmation:
confirmation = self._request_confirmation(user_id, kwargs)
if not confirmation:
return {"error": "Action not confirmed by user"}
return self.tool(**kwargs)
# Wrap dangerous tools
safe_delete = SafetyWrapper(
tool=delete_database_record,
requires_confirmation=True,
allowed_users=["admin_id"]
)Agents need memory to maintain context across interactions and learn from experience.
1. Short-term (Conversation) Memory
class ConversationMemory:
def __init__(self, max_messages=20):
self.messages = []
self.max_messages = max_messages
def add_message(self, role, content):
"""Add a message to memory."""
self.messages.append({"role": role, "content": content})
# Keep only recent messages
if len(self.messages) > self.max_messages:
# Keep system message + recent messages
self.messages = [self.messages[0]] + self.messages[-self.max_messages+1:]
def get_messages(self):
"""Get conversation history."""
return self.messages
def clear(self):
"""Clear conversation history."""
self.messages = []2. Long-term (Vector) Memory
class VectorMemory:
def __init__(self, collection_name="agent_memory"):
self.vector_db = get_vector_db()
self.collection = collection_name
def store(self, content, metadata=None):
"""Store information in long-term memory."""
embedding = get_embedding(content)
self.vector_db.upsert(
collection=self.collection,
data={
"text": content,
"embedding": embedding,
"metadata": metadata or {},
"timestamp": datetime.now().isoformat()
}
)
def recall(self, query, limit=5):
"""Retrieve relevant memories."""
query_embedding = get_embedding(query)
results = self.vector_db.search(
collection=self.collection,
embedding=query_embedding,
limit=limit
)
return [r["text"] for r in results]
# Use in agent
class AgentWithMemory:
def __init__(self):
self.short_term = ConversationMemory()
self.long_term = VectorMemory()
def process(self, user_input):
# Recall relevant long-term memories
relevant_memories = self.long_term.recall(user_input)
# Build context with memories
context = "Relevant information from past interactions:\n"
context += "\n".join(relevant_memories)
# Add to conversation
self.short_term.add_message("system", context)
self.short_term.add_message("user", user_input)
# Get response
response = self.get_llm_response(self.short_term.get_messages())
# Store important info in long-term memory
if self._is_important(user_input, response):
self.long_term.store(f"User: {user_input}\nAssistant: {response}")
return response3. Entity Memory (Structured)
class EntityMemory:
"""Track entities (users, companies, etc.) and their attributes."""
def __init__(self):
self.entities = {} # In production: use database
def update_entity(self, entity_type, entity_id, attributes):
"""Update entity attributes."""
key = f"{entity_type}:{entity_id}"
if key not in self.entities:
self.entities[key] = {"type": entity_type, "id": entity_id}
self.entities[key].update(attributes)
def get_entity(self, entity_type, entity_id):
"""Retrieve entity information."""
key = f"{entity_type}:{entity_id}"
return self.entities.get(key, {})
def get_context(self, entity_type, entity_id):
"""Get formatted context about entity."""
entity = self.get_entity(entity_type, entity_id)
if not entity:
return ""
context = f"{entity_type} {entity_id}:\n"
for key, value in entity.items():
if key not in ["type", "id"]:
context += f"- {key}: {value}\n"
return context
# Usage in agent
entity_memory = EntityMemory()
# Update from conversation
entity_memory.update_entity("user", "john@example.com", {
"name": "John Doe",
"company": "Acme Corp",
"subscription": "Premium",
"last_contact": "2025-01-20"
})
# Use in context
context = entity_memory.get_context("user", "john@example.com")Complex tasks often benefit from multiple specialized agents working together.
1. Sequential (Pipeline)
Agents process information in sequence, each adding value:
class AgentPipeline:
def __init__(self, agents):
self.agents = agents
def run(self, input_data):
"""Run input through agent pipeline."""
result = input_data
for agent in self.agents:
print(f"Running {agent.name}...")
result = agent.process(result)
return result
# Example: Content creation pipeline
pipeline = AgentPipeline([
ResearchAgent(), # Research topic
OutlineAgent(), # Create outline
WriterAgent(), # Write content
EditorAgent(), # Edit and refine
SEOAgent() # Add SEO optimization
])
article = pipeline.run({"topic": "AI in Healthcare"})2. Parallel (Concurrent)
Multiple agents work on subtasks simultaneously:
import asyncio
class ParallelAgents:
def __init__(self, agents):
self.agents = agents
async def run(self, task):
"""Run agents in parallel."""
# Create tasks
tasks = [agent.process_async(task) for agent in self.agents]
# Wait for all to complete
results = await asyncio.gather(*tasks)
# Synthesize results
return self._synthesize(results)
def _synthesize(self, results):
"""Combine results from parallel agents."""
combined = "Results from parallel analysis:\n\n"
for agent, result in zip(self.agents, results):
combined += f"{agent.name}: {result}\n\n"
return combined
# Example: Multi-perspective analysis
agents = ParallelAgents([
TechnicalAnalysisAgent(),
FinancialAnalysisAgent(),
RiskAnalysisAgent(),
CompetitiveAnalysisAgent()
])
analysis = await agents.run("Evaluate acquisition of Startup X")3. Hierarchical (Manager-Worker)
A manager agent delegates to specialist agents:
class ManagerAgent:
def __init__(self, specialist_agents):
self.specialists = specialist_agents
def run(self, task):
"""Delegate task to appropriate specialists."""
# Decompose task
subtasks = self._decompose_task(task)
results = []
for subtask in subtasks:
# Select best agent for subtask
agent = self._select_agent(subtask)
# Delegate
result = agent.process(subtask)
results.append(result)
# Synthesize final answer
return self._synthesize_results(task, results)
def _select_agent(self, subtask):
"""Choose the best specialist for a subtask."""
# Use LLM to determine which agent to use
prompt = f"""Which specialist should handle this subtask?
Subtask: {subtask}
Available specialists:
{self._format_specialists()}
Return just the specialist name."""
response = get_llm_response(prompt)
return self.specialists[response.strip()]class AgentCommunication:
"""Enable agents to communicate with each other."""
def __init__(self):
self.message_queue = []
def send_message(self, from_agent, to_agent, content):
"""Send message between agents."""
self.message_queue.append({
"from": from_agent,
"to": to_agent,
"content": content,
"timestamp": datetime.now()
})
def get_messages(self, agent_name):
"""Get messages for an agent."""
messages = [m for m in self.message_queue if m["to"] == agent_name]
# Remove retrieved messages
self.message_queue = [m for m in self.message_queue if m["to"] != agent_name]
return messages
# Agents can communicate
comm = AgentCommunication()
# Research agent asks question to specialist
comm.send_message(
from_agent="research_agent",
to_agent="financial_analyst",
content="What was the company's revenue in Q4?"
)
# Financial analyst checks messages and responds
messages = comm.get_messages("financial_analyst")
for msg in messages:
response = financial_analyst.process(msg["content"])
comm.send_message("financial_analyst", msg["from"], response)Deploying agents to production requires robust error handling, monitoring, and safety measures.
class RobustAgent:
def __init__(self, max_retries=3):
self.max_retries = max_retries
def run(self, task):
"""Run task with error handling and retries."""
for attempt in range(self.max_retries):
try:
return self._execute_task(task)
except ToolError as e:
print(f"Tool error on attempt {attempt + 1}: {e}")
if attempt < self.max_retries - 1:
# Try alternative tool or approach
continue
else:
return self._graceful_failure(task, e)
except LLMError as e:
print(f"LLM error: {e}")
# Exponential backoff
time.sleep(2 ** attempt)
continue
except Exception as e:
print(f"Unexpected error: {e}")
self._log_error(task, e)
return "I encountered an error processing your request."
def _graceful_failure(self, task, error):
"""Handle failure gracefully."""
return f"I tried to complete your task but encountered an issue: {error}. Please try rephrasing or contact support."from dataclasses import dataclass
from datetime import datetime
@dataclass
class AgentTrace:
"""Track agent execution."""
task: str
agent_name: str
start_time: datetime
end_time: datetime = None
steps: list = None
tools_used: list = None
tokens_used: int = 0
success: bool = True
error: str = None
class MonitoredAgent:
def run(self, task):
"""Run task with full tracing."""
trace = AgentTrace(
task=task,
agent_name=self.name,
start_time=datetime.now(),
steps=[],
tools_used=[]
)
try:
result = self._execute_with_tracing(task, trace)
trace.success = True
return result
except Exception as e:
trace.success = False
trace.error = str(e)
raise
finally:
trace.end_time = datetime.now()
self._log_trace(trace)
def _log_trace(self, trace):
"""Log execution trace for analysis."""
duration = (trace.end_time - trace.start_time).total_seconds()
log_data = {
"agent": trace.agent_name,
"task": trace.task,
"duration_seconds": duration,
"steps_count": len(trace.steps),
"tools_used": trace.tools_used,
"tokens": trace.tokens_used,
"success": trace.success,
"error": trace.error
}
# Send to monitoring system (e.g., DataDog, CloudWatch)
logger.info("agent_execution", extra=log_data)
# Alert if slow or failed
if duration > 30:
alert("Slow agent execution", log_data)
if not trace.success:
alert("Agent execution failed", log_data)class CostTracker:
"""Track API costs and usage."""
def __init__(self):
self.costs = []
def track_llm_call(self, model, input_tokens, output_tokens):
"""Track LLM API cost."""
# Pricing as of early 2025
pricing = {
"gpt-4o": {"input": 0.005, "output": 0.015}, # per 1K tokens
"gpt-4o-mini": {"input": 0.00015, "output": 0.0006},
"claude-3-5-sonnet": {"input": 0.003, "output": 0.015}
}
rates = pricing.get(model, pricing["gpt-4o-mini"])
cost = (input_tokens / 1000 * rates["input"]) + (output_tokens / 1000 * rates["output"])
self.costs.append({
"timestamp": datetime.now(),
"model": model,
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"cost": cost
})
return cost
def get_total_cost(self, since=None):
"""Get total cost since a timestamp."""
if since:
relevant = [c for c in self.costs if c["timestamp"] >= since]
else:
relevant = self.costs
return sum(c["cost"] for c in relevant)
# Use in agent
tracker = CostTracker()
def run_agent_with_cost_tracking(task):
response = client.chat.completions.create(...)
tracker.track_llm_call(
model="gpt-4o",
input_tokens=response.usage.prompt_tokens,
output_tokens=response.usage.completion_tokens
)
# Check if over budget
daily_cost = tracker.get_total_cost(since=datetime.now() - timedelta(days=1))
if daily_cost > DAILY_BUDGET:
alert("Agent budget exceeded", {"daily_cost": daily_cost})from collections import deque
import time
class RateLimiter:
"""Rate limit agent actions."""
def __init__(self, max_requests, time_window):
self.max_requests = max_requests
self.time_window = time_window # seconds
self.requests = deque()
def allow_request(self):
"""Check if request is allowed."""
now = time.time()
# Remove old requests outside time window
while self.requests and self.requests[0] < now - self.time_window:
self.requests.popleft()
# Check if under limit
if len(self.requests) < self.max_requests:
self.requests.append(now)
return True
return False
def wait_if_needed(self):
"""Wait until request is allowed."""
while not self.allow_request():
time.sleep(0.1)
# Apply to agent
rate_limiter = RateLimiter(max_requests=100, time_window=60) # 100 req/min
def rate_limited_agent_run(task):
rate_limiter.wait_if_needed()
return agent.run(task)Building production-ready AI agents requires mastering multiple disciplines: agent architectures (ReAct, Plan-Execute), tool integration with proper safety measures, memory systems for context persistence, multi-agent orchestration patterns, and comprehensive monitoring and error handling.
The journey from a simple function-calling bot to a sophisticated autonomous agent is incremental. Start with basic tool use and gradually add capabilities: memory for context, multiple tools for flexibility, multi-agent patterns for complex workflows, and robust error handling for reliability.
Remember that agents are not fully autonomous—they're autonomous within constraints. Always implement safety measures like human-in-the-loop for destructive actions, rate limiting, cost tracking, and comprehensive monitoring. The goal is agents that reliably solve problems while staying within safe, predictable boundaries.
As AI capabilities continue to advance, the agent patterns in this guide will remain foundational. Whether building customer service agents, research assistants, or complex multi-agent systems, these architectures and best practices provide a solid foundation for production deployment.
Discover how AI agents go beyond chatbots to autonomously accomplish tasks using tools and reasoning. Learn agent architectures, capabilities, business applications, and implementation strategies.
Learn how to build a production-ready RAG (Retrieval Augmented Generation) system from scratch with practical code examples, architecture patterns, and best practices.
Master patterns for integrating with LLM APIs reliably at scale. Learn error handling, rate limiting, caching, cost optimization, and production-ready architectures for OpenAI, Anthropic, and other providers.