AI Agent Development Guide: Building Autonomous Systems That Take Action
Complete guide to developing AI agents that can perceive, reason, and act autonomously. Learn agent architectures, tool integration, memory systems, and production deployment patterns.
AI agents represent the evolution from passive question-answering systems to autonomous entities that can perceive their environment, make decisions, and take actions to accomplish goals. Unlike traditional chatbots that simply respond to queries, agents can break down complex tasks, use tools, maintain context across multiple interactions, and adapt their approach based on results.
Building effective AI agents requires understanding several key concepts: agent architectures (ReAct, Plan-Execute, Reflexion), tool integration and function calling, memory systems for context persistence, error handling and recovery, and orchestration patterns for coordinating multiple agents. This guide provides a comprehensive walkthrough of these concepts with practical implementation examples.
Whether you're building a research assistant that can search the web and analyze documents, a customer service agent that can query databases and update tickets, or a complex multi-agent system for workflow automation, you'll learn the patterns and best practices for production-ready agent systems.
Key Takeaways
- ReAct (Reasoning + Acting) is the most versatile agent pattern - alternates between thinking and tool use, adapting based on observations
- Function calling with JSON schemas is more reliable than parsing text outputs - modern LLMs (GPT-4o, Claude 3.5) have native support
- Agents need three types of memory: short-term (conversation), long-term (vector database for recall), and entity (structured facts about users/objects)
- Multi-agent patterns include sequential (pipeline), parallel (concurrent), and hierarchical (manager-worker) - choose based on task structure
- Production deployment requires robust error handling with retries, comprehensive monitoring with traces, cost tracking, and rate limiting
- Safety measures are critical: human-in-the-loop for destructive actions, permissions systems, confirmation for sensitive operations
- Monitor agent performance continuously: track success rate, tool usage, latency, token consumption, and costs - alert on anomalies
Agent Architectures and Patterns
Different agent architectures suit different use cases. Let's explore the major patterns.
1. ReAct (Reasoning + Acting)
The ReAct pattern alternates between reasoning (thinking) and acting (using tools). This is the most widely used agent architecture.
from openai import OpenAI
import json
class ReActAgent:
def __init__(self, tools):
self.client = OpenAI()
self.tools = tools # Dictionary of available tools
self.max_iterations = 10
def run(self, task):
"""Execute a task using ReAct pattern."""
conversation = [
{"role": "system", "content": self._build_system_prompt()},
{"role": "user", "content": f"Task: {task}"}
]
for iteration in range(self.max_iterations):
# Get next action from LLM
response = self.client.chat.completions.create(
model="gpt-4o",
messages=conversation
)
content = response.choices[0].message.content
# Parse thought, action, and action input
thought, action, action_input = self._parse_response(content)
print(f"Thought: {thought}")
print(f"Action: {action}")
# Check if task is complete
if action == "Final Answer":
return action_input
# Execute action
if action in self.tools:
observation = self.tools[action](action_input)
conversation.append({"role": "assistant", "content": content})
conversation.append({"role": "user", "content": f"Observation: {observation}"})
else:
conversation.append({"role": "user", "content": f"Error: Unknown action '{action}'"})
return "Task incomplete after maximum iterations"
def _build_system_prompt(self):
return f"""You are an AI agent that can use tools to accomplish tasks.
Available tools:
{json.dumps([{"name": name, "description": tool.__doc__} for name, tool in self.tools.items()], indent=2)}
Use this format:
Thought: [your reasoning about what to do next]
Action: [tool name]
Action Input: [input for the tool]
After receiving an Observation, continue with another Thought/Action or provide:
Thought: [final reasoning]
Action: Final Answer
Action Input: [your final response to the user]"""
# Define tools
def search_database(query):
"""Search the customer database for information."""
# Implement actual database search
return f"Found 3 results for: {query}"
def send_email(recipient, subject, body):
"""Send an email to a recipient."""
# Implement email sending
return f"Email sent to {recipient}"
# Use agent
agent = ReActAgent({
"search_database": search_database,
"send_email": send_email
})
result = agent.run("Find customers who haven't logged in for 30 days and send them a re-engagement email")2. Plan-Execute Pattern
First create a complete plan, then execute steps. Better for complex multi-step tasks.
class PlanExecuteAgent:
def run(self, task):
# Step 1: Create plan
plan = self._create_plan(task)
print(f"Plan created with {len(plan)} steps")
# Step 2: Execute each step
results = []
for i, step in enumerate(plan, 1):
print(f"Executing step {i}: {step}")
result = self._execute_step(step, results)
results.append(result)
# Step 3: Synthesize final answer
return self._synthesize_answer(task, results)
def _create_plan(self, task):
"""Create a step-by-step plan."""
response = self.client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": f"""Create a step-by-step plan for this task: {task}
Return as JSON array of steps:
["step 1", "step 2", "step 3"]"""
}]
)
return json.loads(response.choices[0].message.content)3. Function Calling (Native Tool Use)
Modern LLMs have native function calling capabilities - more reliable than parsing text outputs:
# Define tools as JSON schema
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
},
{
"type": "function",
"function": {
"name": "search_web",
"description": "Search the web for information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}
}
]
# Agent with function calling
messages = [{"role": "user", "content": "What's the weather in Sydney and any news about AI?"}]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto" # Let model decide when to use tools
)
# Check if model wants to call functions
if response.choices[0].message.tool_calls:
for tool_call in response.choices[0].message.tool_calls:
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
# Execute the function
if function_name == "get_weather":
result = get_weather(**arguments)
elif function_name == "search_web":
result = search_web(**arguments)
# Add function result to conversation
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})
# Get final response with function results
final_response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools
)Choosing an Architecture
| Pattern | Best For | Pros | Cons |
|---|---|---|---|
| ReAct | Dynamic tasks, exploration | Flexible, adapts to observations | Can be inefficient, may loop |
| Plan-Execute | Complex multi-step workflows | Clear structure, predictable | Less adaptive to changes |
| Function Calling | Production systems | Reliable, structured, fast | Requires schema definition |
Tool Integration and Capabilities
Tools extend agent capabilities beyond language processing. Let's build a robust tool system.
Designing Effective Tools
Good tools are:
- Single-purpose: Each tool does one thing well
- Well-documented: Clear descriptions help the LLM choose correctly
- Error-handling: Return useful errors, not crashes
- Fast: < 5 seconds execution time for best UX
- Idempotent when possible: Safe to retry
Building a Tool Registry
from typing import Callable, Dict, Any
from dataclasses import dataclass
@dataclass
class Tool:
name: str
description: str
function: Callable
parameters: Dict[str, Any]
class ToolRegistry:
def __init__(self):
self.tools: Dict[str, Tool] = {}
def register(self, tool: Tool):
"""Register a tool."""
self.tools[tool.name] = tool
def get_tool_schemas(self):
"""Get OpenAI function calling schemas."""
return [{
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.parameters
}
} for tool in self.tools.values()]
def execute(self, tool_name: str, **kwargs):
"""Execute a tool with error handling."""
if tool_name not in self.tools:
return {"error": f"Unknown tool: {tool_name}"}
try:
result = self.tools[tool_name].function(**kwargs)
return {"success": True, "result": result}
except Exception as e:
return {"success": False, "error": str(e)}
# Initialize registry
registry = ToolRegistry()
# Register tools
registry.register(Tool(
name="calculate",
description="Perform mathematical calculations",
function=lambda expression: eval(expression), # In production, use safe eval!
parameters={
"type": "object",
"properties": {
"expression": {"type": "string", "description": "Math expression to evaluate"}
},
"required": ["expression"]
}
))
registry.register(Tool(
name="get_current_time",
description="Get the current date and time",
function=lambda: datetime.now().isoformat(),
parameters={"type": "object", "properties": {}}
))Common Tool Categories
1. Data Retrieval Tools
def search_documents(query: str, limit: int = 5):
"""Search vector database for relevant documents."""
embedding = get_embedding(query)
results = vector_db.search(embedding, limit=limit)
return [{"text": r.text, "score": r.score} for r in results]
def query_database(sql: str):
"""Execute a SQL query (read-only)."""
# Add safety checks: read-only, query timeout, result limits
if not sql.lower().startswith("select"):
return {"error": "Only SELECT queries allowed"}
with db.connect() as conn:
results = conn.execute(text(sql)).fetchall()
return [dict(row) for row in results[:100]] # Limit results2. Action Tools
def send_slack_message(channel: str, message: str):
"""Send a message to a Slack channel."""
slack_client.chat_postMessage(channel=channel, text=message)
return f"Message sent to {channel}"
def create_jira_ticket(project: str, summary: str, description: str):
"""Create a JIRA ticket."""
issue = jira.create_issue(
project=project,
summary=summary,
description=description,
issuetype={"name": "Task"}
)
return f"Created ticket: {issue.key}"3. Analysis Tools
def analyze_sentiment(text: str):
"""Analyze sentiment of text."""
# Use sentiment analysis model
result = sentiment_analyzer(text)
return {
"sentiment": result["label"],
"confidence": result["score"]
}
def extract_entities(text: str):
"""Extract named entities from text."""
doc = nlp(text)
return [{
"text": ent.text,
"label": ent.label_
} for ent in doc.ents]Safety and Permissions
Implement safeguards for destructive actions:
class SafetyWrapper:
def __init__(self, tool, requires_confirmation=False, allowed_users=None):
self.tool = tool
self.requires_confirmation = requires_confirmation
self.allowed_users = allowed_users or []
def execute(self, user_id, **kwargs):
# Check permissions
if self.allowed_users and user_id not in self.allowed_users:
return {"error": "Unauthorized"}
# Require human confirmation for destructive actions
if self.requires_confirmation:
confirmation = self._request_confirmation(user_id, kwargs)
if not confirmation:
return {"error": "Action not confirmed by user"}
return self.tool(**kwargs)
# Wrap dangerous tools
safe_delete = SafetyWrapper(
tool=delete_database_record,
requires_confirmation=True,
allowed_users=["admin_id"]
)Memory and State Management
Agents need memory to maintain context across interactions and learn from experience.
Types of Memory
1. Short-term (Conversation) Memory
class ConversationMemory:
def __init__(self, max_messages=20):
self.messages = []
self.max_messages = max_messages
def add_message(self, role, content):
"""Add a message to memory."""
self.messages.append({"role": role, "content": content})
# Keep only recent messages
if len(self.messages) > self.max_messages:
# Keep system message + recent messages
self.messages = [self.messages[0]] + self.messages[-self.max_messages+1:]
def get_messages(self):
"""Get conversation history."""
return self.messages
def clear(self):
"""Clear conversation history."""
self.messages = []2. Long-term (Vector) Memory
class VectorMemory:
def __init__(self, collection_name="agent_memory"):
self.vector_db = get_vector_db()
self.collection = collection_name
def store(self, content, metadata=None):
"""Store information in long-term memory."""
embedding = get_embedding(content)
self.vector_db.upsert(
collection=self.collection,
data={
"text": content,
"embedding": embedding,
"metadata": metadata or {},
"timestamp": datetime.now().isoformat()
}
)
def recall(self, query, limit=5):
"""Retrieve relevant memories."""
query_embedding = get_embedding(query)
results = self.vector_db.search(
collection=self.collection,
embedding=query_embedding,
limit=limit
)
return [r["text"] for r in results]
# Use in agent
class AgentWithMemory:
def __init__(self):
self.short_term = ConversationMemory()
self.long_term = VectorMemory()
def process(self, user_input):
# Recall relevant long-term memories
relevant_memories = self.long_term.recall(user_input)
# Build context with memories
context = "Relevant information from past interactions:\n"
context += "\n".join(relevant_memories)
# Add to conversation
self.short_term.add_message("system", context)
self.short_term.add_message("user", user_input)
# Get response
response = self.get_llm_response(self.short_term.get_messages())
# Store important info in long-term memory
if self._is_important(user_input, response):
self.long_term.store(f"User: {user_input}\nAssistant: {response}")
return response3. Entity Memory (Structured)
class EntityMemory:
"""Track entities (users, companies, etc.) and their attributes."""
def __init__(self):
self.entities = {} # In production: use database
def update_entity(self, entity_type, entity_id, attributes):
"""Update entity attributes."""
key = f"{entity_type}:{entity_id}"
if key not in self.entities:
self.entities[key] = {"type": entity_type, "id": entity_id}
self.entities[key].update(attributes)
def get_entity(self, entity_type, entity_id):
"""Retrieve entity information."""
key = f"{entity_type}:{entity_id}"
return self.entities.get(key, {})
def get_context(self, entity_type, entity_id):
"""Get formatted context about entity."""
entity = self.get_entity(entity_type, entity_id)
if not entity:
return ""
context = f"{entity_type} {entity_id}:\n"
for key, value in entity.items():
if key not in ["type", "id"]:
context += f"- {key}: {value}\n"
return context
# Usage in agent
entity_memory = EntityMemory()
# Update from conversation
entity_memory.update_entity("user", "john@example.com", {
"name": "John Doe",
"company": "Acme Corp",
"subscription": "Premium",
"last_contact": "2025-01-20"
})
# Use in context
context = entity_memory.get_context("user", "john@example.com")Multi-Agent Systems and Orchestration
Complex tasks often benefit from multiple specialized agents working together.
Agent Orchestration Patterns
1. Sequential (Pipeline)
Agents process information in sequence, each adding value:
class AgentPipeline:
def __init__(self, agents):
self.agents = agents
def run(self, input_data):
"""Run input through agent pipeline."""
result = input_data
for agent in self.agents:
print(f"Running {agent.name}...")
result = agent.process(result)
return result
# Example: Content creation pipeline
pipeline = AgentPipeline([
ResearchAgent(), # Research topic
OutlineAgent(), # Create outline
WriterAgent(), # Write content
EditorAgent(), # Edit and refine
SEOAgent() # Add SEO optimization
])
article = pipeline.run({"topic": "AI in Healthcare"})2. Parallel (Concurrent)
Multiple agents work on subtasks simultaneously:
import asyncio
class ParallelAgents:
def __init__(self, agents):
self.agents = agents
async def run(self, task):
"""Run agents in parallel."""
# Create tasks
tasks = [agent.process_async(task) for agent in self.agents]
# Wait for all to complete
results = await asyncio.gather(*tasks)
# Synthesize results
return self._synthesize(results)
def _synthesize(self, results):
"""Combine results from parallel agents."""
combined = "Results from parallel analysis:\n\n"
for agent, result in zip(self.agents, results):
combined += f"{agent.name}: {result}\n\n"
return combined
# Example: Multi-perspective analysis
agents = ParallelAgents([
TechnicalAnalysisAgent(),
FinancialAnalysisAgent(),
RiskAnalysisAgent(),
CompetitiveAnalysisAgent()
])
analysis = await agents.run("Evaluate acquisition of Startup X")3. Hierarchical (Manager-Worker)
A manager agent delegates to specialist agents:
class ManagerAgent:
def __init__(self, specialist_agents):
self.specialists = specialist_agents
def run(self, task):
"""Delegate task to appropriate specialists."""
# Decompose task
subtasks = self._decompose_task(task)
results = []
for subtask in subtasks:
# Select best agent for subtask
agent = self._select_agent(subtask)
# Delegate
result = agent.process(subtask)
results.append(result)
# Synthesize final answer
return self._synthesize_results(task, results)
def _select_agent(self, subtask):
"""Choose the best specialist for a subtask."""
# Use LLM to determine which agent to use
prompt = f"""Which specialist should handle this subtask?
Subtask: {subtask}
Available specialists:
{self._format_specialists()}
Return just the specialist name."""
response = get_llm_response(prompt)
return self.specialists[response.strip()]Agent Communication
class AgentCommunication:
"""Enable agents to communicate with each other."""
def __init__(self):
self.message_queue = []
def send_message(self, from_agent, to_agent, content):
"""Send message between agents."""
self.message_queue.append({
"from": from_agent,
"to": to_agent,
"content": content,
"timestamp": datetime.now()
})
def get_messages(self, agent_name):
"""Get messages for an agent."""
messages = [m for m in self.message_queue if m["to"] == agent_name]
# Remove retrieved messages
self.message_queue = [m for m in self.message_queue if m["to"] != agent_name]
return messages
# Agents can communicate
comm = AgentCommunication()
# Research agent asks question to specialist
comm.send_message(
from_agent="research_agent",
to_agent="financial_analyst",
content="What was the company's revenue in Q4?"
)
# Financial analyst checks messages and responds
messages = comm.get_messages("financial_analyst")
for msg in messages:
response = financial_analyst.process(msg["content"])
comm.send_message("financial_analyst", msg["from"], response)Production Deployment and Monitoring
Deploying agents to production requires robust error handling, monitoring, and safety measures.
Error Handling and Recovery
class RobustAgent:
def __init__(self, max_retries=3):
self.max_retries = max_retries
def run(self, task):
"""Run task with error handling and retries."""
for attempt in range(self.max_retries):
try:
return self._execute_task(task)
except ToolError as e:
print(f"Tool error on attempt {attempt + 1}: {e}")
if attempt < self.max_retries - 1:
# Try alternative tool or approach
continue
else:
return self._graceful_failure(task, e)
except LLMError as e:
print(f"LLM error: {e}")
# Exponential backoff
time.sleep(2 ** attempt)
continue
except Exception as e:
print(f"Unexpected error: {e}")
self._log_error(task, e)
return "I encountered an error processing your request."
def _graceful_failure(self, task, error):
"""Handle failure gracefully."""
return f"I tried to complete your task but encountered an issue: {error}. Please try rephrasing or contact support."Monitoring and Observability
from dataclasses import dataclass
from datetime import datetime
@dataclass
class AgentTrace:
"""Track agent execution."""
task: str
agent_name: str
start_time: datetime
end_time: datetime = None
steps: list = None
tools_used: list = None
tokens_used: int = 0
success: bool = True
error: str = None
class MonitoredAgent:
def run(self, task):
"""Run task with full tracing."""
trace = AgentTrace(
task=task,
agent_name=self.name,
start_time=datetime.now(),
steps=[],
tools_used=[]
)
try:
result = self._execute_with_tracing(task, trace)
trace.success = True
return result
except Exception as e:
trace.success = False
trace.error = str(e)
raise
finally:
trace.end_time = datetime.now()
self._log_trace(trace)
def _log_trace(self, trace):
"""Log execution trace for analysis."""
duration = (trace.end_time - trace.start_time).total_seconds()
log_data = {
"agent": trace.agent_name,
"task": trace.task,
"duration_seconds": duration,
"steps_count": len(trace.steps),
"tools_used": trace.tools_used,
"tokens": trace.tokens_used,
"success": trace.success,
"error": trace.error
}
# Send to monitoring system (e.g., DataDog, CloudWatch)
logger.info("agent_execution", extra=log_data)
# Alert if slow or failed
if duration > 30:
alert("Slow agent execution", log_data)
if not trace.success:
alert("Agent execution failed", log_data)Cost and Usage Tracking
class CostTracker:
"""Track API costs and usage."""
def __init__(self):
self.costs = []
def track_llm_call(self, model, input_tokens, output_tokens):
"""Track LLM API cost."""
# Pricing as of early 2025
pricing = {
"gpt-4o": {"input": 0.005, "output": 0.015}, # per 1K tokens
"gpt-4o-mini": {"input": 0.00015, "output": 0.0006},
"claude-3-5-sonnet": {"input": 0.003, "output": 0.015}
}
rates = pricing.get(model, pricing["gpt-4o-mini"])
cost = (input_tokens / 1000 * rates["input"]) + (output_tokens / 1000 * rates["output"])
self.costs.append({
"timestamp": datetime.now(),
"model": model,
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"cost": cost
})
return cost
def get_total_cost(self, since=None):
"""Get total cost since a timestamp."""
if since:
relevant = [c for c in self.costs if c["timestamp"] >= since]
else:
relevant = self.costs
return sum(c["cost"] for c in relevant)
# Use in agent
tracker = CostTracker()
def run_agent_with_cost_tracking(task):
response = client.chat.completions.create(...)
tracker.track_llm_call(
model="gpt-4o",
input_tokens=response.usage.prompt_tokens,
output_tokens=response.usage.completion_tokens
)
# Check if over budget
daily_cost = tracker.get_total_cost(since=datetime.now() - timedelta(days=1))
if daily_cost > DAILY_BUDGET:
alert("Agent budget exceeded", {"daily_cost": daily_cost})Rate Limiting and Throttling
from collections import deque
import time
class RateLimiter:
"""Rate limit agent actions."""
def __init__(self, max_requests, time_window):
self.max_requests = max_requests
self.time_window = time_window # seconds
self.requests = deque()
def allow_request(self):
"""Check if request is allowed."""
now = time.time()
# Remove old requests outside time window
while self.requests and self.requests[0] < now - self.time_window:
self.requests.popleft()
# Check if under limit
if len(self.requests) < self.max_requests:
self.requests.append(now)
return True
return False
def wait_if_needed(self):
"""Wait until request is allowed."""
while not self.allow_request():
time.sleep(0.1)
# Apply to agent
rate_limiter = RateLimiter(max_requests=100, time_window=60) # 100 req/min
def rate_limited_agent_run(task):
rate_limiter.wait_if_needed()
return agent.run(task)Conclusion
Building production-ready AI agents requires mastering multiple disciplines: agent architectures (ReAct, Plan-Execute), tool integration with proper safety measures, memory systems for context persistence, multi-agent orchestration patterns, and comprehensive monitoring and error handling.
The journey from a simple function-calling bot to a sophisticated autonomous agent is incremental. Start with basic tool use and gradually add capabilities: memory for context, multiple tools for flexibility, multi-agent patterns for complex workflows, and robust error handling for reliability.
Remember that agents are not fully autonomous - they're autonomous within constraints. Always implement safety measures like human-in-the-loop for destructive actions, rate limiting, cost tracking, and comprehensive monitoring. The goal is agents that reliably solve problems while staying within safe, predictable boundaries.
As AI capabilities continue to advance, the agent patterns in this guide will remain foundational. Whether building customer service agents, research assistants, or complex multi-agent systems, these architectures and best practices provide a solid foundation for production deployment.
Frequently Asked Questions
What is the difference between a chatbot and an AI agent?
Which agent framework should I use: LangChain, AutoGPT, or build custom?
How do I prevent agents from taking harmful actions?
How much do agents cost to run compared to simple chatbots?
Can agents work reliably in production, or are they too unpredictable?
How do I debug agents when they fail or behave unexpectedly?
Should I use open-source models for agents or stick with GPT-4/Claude?
How do I implement memory that persists across sessions?
Can multiple agents share the same memory/knowledge base?
What are the most common mistakes when building agents?
Table of Contents
Related Articles
AI Agents Fundamentals: Complete Guide to Autonomous AI
Discover how AI agents go beyond chatbots to autonomously accomplish tasks using tools and reasoning. Learn agent architectures, capabilities, business applications, and implementation strategies.
Building Your First RAG System: A Complete Implementation Guide
Learn how to build a production-ready RAG (Retrieval Augmented Generation) system from scratch with practical code examples, architecture patterns, and best practices.
API Integration Patterns: Building Reliable, Scalable LLM Applications
Master patterns for integrating with LLM APIs reliably at scale. Learn error handling, rate limiting, caching, cost optimization, and production-ready architectures for OpenAI, Anthropic, and other providers.
