Master the OpenAI API for production applications. From GPT-4 to embeddings, learn how Australian businesses build custom AI solutions with practical code examples and cost optimisation strategies.
The OpenAI API powers everything from simple chatbots to sophisticated AI agents processing millions of requests. For Australian businesses building custom AI applications, understanding the API deeply—beyond basic chat completions—unlocks capabilities that pre-built tools simply can't match.
This guide goes beyond "Hello World" to cover production-grade API usage: function calling for tool use, embeddings for semantic search, streaming for real-time UX, cost optimisation strategies, and error handling patterns. Whether you're building internal tools or customer-facing products, you'll learn techniques used by Australian businesses running AI at scale.
Before diving into advanced features, ensure you understand the API's core concepts and available models.
128K
GPT-4o context window
$2.50
Per 1M input tokens (GPT-4o)
~100
Tokens per second (streaming)
| Model | Best For | Input Cost/1M | Output Cost/1M |
|---|---|---|---|
| gpt-4o | Best all-rounder, multimodal | $2.50 USD | $10.00 USD |
| gpt-4o-mini | Cost-effective, most tasks | $0.15 USD | $0.60 USD |
| gpt-4-turbo | Complex reasoning, JSON mode | $10.00 USD | $30.00 USD |
| gpt-3.5-turbo | Simple tasks, high volume | $0.50 USD | $1.50 USD |
| text-embedding-3-small | Embeddings, search | $0.02 USD | N/A |
Cost Tip for Australian Businesses: GPT-4o-mini is often the best value. It handles 90% of tasks at 1/16th the cost of GPT-4o. Start with mini, upgrade to full GPT-4o only when quality requirements demand it.
Getting Started (Python)
from openai import OpenAI
# Initialize client with API key
client = OpenAI(api_key="sk-...") # Or set OPENAI_API_KEY env var
# Basic completion
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)
Getting Started (JavaScript/TypeScript)
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" }
]
});
console.log(response.choices[0].message.content);
Function calling transforms AI from answering questions to executing tasks. Define functions the AI can call, and it will determine when and how to use them based on user requests.
Function Calling Example (Python)
import json
from openai import OpenAI
client = OpenAI()
# Define available functions
tools = [
{
"type": "function",
"function": {
"name": "get_customer_info",
"description": "Retrieve customer information from CRM",
"parameters": {
"type": "object",
"properties": {
"customer_id": {
"type": "string",
"description": "The customer's unique identifier"
},
"fields": {
"type": "array",
"items": {"type": "string"},
"description": "Fields to retrieve: name, email, orders, balance"
}
},
"required": ["customer_id"]
}
}
},
{
"type": "function",
"function": {
"name": "create_support_ticket",
"description": "Create a support ticket in the helpdesk system",
"parameters": {
"type": "object",
"properties": {
"customer_id": {"type": "string"},
"subject": {"type": "string"},
"priority": {"type": "string", "enum": ["low", "medium", "high"]},
"description": {"type": "string"}
},
"required": ["customer_id", "subject", "description"]
}
}
}
]
# Make API call with function definitions
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a customer service assistant. Use the available tools to help customers."},
{"role": "user", "content": "Can you check the order history for customer ABC123 and create a ticket about their delayed shipment?"}
],
tools=tools,
tool_choice="auto" # Let model decide when to use tools
)
# Handle function calls
message = response.choices[0].message
if message.tool_calls:
for tool_call in message.tool_calls:
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
# Execute the actual function
if function_name == "get_customer_info":
result = get_customer_info(**arguments) # Your implementation
elif function_name == "create_support_ticket":
result = create_support_ticket(**arguments) # Your implementation
# Continue conversation with function result
# ... (send result back to model)
Let AI generate and execute SQL based on natural language. "Show me all overdue invoices from Queensland" → structured query.
AI decides which APIs to call: CRM lookups, calendar scheduling, email sending based on user intent.
Complex tasks broken into function calls: validate input → check inventory → create order → send confirmation.
Force structured output by defining extraction functions. AI extracts and validates in single call.
B2B software company built AI assistant using function calling.
Embeddings convert text into numerical vectors that capture semantic meaning. Two similar concepts have vectors that are "close" in vector space, enabling powerful search and retrieval applications.
Creating Embeddings (Python)
from openai import OpenAI
client = OpenAI()
# Single text embedding
response = client.embeddings.create(
model="text-embedding-3-small",
input="How do I automate invoice processing with AI?"
)
embedding = response.data[0].embedding # 1536-dimensional vector
print(f"Embedding dimension: {len(embedding)}")
# Batch embeddings for efficiency
texts = [
"Invoice processing automation",
"Customer support chatbot",
"Sales pipeline analysis"
]
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
embeddings = [item.embedding for item in response.data]
RAG Implementation Pattern
import numpy as np
from openai import OpenAI
client = OpenAI()
# 1. INDEXING: Embed and store your documents
def embed_documents(documents):
"""Embed a list of documents and return vectors."""
response = client.embeddings.create(
model="text-embedding-3-small",
input=documents
)
return [item.embedding for item in response.data]
# 2. RETRIEVAL: Find relevant documents for a query
def find_relevant_docs(query, doc_embeddings, documents, top_k=3):
"""Find most relevant documents using cosine similarity."""
# Embed the query
query_response = client.embeddings.create(
model="text-embedding-3-small",
input=query
)
query_embedding = query_response.data[0].embedding
# Calculate similarities
similarities = []
for i, doc_emb in enumerate(doc_embeddings):
similarity = np.dot(query_embedding, doc_emb) / (
np.linalg.norm(query_embedding) * np.linalg.norm(doc_emb)
)
similarities.append((similarity, documents[i]))
# Return top-k most similar
similarities.sort(reverse=True)
return [doc for _, doc in similarities[:top_k]]
# 3. GENERATION: Answer using retrieved context
def answer_with_context(question, context_docs):
"""Generate answer using retrieved documents as context."""
context = "\n\n".join(context_docs)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": f"""Answer questions based on the following context.
If the context doesn't contain relevant information, say so.
Context:
{context}"""
},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
# Usage
documents = ["...your documents..."]
doc_embeddings = embed_documents(documents)
question = "How do I set up automated invoicing?"
relevant = find_relevant_docs(question, doc_embeddings, documents)
answer = answer_with_context(question, relevant)
print(answer)
| Model | Dimensions | Cost/1M tokens | Use Case |
|---|---|---|---|
| text-embedding-3-small | 1536 | $0.02 USD | Most applications (recommended) |
| text-embedding-3-large | 3072 | $0.13 USD | Maximum accuracy needs |
Production Tip: For production RAG systems, use a vector database (Pinecone, Qdrant, Supabase) instead of in-memory storage. They handle similarity search efficiently at scale and provide features like filtering and metadata storage.
Streaming returns tokens as they're generated rather than waiting for the complete response. Essential for chat interfaces and any UX where perceived speed matters.
Streaming in Python
from openai import OpenAI
client = OpenAI()
# Stream response
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Explain AI in 200 words"}],
stream=True
)
# Print tokens as they arrive
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
print() # Newline at end
Streaming in TypeScript (Next.js)
// app/api/chat/route.ts
import OpenAI from 'openai';
const openai = new OpenAI();
export async function POST(req: Request) {
const { messages } = await req.json();
const stream = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages,
stream: true,
});
// Return streaming response
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
const text = chunk.choices[0]?.delta?.content || '';
controller.enqueue(encoder.encode(text));
}
controller.close();
},
});
return new Response(readable, {
headers: { 'Content-Type': 'text/plain; charset=utf-8' },
});
}
When using function calling with streaming, you need to accumulate the function call arguments:
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[...],
tools=tools,
stream=True
)
tool_calls = []
current_tool_call = None
for chunk in stream:
delta = chunk.choices[0].delta
# Handle tool calls in stream
if delta.tool_calls:
for tc_chunk in delta.tool_calls:
if tc_chunk.index is not None:
if tc_chunk.id: # New tool call
current_tool_call = {
"id": tc_chunk.id,
"type": "function",
"function": {"name": "", "arguments": ""}
}
tool_calls.append(current_tool_call)
if tc_chunk.function:
if tc_chunk.function.name:
current_tool_call["function"]["name"] = tc_chunk.function.name
if tc_chunk.function.arguments:
current_tool_call["function"]["arguments"] += tc_chunk.function.arguments
# Handle regular content
elif delta.content:
print(delta.content, end="", flush=True)
API costs can escalate quickly at scale. These strategies help Australian businesses manage costs while maintaining quality.
Use GPT-4o-mini (1/16th cost) for most tasks. Reserve GPT-4o for complex reasoning. Run tests to confirm quality meets needs at lower tier.
Shorter prompts = lower cost. Remove redundant instructions. Use concise system prompts. Every token costs money.
Cache responses for identical/similar queries. Use embeddings to find cached answers for semantically similar questions.
Set max_tokens to prevent runaway responses. Request concise answers in prompts. Output tokens cost more than input.
Process multiple items in single API call where possible. Reduces overhead and can improve throughput.
Filter/classify with cheaper methods first. Only send to GPT-4 when necessary. Use regex or rules for obvious cases.
Track Usage Per Request
# Track tokens and estimate costs
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[...],
)
usage = response.usage
print(f"Input tokens: {usage.prompt_tokens}")
print(f"Output tokens: {usage.completion_tokens}")
print(f"Total tokens: {usage.total_tokens}")
# Estimate cost (GPT-4o-mini pricing)
input_cost = usage.prompt_tokens * 0.00000015 # $0.15/1M
output_cost = usage.completion_tokens * 0.0000006 # $0.60/1M
total_cost = input_cost + output_cost
print(f"Estimated cost: ${total_cost:.6f} USD")
Processing 5,000 support queries/month
GPT-4o (before optimisation):
GPT-4o-mini (after optimisation):
Budget Alerts: Set up usage limits and alerts in the OpenAI dashboard. This prevents surprise bills from bugs or unexpected usage spikes. Especially important during development when testing can burn through credits quickly.
Production applications need robust error handling. The OpenAI API can fail for various reasons—your code must handle them gracefully.
Too many requests. Implement exponential backoff and retry.
OpenAI server error. Retry with backoff—usually temporary.
Bad request (context too long, invalid params). Fix the request, don't retry.
Invalid API key. Check configuration, don't retry.
Request took too long. Retry or reduce complexity.
Python with Tenacity Retry
from openai import OpenAI, RateLimitError, APIError
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
client = OpenAI()
@retry(
retry=retry_if_exception_type((RateLimitError, APIError)),
wait=wait_exponential(multiplier=1, min=4, max=60),
stop=stop_after_attempt(5)
)
def call_openai_with_retry(messages, model="gpt-4o-mini"):
"""Make OpenAI API call with automatic retry on transient errors."""
try:
response = client.chat.completions.create(
model=model,
messages=messages,
timeout=30 # Set reasonable timeout
)
return response.choices[0].message.content
except RateLimitError:
# Log rate limit hit, will retry automatically
print("Rate limited, retrying...")
raise
except APIError as e:
if e.status_code >= 500:
# Server error, retry
print(f"Server error {e.status_code}, retrying...")
raise
else:
# Client error, don't retry
raise ValueError(f"API error: {e}")
except Exception as e:
# Unexpected error
print(f"Unexpected error: {e}")
raise
# Usage
try:
result = call_openai_with_retry([
{"role": "user", "content": "Hello!"}
])
print(result)
except Exception as e:
print(f"Failed after retries: {e}")
# Fallback logic here
Truncate to Fit Context Window
import tiktoken
def count_tokens(text, model="gpt-4o-mini"):
"""Count tokens in text for a given model."""
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
def truncate_to_fit(messages, max_tokens=100000, model="gpt-4o-mini"):
"""Truncate conversation to fit within token limit."""
encoding = tiktoken.encoding_for_model(model)
# Keep system message, truncate from oldest user/assistant
total_tokens = 0
result = []
# Always include system message first
for msg in messages:
if msg["role"] == "system":
result.append(msg)
total_tokens += count_tokens(msg["content"], model)
break
# Add messages from most recent, stop when limit reached
for msg in reversed(messages):
if msg["role"] == "system":
continue
msg_tokens = count_tokens(msg["content"], model)
if total_tokens + msg_tokens > max_tokens:
break
result.insert(1, msg) # Insert after system message
total_tokens += msg_tokens
return result
Monitoring Tip: Log all API calls with timestamps, token counts, and latency. This helps identify issues quickly and provides data for cost optimisation. Consider tools like Helicone or LangSmith for production monitoring.
Australian businesses must consider data protection when using OpenAI's API.
For organisations requiring Australian data residency, Azure OpenAI Service offers:
Trade-off: Azure OpenAI requires Azure subscription and approval process. Models may lag slightly behind OpenAI direct. Choose based on whether Australian data residency is a hard requirement.
The OpenAI API is remarkably capable, but getting value from it requires more than basic chat completions. Function calling transforms AI from answering questions to taking actions. Embeddings enable semantic search and RAG applications that make AI knowledgeable about your specific data. Streaming creates responsive user experiences. And thoughtful cost management keeps projects economically viable.
For Australian businesses building custom AI applications, the API offers flexibility that pre-built tools can't match. But that flexibility comes with responsibility—for error handling, cost management, and compliance considerations that packaged solutions handle for you.
Start with GPT-4o-mini for most use cases—it handles the majority of tasks at a fraction of the cost. Implement proper error handling from day one. Monitor your costs closely, especially during development. And when you're ready for production, consider the full picture: rate limits, fallbacks, monitoring, and the data considerations specific to your Australian context.
Master the Claude API for sophisticated AI applications. Extended context windows, tool use, vision capabilities, and production patterns for Australian businesses building with Anthropic's models.
Master LangChain for building sophisticated AI applications. Complete guide to chains, agents, memory, and retrieval systems for Australian developers.