OpenAI API Deep Dive: Building AI Applications in Australia
Master the OpenAI API for production applications. From GPT-4 to embeddings, learn how Australian businesses build custom AI solutions with practical code examples and cost optimisation strategies.
The OpenAI API powers everything from simple chatbots to sophisticated AI agents processing millions of requests. For Australian businesses building custom AI applications, understanding the API deeply - beyond basic chat completions - unlocks capabilities that pre-built tools simply can't match.
This guide goes beyond "Hello World" to cover production-grade API usage: function calling for tool use, embeddings for semantic search, streaming for real-time UX, cost optimisation strategies, and error handling patterns. Whether you're building internal tools or customer-facing products, you'll learn techniques used by Australian businesses running AI at scale.
Key Takeaways
- GPT-4o-mini offers 90%+ of GPT-4o capability at 1/16th the cost - use it as your default model
- Function calling enables AI to execute real actions: database queries, API calls, multi-step workflows
- Embeddings power semantic search and RAG - essential for AI that knows your specific data
- Streaming improves perceived performance dramatically for chat interfaces and long responses
- Implement retry logic with exponential backoff for production reliability
- Monitor token usage and costs - they can escalate quickly at scale without proper tracking
- OpenAI API doesn't train on your data by default, but document data flows for Australian compliance
OpenAI API Fundamentals
Before diving into advanced features, ensure you understand the API's core concepts and available models.
128K
GPT-4o context window
$2.50
Per 1M input tokens (GPT-4o)
~100
Tokens per second (streaming)
Model Selection Guide
| Model | Best For | Input Cost/1M | Output Cost/1M |
|---|---|---|---|
| gpt-4o | Best all-rounder, multimodal | $2.50 USD | $10.00 USD |
| gpt-4o-mini | Cost-effective, most tasks | $0.15 USD | $0.60 USD |
| gpt-4-turbo | Complex reasoning, JSON mode | $10.00 USD | $30.00 USD |
| gpt-3.5-turbo | Simple tasks, high volume | $0.50 USD | $1.50 USD |
| text-embedding-3-small | Embeddings, search | $0.02 USD | N/A |
Cost Tip for Australian Businesses: GPT-4o-mini is often the best value. It handles 90% of tasks at 1/16th the cost of GPT-4o. Start with mini, upgrade to full GPT-4o only when quality requirements demand it.
API Authentication
Getting Started (Python)
from openai import OpenAI
# Initialize client with API key
client = OpenAI(api_key="sk-...") # Or set OPENAI_API_KEY env var
# Basic completion
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)
Getting Started (JavaScript/TypeScript)
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" }
]
});
console.log(response.choices[0].message.content);
Function Calling: AI That Takes Action
Function calling transforms AI from answering questions to executing tasks. Define functions the AI can call, and it will determine when and how to use them based on user requests.
How Function Calling Works
- 1. Define Functions: Tell the API what functions are available and their parameters
- 2. User Request: User asks something that might need a function
- 3. AI Decides: Model determines if/which function to call with what arguments
- 4. You Execute: Your code actually runs the function
- 5. Return Results: Send function output back to the model for final response
Function Calling Example (Python)
import json
from openai import OpenAI
client = OpenAI()
# Define available functions
tools = [
{
"type": "function",
"function": {
"name": "get_customer_info",
"description": "Retrieve customer information from CRM",
"parameters": {
"type": "object",
"properties": {
"customer_id": {
"type": "string",
"description": "The customer's unique identifier"
},
"fields": {
"type": "array",
"items": {"type": "string"},
"description": "Fields to retrieve: name, email, orders, balance"
}
},
"required": ["customer_id"]
}
}
},
{
"type": "function",
"function": {
"name": "create_support_ticket",
"description": "Create a support ticket in the helpdesk system",
"parameters": {
"type": "object",
"properties": {
"customer_id": {"type": "string"},
"subject": {"type": "string"},
"priority": {"type": "string", "enum": ["low", "medium", "high"]},
"description": {"type": "string"}
},
"required": ["customer_id", "subject", "description"]
}
}
}
]
# Make API call with function definitions
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a customer service assistant. Use the available tools to help customers."},
{"role": "user", "content": "Can you check the order history for customer ABC123 and create a ticket about their delayed shipment?"}
],
tools=tools,
tool_choice="auto" # Let model decide when to use tools
)
# Handle function calls
message = response.choices[0].message
if message.tool_calls:
for tool_call in message.tool_calls:
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
# Execute the actual function
if function_name == "get_customer_info":
result = get_customer_info(**arguments) # Your implementation
elif function_name == "create_support_ticket":
result = create_support_ticket(**arguments) # Your implementation
# Continue conversation with function result
# ... (send result back to model)
Practical Function Calling Use Cases
Database Queries
Let AI generate and execute SQL based on natural language. "Show me all overdue invoices from Queensland" → structured query.
API Integrations
AI decides which APIs to call: CRM lookups, calendar scheduling, email sending based on user intent.
Multi-Step Workflows
Complex tasks broken into function calls: validate input → check inventory → create order → send confirmation.
Data Extraction
Force structured output by defining extraction functions. AI extracts and validates in single call.
Case Study: Melbourne SaaS Platform
B2B software company built AI assistant using function calling.
- • Functions defined: 12 (user management, billing, analytics, etc.)
- • User queries: Natural language requests from support chat
- • Result: AI handles 60% of support queries end-to-end
- • Key insight: Function descriptions are critical - more detail = better tool selection
Embeddings: Semantic Search & RAG
Embeddings convert text into numerical vectors that capture semantic meaning. Two similar concepts have vectors that are "close" in vector space, enabling powerful search and retrieval applications.
When to Use Embeddings
- Semantic Search: Find relevant documents even when exact keywords don't match
- RAG Applications: Retrieve context for AI to answer questions about your data
- Recommendation Systems: Find similar products, articles, or customers
- Classification: Categorise content by comparing to category examples
- Duplicate Detection: Find near-duplicate content or entities
Creating Embeddings (Python)
from openai import OpenAI
client = OpenAI()
# Single text embedding
response = client.embeddings.create(
model="text-embedding-3-small",
input="How do I automate invoice processing with AI?"
)
embedding = response.data[0].embedding # 1536-dimensional vector
print(f"Embedding dimension: {len(embedding)}")
# Batch embeddings for efficiency
texts = [
"Invoice processing automation",
"Customer support chatbot",
"Sales pipeline analysis"
]
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
embeddings = [item.embedding for item in response.data]
Building a Simple RAG System
RAG Implementation Pattern
import numpy as np
from openai import OpenAI
client = OpenAI()
# 1. INDEXING: Embed and store your documents
def embed_documents(documents):
"""Embed a list of documents and return vectors."""
response = client.embeddings.create(
model="text-embedding-3-small",
input=documents
)
return [item.embedding for item in response.data]
# 2. RETRIEVAL: Find relevant documents for a query
def find_relevant_docs(query, doc_embeddings, documents, top_k=3):
"""Find most relevant documents using cosine similarity."""
# Embed the query
query_response = client.embeddings.create(
model="text-embedding-3-small",
input=query
)
query_embedding = query_response.data[0].embedding
# Calculate similarities
similarities = []
for i, doc_emb in enumerate(doc_embeddings):
similarity = np.dot(query_embedding, doc_emb) / (
np.linalg.norm(query_embedding) * np.linalg.norm(doc_emb)
)
similarities.append((similarity, documents[i]))
# Return top-k most similar
similarities.sort(reverse=True)
return [doc for _, doc in similarities[:top_k]]
# 3. GENERATION: Answer using retrieved context
def answer_with_context(question, context_docs):
"""Generate answer using retrieved documents as context."""
context = "\n\n".join(context_docs)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": f"""Answer questions based on the following context.
If the context doesn't contain relevant information, say so.
Context:
{context}"""
},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
# Usage
documents = ["...your documents..."]
doc_embeddings = embed_documents(documents)
question = "How do I set up automated invoicing?"
relevant = find_relevant_docs(question, doc_embeddings, documents)
answer = answer_with_context(question, relevant)
print(answer)
Embedding Model Selection
| Model | Dimensions | Cost/1M tokens | Use Case |
|---|---|---|---|
| text-embedding-3-small | 1536 | $0.02 USD | Most applications (recommended) |
| text-embedding-3-large | 3072 | $0.13 USD | Maximum accuracy needs |
Production Tip: For production RAG systems, use a vector database (Pinecone, Qdrant, Supabase) instead of in-memory storage. They handle similarity search efficiently at scale and provide features like filtering and metadata storage.
Streaming: Real-Time User Experience
Streaming returns tokens as they're generated rather than waiting for the complete response. Essential for chat interfaces and any UX where perceived speed matters.
Why Streaming Matters
- Perceived Speed: Users see response starting immediately vs waiting 5-10 seconds
- Long Responses: Critical for detailed answers that take time to generate
- Chat UX: Mimics natural conversation flow
- Progress Indication: Users know the system is working
Streaming in Python
from openai import OpenAI
client = OpenAI()
# Stream response
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Explain AI in 200 words"}],
stream=True
)
# Print tokens as they arrive
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
print() # Newline at end
Streaming in TypeScript (Next.js)
// app/api/chat/route.ts
import OpenAI from 'openai';
const openai = new OpenAI();
export async function POST(req: Request) {
const { messages } = await req.json();
const stream = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages,
stream: true,
});
// Return streaming response
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
const text = chunk.choices[0]?.delta?.content || '';
controller.enqueue(encoder.encode(text));
}
controller.close();
},
});
return new Response(readable, {
headers: { 'Content-Type': 'text/plain; charset=utf-8' },
});
}
Streaming with Function Calls
When using function calling with streaming, you need to accumulate the function call arguments:
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[...],
tools=tools,
stream=True
)
tool_calls = []
current_tool_call = None
for chunk in stream:
delta = chunk.choices[0].delta
# Handle tool calls in stream
if delta.tool_calls:
for tc_chunk in delta.tool_calls:
if tc_chunk.index is not None:
if tc_chunk.id: # New tool call
current_tool_call = {
"id": tc_chunk.id,
"type": "function",
"function": {"name": "", "arguments": ""}
}
tool_calls.append(current_tool_call)
if tc_chunk.function:
if tc_chunk.function.name:
current_tool_call["function"]["name"] = tc_chunk.function.name
if tc_chunk.function.arguments:
current_tool_call["function"]["arguments"] += tc_chunk.function.arguments
# Handle regular content
elif delta.content:
print(delta.content, end="", flush=True)
Cost Optimisation Strategies
API costs can escalate quickly at scale. These strategies help Australian businesses manage costs while maintaining quality.
Cost Reduction Techniques
1. Model Selection
Use GPT-4o-mini (1/16th cost) for most tasks. Reserve GPT-4o for complex reasoning. Run tests to confirm quality meets needs at lower tier.
2. Prompt Optimisation
Shorter prompts = lower cost. Remove redundant instructions. Use concise system prompts. Every token costs money.
3. Caching
Cache responses for identical/similar queries. Use embeddings to find cached answers for semantically similar questions.
4. Output Limiting
Set max_tokens to prevent runaway responses. Request concise answers in prompts. Output tokens cost more than input.
5. Batching
Process multiple items in single API call where possible. Reduces overhead and can improve throughput.
6. Pre-Processing
Filter/classify with cheaper methods first. Only send to GPT-4 when necessary. Use regex or rules for obvious cases.
Cost Monitoring Setup
Track Usage Per Request
# Track tokens and estimate costs
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[...],
)
usage = response.usage
print(f"Input tokens: {usage.prompt_tokens}")
print(f"Output tokens: {usage.completion_tokens}")
print(f"Total tokens: {usage.total_tokens}")
# Estimate cost (GPT-4o-mini pricing)
input_cost = usage.prompt_tokens * 0.00000015 # $0.15/1M
output_cost = usage.completion_tokens * 0.0000006 # $0.60/1M
total_cost = input_cost + output_cost
print(f"Estimated cost: ${total_cost:.6f} USD")
Cost Example: Australian Business Use Case
Customer Support Bot - Monthly Cost Analysis
Processing 5,000 support queries/month
GPT-4o (before optimisation):
- • Avg 1,500 tokens/query (input + output)
- • Monthly tokens: 7.5M
- • Cost: ~$50-75 USD/month
GPT-4o-mini (after optimisation):
- • Same 1,500 tokens/query
- • Monthly tokens: 7.5M
- • Cost: ~$3-5 USD/month
- • Savings: 90%+
Budget Alerts: Set up usage limits and alerts in the OpenAI dashboard. This prevents surprise bills from bugs or unexpected usage spikes. Especially important during development when testing can burn through credits quickly.
Error Handling & Reliability
Production applications need robust error handling. The OpenAI API can fail for various reasons - your code must handle them gracefully.
Common Error Types
RateLimitError (429)
Too many requests. Implement exponential backoff and retry.
APIError (500)
OpenAI server error. Retry with backoff - usually temporary.
InvalidRequestError (400)
Bad request (context too long, invalid params). Fix the request, don't retry.
AuthenticationError (401)
Invalid API key. Check configuration, don't retry.
Timeout
Request took too long. Retry or reduce complexity.
Robust Error Handling Pattern
Python with Tenacity Retry
from openai import OpenAI, RateLimitError, APIError
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
client = OpenAI()
@retry(
retry=retry_if_exception_type((RateLimitError, APIError)),
wait=wait_exponential(multiplier=1, min=4, max=60),
stop=stop_after_attempt(5)
)
def call_openai_with_retry(messages, model="gpt-4o-mini"):
"""Make OpenAI API call with automatic retry on transient errors."""
try:
response = client.chat.completions.create(
model=model,
messages=messages,
timeout=30 # Set reasonable timeout
)
return response.choices[0].message.content
except RateLimitError:
# Log rate limit hit, will retry automatically
print("Rate limited, retrying...")
raise
except APIError as e:
if e.status_code >= 500:
# Server error, retry
print(f"Server error {e.status_code}, retrying...")
raise
else:
# Client error, don't retry
raise ValueError(f"API error: {e}")
except Exception as e:
# Unexpected error
print(f"Unexpected error: {e}")
raise
# Usage
try:
result = call_openai_with_retry([
{"role": "user", "content": "Hello!"}
])
print(result)
except Exception as e:
print(f"Failed after retries: {e}")
# Fallback logic here
Context Length Management
Truncate to Fit Context Window
import tiktoken
def count_tokens(text, model="gpt-4o-mini"):
"""Count tokens in text for a given model."""
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
def truncate_to_fit(messages, max_tokens=100000, model="gpt-4o-mini"):
"""Truncate conversation to fit within token limit."""
encoding = tiktoken.encoding_for_model(model)
# Keep system message, truncate from oldest user/assistant
total_tokens = 0
result = []
# Always include system message first
for msg in messages:
if msg["role"] == "system":
result.append(msg)
total_tokens += count_tokens(msg["content"], model)
break
# Add messages from most recent, stop when limit reached
for msg in reversed(messages):
if msg["role"] == "system":
continue
msg_tokens = count_tokens(msg["content"], model)
if total_tokens + msg_tokens > max_tokens:
break
result.insert(1, msg) # Insert after system message
total_tokens += msg_tokens
return result
Monitoring Tip: Log all API calls with timestamps, token counts, and latency. This helps identify issues quickly and provides data for cost optimisation. Consider tools like Helicone or LangSmith for production monitoring.
Australian Compliance & Data Considerations
Australian businesses must consider data protection when using OpenAI's API.
Privacy Act Considerations
OpenAI API Data Handling Checklist
- ✓ No Training on API Data: OpenAI doesn't train on API inputs by default (unlike ChatGPT free tier)
- ✓ Data Retention: API inputs retained for 30 days for abuse monitoring, then deleted
- ✓ Zero Retention Available: Enterprise customers can request zero data retention
- ✓ Cross-Border Transfer: Data processed in US - document for APP 8 compliance
- ✓ PII Handling: Establish clear policies on what personal information can be processed
Best Practices for Australian Businesses
- Data Minimisation: Only send data necessary for the task. Strip identifying information when possible.
- Sensitive Data Policy: Define what data categories are prohibited (health records, financial details, etc.)
- Audit Logging: Log what data is sent to API for compliance reviews
- Privacy Policy Update: Disclose AI processing to customers if relevant
- Vendor Assessment: Document OpenAI's security practices for due diligence requirements
Azure OpenAI Alternative
For organisations requiring Australian data residency, Azure OpenAI Service offers:
- Australian Data Centres: Deploy in Azure Australia East/Southeast
- Enterprise Security: Private networking, managed identity, VNET integration
- Same Models: GPT-4, embeddings, DALL-E available
- Compliance: Inherits Azure's compliance certifications
Trade-off: Azure OpenAI requires Azure subscription and approval process. Models may lag slightly behind OpenAI direct. Choose based on whether Australian data residency is a hard requirement.
Conclusion
The OpenAI API is remarkably capable, but getting value from it requires more than basic chat completions. Function calling transforms AI from answering questions to taking actions. Embeddings enable semantic search and RAG applications that make AI knowledgeable about your specific data. Streaming creates responsive user experiences. And thoughtful cost management keeps projects economically viable.
For Australian businesses building custom AI applications, the API offers flexibility that pre-built tools can't match. But that flexibility comes with responsibility - for error handling, cost management, and compliance considerations that packaged solutions handle for you.
Start with GPT-4o-mini for most use cases - it handles the majority of tasks at a fraction of the cost. Implement proper error handling from day one. Monitor your costs closely, especially during development. And when you're ready for production, consider the full picture: rate limits, fallbacks, monitoring, and the data considerations specific to your Australian context.
Frequently Asked Questions
How much does the OpenAI API cost for Australian businesses?
Is the OpenAI API compliant with Australian privacy laws?
What's the difference between GPT-4o and GPT-4o-mini?
How do I prevent high API costs?
What is function calling and when should I use it?
How do embeddings work and what are they for?
Should I use OpenAI API directly or through Azure OpenAI?
How do I handle API errors in production?
Table of Contents
Related Articles
Anthropic Claude API Guide: Building Production AI Applications
Master the Claude API for sophisticated AI applications. Extended context windows, tool use, vision capabilities, and production patterns for Australian businesses building with Anthropic's models.
LangChain Implementation Guide: Building AI Applications in Australia
Master LangChain for building sophisticated AI applications. Complete guide to chains, agents, memory, and retrieval systems for Australian developers.
