OpenAI API Guide Australia | GPT-4 Development Tutorial 2025

Q: How much does the OpenAI API cost for Australian businesses?

Costs depend heavily on model choice and volume. GPT-4o-mini is the most cost-effective for most tasks at $0.15 USD per 1M input tokens and $0.60 per 1M output tokens. GPT-4o costs $2.50/$10 per 1M tokens. A typical business processing 5,000 queries/month with GPT-4o-mini might spend $3-10 USD. Embeddings are very cheap at $0.02/1M tokens. Always monitor usage and set budget alerts.

Q: Is the OpenAI API compliant with Australian privacy laws?

OpenAI API data isn't used for training by default and is retained only 30 days for abuse monitoring. However, data is processed in the US, requiring consideration under APP 8 (cross-border disclosure). For strict Australian data residency requirements, consider Azure OpenAI Service which can be deployed in Australian data centres. Document your data flows and update privacy policies accordingly.

Q: What's the difference between GPT-4o and GPT-4o-mini?

GPT-4o is OpenAI's most capable model with superior reasoning, creativity, and complex task performance. GPT-4o-mini is a smaller, faster, much cheaper model that handles most tasks nearly as well. Mini costs 1/16th as much and is faster. Use mini as default, upgrade to full GPT-4o only when mini's quality isn't sufficient for your specific use case.

Q: How do I prevent high API costs?

Key strategies: 1) Use GPT-4o-mini instead of GPT-4o for most tasks, 2) Set max_tokens to limit response length, 3) Cache responses for repeated/similar queries, 4) Monitor usage with budget alerts, 5) Optimize prompts to reduce token count, 6) Pre-filter with cheaper methods before sending to GPT. Most cost overruns come from using expensive models for tasks that don't need them.

Q: What is function calling and when should I use it?

Function calling lets you define tools the AI can use, then the AI decides when to call them based on user input. Use it when you need AI to: execute database queries, call external APIs, perform multi-step operations, or extract structured data. It transforms AI from answering questions to taking actions—essential for building AI agents and assistants that actually do things.

Q: How do embeddings work and what are they for?

Embeddings convert text into numerical vectors capturing semantic meaning. Similar concepts have similar vectors. Use them for: semantic search (find relevant documents without exact keyword match), RAG applications (give AI access to your specific data), recommendations, and classification. OpenAI's text-embedding-3-small is cheap ($0.02/1M tokens) and handles most use cases well.

Q: Should I use OpenAI API directly or through Azure OpenAI?

Use OpenAI directly for: fastest access to new models, simpler setup, lower friction. Use Azure OpenAI for: Australian data residency requirements, enterprise security needs (private networking, managed identity), existing Azure infrastructure, or compliance certifications. Azure models may lag slightly behind direct API but offer better enterprise controls.

Q: How do I handle API errors in production?

Implement retry logic with exponential backoff for transient errors (rate limits, server errors). Set timeouts to prevent hanging requests. Log all calls with token counts and latency. Use try/catch to handle different error types appropriately—retry server errors but not client errors. Consider fallback responses for when AI is unavailable. Libraries like Tenacity (Python) make retry logic easier.

The OpenAI API powers everything from simple chatbots to sophisticated AI agents processing millions of requests. For Australian businesses building custom AI applications, understanding the API deeply—beyond basic chat completions—unlocks capabilities that pre-built tools simply can't match.

This guide goes beyond "Hello World" to cover production-grade API usage: function calling for tool use, embeddings for semantic search, streaming for real-time UX, cost optimisation strategies, and error handling patterns. Whether you're building internal tools or customer-facing products, you'll learn techniques used by Australian businesses running AI at scale.

Key Takeaways

GPT-4o-mini offers 90%+ of GPT-4o capability at 1/16th the cost—use it as your default model
Function calling enables AI to execute real actions: database queries, API calls, multi-step workflows
Embeddings power semantic search and RAG—essential for AI that knows your specific data
Streaming improves perceived performance dramatically for chat interfaces and long responses
Implement retry logic with exponential backoff for production reliability
Monitor token usage and costs—they can escalate quickly at scale without proper tracking
OpenAI API doesn't train on your data by default, but document data flows for Australian compliance

OpenAI API Fundamentals

Before diving into advanced features, ensure you understand the API's core concepts and available models.

128K

GPT-4o context window

$2.50

Per 1M input tokens (GPT-4o)

~100

Tokens per second (streaming)

Model Selection Guide

Model	Best For	Input Cost/1M	Output Cost/1M
gpt-4o	Best all-rounder, multimodal	$2.50 USD	$10.00 USD
gpt-4o-mini	Cost-effective, most tasks	$0.15 USD	$0.60 USD
gpt-4-turbo	Complex reasoning, JSON mode	$10.00 USD	$30.00 USD
gpt-3.5-turbo	Simple tasks, high volume	$0.50 USD	$1.50 USD
text-embedding-3-small	Embeddings, search	$0.02 USD	N/A

Cost Tip for Australian Businesses: GPT-4o-mini is often the best value. It handles 90% of tasks at 1/16th the cost of GPT-4o. Start with mini, upgrade to full GPT-4o only when quality requirements demand it.

API Authentication

Getting Started (Python)

from openai import OpenAI

# Initialize client with API key
client = OpenAI(api_key="sk-...")  # Or set OPENAI_API_KEY env var

# Basic completion
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Getting Started (JavaScript/TypeScript)

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Hello!" }
  ]
});

console.log(response.choices[0].message.content);

Function Calling: AI That Takes Action

Function calling transforms AI from answering questions to executing tasks. Define functions the AI can call, and it will determine when and how to use them based on user requests.

How Function Calling Works

1. Define Functions: Tell the API what functions are available and their parameters
2. User Request: User asks something that might need a function
3. AI Decides: Model determines if/which function to call with what arguments
4. You Execute: Your code actually runs the function
5. Return Results: Send function output back to the model for final response

Function Calling Example (Python)

import json
from openai import OpenAI

client = OpenAI()

# Define available functions
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_customer_info",
            "description": "Retrieve customer information from CRM",
            "parameters": {
                "type": "object",
                "properties": {
                    "customer_id": {
                        "type": "string",
                        "description": "The customer's unique identifier"
                    },
                    "fields": {
                        "type": "array",
                        "items": {"type": "string"},
                        "description": "Fields to retrieve: name, email, orders, balance"
                    }
                },
                "required": ["customer_id"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "create_support_ticket",
            "description": "Create a support ticket in the helpdesk system",
            "parameters": {
                "type": "object",
                "properties": {
                    "customer_id": {"type": "string"},
                    "subject": {"type": "string"},
                    "priority": {"type": "string", "enum": ["low", "medium", "high"]},
                    "description": {"type": "string"}
                },
                "required": ["customer_id", "subject", "description"]
            }
        }
    }
]

# Make API call with function definitions
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a customer service assistant. Use the available tools to help customers."},
        {"role": "user", "content": "Can you check the order history for customer ABC123 and create a ticket about their delayed shipment?"}
    ],
    tools=tools,
    tool_choice="auto"  # Let model decide when to use tools
)

# Handle function calls
message = response.choices[0].message

if message.tool_calls:
    for tool_call in message.tool_calls:
        function_name = tool_call.function.name
        arguments = json.loads(tool_call.function.arguments)

        # Execute the actual function
        if function_name == "get_customer_info":
            result = get_customer_info(**arguments)  # Your implementation
        elif function_name == "create_support_ticket":
            result = create_support_ticket(**arguments)  # Your implementation

        # Continue conversation with function result
        # ... (send result back to model)

Practical Function Calling Use Cases

Database Queries

Let AI generate and execute SQL based on natural language. "Show me all overdue invoices from Queensland" → structured query.

API Integrations

AI decides which APIs to call: CRM lookups, calendar scheduling, email sending based on user intent.

Multi-Step Workflows

Complex tasks broken into function calls: validate input → check inventory → create order → send confirmation.

Data Extraction

Force structured output by defining extraction functions. AI extracts and validates in single call.

Case Study: Melbourne SaaS Platform

B2B software company built AI assistant using function calling.

• Functions defined: 12 (user management, billing, analytics, etc.)
• User queries: Natural language requests from support chat
• Result: AI handles 60% of support queries end-to-end
• Key insight: Function descriptions are critical—more detail = better tool selection

📚 Want to learn more?

Embeddings: Semantic Search & RAG

Embeddings convert text into numerical vectors that capture semantic meaning. Two similar concepts have vectors that are "close" in vector space, enabling powerful search and retrieval applications.

When to Use Embeddings

Semantic Search: Find relevant documents even when exact keywords don't match
RAG Applications: Retrieve context for AI to answer questions about your data
Recommendation Systems: Find similar products, articles, or customers
Classification: Categorise content by comparing to category examples
Duplicate Detection: Find near-duplicate content or entities

Creating Embeddings (Python)

from openai import OpenAI

client = OpenAI()

# Single text embedding
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="How do I automate invoice processing with AI?"
)

embedding = response.data[0].embedding  # 1536-dimensional vector
print(f"Embedding dimension: {len(embedding)}")

# Batch embeddings for efficiency
texts = [
    "Invoice processing automation",
    "Customer support chatbot",
    "Sales pipeline analysis"
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

embeddings = [item.embedding for item in response.data]

Building a Simple RAG System

RAG Implementation Pattern

import numpy as np
from openai import OpenAI

client = OpenAI()

# 1. INDEXING: Embed and store your documents
def embed_documents(documents):
    """Embed a list of documents and return vectors."""
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=documents
    )
    return [item.embedding for item in response.data]

# 2. RETRIEVAL: Find relevant documents for a query
def find_relevant_docs(query, doc_embeddings, documents, top_k=3):
    """Find most relevant documents using cosine similarity."""
    # Embed the query
    query_response = client.embeddings.create(
        model="text-embedding-3-small",
        input=query
    )
    query_embedding = query_response.data[0].embedding

    # Calculate similarities
    similarities = []
    for i, doc_emb in enumerate(doc_embeddings):
        similarity = np.dot(query_embedding, doc_emb) / (
            np.linalg.norm(query_embedding) * np.linalg.norm(doc_emb)
        )
        similarities.append((similarity, documents[i]))

    # Return top-k most similar
    similarities.sort(reverse=True)
    return [doc for _, doc in similarities[:top_k]]

# 3. GENERATION: Answer using retrieved context
def answer_with_context(question, context_docs):
    """Generate answer using retrieved documents as context."""
    context = "\n\n".join(context_docs)

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": f"""Answer questions based on the following context.
If the context doesn't contain relevant information, say so.

Context:
{context}"""
            },
            {"role": "user", "content": question}
        ]
    )

    return response.choices[0].message.content

# Usage
documents = ["...your documents..."]
doc_embeddings = embed_documents(documents)

question = "How do I set up automated invoicing?"
relevant = find_relevant_docs(question, doc_embeddings, documents)
answer = answer_with_context(question, relevant)
print(answer)

Embedding Model Selection

Model	Dimensions	Cost/1M tokens	Use Case
text-embedding-3-small	1536	$0.02 USD	Most applications (recommended)
text-embedding-3-large	3072	$0.13 USD	Maximum accuracy needs

Production Tip: For production RAG systems, use a vector database (Pinecone, Qdrant, Supabase) instead of in-memory storage. They handle similarity search efficiently at scale and provide features like filtering and metadata storage.

Streaming: Real-Time User Experience

Streaming returns tokens as they're generated rather than waiting for the complete response. Essential for chat interfaces and any UX where perceived speed matters.

Why Streaming Matters

Perceived Speed: Users see response starting immediately vs waiting 5-10 seconds
Long Responses: Critical for detailed answers that take time to generate
Chat UX: Mimics natural conversation flow
Progress Indication: Users know the system is working

Streaming in Python

from openai import OpenAI

client = OpenAI()

# Stream response
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain AI in 200 words"}],
    stream=True
)

# Print tokens as they arrive
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)
print()  # Newline at end

Streaming in TypeScript (Next.js)

// app/api/chat/route.ts
import OpenAI from 'openai';

const openai = new OpenAI();

export async function POST(req: Request) {
  const { messages } = await req.json();

  const stream = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages,
    stream: true,
  });

  // Return streaming response
  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const text = chunk.choices[0]?.delta?.content || '';
        controller.enqueue(encoder.encode(text));
      }
      controller.close();
    },
  });

  return new Response(readable, {
    headers: { 'Content-Type': 'text/plain; charset=utf-8' },
  });
}

Streaming with Function Calls

When using function calling with streaming, you need to accumulate the function call arguments:

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[...],
    tools=tools,
    stream=True
)

tool_calls = []
current_tool_call = None

for chunk in stream:
    delta = chunk.choices[0].delta

    # Handle tool calls in stream
    if delta.tool_calls:
        for tc_chunk in delta.tool_calls:
            if tc_chunk.index is not None:
                if tc_chunk.id:  # New tool call
                    current_tool_call = {
                        "id": tc_chunk.id,
                        "type": "function",
                        "function": {"name": "", "arguments": ""}
                    }
                    tool_calls.append(current_tool_call)

                if tc_chunk.function:
                    if tc_chunk.function.name:
                        current_tool_call["function"]["name"] = tc_chunk.function.name
                    if tc_chunk.function.arguments:
                        current_tool_call["function"]["arguments"] += tc_chunk.function.arguments

    # Handle regular content
    elif delta.content:
        print(delta.content, end="", flush=True)

💡 Need expert help with this?

Cost Optimisation Strategies

API costs can escalate quickly at scale. These strategies help Australian businesses manage costs while maintaining quality.

Cost Reduction Techniques

1. Model Selection

Use GPT-4o-mini (1/16th cost) for most tasks. Reserve GPT-4o for complex reasoning. Run tests to confirm quality meets needs at lower tier.

2. Prompt Optimisation

Shorter prompts = lower cost. Remove redundant instructions. Use concise system prompts. Every token costs money.

3. Caching

Cache responses for identical/similar queries. Use embeddings to find cached answers for semantically similar questions.

4. Output Limiting

Set max_tokens to prevent runaway responses. Request concise answers in prompts. Output tokens cost more than input.

5. Batching

Process multiple items in single API call where possible. Reduces overhead and can improve throughput.

6. Pre-Processing

Filter/classify with cheaper methods first. Only send to GPT-4 when necessary. Use regex or rules for obvious cases.

Cost Monitoring Setup

Track Usage Per Request

# Track tokens and estimate costs
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[...],
)

usage = response.usage
print(f"Input tokens: {usage.prompt_tokens}")
print(f"Output tokens: {usage.completion_tokens}")
print(f"Total tokens: {usage.total_tokens}")

# Estimate cost (GPT-4o-mini pricing)
input_cost = usage.prompt_tokens * 0.00000015  # $0.15/1M
output_cost = usage.completion_tokens * 0.0000006  # $0.60/1M
total_cost = input_cost + output_cost

print(f"Estimated cost: ${total_cost:.6f} USD")

Cost Example: Australian Business Use Case

Customer Support Bot - Monthly Cost Analysis

Processing 5,000 support queries/month

GPT-4o (before optimisation):

• Avg 1,500 tokens/query (input + output)
• Monthly tokens: 7.5M
• Cost: ~$50-75 USD/month

GPT-4o-mini (after optimisation):

• Same 1,500 tokens/query
• Monthly tokens: 7.5M
• Cost: ~$3-5 USD/month
• Savings: 90%+

Budget Alerts: Set up usage limits and alerts in the OpenAI dashboard. This prevents surprise bills from bugs or unexpected usage spikes. Especially important during development when testing can burn through credits quickly.

Error Handling & Reliability

Production applications need robust error handling. The OpenAI API can fail for various reasons—your code must handle them gracefully.

Common Error Types

RateLimitError (429)

Too many requests. Implement exponential backoff and retry.

APIError (500)

OpenAI server error. Retry with backoff—usually temporary.

InvalidRequestError (400)

Bad request (context too long, invalid params). Fix the request, don't retry.

AuthenticationError (401)

Invalid API key. Check configuration, don't retry.

Timeout

Request took too long. Retry or reduce complexity.

Robust Error Handling Pattern

Python with Tenacity Retry

from openai import OpenAI, RateLimitError, APIError
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type

client = OpenAI()

@retry(
    retry=retry_if_exception_type((RateLimitError, APIError)),
    wait=wait_exponential(multiplier=1, min=4, max=60),
    stop=stop_after_attempt(5)
)
def call_openai_with_retry(messages, model="gpt-4o-mini"):
    """Make OpenAI API call with automatic retry on transient errors."""
    try:
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            timeout=30  # Set reasonable timeout
        )
        return response.choices[0].message.content

    except RateLimitError:
        # Log rate limit hit, will retry automatically
        print("Rate limited, retrying...")
        raise

    except APIError as e:
        if e.status_code >= 500:
            # Server error, retry
            print(f"Server error {e.status_code}, retrying...")
            raise
        else:
            # Client error, don't retry
            raise ValueError(f"API error: {e}")

    except Exception as e:
        # Unexpected error
        print(f"Unexpected error: {e}")
        raise

# Usage
try:
    result = call_openai_with_retry([
        {"role": "user", "content": "Hello!"}
    ])
    print(result)
except Exception as e:
    print(f"Failed after retries: {e}")
    # Fallback logic here

Context Length Management

Truncate to Fit Context Window

import tiktoken

def count_tokens(text, model="gpt-4o-mini"):
    """Count tokens in text for a given model."""
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

def truncate_to_fit(messages, max_tokens=100000, model="gpt-4o-mini"):
    """Truncate conversation to fit within token limit."""
    encoding = tiktoken.encoding_for_model(model)

    # Keep system message, truncate from oldest user/assistant
    total_tokens = 0
    result = []

    # Always include system message first
    for msg in messages:
        if msg["role"] == "system":
            result.append(msg)
            total_tokens += count_tokens(msg["content"], model)
            break

    # Add messages from most recent, stop when limit reached
    for msg in reversed(messages):
        if msg["role"] == "system":
            continue

        msg_tokens = count_tokens(msg["content"], model)
        if total_tokens + msg_tokens > max_tokens:
            break

        result.insert(1, msg)  # Insert after system message
        total_tokens += msg_tokens

    return result

Monitoring Tip: Log all API calls with timestamps, token counts, and latency. This helps identify issues quickly and provides data for cost optimisation. Consider tools like Helicone or LangSmith for production monitoring.

Australian Compliance & Data Considerations

Australian businesses must consider data protection when using OpenAI's API.

Privacy Act Considerations

OpenAI API Data Handling Checklist

✓ No Training on API Data: OpenAI doesn't train on API inputs by default (unlike ChatGPT free tier)
✓ Data Retention: API inputs retained for 30 days for abuse monitoring, then deleted
✓ Zero Retention Available: Enterprise customers can request zero data retention
✓ Cross-Border Transfer: Data processed in US—document for APP 8 compliance
✓ PII Handling: Establish clear policies on what personal information can be processed

Best Practices for Australian Businesses

Data Minimisation: Only send data necessary for the task. Strip identifying information when possible.
Sensitive Data Policy: Define what data categories are prohibited (health records, financial details, etc.)
Audit Logging: Log what data is sent to API for compliance reviews
Privacy Policy Update: Disclose AI processing to customers if relevant
Vendor Assessment: Document OpenAI's security practices for due diligence requirements

Azure OpenAI Alternative

For organisations requiring Australian data residency, Azure OpenAI Service offers:

Australian Data Centres: Deploy in Azure Australia East/Southeast
Enterprise Security: Private networking, managed identity, VNET integration
Same Models: GPT-4, embeddings, DALL-E available
Compliance: Inherits Azure's compliance certifications

Trade-off: Azure OpenAI requires Azure subscription and approval process. Models may lag slightly behind OpenAI direct. Choose based on whether Australian data residency is a hard requirement.

Conclusion

The OpenAI API is remarkably capable, but getting value from it requires more than basic chat completions. Function calling transforms AI from answering questions to taking actions. Embeddings enable semantic search and RAG applications that make AI knowledgeable about your specific data. Streaming creates responsive user experiences. And thoughtful cost management keeps projects economically viable.

For Australian businesses building custom AI applications, the API offers flexibility that pre-built tools can't match. But that flexibility comes with responsibility—for error handling, cost management, and compliance considerations that packaged solutions handle for you.

Start with GPT-4o-mini for most use cases—it handles the majority of tasks at a fraction of the cost. Implement proper error handling from day one. Monitor your costs closely, especially during development. And when you're ready for production, consider the full picture: rate limits, fallbacks, monitoring, and the data considerations specific to your Australian context.

Frequently Asked Questions

How much does the OpenAI API cost for Australian businesses?

Is the OpenAI API compliant with Australian privacy laws?

What's the difference between GPT-4o and GPT-4o-mini?

How do I prevent high API costs?

What is function calling and when should I use it?

How do embeddings work and what are they for?

Should I use OpenAI API directly or through Azure OpenAI?

How do I handle API errors in production?

Ready to Implement?

This guide provides the knowledge, but implementation requires expertise. Our team has done this 500+ times and can get you production-ready in weeks.

✓ FT Fast 500 APAC Winner✓ 500+ Implementations✓ Results in Weeks

OpenAI API Deep Dive: Building AI Applications in Australia