Master the Google Gemini API for production AI applications. Multi-modal capabilities, long context windows, and Google Cloud integration. Complete guide for Australian developers building with Google's flagship AI models.
Google's Gemini represents a significant leap in AI capabilities, offering natively multi-modal understanding, massive context windows, and deep integration with Google's cloud infrastructure. For Australian businesses already using Google Workspace or Google Cloud, Gemini provides a natural path to adding sophisticated AI capabilities.
This guide covers Gemini API implementation from basic prompting to advanced multi-modal applications, with practical examples and Australian business context. Whether you're building document analysis tools, conversational AI, or complex reasoning systems, understanding Gemini's unique capabilities will help you choose and implement the right AI solution.
Gemini is Google's most capable AI model family, designed from the ground up for multi-modal understanding. Unlike models that bolt on image capabilities, Gemini natively processes text, images, audio, and video in a unified architecture.
| Model | Context Window | Best For | Pricing (approx USD) |
|---|---|---|---|
| Gemini 1.5 Flash | 1M tokens | High-volume, cost-sensitive tasks | $0.075/1M input, $0.30/1M output |
| Gemini 1.5 Pro | 2M tokens | Complex reasoning, long documents | $1.25/1M input, $5.00/1M output |
| Gemini 1.0 Pro | 32K tokens | General purpose, legacy support | $0.50/1M input, $1.50/1M output |
| Gemini 1.0 Ultra | 32K tokens | Most complex tasks | Contact Google |
Pricing shown is approximate USD. When using Vertex AI in the Sydney region, you pay in AUD with standard Google Cloud billing. Monitor costs carefully during development as multi-modal processing can consume tokens quickly.
Google offers two primary ways to access Gemini: Google AI Studio for development/prototyping and Vertex AI for production enterprise workloads.
# Install the SDK
# pip install google-generativeai
import google.generativeai as genai
# Configure with your API key
genai.configure(api_key='YOUR_API_KEY')
# Create a model instance
model = genai.GenerativeModel('gemini-1.5-pro')
# Generate content
response = model.generate_content(
"Explain the GST implications for Australian e-commerce businesses selling internationally."
)
print(response.text)
# Install Vertex AI SDK
# pip install google-cloud-aiplatform
import vertexai
from vertexai.generative_models import GenerativeModel
# Initialize Vertex AI with Sydney region
vertexai.init(
project="your-project-id",
location="australia-southeast1" # Sydney region
)
# Create model instance
model = GenerativeModel("gemini-1.5-pro")
# Generate content
response = model.generate_content(
"Analyse the compliance requirements for Australian financial services automation.",
generation_config={
"temperature": 0.2,
"max_output_tokens": 2048,
}
)
print(response.text)
Gemini's native multi-modal architecture enables sophisticated understanding of text, images, audio, and video within a single prompt. This opens powerful use cases for document processing, content analysis, and more.
import google.generativeai as genai
from PIL import Image
# Configure API
genai.configure(api_key='YOUR_API_KEY')
model = genai.GenerativeModel('gemini-1.5-pro')
# Load an image (invoice, receipt, document)
image = Image.open('australian_invoice.png')
# Analyse with context
response = model.generate_content([
"Extract the following from this Australian tax invoice: " +
"1. Supplier name and ABN " +
"2. Invoice number and date " +
"3. Line items with GST breakdown " +
"4. Total amount including GST. " +
"Format as JSON.",
image
])
print(response.text)
# Upload video file
video_file = genai.upload_file(path="meeting_recording.mp4")
# Wait for processing
import time
while video_file.state.name == "PROCESSING":
time.sleep(10)
video_file = genai.get_file(video_file.name)
# Analyse video content
model = genai.GenerativeModel('gemini-1.5-pro')
response = model.generate_content([
"Analyse this meeting recording and provide: " +
"1. Meeting summary (200 words) " +
"2. Key decisions made " +
"3. Action items with owners " +
"4. Topics requiring follow-up",
video_file
])
print(response.text)
Gemini's 1M-2M token context windows enable applications previously impossible with smaller context models. This is transformative for document-heavy Australian industries like legal, accounting, and government.
Context Window Capabilities:
GPT-4 (128K tokens):
- ~300 pages of text
- Good for single long documents
Claude (200K tokens):
- ~500 pages of text
- Extended document analysis
Gemini 1.5 Pro (2M tokens):
- ~5,000 pages of text
- Entire codebases
- Multi-document analysis
- Long video transcripts
import google.generativeai as genai
genai.configure(api_key='YOUR_API_KEY')
model = genai.GenerativeModel('gemini-1.5-pro')
# Load multiple documents
documents = []
for doc_path in ['contract_v1.pdf', 'contract_v2.pdf', 'amendments.pdf']:
with open(doc_path, 'rb') as f:
documents.append(f.read())
# Comprehensive analysis
response = model.generate_content([
"""Analyse these three contract documents:
1. Summarise the key terms of each document
2. Identify all changes between v1 and v2
3. List amendments and their impact
4. Flag any conflicting terms
5. Identify any clauses that may not comply with Australian Consumer Law
6. Provide risk assessment for each major clause
Format your response with clear headings.""",
*documents
])
print(response.text)
# Upload an entire codebase for analysis
import os
def gather_codebase(directory, extensions=['.py', '.ts', '.js']):
code_content = []
for root, dirs, files in os.walk(directory):
for file in files:
if any(file.endswith(ext) for ext in extensions):
filepath = os.path.join(root, file)
with open(filepath, 'r') as f:
code_content.append(f"// File: {filepath}\n{f.read()}")
return "\n\n".join(code_content)
codebase = gather_codebase('./src')
response = model.generate_content([
f"""Analyse this codebase and provide:
1. Architecture overview
2. Key design patterns used
3. Potential security vulnerabilities
4. Performance optimisation opportunities
5. Test coverage gaps
Codebase:
{codebase}"""
])
print(response.text)
Gemini supports function calling (tool use), enabling the model to interact with external systems, databases, and APIs in a structured way.
import google.generativeai as genai
# Define tools the model can use
tools = [
{
"function_declarations": [
{
"name": "search_australian_business",
"description": "Search for Australian business information by ABN or name",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Business name or ABN to search"
},
"state": {
"type": "string",
"enum": ["NSW", "VIC", "QLD", "WA", "SA", "TAS", "NT", "ACT"],
"description": "Australian state to filter by"
}
},
"required": ["query"]
}
},
{
"name": "calculate_gst",
"description": "Calculate GST for Australian transactions",
"parameters": {
"type": "object",
"properties": {
"amount": {
"type": "number",
"description": "The dollar amount"
},
"inclusive": {
"type": "boolean",
"description": "Whether amount includes GST"
}
},
"required": ["amount"]
}
}
]
}
]
model = genai.GenerativeModel(
'gemini-1.5-pro',
tools=tools
)
def handle_function_call(function_call):
"""Handle function calls from Gemini"""
name = function_call.name
args = function_call.args
if name == "search_australian_business":
# Call ABR API
return search_abr(args.get("query"), args.get("state"))
elif name == "calculate_gst":
amount = args["amount"]
inclusive = args.get("inclusive", True)
if inclusive:
gst = amount / 11
net = amount - gst
else:
gst = amount * 0.1
net = amount
return {"gst": gst, "net": net, "total": net + gst}
# Chat with function calling
chat = model.start_chat()
response = chat.send_message(
"Look up the business details for Clever Ops in Victoria and calculate GST on a $1,100 invoice"
)
# Check for function calls
for part in response.parts:
if hasattr(part, 'function_call'):
result = handle_function_call(part.function_call)
# Send function result back
response = chat.send_message(
genai.protos.Content(
parts=[genai.protos.Part(
function_response=genai.protos.FunctionResponse(
name=part.function_call.name,
response={"result": result}
)
)]
)
)
For production Australian workloads, Vertex AI provides enterprise features, Sydney region deployment, and robust MLOps capabilities.
import vertexai
from vertexai.generative_models import GenerativeModel, GenerationConfig
from google.cloud import aiplatform
# Initialize with Sydney region
vertexai.init(
project="your-gcp-project",
location="australia-southeast1"
)
# Configure generation parameters
generation_config = GenerationConfig(
temperature=0.2,
top_p=0.8,
top_k=40,
max_output_tokens=2048,
candidate_count=1,
)
# Safety settings for enterprise use
safety_settings = {
"HARM_CATEGORY_HARASSMENT": "BLOCK_MEDIUM_AND_ABOVE",
"HARM_CATEGORY_HATE_SPEECH": "BLOCK_MEDIUM_AND_ABOVE",
"HARM_CATEGORY_SEXUALLY_EXPLICIT": "BLOCK_MEDIUM_AND_ABOVE",
"HARM_CATEGORY_DANGEROUS_CONTENT": "BLOCK_MEDIUM_AND_ABOVE",
}
model = GenerativeModel(
"gemini-1.5-pro",
generation_config=generation_config,
safety_settings=safety_settings
)
from google.api_core import retry
from google.api_core.exceptions import ResourceExhausted, ServiceUnavailable
import logging
logger = logging.getLogger(__name__)
# Configure retry policy
retry_policy = retry.Retry(
initial=1.0,
maximum=60.0,
multiplier=2.0,
predicate=retry.if_exception_type(
ResourceExhausted,
ServiceUnavailable,
),
deadline=300.0
)
async def generate_with_retry(model, prompt):
"""Generate content with retry logic"""
try:
response = await model.generate_content_async(
prompt,
retry=retry_policy
)
return response.text
except ResourceExhausted as e:
logger.warning(f"Rate limited, implementing backoff: {e}")
raise
except Exception as e:
logger.error(f"Generation failed: {e}")
raise
from google.cloud import billing_v1
from google.cloud import monitoring_v3
def track_gemini_costs(project_id):
"""Set up cost monitoring for Gemini API usage"""
# Create budget alert
budget_client = billing_v1.BudgetServiceClient()
budget = {
"display_name": "Gemini API Budget",
"amount": {
"specified_amount": {
"currency_code": "AUD",
"units": 1000 # $1000 AUD budget
}
},
"threshold_rules": [
{"threshold_percent": 0.5, "spend_basis": "CURRENT_SPEND"},
{"threshold_percent": 0.8, "spend_basis": "CURRENT_SPEND"},
{"threshold_percent": 1.0, "spend_basis": "CURRENT_SPEND"},
],
"all_updates_rule": {
"pubsub_topic": f"projects/{project_id}/topics/budget-alerts",
"schema_version": "1.0"
}
}
return budget
Understanding Gemini's strengths relative to alternatives helps Australian businesses make informed API choices.
| Capability | Gemini | GPT-4 | Claude |
|---|---|---|---|
| Context Window | 2M tokens | 128K tokens | 200K tokens |
| Multi-Modal | Native (text, image, audio, video) | Text, images, audio | Text, images |
| Australian Region | Sydney via Vertex AI | US/EU only | US only |
| Google Integration | Native | Via APIs | Via APIs |
| Reasoning Quality | Strong | Excellent | Excellent |
| Cost Efficiency | Very competitive (Flash) | Moderate | Moderate |
Google Gemini brings unique capabilities to the AI API landscape, particularly its massive context windows, native multi-modal architecture, and Sydney region availability via Vertex AI. For Australian businesses in the Google ecosystem, it provides a compelling option with strong data residency options.
The choice between Gemini, GPT-4, and Claude depends on your specific requirements. Gemini excels for long document analysis, multi-modal processing, and Google Cloud integration. Its Flash variant offers exceptional value for high-volume applications where cost matters.
Start with Google AI Studio for development, then move to Vertex AI for production workloads requiring enterprise security and Australian data residency. With proper implementation patterns, Gemini enables sophisticated AI applications that meet both capability and compliance requirements.
Master the OpenAI API for production applications. From GPT-4 to embeddings, learn how Australian businesses build custom AI solutions with practical code examples and cost optimisation strategies.
Master the Claude API for sophisticated AI applications. Extended context windows, tool use, vision capabilities, and production patterns for Australian businesses building with Anthropic's models.
Master LangChain for building sophisticated AI applications. Complete guide to chains, agents, memory, and retrieval systems for Australian developers.