Complete guide to setting up Pinecone for vector search and AI applications. Learn indexing strategies, query optimisation, and production deployment for Australian enterprises.
Pinecone has emerged as the leading managed vector database for AI applications, providing the infrastructure needed to build semantic search, recommendation systems, and retrieval-augmented generation (RAG) at scale. For Australian businesses implementing AI, understanding Pinecone is essential for building applications that understand context and meaning.
This comprehensive guide walks through Pinecone setup from initial configuration to production deployment, with practical examples tailored for Australian business requirements. Whether you're building your first RAG system or scaling existing AI applications, mastering Pinecone enables the semantic search capabilities that modern AI demands.
Vector databases fundamentally change how we search and retrieve information. Unlike traditional databases that match keywords, vector databases find semantically similar content—understanding that "automobile" and "car" are related even without explicit keywords.
Text, images, or data
Convert to vectors
Store in Pinecone
Find similar vectors
| Feature | Pinecone | Self-Hosted Alternatives |
|---|---|---|
| Setup Complexity | Minutes (managed) | Days to weeks |
| Scaling | Automatic | Manual configuration |
| Maintenance | Zero (fully managed) | Ongoing ops burden |
| Performance | Optimised clusters | Depends on setup |
| Reliability | 99.9% SLA | Self-managed |
Setting up Pinecone involves creating an account, configuring your first index, and understanding the key concepts that govern performance and cost.
# Install Pinecone client
pip install pinecone-client
# For latest features
pip install pinecone-client[grpc]
from pinecone import Pinecone
# Initialize client
pc = Pinecone(api_key="your-api-key")
# List existing indexes
print(pc.list_indexes())
A collection of vectors with the same dimensionality. Similar to a database table. Each project can have multiple indexes.
Logical partition within an index. Use for multi-tenancy or data separation without additional indexes.
A numerical representation (embedding) of your data. Includes an ID, values array, and optional metadata.
Key-value pairs attached to vectors for filtering. Essential for hybrid search combining semantic and attribute filtering.
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="your-api-key")
# Create a serverless index (recommended for most use cases)
pc.create_index(
name="australian-business-docs",
dimension=1536, # OpenAI text-embedding-3-small dimension
metric="cosine",
spec=ServerlessSpec(
cloud="aws",
region="us-east-1" # Choose closest available region
)
)
# Wait for index to be ready
import time
while not pc.describe_index("australian-business-docs").status['ready']:
time.sleep(1)
# Connect to the index
index = pc.Index("australian-business-docs")
Pinecone's serverless offering currently operates from US and EU regions. For Australian businesses, us-west-2 typically provides the lowest latency. If data sovereignty is critical, consider Pinecone's dedicated deployment options or consult with their team about upcoming APAC regions.
Before storing data in Pinecone, you need to convert it to vector embeddings. The quality of your embeddings directly impacts search relevance.
| Model | Dimensions | Performance | Cost |
|---|---|---|---|
| text-embedding-3-small | 1536 | Good | $0.02/1M tokens |
| text-embedding-3-large | 3072 | Excellent | $0.13/1M tokens |
| Cohere embed-english-v3 | 1024 | Very Good | $0.10/1M tokens |
| Voyage AI voyage-2 | 1024 | Very Good | $0.10/1M tokens |
from openai import OpenAI
client = OpenAI(api_key="your-openai-key")
def get_embedding(text: str, model: str = "text-embedding-3-small") -> list[float]:
"""Generate embedding for text."""
response = client.embeddings.create(
input=text,
model=model
)
return response.data[0].embedding
# Example: Embed a business document
document = """
Clever Ops provides AI automation solutions for Australian businesses.
We specialise in workflow automation, custom AI development, and
enterprise integrations across Sydney, Melbourne, and Brisbane.
"""
embedding = get_embedding(document)
print(f"Embedding dimension: {len(embedding)}") # 1536
from pinecone import Pinecone
from openai import OpenAI
import uuid
pc = Pinecone(api_key="pinecone-api-key")
openai_client = OpenAI(api_key="openai-api-key")
index = pc.Index("australian-business-docs")
def prepare_documents(documents: list[dict]) -> list[dict]:
"""Prepare documents for Pinecone ingestion."""
vectors = []
# Batch embeddings for efficiency
texts = [doc["content"] for doc in documents]
# OpenAI supports up to 2048 inputs per batch
batch_size = 100
all_embeddings = []
for i in range(0, len(texts), batch_size):
batch = texts[i:i + batch_size]
response = openai_client.embeddings.create(
input=batch,
model="text-embedding-3-small"
)
all_embeddings.extend([e.embedding for e in response.data])
# Prepare vectors with metadata
for doc, embedding in zip(documents, all_embeddings):
vectors.append({
"id": doc.get("id", str(uuid.uuid4())),
"values": embedding,
"metadata": {
"title": doc.get("title", ""),
"category": doc.get("category", ""),
"source": doc.get("source", ""),
"date": doc.get("date", ""),
"text": doc["content"][:1000] # Store truncated text
}
})
return vectors
def upsert_documents(documents: list[dict], namespace: str = ""):
"""Upload documents to Pinecone."""
vectors = prepare_documents(documents)
# Upsert in batches of 100
batch_size = 100
for i in range(0, len(vectors), batch_size):
batch = vectors[i:i + batch_size]
index.upsert(vectors=batch, namespace=namespace)
print(f"Upserted {len(vectors)} vectors to namespace '{namespace}'")
Long documents need to be split into chunks before embedding:
from langchain.text_splitter import RecursiveCharacterTextSplitter
def chunk_document(text: str, chunk_size: int = 500, overlap: int = 50) -> list[str]:
"""Split document into overlapping chunks."""
splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=overlap,
separators=["
", "
", ". ", " ", ""]
)
return splitter.split_text(text)
# Optimal chunk sizes
# - FAQ/Support: 200-500 characters (precise answers)
# - Documentation: 500-1000 characters (balanced)
# - Long-form content: 1000-2000 characters (more context)Effective querying is crucial for building responsive AI applications. Pinecone offers multiple query strategies to optimise for different use cases.
# Query with a text embedding
query_text = "How do I implement GST calculations for my business?"
query_embedding = get_embedding(query_text)
results = index.query(
vector=query_embedding,
top_k=5,
include_metadata=True
)
for match in results.matches:
print(f"Score: {match.score:.4f}")
print(f"Title: {match.metadata.get('title')}")
print(f"Text: {match.metadata.get('text')[:200]}...")
print("---")
Combine semantic search with metadata filtering:
# Filter by category and date
results = index.query(
vector=query_embedding,
top_k=10,
include_metadata=True,
filter={
"category": {"$eq": "compliance"},
"date": {"$gte": "2024-01-01"}
}
)
# Complex filters with AND/OR
results = index.query(
vector=query_embedding,
top_k=10,
filter={
"$and": [
{"category": {"$in": ["compliance", "legal"]}},
{"$or": [
{"region": {"$eq": "NSW"}},
{"region": {"$eq": "VIC"}}
]}
]
}
)
# Query specific namespace (e.g., per-tenant data)
results = index.query(
vector=query_embedding,
top_k=5,
namespace="tenant_acme_corp"
)
# Query across all namespaces not directly supported
# Use separate queries and merge results if needed
Only request as many results as you need. top_k=5 is faster than top_k=100.
Narrow search space with filters before semantic matching.
Partition data logically to reduce search scope.
Use async queries for multiple simultaneous searches.
import asyncio
from pinecone import Pinecone
async def batch_query(queries: list[str], index) -> list[dict]:
"""Execute multiple queries concurrently."""
async def single_query(query_text: str):
embedding = get_embedding(query_text)
return index.query(
vector=embedding,
top_k=5,
include_metadata=True
)
tasks = [single_query(q) for q in queries]
results = await asyncio.gather(*tasks)
return results
# Run batch queries
queries = [
"Australian tax compliance",
"Privacy Act requirements",
"APRA regulations"
]
results = asyncio.run(batch_query(queries, index))Let's build a production-ready RAG system using Pinecone that can answer questions about Australian business regulations.
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ User │───▶│ Query │───▶│ Pinecone │
│ Query │ │ Embedding │ │ Search │
└─────────────┘ └──────────────┘ └──────────────┘
│
▼
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Response │◀───│ GPT-4 │◀───│ Context │
│ to User │ │ Generation │ │ Assembly │
└─────────────┘ └──────────────┘ └─────────────┘
from pinecone import Pinecone
from openai import OpenAI
from typing import Optional
class AustralianBusinessRAG:
"""RAG system for Australian business knowledge."""
def __init__(
self,
pinecone_api_key: str,
openai_api_key: str,
index_name: str
):
self.pc = Pinecone(api_key=pinecone_api_key)
self.openai = OpenAI(api_key=openai_api_key)
self.index = self.pc.Index(index_name)
def get_embedding(self, text: str) -> list[float]:
"""Generate embedding for query."""
response = self.openai.embeddings.create(
input=text,
model="text-embedding-3-small"
)
return response.data[0].embedding
def retrieve(
self,
query: str,
top_k: int = 5,
filter_dict: Optional[dict] = None
) -> list[dict]:
"""Retrieve relevant documents."""
query_embedding = self.get_embedding(query)
results = self.index.query(
vector=query_embedding,
top_k=top_k,
include_metadata=True,
filter=filter_dict
)
return [
{
"text": match.metadata.get("text", ""),
"title": match.metadata.get("title", ""),
"source": match.metadata.get("source", ""),
"score": match.score
}
for match in results.matches
]
def generate_response(
self,
query: str,
context_docs: list[dict]
) -> str:
"""Generate response using retrieved context."""
# Format context
context = "
".join([
f"Source: {doc['title']}
{doc['text']}"
for doc in context_docs
])
# Create prompt
system_prompt = """You are an expert on Australian business regulations and compliance.
Answer questions based on the provided context. Use Australian English spelling.
If the context doesn't contain relevant information, say so clearly.
Always cite your sources."""
user_prompt = f"""Context:
{context}
Question: {query}
Answer:"""
response = self.openai.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
],
temperature=0.3
)
return response.choices[0].message.content
def ask(
self,
query: str,
top_k: int = 5,
filter_dict: Optional[dict] = None
) -> dict:
"""Complete RAG pipeline."""
# Retrieve relevant documents
docs = self.retrieve(query, top_k, filter_dict)
# Generate response
response = self.generate_response(query, docs)
return {
"query": query,
"response": response,
"sources": [
{"title": d["title"], "source": d["source"]}
for d in docs
]
}
# Usage
rag = AustralianBusinessRAG(
pinecone_api_key="your-key",
openai_api_key="your-key",
index_name="australian-business-docs"
)
result = rag.ask(
"What are the record-keeping requirements under the Privacy Act?",
filter_dict={"category": "privacy"}
)
print(result["response"])
print("
Sources:", result["sources"])Moving to production requires attention to reliability, monitoring, and cost management.
from pinecone.exceptions import PineconeException
import time
def robust_query(index, embedding, retries=3):
"""Query with retry logic."""
for attempt in range(retries):
try:
return index.query(
vector=embedding,
top_k=5,
include_metadata=True
)
except PineconeException as e:
if attempt < retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
continue
raise e
def robust_upsert(index, vectors, batch_size=100, retries=3):
"""Upsert with batching and retry logic."""
for i in range(0, len(vectors), batch_size):
batch = vectors[i:i + batch_size]
for attempt in range(retries):
try:
index.upsert(vectors=batch)
break
except PineconeException as e:
if attempt < retries - 1:
time.sleep(2 ** attempt)
continue
raise e
import logging
import time
from functools import wraps
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def monitor_query(func):
"""Decorator to monitor query performance."""
@wraps(func)
def wrapper(*args, **kwargs):
start = time.time()
try:
result = func(*args, **kwargs)
duration = time.time() - start
logger.info(f"Query completed in {duration:.3f}s")
return result
except Exception as e:
logger.error(f"Query failed: {e}")
raise
return wrapper
# Check index stats
def get_index_stats(index):
"""Get detailed index statistics."""
stats = index.describe_index_stats()
logger.info(f"Total vectors: {stats.total_vector_count}")
logger.info(f"Namespaces: {list(stats.namespaces.keys())}")
return stats
| Strategy | Impact | Implementation |
|---|---|---|
| Use Serverless | Pay per use vs reserved | Best for variable workloads |
| Optimise Dimensions | Lower storage costs | Use smaller embedding models when appropriate |
| Implement Caching | Reduce query volume | Cache common queries with Redis |
| Batch Operations | Fewer API calls | Batch upserts and queries |
| Clean Old Data | Reduce storage | Delete outdated vectors regularly |
# Delete vectors by ID
index.delete(ids=["doc_1", "doc_2", "doc_3"])
# Delete by metadata filter
index.delete(
filter={"date": {"$lt": "2023-01-01"}}
)
# Delete entire namespace
index.delete(delete_all=True, namespace="old_tenant")
# Update metadata without re-embedding
index.update(
id="doc_1",
set_metadata={"status": "archived", "reviewed": True}
)A Brisbane-based professional services firm implemented Pinecone to power their internal knowledge management system, demonstrating practical vector database deployment for Australian enterprises.
The firm had accumulated thousands of documents including client proposals, project reports, compliance guidelines, and best practices. Staff spent hours searching for relevant information, often recreating existing work.
# Namespace structure
namespaces = {
"general": "Company-wide knowledge",
"client_acme": "ACME Corp project docs",
"client_bigbank": "BigBank engagement",
"compliance": "Regulatory documents",
"templates": "Proposal and report templates"
}
# Access control via namespace selection
def get_accessible_namespaces(user_role: str, user_clients: list) -> list:
namespaces = ["general", "compliance", "templates"]
if user_role in ["partner", "senior_manager"]:
namespaces.extend([f"client_{c}" for c in user_clients])
return namespaces
Pinecone provides the vector database infrastructure essential for building modern AI applications that understand context and meaning. For Australian businesses, it enables everything from intelligent document search to sophisticated RAG systems that can answer questions grounded in your organisation's knowledge.
Success with Pinecone requires attention to embedding quality, thoughtful index design, and query optimisation. Start with a clear use case, begin with a subset of your data, and iterate based on real-world performance. The managed nature of Pinecone lets you focus on building great AI applications rather than managing infrastructure.
As your AI applications grow, Pinecone scales with you—from prototype to production handling millions of vectors. Combined with quality embeddings and well-designed retrieval strategies, it forms the foundation for AI systems that truly understand your business context.
Master LangChain for building sophisticated AI applications. Complete guide to chains, agents, memory, and retrieval systems for Australian developers.
Master Qdrant for building powerful vector search applications. Complete guide to self-hosted and cloud deployment, filtering, and production optimisation for Australian developers.
Master the OpenAI API for production applications. From GPT-4 to embeddings, learn how Australian businesses build custom AI solutions with practical code examples and cost optimisation strategies.