What is the best chunk size?

No universal answer - depends on content and use case. 256-512 tokens works for many applications. Shorter for precise answers, longer for context-dependent content. Test with your actual queries.

Why use chunk overlap?

Overlap ensures information at chunk boundaries isn't lost. If a key answer spans two chunks, overlap helps both chunks contain relevant context. 10-20% overlap is common.

Should I use semantic chunking?

Semantic chunking (AI-determined breaks) can improve quality but adds cost and complexity. Start with simpler strategies; use semantic if retrieval quality is insufficient.

How does chunking affect costs?

More chunks = more embeddings to generate and store. Smaller chunks may mean retrieving more chunks to get sufficient context. Balance granularity against embedding and storage costs.

Chunking Definition | AI Glossary Australia

In-Depth Explanation

Chunking is the process of dividing large documents into smaller segments for processing by AI systems. It's a critical step in RAG pipelines where chunk quality directly impacts retrieval and answer quality.

Why chunking matters:

Models have context window limits
Smaller chunks enable precise retrieval
Embedding quality degrades for very long texts
Allows returning most relevant portions

Chunking strategies:

Fixed size: Split every N characters/tokens
Sentence-based: Split at sentence boundaries
Paragraph-based: Maintain paragraph structure
Semantic: Use AI to find natural breaks
Recursive: Hierarchical splitting with overlap

Key parameters:

Chunk size: How large each piece is (typically 200-1000 tokens)
Chunk overlap: How much consecutive chunks share (typically 10-20%)
Separators: What constitutes a break point

Common mistakes:

Chunks too small: lose context
Chunks too large: dilute relevance
No overlap: miss information at boundaries
Ignoring structure: break mid-sentence/thought

Business Context

Proper chunking strategy can make or break RAG performance. Chunks that are too small lose context; too large wastes tokens and reduces relevance.

How Clever Ops Uses This

We extensively tune chunking strategies for Australian business RAG systems. The right approach depends on content type, query patterns, and retrieval requirements.

Example Use Case

"Splitting a 100-page manual into 500-word chunks with 50-word overlaps for better retrieval in a support chatbot."

Frequently Asked Questions

Learn More

What is RAG (Retrieval Augmented Generation)?

Learn how RAG combines the power of large language models with your business data to provide accurate, contextual AI responses. Complete guide to understanding and implementing RAG systems.

Read article

Chunking