The maximum amount of text (measured in tokens) that an LLM can process in a single request. This includes both the input prompt and the generated output.
The context window is one of the most important specifications of a language model, defining the maximum amount of information it can consider when generating a response. Think of it as the model's working memory.
Context window sizes have grown dramatically:
Within the context window, everything competes for space:
Managing context effectively is crucial because:
Larger context windows (128K-1M tokens) enable processing entire documents but cost more. Choose models based on your typical document sizes and budget.
We help clients choose appropriate context window sizes for their use cases. Often, effective chunking and retrieval strategies outperform simply using larger contexts.
"GPT-4 Turbo has a 128K token context window, roughly equivalent to 300 pages of text, enabling analysis of entire contracts or reports."