The basic units of text that LLMs process. Roughly 1 token = 4 characters or 0.75 words in English. Both input and output are measured in tokens.
Tokens are the fundamental units that LLMs process. Rather than working with characters or words directly, models break text into tokens - subword units that balance vocabulary size with expressiveness.
How tokenisation works:
Token rules of thumb (English):
Why tokens matter:
Tokenisation quirks:
Token usage directly determines API costs. A typical 1000-word document is about 1300 tokens. Monitor token usage to control expenses.
We help Australian businesses understand and optimise token usage. Efficient prompting and smart caching can reduce AI costs by 50% or more while maintaining quality.
"The word "hamburger" is 3 tokens: "ham", "bur", "ger". The word "the" is 1 token. Understanding this helps predict costs."