T

Tokens

The basic units of text that LLMs process. Roughly 1 token = 4 characters or 0.75 words in English. Both input and output are measured in tokens.

In-Depth Explanation

Tokens are the fundamental units that LLMs process. Rather than working with characters or words directly, models break text into tokens - subword units that balance vocabulary size with expressiveness.

How tokenisation works:

  • Text is split into tokens using a learned vocabulary
  • Common words often become single tokens
  • Rare words split into multiple tokens
  • Punctuation and spaces are also tokens
  • Different models use different tokenisers

Token rules of thumb (English):

  • 1 token ≈ 4 characters
  • 1 token ≈ 0.75 words
  • 100 tokens ≈ 75 words
  • 1000 words ≈ 1300 tokens

Why tokens matter:

  • Pricing: API costs are per-token
  • Limits: Context windows measured in tokens
  • Computation: Processing time scales with tokens
  • Output control: max_tokens limits generation length

Tokenisation quirks:

  • "Hello" = 1 token, "Hello!" = 2 tokens
  • Numbers can be surprising (384 might be 2 tokens)
  • Non-English text often uses more tokens
  • Code tokenises differently than prose

Business Context

Token usage directly determines API costs. A typical 1000-word document is about 1300 tokens. Monitor token usage to control expenses.

How Clever Ops Uses This

We help Australian businesses understand and optimise token usage. Efficient prompting and smart caching can reduce AI costs by 50% or more while maintaining quality.

Example Use Case

"The word "hamburger" is 3 tokens: "ham", "bur", "ger". The word "the" is 1 token. Understanding this helps predict costs."

Frequently Asked Questions

Category

ai ml

Need Expert Help?

Understanding is the first step. Let our experts help you implement AI solutions for your business.

Ready to Implement AI?

Understanding the terminology is just the first step. Our experts can help you implement AI solutions tailored to your business needs.

FT Fast 500 APAC Winner|500+ Implementations|Harvard-Educated Team