Models & inference

Tokens

The units LLMs process. Roughly four characters of English per token, billed per million.

also known as: LLM tokens

In depth

Tokens are the smallest unit an LLM sees. A word like 'apple' is one token, a long word like 'sesquicentennial' might be three. English averages around four characters per token, code is denser. You pay per million tokens, separately for input (what the model reads) and output (what it generates). Cost optimisation in agent work is largely about not putting useless tokens into context: summarise long tool output, truncate big files, route exploration to cheap models.

Related concepts

Context windowThe maximum number of tokens an LLM can process in a single call. Modern frontier models offer 200K to 2M.Model routingSending different turn types to different models (cheap for exploration, premium for writing).

Newsletter

Get the next post in your inbox.

Engineering notes from the Digitorn team. No marketing, no launch announcements, no "10 prompts that will change your life". Just the things we write that we'd want to read.

More in Models & inference

Context window/glossary/context-window Frontier model/glossary/frontier-model Inference/glossary/inference LLM/glossary/llm Open-weight model/glossary/open-weight-model Streaming/glossary/streaming