Models & inference

Context window

The maximum number of tokens an LLM can process in a single call. Modern frontier models offer 200K to 2M.

also known as: context length

In depth

The context window caps how much input plus output fits in one call. Claude Sonnet ships 200K, GPT-4o is 128K, Gemini Pro pushes to 2M. Long sessions overflow eventually, which is why compaction exists. Bigger windows are not always better: cost scales with input tokens, and models often degrade in quality at the far edge of their window.

Related concepts

TokensThe units LLMs process. Roughly four characters of English per token, billed per million.CompactionA runtime process that summarises old turns once the context window is nearly full, freeing space for new turns.Frontier modelThe highest-quality, most expensive tier from a provider. Claude Sonnet, GPT-4o, Gemini Pro.

Newsletter

Get the next post in your inbox.

Engineering notes from the Digitorn team. No marketing, no launch announcements, no "10 prompts that will change your life". Just the things we write that we'd want to read.

More in Models & inference

Frontier model/glossary/frontier-model Inference/glossary/inference LLM/glossary/llm Open-weight model/glossary/open-weight-model Streaming/glossary/streaming Temperature/glossary/temperature