Digitorn
Digitorn
All terms
Models & inference

Inference

The act of running a trained model to produce output. The thing you pay for per token.

also known as: LLM inference
In depth

Training is what creates the model, inference is what uses it. When you call a provider's API or run Ollama on your laptop, that's inference. Latency, throughput, and cost all live at the inference layer. Custom silicon (Groq's LPU, Cerebras's wafer) optimises inference specifically, which is why those providers post the lowest latency numbers.

Related concepts
Newsletter

Get the next post in your inbox.

Engineering notes from the Digitorn team. No marketing, no launch announcements, no "10 prompts that will change your life". Just the things we write that we'd want to read.

One-click unsubscribe. We never share your address. Powered by our own infrastructure, not a tracker.

More in Models & inference

Context window/glossary/context-windowFrontier model/glossary/frontier-modelLLM/glossary/llmOpen-weight model/glossary/open-weight-modelStreaming/glossary/streamingTemperature/glossary/temperature