In depth
An LLM cannot retrieve a 100-page PDF as one unit, the embedding would lose the granular signal. Chunking splits documents into smaller pieces, each embedded separately, so retrieval can find the specific paragraph that answers the question. Chunk size is a tuning knob: small chunks are precise but lose context, large chunks keep context but blur the signal. 200-500 tokens with a 50-token overlap is a common starting point.
Related concepts
RAGA pattern where relevant documents are fetched from a knowledge base and pasted into context before the LLM answers.EmbeddingA fixed-size numeric vector that represents the semantic meaning of a piece of text.Vector databaseA database optimised for storing and querying high-dimensional vectors, typically for similarity search.
Newsletter
Get the next post in your inbox.
Engineering notes from the Digitorn team. No marketing, no launch announcements, no "10 prompts that will change your life". Just the things we write that we'd want to read.