A single Sonnet-class agent answers everything. Half the queries are 'what is X' lookups that a Haiku-class model handles for a tenth of the cost. The expensive model spends most of its budget on questions it never needed to see.
Apps with a wide query distribution where many requests are simple and a few are complex. Customer support, knowledge base assistants, internal helpdesks.
Apps where every query touches the same tool surface and the cost of misrouting (a small model failing) is high.
Drop this into an app.yaml. Adjust the credential refs and module names to fit your existing setup.
1modules:2 web: {}3 rag: { config: { backend: { type: qdrant, path: ./kb } } }45agents:6 - id: router7 modules: [{agent_spawn: [Agent]}]8 brain:9 provider: anthropic10 model: claude-haiku-4-511 credential: anthropic_main12 system_prompt: |13 You route. Classify the user message in one word:14 SIMPLE -> single fact or definition, dispatch fast_helper15 RESEARCH -> needs the web, dispatch researcher16 EXPERT -> multi-step or ambiguous, dispatch expert17 Then call Agent(specialist=<picked>, prompt=<original>, wait=true)18 and return the result verbatim.1920 - id: fast_helper21 role: specialist22 modules: [{rag: [search]}]23 brain: { model: claude-haiku-4-5, credential: anthropic_main }24 system_prompt: "Answer in one paragraph using rag.search results."2526 - id: researcher27 role: specialist28 modules: [{web: [search, fetch]}, {rag: [search]}]29 brain: { model: claude-sonnet-4-6, credential: anthropic_main }30 system_prompt: "Research, cite sources, write a concise answer."3132 - id: expert33 role: specialist34 modules: [{web: [search, fetch]}, {rag: [search]}]35 brain: { model: claude-opus-4-7, credential: anthropic_main }36 system_prompt: "Reason carefully, ask clarifying questions if needed."Walking through the YAML one block at a time so the design is clear, not memorised.
The router runs Haiku-class. Its only job is one-word classification of incoming messages, almost free per call.
Three specialists: Haiku for trivial lookups, Sonnet for research, Opus for hard reasoning. Each only sees the queries that need it.
Returning the specialist's output as-is keeps the response identical to what a single big model would have produced. No double summarisation.
Most workloads are 70% simple, 25% research, 5% expert. With a 1:10:100 cost ratio between Haiku and Opus, the average query cost drops 5-15x.
The pattern above is not the only answer. Here is when something else is the right call.
Simpler config, predictable quality. You pay the expert price on every query.
Replace the LLM router with a local embedding classifier trained on past queries. Near-zero cost, lower flexibility.
Engineering notes from the Digitorn team. No marketing, no launch announcements, no "10 prompts that will change your life". Just the things we write that we'd want to read.