Long conversations or dense tool outputs blow through the context window. The model forgets the goal, repeats itself, and costs spike because every turn rereads everything.
- Token usage near the model's window cap
- Quality degrades after turn 15-20
- Tool outputs are large (file contents, search results) and rarely relevant later
Long sessions, agents that read lots of files, research tasks that keep exploring instead of converging.
Short stateless calls. The summarisation step is overhead with no benefit when the context is already small.
The YAML
Drop this into an app.yaml. Adjust the credential refs and module names to fit your existing setup.
1execution:2 mode: conversation3 hooks:4 - id: compact_when_pressured5 "on": turn_end6 condition: { type: context_pressure, threshold: 0.7 }7 action:8 type: compact_context9 keep_recent_turns: 410 summarize_with: { model: claude-haiku-4-5, credential: anthropic_main }11 preserve:12 - goal13 - todos14 - last_user_message1516agents:17 - id: helper18 modules: [{filesystem: [read, edit, write]}, {web: [search]}]19 brain: { model: claude-sonnet-4-6, credential: anthropic_main }How it works
Walking through the YAML one block at a time so the design is clear, not memorised.
Context pressure trigger, not turn count
The hook fires at 70% of the model's context window, not at a fixed turn number. Long turns trigger it sooner, short turns later.
Recent turns kept verbatim
The last four turns stay raw. Recent context is most useful and summarising it would lose detail the model just used.
Older turns get compressed by a cheap model
A Haiku call summarises everything before the recent window into a few hundred tokens. Cost is one cheap call per compaction, not one per turn.
Pinned items survive compaction
Goal, todos, and the last user message are preserved verbatim. The model never loses sight of what it's doing or what the user asked for.
Other ways to solve it
The pattern above is not the only answer. Here is when something else is the right call.
Sliding window without summary
Drop oldest turns silently. Cheap, lossy, can lose critical context the model needed.
External memory store
Push facts to a vector store and recall on demand. Robust for very long-running agents, more setup, slower per-turn.
Get the next post in your inbox.
Engineering notes from the Digitorn team. No marketing, no launch announcements, no "10 prompts that will change your life". Just the things we write that we'd want to read.