performance pattern

Summarise and feed back

Compress past turns when the context window starts to bite.

The problem

Long conversations or dense tool outputs blow through the context window. The model forgets the goal, repeats itself, and costs spike because every turn rereads everything.

Symptoms

Token usage near the model's window cap
Quality degrades after turn 15-20
Tool outputs are large (file contents, search results) and rarely relevant later

Use when

Long sessions, agents that read lots of files, research tasks that keep exploring instead of converging.

Skip when

Short stateless calls. The summarisation step is overhead with no benefit when the context is already small.

The YAML

Drop this into an app.yaml. Adjust the credential refs and module names to fit your existing setup.

app.yaml

1execution:2  mode: conversation3  hooks:4    - id: compact_when_pressured5      "on": turn_end6      condition: { type: context_pressure, threshold: 0.7 }7      action:8        type: compact_context9        keep_recent_turns: 410        summarize_with: { model: claude-haiku-4-5, credential: anthropic_main }11        preserve:12          - goal13          - todos14          - last_user_message1516agents:17  - id: helper18    modules: [{filesystem: [read, edit, write]}, {web: [search]}]19    brain: { model: claude-sonnet-4-6, credential: anthropic_main }

How it works

Walking through the YAML one block at a time so the design is clear, not memorised.

01

Context pressure trigger, not turn count

The hook fires at 70% of the model's context window, not at a fixed turn number. Long turns trigger it sooner, short turns later.

02

Recent turns kept verbatim

The last four turns stay raw. Recent context is most useful and summarising it would lose detail the model just used.

03

Older turns get compressed by a cheap model

A Haiku call summarises everything before the recent window into a few hundred tokens. Cost is one cheap call per compaction, not one per turn.

04

Pinned items survive compaction

Goal, todos, and the last user message are preserved verbatim. The model never loses sight of what it's doing or what the user asked for.

Other ways to solve it

The pattern above is not the only answer. Here is when something else is the right call.

Alternative

Sliding window without summary

Drop oldest turns silently. Cheap, lossy, can lose critical context the model needed.

Prefer when: Stateless chat where each turn is largely independent and history is only for tone continuity.

Alternative

External memory store

Push facts to a vector store and recall on demand. Robust for very long-running agents, more setup, slower per-turn.

Prefer when: Multi-day or multi-session agents where the same facts come back repeatedly.

Newsletter

Get the next post in your inbox.

Engineering notes from the Digitorn team. No marketing, no launch announcements, no "10 prompts that will change your life". Just the things we write that we'd want to read.

Related patterns

performanceSemantic routerPick the cheapest specialist that can answer, instead of one big model.

The YAML

Drop this into an app.yaml. Adjust the credential refs and module names to fit your existing setup.

app.yaml

1execution:2  mode: conversation3  hooks:4    - id: compact_when_pressured5      "on": turn_end6      condition: { type: context_pressure, threshold: 0.7 }7      action:8        type: compact_context9        keep_recent_turns: 410        summarize_with: { model: claude-haiku-4-5, credential: anthropic_main }11        preserve:12          - goal13          - todos14          - last_user_message1516agents:17  - id: helper18    modules: [{filesystem: [read, edit, write]}, {web: [search]}]19    brain: { model: claude-sonnet-4-6, credential: anthropic_main }

How it works

Walking through the YAML one block at a time so the design is clear, not memorised.

01

Context pressure trigger, not turn count

The hook fires at 70% of the model's context window, not at a fixed turn number. Long turns trigger it sooner, short turns later.

02

Recent turns kept verbatim

The last four turns stay raw. Recent context is most useful and summarising it would lose detail the model just used.

03

Older turns get compressed by a cheap model

A Haiku call summarises everything before the recent window into a few hundred tokens. Cost is one cheap call per compaction, not one per turn.

04

Pinned items survive compaction

Goal, todos, and the last user message are preserved verbatim. The model never loses sight of what it's doing or what the user asked for.

Other ways to solve it

The pattern above is not the only answer. Here is when something else is the right call.

Alternative

Sliding window without summary

Drop oldest turns silently. Cheap, lossy, can lose critical context the model needed.

Prefer when: Stateless chat where each turn is largely independent and history is only for tone continuity.

Alternative

External memory store

Push facts to a vector store and recall on demand. Robust for very long-running agents, more setup, slower per-turn.

Prefer when: Multi-day or multi-session agents where the same facts come back repeatedly.

Newsletter

Get the next post in your inbox.

Engineering notes from the Digitorn team. No marketing, no launch announcements, no "10 prompts that will change your life". Just the things we write that we'd want to read.