A downstream service is degraded. Retrying every call keeps the upstream pinned and prevents recovery. Worse, the agent burns tokens on retries that would have failed.
When a tool depends on a single external service that has well-understood failure modes and a viable fallback (cache, secondary provider, graceful 'service unavailable' message).
When there is no fallback and the service is critical to the agent's output. Better to surface the failure than to lie with cached data.
Drop this into an app.yaml. Adjust the credential refs and module names to fit your existing setup.
1# Trip the circuit after 3 consecutive failures, route to fallback2modules:3 web: {}4 cache: {}56execution:7 mode: conversation8 hooks:9 - id: trip_circuit10 "on": tool_end11 condition:12 type: all_of13 conditions:14 - { type: tool_name, match: "web.fetch" }15 - { type: tool_failed }16 - { type: expression, expr: "session.consecutive_failures.web_fetch >= 3" }17 action:18 type: chain19 actions:20 - { type: module_action, module: cache, action: set, params: { key: "circuit:web_fetch", value: "open", ttl: 60 } }21 - { type: log, level: warn, message: "web.fetch circuit opened" }2223 - id: serve_from_cache_when_open24 "on": tool_start25 condition:26 type: all_of27 conditions:28 - { type: tool_name, match: "web.fetch" }29 - { type: expression, expr: "cache.get('circuit:web_fetch') == 'open'" }30 action:31 type: gate32 result:33 status: "service_unavailable"34 fallback: "cache"35 retry_after: 603637agents:38 - id: helper39 modules: [{web: [fetch]}, {cache: [get, set]}]40 brain: { model: claude-haiku-4-5, credential: anthropic_main }41 system_prompt: |42 If a tool returns status: service_unavailable, do not retry, tell43 the user the live data is unavailable and offer the cached version.Walking through the YAML one block at a time so the design is clear, not memorised.
The runtime exposes session.consecutive_failures.{tool_name}. The hook fires when the count hits three.
A cache entry with a 60s TTL is the open-circuit signal. Long enough to give the upstream room to recover, short enough that the next call after the cooldown probes the service again.
The gate action inside a tool_start hook intercepts the call before it reaches the network. The agent sees a structured result, not an error.
After 60s the circuit cache entry expires. The next call goes through. If it fails three times again, the circuit reopens with the same logic.
The pattern above is not the only answer. Here is when something else is the right call.
Classic CB pattern: after the cooldown, allow one request through. If it succeeds, close the circuit; if it fails, reopen for another cycle. Slightly more code, much smoother under intermittent failures.
Simpler. Costs more in degraded scenarios because every call still tries.
Engineering notes from the Digitorn team. No marketing, no launch announcements, no "10 prompts that will change your life". Just the things we write that we'd want to read.