Why we chose YAML over Python for our agent runtime

The first version of Digitorn was a Python framework. Inheriting from Agent, registering tools as decorators, the whole pattern. We had a working coding agent in three weeks. We then spent six months unwinding that decision.

This is not a Python hate piece. Python is fine. The problem is more specific: writing an agent in Python means writing the shape of the agent and its behaviour in the same language, in the same files, at the same level of abstraction. That sounds clever until you have to ship the thing, audit it, hand it to a non-engineer, or hot-reload a prompt at three in the morning. Then it stops being clever.

So we rewrote the runtime. The agents are now declared in YAML, which the runtime parses, validates, and executes. The Python is still there, underneath, doing the actual work. But the surface a user touches is roughly fifty lines of config.

Lines of code

Python

62

YAML

14

Files to edit

Python

5

YAML

1

Reload after edit

Python

8s

YAML

200ms

Reviewable in PR

Python

barely

YAML

trivially

agent.py · imperative

from agents import Coordinator, Worker
from agents.tools import Filesystem, Shell, Web
from agents.brain import Anthropic, OpenAI
fs = Filesystem(max_kb=2048, read_before_edit=True)
sh = Shell(allowed_roots=["~/work"])
explorer = Worker(
    name="explorer",
    tools=[fs.read, fs.grep, fs.glob, sh.bash],
    brain=Anthropic("claude-haiku-4-5")
)
coord = Coordinator(
    tools=[fs.read, fs.write, fs.edit, sh.bash],
    workers=[explorer],
    brain=Anthropic("claude-sonnet-4-6")
)
# 40 more lines of: register tools, load env,
# wire abort handler, set up logging, hot-reload...

app.yaml · declarative

modules:
  filesystem: {max_kb: 2048, read_before_edit: true}
  shell: {allowed_roots: ["~/work"]}
agents:
  - id: explorer
    modules: [{filesystem: [read, grep, glob]}, shell]
    brain: {model: claude-haiku-4-5}
  - id: coordinator
    modules: [filesystem, shell]
    workers: [explorer]
    brain: {model: claude-sonnet-4-6}

Same agent. Same behaviour. The Python version on the left runs in production today on a popular framework. The YAML on the right is what shipped on Digitorn after we stopped fighting Python.

Same agent. Same loop, same tools, same brains. The Python version on the left ships in production today on a popular framework. The YAML version on the right is what runs on Digitorn after we stopped fighting the language.

What "config as code" really means here

The phrase gets thrown around a lot. Here's what we mean by it specifically.

An agent has a shape: which tools it has, which model it uses, what its system prompt is, when it's allowed to spawn workers. The shape is mostly static across runs. It doesn't loop, branch, or compute. It just describes structure.

An agent also has behaviour: the actual decisions the LLM makes, the tool calls it issues, the error paths it takes. The behaviour is dynamic. It can't be encoded statically because the whole point of an LLM is that the runtime decisions live in the weights, not in your code.

In a Python-based framework, both live in the same file. You read a 200-line Agent class and try to figure out which lines describe the shape (so they're stable) and which lines hook into runtime callbacks (so they're load-bearing). Six months in, nobody on the team can tell you with confidence.

The YAML approach forces the split. The shape is declarative and lives in app.yaml. The behaviour is decided by the LLM at runtime. The plumbing in between is in the runtime, not in your config. You read the YAML and you know exactly what the agent can and cannot do, because there is no other surface to check.

The hot-reload problem, which is the whole pitch

If you've worked on agent prompts for any length of time, you know the loop: edit a system prompt, restart the agent, send the same test message, see what changed. Repeat thirty times an afternoon. The thing you actually care about is fast iteration on the prompt. Everything else is overhead.

In a Python framework, every edit means stopping the process, re-importing modules (if you trust your import graph, which you usually shouldn't), reloading credentials, re-establishing the session, then re-sending the test. On our internal benchmark we measured around eight seconds end to end on a warm machine. Cold start was closer to twenty.

The YAML side of that picture is what we have now. A file watcher catches the save, the runtime diffs the config, swaps the affected modules in place, and resumes the next turn with the new prompt. About 200 milliseconds in normal cases. The agent's conversation history is preserved, which means you can tweak the prompt mid-task and watch the next decision land with the new instructions, without losing anything.

A 40× speed-up on the inner loop sounds like a vanity metric until you sit through a debugging session with both. The Python loop punishes iteration. You start batching changes ("let me just try ten things at once and see what happens") which is the worst possible debugging strategy for a stochastic system. The YAML loop rewards iteration. You change one line, you see the effect immediately, you keep your mental model intact.

What you give up

This is the part the marketing pages skip.

Warning

YAML is genuinely worse than Python for the small subset of things Python is genuinely good at. If you need to compute the system prompt from a database query, branch tool registration on a feature flag, or wire up a custom LLM provider that nobody has written a backend for, YAML will fight you. Be honest about whether your use case actually needs that.

The biggest thing you give up is reachability into the Python ecosystem. If you want to plug a custom retriever from llama-index, run a Pydantic schema validator on a tool input, or use a niche embedding model that ships as a Python class, you can't just import it from your YAML.

The second thing you give up is fluent debugging. A Python stack trace through a def think(): you wrote yourself is more informative than a stack trace through a runtime that interpreted YAML and dispatched into modules. The trade is roughly: you debug less often, but when you do, it's harder.

The third thing is the feel of being in control. Writing an agent in Python feels like programming. Writing an agent in YAML feels like filling out a form. For some teams, especially research-heavy ones, the loss-of-feel is a deal-breaker. We respect that. Not every problem wants the same answer.

What you gain

The wins, in roughly the order we noticed them.

Pull request reviews stop being theatre. A YAML diff is small, scannable, and has no flow control. Reviewers can actually tell what changed. Compare to a Python diff that adds a decorator and reorders an import: nobody can tell at a glance whether that PR is a no-op or a behaviour change.

Non-engineers can ship. Our product manager edits the system prompts directly in the YAML. She catches issues we miss because she's actually reading the agent's outputs against her acceptance criteria. In the Python version, she filed tickets. In the YAML version, she opens PRs.

Audit and compliance become trivial. "What does this agent have access to?" is answerable by grep modules app.yaml. It used to be a meeting.

Deploys are deterministic. The YAML hashes to a stable bundle. Two installs of the same hash get the same agent, byte for byte. The Python version had install variability we couldn't fully eliminate (transitive deps, env-dependent behaviour) until we shipped containers, at which point we'd already lost half the iteration speed.

Prompts can travel. An agent on Digitorn travels as YAML plus a few small assets. You can publish it to the Hub, a teammate runs digitorn install, and they have your agent. No virtualenv. No "works on my machine".

Hot reload, again. This is the one we underestimated. It's not a productivity nice-to-have. It changes what kinds of debugging you do. You debug differently when the cycle is 200ms.

The 1% case where you actually need code

The honest answer is that some agents are not config-shaped. A Slack bot that needs to reach into your CRM, look up the customer's tier, conditionally enable two tools, call a third tool's API directly with a derived auth token, and post the result with a custom formatter, that's not really a YAML problem. It's a Python problem wearing a YAML hat.

We hit this wall ourselves. The fix is the layered escape hatch.

Three layers, one runtime. You start in YAML. If a hook isn't enough, you drop into Python without rewiring the rest of the agent.

Layer 1 is the YAML config. The vast majority of agents stop here. We ran the numbers on the Hub: 95% of published agents touch nothing but YAML, and the other 5% mostly want a one-line shell snippet which the runtime already supports through hooks.

Layer 2 is the hook system. You declare a hook in YAML (on_tool_end, on_turn_start, on_error, etc.) with an inline shell, JS, or Python snippet. The snippet runs in a sandbox, has access to the tool result, and can transform or branch on it. This is enough to handle conditional logic, cross-tool state, custom validation, retry policies, and most of what people think they need Python for.

Layer 3 is a custom Python module. You drop a .py file in the app directory, declare it in YAML, and the runtime loads it like any other module. From the agent's point of view it's just another tool. From your point of view you have full Python freedom for that one piece, while everything else stays declarative.

This three-layer structure is the answer to "does YAML eventually paint you into a corner?". It does, briefly, on the way to a more useful place. The corner is the hooks layer, which is enough for almost everything. The 1% that breaks through stays in YAML for the shape and gains a small Python file for the behaviour. The whole agent is still reviewable as one diff.

A pragmatic recommendation

If you're at the start of an agent project and you're choosing between writing the orchestration in Python or picking a runtime with declarative configs, ask yourself two questions.

Is the shape of your agent going to change a lot, or is the behaviour going to change a lot? If it's the shape (new tools, new sub-agents, new model providers), declarative wins by a huge margin because every change is a one-line edit. If it's the behaviour (new prompt strategies, dynamic dispatch, model-specific tweaks per turn), then you're going to spend most of your time in the inner loop, and a fast hot-reload matters more than language flexibility. Declarative still wins, just for a different reason.

How many people on your team are going to read this code? If the answer is "two engineers", Python is fine. If the answer is "two engineers, a PM, a designer reviewing prompts, and an oncall in three months who's never seen this codebase", declarative wins on review surface alone.

We're not religious about it. We use Python every day, in Digitorn's runtime, in our internal tooling, in the digitorn-builder when it generates YAML for users. What we don't use it for is the agent's own definition. That's a job for a config file the runtime can read in 200 milliseconds and a human can review in two.

Try it for yourself

If this argument resonates, the fastest way to see it in practice is to install Digitorn and edit a system prompt while the agent is running.

Bash

1curl -sSL https://digitorn.ai/install | sh2digitorn install hub://digitorn/digitorn-code3digitorn dev chat digitorn-code

Then in another terminal, open the agent's app.yaml and change one line of the system prompt. Save. The next message you send will use the new prompt, no restart. The first time you do this, it feels like cheating.

The full architecture lives in the docs. The companion piece on multi-agent dispatch is How to build a Claude Code clone in YAML, which works through a real coordinator-plus-explorer setup end to end.

If you push back on this take, we want to hear it. The article source is on GitHub and so is the runtime. Open an issue, send a PR, or just write a thread about why we got it wrong. We've changed our minds about smaller things.

#yaml#framework-design#language-design#runtime#production

Share LinkedIn

Newsletter

One post a fortnight, in your inbox.

Engineering notes from the Digitorn team. No marketing, no launch announcements, no "10 prompts that will change your life". Just the things we write that we'd want to read.

The Digitorn team

We build the open-source AI agent runtime that runs on your own machine. YAML over Python, multi-agent by default, marketplace for sharing.

GitHub Edit this article

Keep reading

yaml

Try it now

Ship your first AI agent in 5 minutes.

Open-source. Self-hosted. YAML-first. Bring your own LLM keys, agents run on your machine.

Install Digitorn Browse the Hub

Lines of code

Python

62

YAML

14

Files to edit

Python

5

YAML

1

Reload after edit

Python

8s

YAML

200ms

Reviewable in PR

Python

barely

YAML

trivially

agent.py · imperative

from agents import Coordinator, Worker
from agents.tools import Filesystem, Shell, Web
from agents.brain import Anthropic, OpenAI
fs = Filesystem(max_kb=2048, read_before_edit=True)
sh = Shell(allowed_roots=["~/work"])
explorer = Worker(
    name="explorer",
    tools=[fs.read, fs.grep, fs.glob, sh.bash],
    brain=Anthropic("claude-haiku-4-5")
)
coord = Coordinator(
    tools=[fs.read, fs.write, fs.edit, sh.bash],
    workers=[explorer],
    brain=Anthropic("claude-sonnet-4-6")
)
# 40 more lines of: register tools, load env,
# wire abort handler, set up logging, hot-reload...

app.yaml · declarative

modules:
  filesystem: {max_kb: 2048, read_before_edit: true}
  shell: {allowed_roots: ["~/work"]}
agents:
  - id: explorer
    modules: [{filesystem: [read, grep, glob]}, shell]
    brain: {model: claude-haiku-4-5}
  - id: coordinator
    modules: [filesystem, shell]
    workers: [explorer]
    brain: {model: claude-sonnet-4-6}

Same agent. Same behaviour. The Python version on the left runs in production today on a popular framework. The YAML on the right is what shipped on Digitorn after we stopped fighting Python.

What "config as code" really means here

The phrase gets thrown around a lot. Here's what we mean by it specifically.

The hot-reload problem, which is the whole pitch

What you give up

This is the part the marketing pages skip.

Warning

What you gain

The wins, in roughly the order we noticed them.

Audit and compliance become trivial. "What does this agent have access to?" is answerable by grep modules app.yaml. It used to be a meeting.

Hot reload, again. This is the one we underestimated. It's not a productivity nice-to-have. It changes what kinds of debugging you do. You debug differently when the cycle is 200ms.

The 1% case where you actually need code

We hit this wall ourselves. The fix is the layered escape hatch.

Three layers, one runtime. You start in YAML. If a hook isn't enough, you drop into Python without rewiring the rest of the agent.

A pragmatic recommendation

If you're at the start of an agent project and you're choosing between writing the orchestration in Python or picking a runtime with declarative configs, ask yourself two questions.

Try it for yourself

If this argument resonates, the fastest way to see it in practice is to install Digitorn and edit a system prompt while the agent is running.

Bash

1curl -sSL https://digitorn.ai/install | sh2digitorn install hub://digitorn/digitorn-code3digitorn dev chat digitorn-code

#yaml#framework-design#language-design#runtime#production

Share LinkedIn

Newsletter

One post a fortnight, in your inbox.

Engineering notes from the Digitorn team. No marketing, no launch announcements, no "10 prompts that will change your life". Just the things we write that we'd want to read.

The Digitorn team

We build the open-source AI agent runtime that runs on your own machine. YAML over Python, multi-agent by default, marketplace for sharing.

GitHub Edit this article

Keep reading

yaml

Ship your first AI agent in 5 minutes.

Open-source. Self-hosted. YAML-first. Bring your own LLM keys, agents run on your machine.

Install Digitorn Browse the Hub

Why we chose YAML over Python for our agent runtime

#What "config as code" really means here

#The hot-reload problem, which is the whole pitch

#What you give up

#What you gain

#The 1% case where you actually need code

#A pragmatic recommendation

#Try it for yourself

One post a fortnight, in your inbox.

Keep reading

Eight blocks instead of fifteen: the YAML schema rewrite

How credentials work on Digitorn: an encrypted vault driven from YAML

Hooks: 4 production patterns we ship today on Digitorn

Ship your first AI agent in 5 minutes.

Why we chose YAML over Python for our agent runtime

#What "config as code" really means here

#The hot-reload problem, which is the whole pitch

#What you give up

#What you gain

#The 1% case where you actually need code

#A pragmatic recommendation

#Try it for yourself

One post a fortnight, in your inbox.

Keep reading

Eight blocks instead of fifteen: the YAML schema rewrite

How credentials work on Digitorn: an encrypted vault driven from YAML

Hooks: 4 production patterns we ship today on Digitorn

Ship your first AI agent in 5 minutes.

What "config as code" really means here

The hot-reload problem, which is the whole pitch

What you give up

What you gain

The 1% case where you actually need code

A pragmatic recommendation

Try it for yourself

What "config as code" really means here

The hot-reload problem, which is the whole pitch

What you give up

What you gain

The 1% case where you actually need code

A pragmatic recommendation

Try it for yourself