Digitorn
Digitorn
All posts
claude-code

How to build a Claude Code clone in YAML

Self-host your own Claude Code in 50 lines of YAML. Same tools, same multi-agent loop, your API keys. With the actual config and a 5-minute install.

DDigitornApr 30, 202612 min read

Claude Code is genuinely good. It's also a closed-source CLI that calls Anthropic's infrastructure with Anthropic's prompts and bills your account for every keystroke. If you want roughly the same experience but running on your laptop, against the model provider you choose, with the prompts all sitting in a file you can read and edit, that's what this guide is about. Around 50 lines of YAML and you're there.

The tour goes like this. We pick apart what Claude Code is actually doing under the hood. Each piece becomes a YAML primitive on the Digitorn runtime. Then we look at the full app.yaml, talk about cost (Sonnet for the writing, Haiku for the grunt work), and do the multi-agent dispatch which is where most home-rolled clones fall over. At the end there's a 5-minute install path.

By the time you're done you'll have a coding agent on your machine doing what Claude Code does, on your API keys, with the entire behaviour readable in a single config file.

The short version

Claude Code feels magical, but the recipe is mundane. Short tool names. A coordinator that delegates. A read-before-edit guard. A plan written before any code is touched. Every one of those is reproducible declaratively if your runtime exposes the right primitives. Digitorn does, so the rest is just YAML.

Fifty lines, same loop, your keys.

What's actually going on inside Claude Code

Strip the polish off and there are five specific things tuned in a specific way. Reproducing those gets you most of the way there.

Short, ergonomic tool names

Tools are called Write, Read, Edit, Bash, Grep, Glob, Agent. Not filesystem.write or shell.bash_execute. The reason is partly economics (every byte the LLM emits costs money) and partly cognitive: a short, unambiguous name like Write(path, content) lets the model commit. Something like tools.filesystem.WriteFileWithOptions(...) makes it hesitate, then hallucinate options.

Spawning sub-agents on the fly

Ask Claude Code to "find every place this function is called and refactor them" and what really happens is two passes: a search worker runs in parallel to map call sites, then a refactor worker rewrites each. You don't see the orchestration. You just see the result.

That trick is what makes a coding agent feel competent. A single 200K-context model trying to grep thirty files, read them, and rewrite them all in one prompt is a hallucination factory. Splitting the job into a coordinator plus focused workers is what production setups do.

Refusing to edit what hasn't been read

Claude Code won't edit a file unless it has been read first. Sounds dull. It's the single difference between "the agent helped me" and "the agent silently corrupted half my codebase". Ask any LLM to edit a file it has never seen and it will happily invent the contents, guided by the filename. Every time.

Writing a plan before doing anything

For anything non-trivial, Claude Code first emits a numbered plan, then executes it step by step. Without that, the agent wanders. With it, you get focused work, and you can read the plan to know what's about to happen.

Clean interrupt and hot reload

Hit Ctrl+C and the agent stops cleanly: in-flight tool calls are cancelled, conversation state is preserved, the next prompt picks up without re-establishing anything. Change a config file and it reloads in place. Both are the kind of thing you don't notice until you use a tool that doesn't have them, then can't go back.

Five pieces, laid out side by side:

Short tool names
Write, Edit, Bash
ergonomics
Sub-agent dispatch
coordinator + workers
scale
Read-before-edit
no silent corruption
safety
Plan-first
numbered, deliberate
focus
Clean abort
Ctrl+C just works
ux
The five things Claude Code does well. Each one becomes a YAML primitive on Digitorn.

Each one becomes a primitive in the YAML below. None of them require code on your side.

The architecture in YAML

Here's the full app.yaml for a Claude Code-equivalent agent, running on Digitorn:

YAML
1app:2  app_id: my-coder3  name: "My Coder"4  version: "1.0.0"5  description: "A self-hosted coding agent. Multi-agent, plan-first, read-before-edit."6  category: "developer-tools"78runtime:9  mode: conversation10  entry_agent: coordinator11  workdir: "{{env.PWD}}"1213agents:14  - id: coordinator15    role: coordinator16    plan_first: true17    brain:18      provider: anthropic19      model: claude-sonnet-4-6     # the writer brain20      backend: anthropic21      config:22        api_key: "{{env.ANTHROPIC_API_KEY}}"23      temperature: 0.224      max_tokens: 819225      context:26        max_tokens: 20000027        strategy: summarize28      fallback:                     # 402 => switch to Haiku29        provider: anthropic30        model: claude-haiku-4-531        config:32          api_key: "{{env.ANTHROPIC_API_KEY}}"33    system_prompt: |34      You are a senior engineer. For non-trivial tasks:35      1. Read relevant files BEFORE editing them.36      2. Write a numbered plan, share it, then execute.37      3. Spawn search/triage specialists in PARALLEL when scanning38         a codebase. Don't grep + read 30 files yourself.39      4. After every Edit/Write, run any relevant tests with Bash.4041  - id: explorer42    role: specialist43    specialty: "Find files, grep symbols, sample contents"44    modules:45      - {filesystem: [read, grep, glob]}46      - {shell: [bash]}47    brain:48      provider: anthropic49      model: claude-haiku-4-5      # cheap fast triage50      config:51        api_key: "{{env.ANTHROPIC_API_KEY}}"52      temperature: 0.053    system_prompt: |54      You explore codebases. Return only the key findings: paths,55      symbols, code excerpts. No prose. Be FAST.5657tools:58  modules:59    filesystem: {}              # read-before-edit guard is built-in60    shell:61      config:62        timeout: 300            # default per-command timeout (seconds)63    memory:64      config:65        working_memory: true66        todo_list: true67    web:68      config:69        search_backend: duckduckgo70  capabilities:71    default_policy: auto

That's the entire file. Here's what the runtime actually wires up at boot:

YOUTerminal promptCOORDINATORClaude Sonnetplan · read · edit · testctx 200K · out 8KAgent(spawn)EXPLORERClaude Haikugrep · glob · read5x cheaper · 3x fasterMODULES (declared in YAML)📁filesystemread_before_edit❯_shellallowed_rootsmemoryset_goal · todosbehaviorloop guardslsppost-edit lintshared workspaceSonnet (writes code)Haiku (explores code)shared module instance
One coordinator on Sonnet, one explorer on Haiku, five shared modules. The whole thing fits in a 50-line YAML file.

One coordinator on Sonnet, one explorer on Haiku, and five modules they share. Now let's walk through what each block buys you.

Modules are the toolbox

Each module is a set of tools the agents can call. The thing worth knowing is that modules are shared between the coordinator and the explorer. Same _read_files set, same workspace, same working memory. The runtime handles all of that. You just declare what's available.

The read-before-edit guard is built into the filesystem module. An Edit on a file the session hasn't read yet returns "Cannot edit /src/foo.py, read it first" as a tool error. The model reads the error, reads the file, retries. Same behaviour as Claude Code, same safety. No flag to set - it is on by default.

Agent spawn is the dispatcher

This is what lets the coordinator hand work off to an explorer. The whole multi-agent surface in Digitorn is a single tool, Agent, with eight modes selected by parameters:

Python
1Agent(prompt="find all callers of foo()", specialist="explorer")  # spawn + wait2Agent(prompt="...", specialist="explorer", wait=false)             # spawn background3Agent(agent_id="abc-123")                                          # check status4Agent(agent_ids=["a", "b", "c"])                                   # wait for many5Agent(agent_id="abc-123", cancel=true)                             # kill stuck agent6Agent(agent_id="abc-123", reassign="try with different terms")     # retry7Agent(list=true)                                                   # list all

A coordinator can spawn five explorers in parallel, wait for the whole batch, and then act on the combined result. The Python you'd write yourself to do that, retries and cancellation included, is closer to 500 lines than 50.

Per-agent brain, where the cost story lives

The coordinator runs on Claude Sonnet, 8192-token outputs, summarize compaction, full 200K context. That's where the actual code gets written, so quality has to be good. The explorer runs on Haiku. Roughly 5x cheaper, 3x faster, perfectly capable of grepping a directory and returning a list of hits.

That single split is what turns a self-hosted coding agent from "expensive curiosity" into something you can run all day. A coordinator-only setup with Sonnet on every turn costs roughly the same as a Claude Code subscription. Offloading exploration to Haiku trims around 60% off that on realistic workloads, because exploration is most of the LLM time.

Cost per coding session, normalised1.0 unit = the Sonnet-only baselineSonnet on every turnnaïve setup1.0×Sonnet + Haiku splitdigitorn-code default0.38×(-62%)Measured on 50 real refactor + debug sessions. Exploration was the dominant turn type, which is why moving it to Haiku pays off.

Numbers above are normalised against the Sonnet-only baseline, measured across 50 real refactor and debug sessions on the digitorn-code builtin. Your mix will vary, but the shape doesn't.

Plan-first as a runtime rule, not a prompt instruction

memory.runtime.plan_first: true flips on a built-in behaviour rule. Before the first non-trivial tool call, the agent has to produce a plan. If it skips, the runtime injects a "did you plan first?" reminder in the next turn's context. None of that lives in your system prompt, which is why the focused, numbered execution that makes Claude Code look so deliberate keeps working even on long tasks.

Brain fallback so a quota error doesn't ruin your Saturday

The fallback block is a small detail with outsized payoff. When the primary provider answers with a 402 ("insufficient credit") or a rate-limit error, the runtime swaps in the fallback provider mid-conversation. The agent doesn't notice. The user doesn't notice. You don't have to scramble to top up an Anthropic balance at midnight.

What it looks like in practice

Easier to make this concrete with an example. Say you ask the coordinator: "Find every place we call auth.verify_token() and add logging before each call." Here's how the ten turns play out across the agents.

USERCOORDINATORSonnetEXPLORERHaikuWORKSPACEfilesystem · shellt1find every call to auth.verify_token() and add loggingt2writes a 4-step planmemory.set_goal · plan-firstt3Agent(spawn, prompt: 'find callers')wait=falset4Grep('auth.verify_token')t54 hits across 3 filest6agent_id ready · returns hitst7Read('src/api/login.py')read-before-editt8Edit(...add logging)t9Bash('pytest -q')4 passedt10report: 4 files patched, tests green
Ten turns to find four call sites and patch them. Sonnet thinks and edits. Haiku does the grepping. The runtime takes care of spawning, waiting, and merging the results.

A few things are worth pointing at in this trace. The coordinator never greps the codebase itself, it hands that off to a Haiku worker that comes back in roughly a second. The read-before-edit guard catches the first attempt to edit login.py, so the coordinator reads it before touching it. Tests run after every edit, not at the end, which means a regression caught at turn 9 doesn't snowball through the next three files. Sonnet tokens go only to the steps that actually need Sonnet.

What goes wrong, and how the runtime catches it

A handful of failure modes show up over and over when people roll their own coding agent. These are the ones worth knowing about up front.

The first is the agent editing a file it has never read. Without a guard, the LLM imagines the file contents from the filename and writes a "fix" that bulldozes the real code. It's the leading cause of "the AI deleted my work" stories. The Digitorn filesystem module keeps a per-session set of files the agent has read; an Edit call on anything outside that set returns an error in the tool result, which the model usually responds to by reading first and trying again. The guard is always on, no flag required.

Bash running outside the workspace is the next big one. The agent runs an rm -rf somewhere it shouldn't, and that's the end of your afternoon. shell.allowed_roots resolves the cwd of every shell command against an allow-list, defaulting to the user home, the workspace, and the temp dir. Anything outside is rejected. / is never on the list.

Then there's raw tool output eating the context window. One Read("100KB-log.txt") and you've burned a third of your budget. filesystem.config.max_file_kb clamps reads and tells the model the result was truncated, which usually triggers a follow-up Read(path, offset=2000, limit=500). Paginated reads, like you'd do with grep.

The infinite re-edit loop is more annoying than dangerous: the agent edits, the file doesn't compile, it edits again, doesn't compile, repeats. The coding behaviour profile rate-limits same-file edits ("more than three of these in five turns and we stop and ask the user"). The lsp_diagnose hook runs the linter after every write so lint errors land in the next turn's tool result, which lets the agent self-correct, but only up to a point.

The last one is goal drift. After thirty-plus turns, context compaction starts shaving off the early messages and the original task fades. Calling memory.set_goal on the first user message keeps the goal pinned at the top of every subsequent turn, even after compaction. At turn 50 the model still sees the same one-line directive it saw at turn 1.

Getting it running in five minutes

If you want to try this tonight, the path is roughly:

Bash
1# Install the runtime (Mac, Linux, or Windows with Git Bash)2curl -sSL https://digitorn.ai/install | sh34# Drop your Anthropic key in5echo 'ANTHROPIC_API_KEY=sk-ant-...' >> ~/.digitorn/.env67# Save the YAML above as my-coder.yaml somewhere convenient8nano my-coder.yaml   # paste the YAML910# Deploy and open11digitorn app deploy my-coder.yaml12digitorn dev chat my-coder

You land in a terminal session with a coding agent that knows about the directory you started it in, can read and write files there, run tests, and spawn explorer workers in parallel. Same loop as Claude Code. Your keys. Your YAML.

If you'd rather not start from a blank file, the Digitorn Hub already ships a polished version called digitorn-code under developer tools. One install command and you're done.

A few questions worth answering

Is this exactly Claude Code? No. Anthropic's internal prompts aren't public, and probably never will be. What this clone reproduces is the architecture: the tool surface, the read-before-edit guard, the multi-agent dispatch, the plan-first behaviour, the cost routing. The prompts are yours to write and iterate on, which is more than you get with the closed product.

Can I run it on something other than Anthropic? Yes, against any OpenAI-compatible endpoint. Swap anthropic for openai, deepseek, azure_openai, mistral, groq, together, or whatever else. The same YAML works with DeepSeek V3 (the cheapest serious option right now), GPT-4o, or a Llama 70B served by vLLM on a local box. Mixing per agent is fine and usually a good idea: coordinator on Sonnet, explorer on DeepSeek, writer on GPT-4o.

How is this different from LangChain or CrewAI? Different philosophy. LangChain is Python-as-config, you build agents by writing Python. Digitorn is YAML-as-config, you declare the agent and the runtime executes it. LangChain is better when you need deeply custom Python-native pipelines. Digitorn is better when you want a coding agent and you don't want to maintain framework code on top of it. The matrix is at Digitorn vs LangChain.

What about Cursor, Aider, Continue? Different shapes. Cursor is a full IDE, so a different category. Aider is the closest in spirit (self-hosted, CLI), but its configuration leans Python and is harder to extend. Continue is an editor extension. The thing specific to Digitorn is the declarative YAML plus built-in multi-agent.

Does it work fully offline? Yes, as long as your model has an OpenAI-compatible endpoint. Run Ollama or vLLM locally, point the YAML at http://localhost:11434/v1, and you have a coding agent that never phones home. Quality is the trade-off: local 70B-class models still trail Sonnet for production coding work, but the gap shrinks every month.

Can I share an agent with my team? Push it to the Digitorn Hub. Your teammate runs digitorn install hub://your-publisher/my-coder and they have it. Agents travel as YAML plus small assets, so prompts get reviewed in PRs like any other code.

Built a variant worth sharing? Push it to the Hub. That's how the ecosystem grows.

#claude-code#tutorial#self-hosted#coding-agents#multi-agent
ShareLinkedIn
Newsletter

One post a fortnight, in your inbox.

Engineering notes from the Digitorn team. No marketing, no launch announcements, no "10 prompts that will change your life". Just the things we write that we'd want to read.

One-click unsubscribe. We never share your address. Powered by our own infrastructure, not a tracker.
D
The Digitorn team

We build the open-source AI agent runtime that runs on your own machine. YAML over Python, multi-agent by default, marketplace for sharing.

Keep reading

Try it now

Ship your first AI agent in 5 minutes.

Open-source. Self-hosted. YAML-first. Bring your own LLM keys, agents run on your machine.