"AI agent" is one of those terms that means everything and nothing in 2026. Half the marketing pages call any chatbot with a "search the web" button an agent. The other half want you to think you need a PhD in distributed systems before you can ship one.
The truth is in between, and pretty boring. An agent is a small program that runs an LLM in a loop until a goal is hit. That's it. Everything else (memory, planning, multi-agent, RAG) is layered on top of that core idea.
This piece is for engineers who want to actually build one. We'll look at what's inside, what separates it from a chatbot, what the YAML for a real agent looks like, and where things tend to break in production.
The short version
An AI agent is an LLM running in a loop with tools and a goal. Drop the loop, you have an LLM. Drop the tools, you have a chatbot. Drop the goal, you have a search bar.
The loop is the part most people skip when they explain it, and it's the one that matters most.
What's actually inside
Strip away the marketing and there are four parts. They show up in every framework, no matter what the README calls them.
A goal
Whatever you want the agent to accomplish. The trigger can be a user typing "write me a research report on vector databases", a webhook firing when a GitHub issue is opened, or a cron job that runs every Monday at 8am. The goal is the seed of the loop.
An LLM doing the thinking
GPT-4o, Claude Sonnet 4, DeepSeek V3, a local Llama 3 on your laptop. Anything that can take a prompt and emit text plus structured tool calls. The model doesn't act. It picks what to do next, and then waits.
Tools that do the acting
Filesystem access, web search, shell commands, HTTP requests, database queries. A tool is a function the LLM is allowed to invoke. The result of each call gets pasted back into context, and the model picks the next move.
A loop that ties them together
Pseudocode is shorter than English here:
1while not goal_reached:2 decision = llm.think(context, available_tools)3 if decision.is_final_answer:4 return decision.text5 result = run_tool(decision.tool, decision.args)6 context.append(result)That's the whole secret. Three turns, ten, fifty, the agent keeps deciding and acting until it chooses to stop. A chat completion runs once. An agent runs until it's done.
So what's the difference with a chatbot, then?
A useful way to think about it: each step is a strict superset of the one before.
| Capability | LLM | Chatbot | AI agent |
|---|---|---|---|
| Multi-step reasoning | × | × (single-turn most of the time) | ✓ |
| Tools (filesystem, web, APIs) | × | sometimes | ✓ |
| State across turns | × | session memory | persistent + working memory |
| Goal-driven | × | × | ✓ |
| Spawns sub-tasks | × | × | ✓ (advanced) |
An LLM is a function. Text in, text out. A chatbot puts the LLM behind a conversation history and a UI. An agent gives it tools, a goal, and a loop. You can have an LLM without a chatbot. You can't really have an agent without an LLM.
What a real agent config looks like
Easier to look at one than to keep talking around it. The YAML below is a working research agent on the Digitorn runtime, the same shape you'd write yourself.
1# app.yaml - a research agent that searches the web and synthesizes findings2app:3 app_id: solo-researcher4 name: "Solo Researcher"5 version: "1.0.0"6 description: "Web research agent that searches, synthesizes, and cites sources."7 category: "research"89agents:10 - id: main11 role: coordinator12 brain:13 provider: deepseek14 model: deepseek-chat15 backend: openai_compat16 config:17 api_key: "{{env.DEEPSEEK_API_KEY}}"18 temperature: 0.219 max_tokens: 409620 system_prompt: |21 You are a research agent. Given a question, you:22 1. Use web.search to find 5-10 authoritative sources23 2. Use web.fetch to read each one24 3. Cross-reference, deduplicate, and write a 600-word synthesis25 with [domain.com](url) inline citations.26 4. Save the report with filesystem.write.2728 Rules:29 - Always cite sources (no claims without a link)30 - Diverse perspectives (don't quote the same domain 3 times)31 - Stop after 25 turns max3233tools:34 modules:35 web:36 config:37 search_backend: duckduckgo38 memory:39 config:40 working_memory: true41 todo_list: true42 filesystem: {}Forty lines, and that's a working research agent. Three things worth noticing.
modules is everything the agent can touch. web for search and fetch, memory so it can track what it has already learned without re-asking, filesystem for writing the final report.
brain is the LLM. Provider, model, temperature, context window. Nothing implicit, no global default that bites you later.
system_prompt is the contract. This is what you actually iterate on. Most of the engineering work in agent design is here, not in Python glue.
The config above runs as-is on Digitorn. The runtime boots in around 200ms, parses the YAML, wires the modules, and waits for a query. No code, no pipeline, no glue layer.
When one agent isn't enough
Single-agent goes a long way. It breaks down once you need:
- different personalities in the same task (research, writing, editing)
- real parallelism (three web searches at once, not in sequence)
- different models for different jobs (cheap and fast for filtering, slow and good for the final draft)
That's where multi-agent comes in. The pattern that holds up best in practice is one coordinator and a handful of specialists.
The coordinator doesn't do the research itself. It splits the question, spawns workers in parallel, waits, and then chains a writer and an editor at the end. Each worker has its own brain, its own prompt, its own subset of tools.
A stripped-down version of the Digitorn DeepResearch agent that ships by default looks like this:
1agents:2 - id: coordinator3 role: coordinator4 system_prompt: |5 Decompose the question into 2-4 angles. Spawn researchers and a6 fact-checker in PARALLEL. Wait for all. Then spawn the writer.7 Then the editor. Return the final report.8 brain:9 provider: deepseek10 model: deepseek-chat11 credential: deepseek_main12 temperature: 0.21314 - id: web_researcher15 role: specialist16 modules:17 - {web: [search, fetch]}18 - {memory: [remember]}19 brain:20 provider: deepseek21 model: deepseek-chat22 credential: deepseek_main23 temperature: 0.22425 - id: fact_checker26 role: specialist27 modules:28 - {web: [search, fetch]}29 brain:30 provider: deepseek31 model: deepseek-chat32 credential: deepseek_main33 temperature: 0.0 # zero temp for verification3435 - id: writer36 role: specialist37 brain:38 provider: anthropic39 model: claude-sonnet-4-640 credential: anthropic_main41 temperature: 0.44243 - id: editor44 role: specialist45 brain:46 provider: anthropic47 model: claude-sonnet-4-648 credential: anthropic_main49 temperature: 0.0The interesting part is the per-agent brain. Researchers run on cheap DeepSeek because they're filtering text, not crafting prose. The writer and the editor get Claude Sonnet because that's where quality actually shows up. Routing each sub-task to the model with the best price-to-quality fit is what makes multi-agent setups both affordable and good.
Spawning, waiting, cancelling, reassigning, that's the runtime's job, not yours. On Digitorn it's exposed as one Agent tool with eight modes selected by parameters (spawn one, spawn many, wait for all, check status, cancel, reassign, list). Coordinators just call it like any other tool.
Where it usually breaks
A few things go wrong over and over again when teams build the agent layer themselves.
Writing the loop yourself
The first instinct is always "we'll just code the loop in Python". Fine for a weekend prototype. Months later you've reimplemented half a framework: tool registration, hot reload, logs, retries, parallel sub-agents, abort handling. You spend more time on plumbing than on the actual agent.
A runtime that already does that for you is the boring-but-correct answer. Digitorn runs everything in YAML, LangChain is heavier and Python-first, CrewAI leans into role-play. Pick one.
The agent that loops forever
Without a turn cap, an agent can keep deciding and acting indefinitely, and bill the whole time. People discover this when their LLM bill spikes overnight.
Two cheap defenses solve most of it. A hard max_turns (Digitorn defaults to 25) and a max_tokens_per_run budget. On top of that, a behaviour rule that yells "stop" when the agent calls the same tool more than N times in a row catches the rest of the runaway cases.
The system prompt gets buried
After ten or fifteen turns the original instructions are smothered under tool results, and the model quietly slides from "research agent" to "generic helpful assistant". It starts editorialising instead of citing.
The fix is structural, not prompt-engineering. You want a memory primitive that re-injects the goal at the top of every turn, before tool results. On Digitorn that's memory.set_goal. The model can't forget what it can't unsee.
Raw tool output eating the context
One web search can be 50 KB of HTML. Three of those and you've blown a 200K context. The LLM starts ignoring the most recent results, which is the opposite of what you want.
Tool implementations have to summarise before returning. Digitorn's web.fetch strips HTML, extracts main content, and truncates with a sane heuristic. The agent never sees the raw page.
Getting started
If this got you curious enough to actually try one, the fastest path is roughly:
Install Digitorn first.
1curl -sSL https://digitorn.ai/install | shFrom there you have three options. If someone has already built the kind of agent you want, browse the Hub. There are agents for research, developer tooling, productivity, and creative work. One click, installed.
If not, save the 40-line YAML from earlier as app.yaml somewhere convenient, drop a DEEPSEEK_API_KEY in your env, and run digitorn app deploy app.yaml. You're live.
If you don't want to write YAML at all, describe what you want in plain English to the Builder and it'll generate the file for you.
A few questions that come up often
Are AI agents and LLMs the same thing? No. An LLM is a function, text in, text out. An agent uses an LLM as its brain but adds tools, state, and a loop on top.
Can the agent access the internet on its own? Only if you give it a tool that can. The default for any new Digitorn app is no network. You add web to its modules, optionally restricted to specific actions like {web: [search]} or {web: [fetch]}.
Is this safe to run? Depends entirely on what you let it touch. An agent with shell access can run anything. An agent with filesystem access to your home directory can wipe it. The runtime is the safety layer, not the model. Digitorn asks for explicit consent at install time, scoped to specific folders, credentials, network destinations, and you can revoke any of it later.
Self-host or cloud? Self-host if your data can't leave your infrastructure, if you want to avoid per-task fees, or if you want to swap models freely. A cloud product makes sense when the volume is low, when you genuinely don't want to operate anything, or when you're just exploring. Digitorn is self-hosted by default. You bring the keys, agents run locally.
Which framework is the best in 2026? Whoever gives you a confident answer to that is selling you something. The realistic answer depends on your stack: Python with custom logic leans towards LangChain, role-play multi-agent leans towards CrewAI, YAML-first with a marketplace leans towards Digitorn, visual workflows lean towards n8n. The comparisons under /vs/ go into the actual trade-offs.
Where to go next
A few useful next stops if this was helpful:
- 📦 The Digitorn Hub for ready-made agents you can install in one click
- 📚 The YAML reference, every block documented
- 🛠️ The Builder if you'd rather describe an agent in English than write YAML
- 🔍 How to build a Claude Code clone in YAML, the deep multi-agent companion piece
Found a mistake or want to push back on something? The article source lives on GitHub. Open an issue or send a PR, both work.
One post a fortnight, in your inbox.
Engineering notes from the Digitorn team. No marketing, no launch announcements, no "10 prompts that will change your life". Just the things we write that we'd want to read.
We build the open-source AI agent runtime that runs on your own machine. YAML over Python, multi-agent by default, marketplace for sharing.
Keep reading
Ship your first AI agent in 5 minutes.
Open-source. Self-hosted. YAML-first. Bring your own LLM keys, agents run on your machine.