vLLM (self-hosted server)

Build AI agents on vLLM in YAML

Production-grade open-weight serving. The right choice once Ollama runs out of throughput.

Why vLLM

vLLM is the high-throughput, GPU-accelerated open-weight server most teams reach for once a single laptop stops being enough. Continuous batching, prefix caching, multi-GPU sharding, OpenAI-compatible HTTP. Run it on your own boxes or through providers like Together. Pair with Digitorn the same way you pair with Ollama, the YAML does not care.

Models worth knowing about

premium

meta-llama/Llama-3.3-70B-Instruct

General-purpose flagship

specialty

Qwen/Qwen2.5-Coder-32B-Instruct

Coding workloads

premium

deepseek-ai/DeepSeek-V3

If you have the GPUs for it

Strengths

Production-grade throughput, continuous batching
Multi-GPU and multi-node sharding
OpenAI-compatible API, drop-in for any agent runtime
Prefix caching cuts repeat-context costs

Worth knowing

No catalog entry needed unless you front the server with auth (in which case use a free-form credential)
Operational complexity beyond Ollama, you maintain the server
GPU costs are real even at moderate volume
Cold-start times vary by model size

Drop into your `app.yaml`

agent brain block

1brain:2  provider: openai_compat3  model: meta-llama/Llama-3.3-70B-Instruct4  config:5    base_url: "http://your-vllm-host:8000/v1"6    api_key: "{{env.VLLM_API_KEY}}"7  temperature: 0.2

⚡ Inline config (catalog support pending)

vLLMdoesn't have a first-class catalog entry yet. Configure it inline using env templates, the same way the blog examples show. Native catalog support is on the roadmap.

VLLM_API_KEYOptional auth token if you front the server with an auth layer

# add to ~/.digitorn/.env
VLLM_API_KEY=...

Run one in 5 minutes

Install Digitorn and chat with your vLLM agent

# 1. install the runtime
curl -sSL https://digitorn.ai/install | sh

# 2. drop your vLLM key in the env file
echo 'VLLM_API_KEY=...' >> ~/.digitorn/.env

# 3. install a starter agent and chat
digitorn install hub://digitorn/digitorn-code
digitorn chat digitorn-code

Newsletter

Get the next post in your inbox.

Engineering notes from the Digitorn team. No marketing, no launch announcements, no "10 prompts that will change your life". Just the things we write that we'd want to read.

Other providers Digitorn supports

frontierAnthropic Claude/integrations/anthropic frontierOpenAI/integrations/openai fastDeepSeek/integrations/deepseek frontierMistral/integrations/mistral openOllama/integrations/ollama fastGroq/integrations/groq enterpriseAzure OpenAI/integrations/azure-openai frontierGoogle Gemini/integrations/google-gemini openTogether AI/integrations/together-ai

# 1. install the runtime curl -sSL https://digitorn.ai/install | sh # 2. drop your vLLM key in the env file echo 'VLLM_API_KEY=...' >> ~/.digitorn/.env # 3. install a starter agent and chat digitorn install hub://digitorn/digitorn-code digitorn chat digitorn-code

Build AI agents on vLLM in YAML

Models worth knowing about

Drop into your app.yaml

Install Digitorn and chat with your vLLM agent

Get the next post in your inbox.

Other providers Digitorn supports

Build AI agents on vLLM in YAML

Models worth knowing about

Drop into your app.yaml

Install Digitorn and chat with your vLLM agent

Get the next post in your inbox.

Other providers Digitorn supports

Drop into your `app.yaml`

Drop into your `app.yaml`