vLLM is the high-throughput, GPU-accelerated open-weight server most teams reach for once a single laptop stops being enough. Continuous batching, prefix caching, multi-GPU sharding, OpenAI-compatible HTTP. Run it on your own boxes or through providers like Together. Pair with Digitorn the same way you pair with Ollama, the YAML does not care.
app.yaml1brain:2 provider: openai_compat3 model: meta-llama/Llama-3.3-70B-Instruct4 config:5 base_url: "http://your-vllm-host:8000/v1"6 api_key: "{{env.VLLM_API_KEY}}"7 temperature: 0.2vLLMdoesn't have a first-class catalog entry yet. Configure it inline using env templates, the same way the blog examples show. Native catalog support is on the roadmap.
VLLM_API_KEYOptional auth token if you front the server with an auth layer# add to ~/.digitorn/.env
VLLM_API_KEY=...# 1. install the runtime
curl -sSL https://digitorn.ai/install | sh
# 2. drop your vLLM key in the env file
echo 'VLLM_API_KEY=...' >> ~/.digitorn/.env
# 3. install a starter agent and chat
digitorn install hub://digitorn/digitorn-code
digitorn chat digitorn-codeEngineering notes from the Digitorn team. No marketing, no launch announcements, no "10 prompts that will change your life". Just the things we write that we'd want to read.