Open-weight models (Llama, Qwen, DeepSeek, Mistral) ship the actual weights so you can run inference on your own GPU. Quality lags frontier models for the hardest tasks but closes monthly. The right choice when data residency matters, when you want zero per-token cost, or when you need fully offline operation. Pair with Ollama for laptop-scale work, vLLM for production throughput.
Engineering notes from the Digitorn team. No marketing, no launch announcements, no "10 prompts that will change your life". Just the things we write that we'd want to read.