Choosing a Backend¶
Whittl supports five AI backends. Picking one isn't a permanent choice — you can switch anytime, and most users run two or three depending on the task. This page is the decision-making reference.
Quick decision¶
Answer one question:
What matters most to you right now?
- Maximum code quality. Use Claude Sonnet or Gemini 2.5 Pro. Highest per-token cost, best output.
- Lowest cost. Use OpenRouter free tier (Qwen3-Coder, DeepSeek R1 free) or Gemini 2.5 Flash (generous free tier). Cent-level per operation.
- Complete privacy. Use Ollama. Everything local, nothing leaves your machine.
- Flexibility across many models. Use OpenRouter. One key, 200+ models, switch anytime.
- Best value overall. Use DeepSeek or Claude Haiku. Tier-S quality at fractional cost of the flagship tiers.
Comparison table¶
| Backend | Cost range | Vision | Tools | Long context | Best for |
|---|---|---|---|---|---|
| Claude (Opus / Sonnet / Haiku) | $0.01 – $2 per op | Yes (native) | Yes (native) | Yes | Highest-quality code, agentic workflows |
| Gemini (2.5 Pro / Flash / Flash-Lite) | Free tier + $0.01–$0.50 | Yes (native) | Yes | Yes (1M+) | Free-tier experimentation, very large contexts |
| DeepSeek (V3 / V3.2) | $0.005 – $0.05 per op | No (direct; use OpenRouter for VL2) | Yes | Yes | Cheapest tier-S option, Python-heavy work |
| OpenRouter (200+ models) | Free tier + $0.001–$2 | Depends on model | Depends on model | Depends on model | Model exploration, one key / one bill, free models |
| Ollama (local) | Free (hardware cost) | Partial (v2.4+) | Depends on model | Small (4k–32k typical) | Full privacy, offline work, budget-zero hobbyist use |
When to use each¶
Claude (Anthropic)¶
Strengths:
- Consistently the best quality on complex Python code, especially PySide6 / tkinter UIs with unusual requirements
- Native tool-use API is the most reliable in practice
- Vision works on all three tiers (Opus, Sonnet, Haiku)
- Prompt caching cuts multi-round session costs by ~87%
Weaknesses:
- Most expensive on a per-token basis
- No free tier
Pick Claude when: you're building something serious, the cost per operation matters less than the iteration count, and you want one backend to rule them all.
Model picker inside Claude:
- Opus — for hard architectural tasks, long-context refactoring, Agent Mode
- Sonnet — the default "just use Claude" pick; 95% of Opus quality at ~20% of the cost
- Haiku — for iteration and small edits on established projects; surprisingly capable
Gemini (Google)¶
Strengths:
- Generous free tier on 2.5 Flash and 3 Flash suitable for hobbyist use
- Extremely long context windows (1M+ tokens on 2.5 Pro)
- Native vision on every modern model
- Automatic prompt caching on 2.5 Flash
Weaknesses:
- Quality on complex Python tasks is occasionally behind Claude Sonnet on the same prompt
- Safety filters sometimes refuse generations that other backends run fine
- Pro tier has stricter rate limits than Claude
Pick Gemini when: you want to experiment without cost, your project has a very large codebase (1M context), or you're already deep in the Google ecosystem.
DeepSeek¶
Strengths:
- Cheapest tier-S option by a wide margin — typical Whittl operation costs $0.005–$0.02
- Very strong on Python code specifically
- Automatic prefix caching
Weaknesses:
- Vision requires routing to DeepSeek-VL2 through OpenRouter, not directly available through the DeepSeek API backend
- Occasional latency spikes during peak hours
- Smaller context than Claude / Gemini
Pick DeepSeek when: you want tier-S Python code quality at budget-backend prices, and you don't need vision.
OpenRouter¶
Strengths:
- Single API key gives you access to 200+ models (Claude, GPT-4o, Gemini, Llama, Mistral, Qwen, DeepSeek, Gemma, etc.)
- Free models available with rate limits (Qwen3-Coder, DeepSeek R1 free, Llama 3.3 free)
openrouter/autometa-model picks the best available model automatically- Capability chips (
[Tools],[Thinks],[Long],[Vision]) tell you what each model supports before you pick - Consolidated billing
Weaknesses:
- Small margin (~5–10%) on top of direct provider pricing
- Quality varies widely across the catalog — picking a bad model gives you bad output
Pick OpenRouter when: you want flexibility to try different models mid-project, you want access to niche models (Pixtral, Qwen-VL, GLM-4.5-Air), or you want one key + one bill across providers.
Ollama (local)¶
Strengths:
- 100% local — nothing ever leaves your machine
- Free to run indefinitely (no per-request cost)
- Works offline (airplane, weak wifi, privacy-mandated environments)
Weaknesses:
- Quality ceiling is whatever the best local model you can run is — typically behind cloud flagships
- RAM-hungry (8 GB for a 7B model, 16+ for 14B)
- Slower per-token than cloud backends
- Vision-capable models exist (
llava,qwen-vl,llama3.2-vision) but Whittl's image-input wiring to Ollama is limited
Pick Ollama when: privacy is non-negotiable, you're working offline, or you've already got the hardware and want zero per-generation cost.
For specific model recommendations with RAM requirements and quality ratings, see the Ollama backend page.
Running multiple backends¶
You can configure all five and switch between them per-session or even mid-session. The dropdown in the chat panel switches backends. Conversation history carries forward.
Practical multi-backend setups:
Solo indie pattern
- Haiku for quick iteration
- Sonnet for the hard problem you hit once a day
- Gemini Flash free for throwaway experiments
Privacy-first pattern
- Ollama for default generation (everything stays local)
- Claude Sonnet or Gemini Pro for fall-back when a problem exceeds what the local model handles
Exploration pattern
- OpenRouter as the primary backend
- Star favorite models in the Models dialog (
openrouter/auto,anthropic/claude-haiku-4-5,google/gemini-2.5-flash-lite,qwen/qwen3-coder) - Rotate based on task
Typical real costs¶
Based on field data from Whittl sessions over a typical month:
| Task | Cheap (OpenRouter free / Qwen) | Mid (DeepSeek / Haiku) | Premium (Sonnet) |
|---|---|---|---|
| Small modification (one edit) | $0.00 – $0.002 | $0.005 – $0.02 | $0.01 – $0.05 |
| Full single-file generation (~300 lines) | $0.002 – $0.02 | $0.05 – $0.15 | $0.15 – $0.50 |
| Screenshot to multi-file app | $0.01 – $0.05 | $0.10 – $0.30 | $0.50 – $2.00 |
| Agent Mode autonomous task | Not recommended | $0.05 – $0.25 | $0.30 – $2.00 |
| Auto-fix cycle (5 rounds max) | $0.001 – $0.01 | $0.02 – $0.10 | $0.10 – $0.40 |
Whittl's auto-fix rules, skills system, and prompt caching all compound to pull these numbers down over time.
What's next¶
- Setting up API keys — the practical how-to-add-each-one guide
- OpenRouter — deep dive if you pick OpenRouter as your primary
- Agent Mode — which models unlock which tier of Agent Mode