Choosing a Backend¶

Whittl supports five AI backends. Picking one isn't a permanent choice — you can switch anytime, and most users run two or three depending on the task. This page is the decision-making reference.

Quick decision¶

Answer one question:

What matters most to you right now?

Maximum code quality. Use Claude Sonnet or Gemini 2.5 Pro. Highest per-token cost, best output.
Lowest cost. Use OpenRouter free tier (Qwen3-Coder, DeepSeek R1 free) or Gemini 2.5 Flash (generous free tier). Cent-level per operation.
Complete privacy. Use Ollama. Everything local, nothing leaves your machine.
Flexibility across many models. Use OpenRouter. One key, 200+ models, switch anytime.
Best value overall. Use DeepSeek or Claude Haiku. Tier-S quality at fractional cost of the flagship tiers.

Comparison table¶

Backend	Cost range	Vision	Tools	Long context	Best for
Claude (Opus / Sonnet / Haiku)	$0.01 – $2 per op	Yes (native)	Yes (native)	Yes	Highest-quality code, agentic workflows
Gemini (2.5 Pro / Flash / Flash-Lite)	Free tier + $0.01–$0.50	Yes (native)	Yes	Yes (1M+)	Free-tier experimentation, very large contexts
DeepSeek (V3 / V3.2)	$0.005 – $0.05 per op	No (direct; use OpenRouter for VL2)	Yes	Yes	Cheapest tier-S option, Python-heavy work
OpenRouter (200+ models)	Free tier + $0.001–$2	Depends on model	Depends on model	Depends on model	Model exploration, one key / one bill, free models
Ollama (local)	Free (hardware cost)	Partial (v2.4+)	Depends on model	Small (4k–32k typical)	Full privacy, offline work, budget-zero hobbyist use

When to use each¶

Claude (Anthropic)¶

Strengths:

Consistently the best quality on complex Python code, especially PySide6 / tkinter UIs with unusual requirements
Native tool-use API is the most reliable in practice
Vision works on all three tiers (Opus, Sonnet, Haiku)
Prompt caching cuts multi-round session costs by ~87%

Weaknesses:

Most expensive on a per-token basis
No free tier

Pick Claude when: you're building something serious, the cost per operation matters less than the iteration count, and you want one backend to rule them all.

Model picker inside Claude:

Opus — for hard architectural tasks, long-context refactoring, Agent Mode
Sonnet — the default "just use Claude" pick; 95% of Opus quality at ~20% of the cost
Haiku — for iteration and small edits on established projects; surprisingly capable

Gemini (Google)¶

Strengths:

Generous free tier on 2.5 Flash and 3 Flash suitable for hobbyist use
Extremely long context windows (1M+ tokens on 2.5 Pro)
Native vision on every modern model
Automatic prompt caching on 2.5 Flash

Weaknesses:

Quality on complex Python tasks is occasionally behind Claude Sonnet on the same prompt
Safety filters sometimes refuse generations that other backends run fine
Pro tier has stricter rate limits than Claude

Pick Gemini when: you want to experiment without cost, your project has a very large codebase (1M context), or you're already deep in the Google ecosystem.

DeepSeek¶

Strengths:

Cheapest tier-S option by a wide margin — typical Whittl operation costs $0.005–$0.02
Very strong on Python code specifically
Automatic prefix caching

Weaknesses:

Vision requires routing to DeepSeek-VL2 through OpenRouter, not directly available through the DeepSeek API backend
Occasional latency spikes during peak hours
Smaller context than Claude / Gemini

Pick DeepSeek when: you want tier-S Python code quality at budget-backend prices, and you don't need vision.

OpenRouter¶

Strengths:

Single API key gives you access to 200+ models (Claude, GPT-4o, Gemini, Llama, Mistral, Qwen, DeepSeek, Gemma, etc.)
Free models available with rate limits (Qwen3-Coder, DeepSeek R1 free, Llama 3.3 free)
openrouter/auto meta-model picks the best available model automatically
Capability chips ([Tools], [Thinks], [Long], [Vision]) tell you what each model supports before you pick
Consolidated billing

Weaknesses:

Small margin (~5–10%) on top of direct provider pricing
Quality varies widely across the catalog — picking a bad model gives you bad output

Pick OpenRouter when: you want flexibility to try different models mid-project, you want access to niche models (Pixtral, Qwen-VL, GLM-4.5-Air), or you want one key + one bill across providers.

Ollama (local)¶

Strengths:

100% local — nothing ever leaves your machine
Free to run indefinitely (no per-request cost)
Works offline (airplane, weak wifi, privacy-mandated environments)

Weaknesses:

Quality ceiling is whatever the best local model you can run is — typically behind cloud flagships
RAM-hungry (8 GB for a 7B model, 16+ for 14B)
Slower per-token than cloud backends
Vision-capable models exist (llava, qwen-vl, llama3.2-vision) but Whittl's image-input wiring to Ollama is limited

Pick Ollama when: privacy is non-negotiable, you're working offline, or you've already got the hardware and want zero per-generation cost.

For specific model recommendations with RAM requirements and quality ratings, see the Ollama backend page.

Running multiple backends¶

You can configure all five and switch between them per-session or even mid-session. The dropdown in the chat panel switches backends. Conversation history carries forward.

Practical multi-backend setups:

Solo indie pattern

Haiku for quick iteration
Sonnet for the hard problem you hit once a day
Gemini Flash free for throwaway experiments

Privacy-first pattern

Ollama for default generation (everything stays local)
Claude Sonnet or Gemini Pro for fall-back when a problem exceeds what the local model handles

Exploration pattern

OpenRouter as the primary backend
Star favorite models in the Models dialog (openrouter/auto, anthropic/claude-haiku-4-5, google/gemini-2.5-flash-lite, qwen/qwen3-coder)
Rotate based on task

Typical real costs¶

Based on field data from Whittl sessions over a typical month:

Task	Cheap (OpenRouter free / Qwen)	Mid (DeepSeek / Haiku)	Premium (Sonnet)
Small modification (one edit)	$0.00 – $0.002	$0.005 – $0.02	$0.01 – $0.05
Full single-file generation (~300 lines)	$0.002 – $0.02	$0.05 – $0.15	$0.15 – $0.50
Screenshot to multi-file app	$0.01 – $0.05	$0.10 – $0.30	$0.50 – $2.00
Agent Mode autonomous task	Not recommended	$0.05 – $0.25	$0.30 – $2.00
Auto-fix cycle (5 rounds max)	$0.001 – $0.01	$0.02 – $0.10	$0.10 – $0.40

Whittl's auto-fix rules, skills system, and prompt caching all compound to pull these numbers down over time.

What's next¶

Setting up API keys — the practical how-to-add-each-one guide
OpenRouter — deep dive if you pick OpenRouter as your primary
Agent Mode — which models unlock which tier of Agent Mode