Skip to content

Choosing a Backend

Whittl supports five AI backends. Picking one isn't a permanent choice — you can switch anytime, and most users run two or three depending on the task. This page is the decision-making reference.

Quick decision

Answer one question:

What matters most to you right now?

  • Maximum code quality. Use Claude Sonnet or Gemini 2.5 Pro. Highest per-token cost, best output.
  • Lowest cost. Use OpenRouter free tier (Qwen3-Coder, DeepSeek R1 free) or Gemini 2.5 Flash (generous free tier). Cent-level per operation.
  • Complete privacy. Use Ollama. Everything local, nothing leaves your machine.
  • Flexibility across many models. Use OpenRouter. One key, 200+ models, switch anytime.
  • Best value overall. Use DeepSeek or Claude Haiku. Tier-S quality at fractional cost of the flagship tiers.

Comparison table

Backend Cost range Vision Tools Long context Best for
Claude (Opus / Sonnet / Haiku) $0.01 – $2 per op Yes (native) Yes (native) Yes Highest-quality code, agentic workflows
Gemini (2.5 Pro / Flash / Flash-Lite) Free tier + $0.01–$0.50 Yes (native) Yes Yes (1M+) Free-tier experimentation, very large contexts
DeepSeek (V3 / V3.2) $0.005 – $0.05 per op No (direct; use OpenRouter for VL2) Yes Yes Cheapest tier-S option, Python-heavy work
OpenRouter (200+ models) Free tier + $0.001–$2 Depends on model Depends on model Depends on model Model exploration, one key / one bill, free models
Ollama (local) Free (hardware cost) Partial (v2.4+) Depends on model Small (4k–32k typical) Full privacy, offline work, budget-zero hobbyist use

When to use each

Claude (Anthropic)

Strengths:

  • Consistently the best quality on complex Python code, especially PySide6 / tkinter UIs with unusual requirements
  • Native tool-use API is the most reliable in practice
  • Vision works on all three tiers (Opus, Sonnet, Haiku)
  • Prompt caching cuts multi-round session costs by ~87%

Weaknesses:

  • Most expensive on a per-token basis
  • No free tier

Pick Claude when: you're building something serious, the cost per operation matters less than the iteration count, and you want one backend to rule them all.

Model picker inside Claude:

  • Opus — for hard architectural tasks, long-context refactoring, Agent Mode
  • Sonnet — the default "just use Claude" pick; 95% of Opus quality at ~20% of the cost
  • Haiku — for iteration and small edits on established projects; surprisingly capable

Gemini (Google)

Strengths:

  • Generous free tier on 2.5 Flash and 3 Flash suitable for hobbyist use
  • Extremely long context windows (1M+ tokens on 2.5 Pro)
  • Native vision on every modern model
  • Automatic prompt caching on 2.5 Flash

Weaknesses:

  • Quality on complex Python tasks is occasionally behind Claude Sonnet on the same prompt
  • Safety filters sometimes refuse generations that other backends run fine
  • Pro tier has stricter rate limits than Claude

Pick Gemini when: you want to experiment without cost, your project has a very large codebase (1M context), or you're already deep in the Google ecosystem.

DeepSeek

Strengths:

  • Cheapest tier-S option by a wide margin — typical Whittl operation costs $0.005–$0.02
  • Very strong on Python code specifically
  • Automatic prefix caching

Weaknesses:

  • Vision requires routing to DeepSeek-VL2 through OpenRouter, not directly available through the DeepSeek API backend
  • Occasional latency spikes during peak hours
  • Smaller context than Claude / Gemini

Pick DeepSeek when: you want tier-S Python code quality at budget-backend prices, and you don't need vision.

OpenRouter

Strengths:

  • Single API key gives you access to 200+ models (Claude, GPT-4o, Gemini, Llama, Mistral, Qwen, DeepSeek, Gemma, etc.)
  • Free models available with rate limits (Qwen3-Coder, DeepSeek R1 free, Llama 3.3 free)
  • openrouter/auto meta-model picks the best available model automatically
  • Capability chips ([Tools], [Thinks], [Long], [Vision]) tell you what each model supports before you pick
  • Consolidated billing

Weaknesses:

  • Small margin (~5–10%) on top of direct provider pricing
  • Quality varies widely across the catalog — picking a bad model gives you bad output

Pick OpenRouter when: you want flexibility to try different models mid-project, you want access to niche models (Pixtral, Qwen-VL, GLM-4.5-Air), or you want one key + one bill across providers.

Ollama (local)

Strengths:

  • 100% local — nothing ever leaves your machine
  • Free to run indefinitely (no per-request cost)
  • Works offline (airplane, weak wifi, privacy-mandated environments)

Weaknesses:

  • Quality ceiling is whatever the best local model you can run is — typically behind cloud flagships
  • RAM-hungry (8 GB for a 7B model, 16+ for 14B)
  • Slower per-token than cloud backends
  • Vision-capable models exist (llava, qwen-vl, llama3.2-vision) but Whittl's image-input wiring to Ollama is limited

Pick Ollama when: privacy is non-negotiable, you're working offline, or you've already got the hardware and want zero per-generation cost.

For specific model recommendations with RAM requirements and quality ratings, see the Ollama backend page.

Running multiple backends

You can configure all five and switch between them per-session or even mid-session. The dropdown in the chat panel switches backends. Conversation history carries forward.

Practical multi-backend setups:

Solo indie pattern

  • Haiku for quick iteration
  • Sonnet for the hard problem you hit once a day
  • Gemini Flash free for throwaway experiments

Privacy-first pattern

  • Ollama for default generation (everything stays local)
  • Claude Sonnet or Gemini Pro for fall-back when a problem exceeds what the local model handles

Exploration pattern

  • OpenRouter as the primary backend
  • Star favorite models in the Models dialog (openrouter/auto, anthropic/claude-haiku-4-5, google/gemini-2.5-flash-lite, qwen/qwen3-coder)
  • Rotate based on task

Typical real costs

Based on field data from Whittl sessions over a typical month:

Task Cheap (OpenRouter free / Qwen) Mid (DeepSeek / Haiku) Premium (Sonnet)
Small modification (one edit) $0.00 – $0.002 $0.005 – $0.02 $0.01 – $0.05
Full single-file generation (~300 lines) $0.002 – $0.02 $0.05 – $0.15 $0.15 – $0.50
Screenshot to multi-file app $0.01 – $0.05 $0.10 – $0.30 $0.50 – $2.00
Agent Mode autonomous task Not recommended $0.05 – $0.25 $0.30 – $2.00
Auto-fix cycle (5 rounds max) $0.001 – $0.01 $0.02 – $0.10 $0.10 – $0.40

Whittl's auto-fix rules, skills system, and prompt caching all compound to pull these numbers down over time.

What's next