Skip to content

Cost Optimization

Whittl's costs are entirely in AI API calls (the app itself is a one-time purchase, no subscription, no usage fees). This page is the strategy guide for keeping those AI costs minimal across a project's life.

The two levers

  1. Pick the right backend for each task — cheap backends are often good enough
  2. Let Whittl's Layer do work that would otherwise cost a round-trip — autofix, smart routing, prompt caching, and skills compound

Most users waste money on lever 1 (always using the premium backend) and underuse lever 2 (not leveraging the Layer's free work).

The "cheap first, escalate only when stuck" pattern

A project's life usually looks like:

  1. Initial scaffolding (first 3-5 generations) — lots of tokens, straightforward work. Use a cheap backend. Gemini free tier or Qwen3-Coder on OpenRouter free cost you $0.
  2. Feature iteration (next 20-50 generations) — small changes, specific edits. Auto-fix and smart routing keep tokens tiny. DeepSeek at ~$0.005-0.02 per edit. A full development session is under $0.50.
  3. The hard bug (once a day) — something subtle the cheap model can't figure out. Switch to Claude Sonnet for ONE generation, get it fixed, switch back. ~$0.05-0.15 for the one escalation.

Total for a day of heavy development: usually under $1. Often under $0.25 if you're disciplined about when to escalate.

The "free tier forever" setup

If you want zero marginal cost:

  • Primary: Gemini 2.5 Flash (free tier) — 1.5M free tokens/day on most accounts
  • Backup: OpenRouter free tier — Qwen3-Coder, DeepSeek R1 free, Llama 3.3 free
  • Fallback: Ollama — local, no limits, slower

You can build real apps on this stack indefinitely. Quality ceiling is below Claude Sonnet but well above "useless" for most tasks.

Using Whittl's Layer to reduce spend

The Layer catches mistakes before they cost AI round-trips:

Layer component Saves
Autofix rules 1-2 rounds per generation (no AI needed for known fixes)
Smart Routing 60-90% of input tokens on multi-file projects
Skills prevents re-making the same mistake (no debugging round)
Prompt caching (Claude / Gemini) ~87% discount on repeat context
Oscillation guard caps wasted rounds at 5 instead of unbounded

Every generation you run is cheaper than the same generation would have been on a bare API.

Session-level tips

  1. Leave Expand off for small edits. Expand adds ~$0.001-0.003 per generation. Tiny but unnecessary for "add a clear button."
  2. Leave Think off by default. Think mode reasoning tokens are 3-5x the normal output cost. Toggle on per-request for hard problems.
  3. Don't re-send the whole project. Smart Routing does this automatically — but if you manually attach files via the Assets panel that aren't needed, those bloat the prompt.
  4. Batch related edits in one prompt. "Make all buttons green" is cheaper than four individual "make this button green" requests.
  5. Prefer iteration over regeneration. "Modify the existing code" is cheaper than "start over from scratch."

Per-task backend cheat-sheet

Task Recommended backend Typical cost
New project scaffolding Gemini Flash (free) $0
Adding a feature DeepSeek V3.2 $0.005 - $0.02
Tricky debugging Claude Sonnet $0.05 - $0.15
Screenshot to App DeepSeek (no vision) / Gemini Flash (vision) $0 - $0.05
Agent Mode autonomous task Claude Sonnet $0.30 - $2.00
Large refactor Claude Sonnet one-shot, or DeepSeek with iteration $0.20 - $1.00
Code review / explain Haiku or Gemini Flash-Lite $0.001 - $0.01

Monitoring spend

  • Whittl tracks per-session token counts in the status bar
  • Each backend has its own usage dashboard (Anthropic console, Google AI Studio, DeepSeek platform, OpenRouter activity page)
  • Check weekly if you're cost-sensitive

What's next