Cost Optimization¶

Whittl's costs are entirely in AI API calls (the app itself is a one-time purchase, no subscription, no usage fees). This page is the strategy guide for keeping those AI costs minimal across a project's life.

The two levers¶

Pick the right backend for each task — cheap backends are often good enough
Let Whittl's Layer do work that would otherwise cost a round-trip — autofix, smart routing, prompt caching, and skills compound

Most users waste money on lever 1 (always using the premium backend) and underuse lever 2 (not leveraging the Layer's free work).

The "cheap first, escalate only when stuck" pattern¶

A project's life usually looks like:

Initial scaffolding (first 3-5 generations) — lots of tokens, straightforward work. Use a cheap backend. Gemini free tier or Qwen3-Coder on OpenRouter free cost you $0.
Feature iteration (next 20-50 generations) — small changes, specific edits. Auto-fix and smart routing keep tokens tiny. DeepSeek at ~$0.005-0.02 per edit. A full development session is under $0.50.
The hard bug (once a day) — something subtle the cheap model can't figure out. Switch to Claude Sonnet for ONE generation, get it fixed, switch back. ~$0.05-0.15 for the one escalation.

Total for a day of heavy development: usually under $1. Often under $0.25 if you're disciplined about when to escalate.

The "free tier forever" setup¶

If you want zero marginal cost:

Primary: Gemini 2.5 Flash (free tier) — 1.5M free tokens/day on most accounts
Backup: OpenRouter free tier — Qwen3-Coder, DeepSeek R1 free, Llama 3.3 free
Fallback: Ollama — local, no limits, slower

You can build real apps on this stack indefinitely. Quality ceiling is below Claude Sonnet but well above "useless" for most tasks.

Using Whittl's Layer to reduce spend¶

The Layer catches mistakes before they cost AI round-trips:

Layer component	Saves
Autofix rules	1-2 rounds per generation (no AI needed for known fixes)
Smart Routing	60-90% of input tokens on multi-file projects
Skills	prevents re-making the same mistake (no debugging round)
Prompt caching (Claude / Gemini)	~87% discount on repeat context
Oscillation guard	caps wasted rounds at 5 instead of unbounded

Every generation you run is cheaper than the same generation would have been on a bare API.

Session-level tips¶

Leave Expand off for small edits. Expand adds ~$0.001-0.003 per generation. Tiny but unnecessary for "add a clear button."
Leave Think off by default. Think mode reasoning tokens are 3-5x the normal output cost. Toggle on per-request for hard problems.
Don't re-send the whole project. Smart Routing does this automatically — but if you manually attach files via the Assets panel that aren't needed, those bloat the prompt.
Batch related edits in one prompt. "Make all buttons green" is cheaper than four individual "make this button green" requests.
Prefer iteration over regeneration. "Modify the existing code" is cheaper than "start over from scratch."

Per-task backend cheat-sheet¶

Task	Recommended backend	Typical cost
New project scaffolding	Gemini Flash (free)	$0
Adding a feature	DeepSeek V3.2	$0.005 - $0.02
Tricky debugging	Claude Sonnet	$0.05 - $0.15
Screenshot to App	DeepSeek (no vision) / Gemini Flash (vision)	$0 - $0.05
Agent Mode autonomous task	Claude Sonnet	$0.30 - $2.00
Large refactor	Claude Sonnet one-shot, or DeepSeek with iteration	$0.20 - $1.00
Code review / explain	Haiku or Gemini Flash-Lite	$0.001 - $0.01

Monitoring spend¶

Whittl tracks per-session token counts in the status bar
Each backend has its own usage dashboard (Anthropic console, Google AI Studio, DeepSeek platform, OpenRouter activity page)
Check weekly if you're cost-sensitive

What's next¶

Choosing a Backend — full per-backend comparison
Smart Routing — how the Layer reduces token cost
Auto-fix — how the Layer reduces round-trip cost