WHITTL DEV LOG

Last updated: April 2026

◆

JUST SHIPPED: v2.3.0

v2.3 is the polish-and-prove release. The generation pipeline from v2.2 was already solid, so this one focused on two things: making the app look and feel like a single designed product, and making Agent Mode a real option for the models that can handle it. Every new feature is paired with safeguards so cheap models don't burn tokens flailing at problems they can't solve.

Screenshot to app. Drop a screenshot of any UI into chat and a vision-capable model rebuilds it as a native Python app. Works with Claude, GPT-4o, Gemini 2.5, Llama 3.2 Vision, Pixtral, Qwen-VL, Gemma 3, and every vision-capable model on OpenRouter's catalog. Full multimodal format through tool-use rounds, not just the first message.
Agent Mode. Opt-in setting that gives tier-S models (Claude Opus/Sonnet, GPT-4o, Gemini 2.5 Pro, Qwen3.5-Plus, DeepSeek V3.2) full autonomy. Skip the planner, unbounded tool loops, persistent session across requests. Cheaper models fall back to capped behavior automatically.
Oscillation guard and hard round cap. Auto-fix loops now detect when a model is oscillating between two errors (A→B→A) or just not converging after 5 rounds. Stops cleanly with a message instead of burning tokens for 12 rounds. Saves real money on weak models.
Stop button actually stops. During an auto-fix cycle the Stop button now persists across rounds and genuinely cancels queued rounds. Previously it flipped back to “Generate” between rounds and you couldn't abort a runaway loop.
Auto-fix transparency. Status bar cycles live captions (“Editing main.py…”, “Checking syntax…”, “Round 3/5…”) instead of freezing on “Ready.” Chat shows System banners when auto-fix starts and finishes.
Capability chips replace tier badges. The Models dialog shows [Tools] [Thinks] [Long] [Vision] chips per model instead of opaque S/A/B letters. Concrete capabilities you can actually verify, not made-up tier grades.
Brand pass. Cormorant Garamond and Inter typography, Phosphor Duotone icons replacing emoji, refined navy/cream/tan/copper palette. Chat has real avatars. Dark and light themes both polished.
Smoother streaming. Code streams in character-by-character instead of jumping 5 lines at a time. Multi-file generations open each file in its own tab as the AI writes it, not at the end in a flood.
Expanded project sidebar icons. 125+ Phosphor Duotone SVGs covering weather, cameras, mountains, books, charts, terminals, and more. Keyword matcher auto-picks an icon per project based on name; user can override through a curated picker.
OpenRouter favorites that persist. Starred models now actually stick between sessions. (Previous version had a race condition between the background update-checker and the favorites writer.)
938 unit tests. Up from 683 in v2.2 (+255 tests). Heavy focus on safeguards that matter: oscillation detection, round cap, theme-flip regressions, streaming parser edge cases, favorites race, project icon availability.

Full v2.3.0 release notes →

◆

SHIPPING NEXT: v2.4.0 — THE WHITTL LAYER

Here's the idea that made everything else click.

Every AI coding tool on the market is a chat window over someone else's model. Cursor, v0.dev, Claude Code, bolt.new are all wrappers. Their quality ceiling is the model's quality.

Whittl is quietly different. It already has 75+ auto-fix rules, a skills library, oscillation and round-cap guards, a tool executor, and custom validators. That's a real knowledge layer sitting between the user and the model. Weak models get more uplift from the layer because they make more fixable mistakes. v2.4 makes that layer a first-class product concept.

The Whittl Layer

Unify autofix rules, skills, validators, and anti-patterns behind a single engine. Version the rule set independently of the Whittl binary so rules can update weekly without a full release. Positioning shifts: Whittl is not “an AI chat window that makes apps.” It's the knowledge layer that makes any model better at building Python desktop apps.

Whittl Commons (download-only)

A curated, signed bundle of community-sourced rules that every Whittl installation pulls on startup. Like a virus-definitions update, but for code generation. Starts as download-only. I seed the library, users benefit from the updates. Opt-in contribution comes later.

File-tab streaming badge

Small “Streaming…” indicator on the currently-generating file tab so you know which file is being written right now during multi-file generation.

In-app auto-updater

Replace today's “open a browser, download the installer, run it” flow with a proper in-app Download → Restart to install experience. Streaming progress bar, SHA256 verification, resumable on interrupted downloads. Silent install on restart (no UAC for per-user installs, which is the default). Linux AppImage swaps in place and re-executes without restart. Once you're on v2.4, future Whittl releases land with one click, no browser hops.

Mid-generation question tool

When the AI has to guess at a design decision you probably have an opinion on (“SQLite or JSON for persistence?”, “Bottom tabs or nav drawer?”), it now gets a tool to ask you directly with structured buttons. Instead of guessing wrong and requiring a follow-up regeneration, the AI stops, asks, you pick, it continues with the right choice baked in. Cuts out a whole class of “no, do it the other way” iteration cycles.

Claude-compatible skill paths

Whittl already has its own skills at ~/.whittl/skills/. v2.4 adds discovery of ~/.claude/skills/ and .claude/skills/ inside your project folder. SKILL.md is becoming a convergent standard (Claude Code and OpenCode both adopted it); Whittl now interoperates with it for free. Users migrating from Claude Code keep their skill library.

File-tab streaming badge

Small “Streaming…” indicator on the currently-generating file tab so you know which file is being written right now during multi-file generation.

Prep for Flet 1.0

Add flet_version metadata to projects (defaults to 0.28, ignored by current code). Infrastructure ready for when Flet 1.0 stabilizes. See “Longer term” for why the full Flet upgrade is deferred.

◆

COMING IN v2.5: LAYER FEATURES + COMMONS CONTRIBUTION

Commons: two-way contribution

Hit an auto-fix that worked? Optional “Submit to Commons” button shows you the exact JSON payload before anything leaves your machine. Your code, prompts, and project details never leave, only the structured fix rule, with your explicit consent. Submissions go through a review queue; approved rules ship in the next Commons bundle. Every contribution makes every Whittl installation smarter.

Whittl Review

Every generation, regardless of model, runs through a fast local pass: security issues (path traversal, eval/exec, SQL injection), framework gotchas (Qt enum syntax, Flet device mismatches, CSS vs QSS), performance anti-patterns (blocking I/O on UI thread). Runs in parallel with code-apply, costs no tokens, works on every backend.

Uplift Ladder

Cheap-then-escalate. Haiku generates, Whittl Layer fixes common mistakes, test runs. Only if it still fails does Whittl escalate to Sonnet. Internal testing: Haiku with the Layer matches bare-Sonnet on ~70% of typical tasks at ~1/15th the cost.

Custom Agents

Named bundles of (model + tools + skills + prompt) pickable from a chat-header dropdown. Ships with starters: Game Builder (pygame-focused, longer round cap, Sonnet), Quick Edit (Haiku, planner skipped, fast iteration), Debug Only (read-only tools, explains instead of fixes), Code Review (Opus, critiques architecture). Users can create and share their own as portable .md files.

◆

COMING IN v2.6: COMPOUNDING QUALITY

Validated Snippet Store (local RAG)

Every generation that passes tests becomes a validated example. Future similar prompts retrieve these as in-context examples to whichever model you're using. All local, never uploads. Your own successful work makes future Whittl runs on similar tasks more reliable.

Learn-From-Autofix

Every time Whittl's autofix corrects something, that's a signal about what your chosen model gets wrong. Whittl remembers per-model patterns and prepends reminders to future prompts for that model: “Note: you historically emit X in these cases, try Y.” Weak models get custom augmentations based on their own history. The Layer compounds with use.

Design → Desktop

Expanded from the original “screenshot to app” idea. Drop a Figma PNG, paste HTML/CSS from a web app, or point Whittl at a URL. Get a native Python desktop app. Every other AI tool outputs more web. Whittl uniquely converts design artifacts into runnable software you own. Take your bolt.new prototype and make it useful.

◆

LONGER TERM

v3.0: The Ecosystem

Whittl Commons automated pipeline. Scales the contribution path with moderation automation, trust scores, in-app voting, and a Canary channel for unproven rules.
Federated Commons and Community Templates. Third parties can host their own channels. Subscribe to specialists like pygame-experts/gamedev or company.com/internal. Whittl aggregates trusted sources.
Consensus Mode. Fire the same prompt at 2-3 cheap models in parallel, let a judge model pick or merge the best output. Often matches premium-model quality at a fraction of the cost.
Visual verification loop. For Design → Desktop: render generated PySide6 app offscreen, compare pixel similarity to source image, iterate until the output matches. Measurable quality target, not “prompt and hope.”
Approve-before-apply edits. AI proposes a checklist of edits. You approve each before anything touches code.

v2.6 or v3.0: Mobile v2 (Flet 1.0 upgrade)

Current Flet pin (0.28.3) is intentional. Flet 0.29 and 1.0 alpha introduced breaking API changes that invalidate the auto-fix rules, system prompts, templates, and APK tooling Whittl depends on. Migration is ~3-4 weeks of focused work.

The plan: wait for Flet 1.0 stable GA, add dual-version support so existing projects stay on 0.28 while new ones opt into 1.0, build a Migration Agent that upgrades old projects one-click. Commons accelerates 1.0 rule calibration via community contributions. Same release probably includes Flet web export since 1.0's web story is much better than 0.28's.

v3.0+: Advanced Commons and marketplace

Agent marketplace. Browse, preview, install custom agents from the community.
Per-model empirical tier adjustment. Commons benchmark aggregates inform real-world capability tiers that correct for models that score well on metadata but flail in practice.
Enterprise private Commons channels. Companies host their own internal patterns.

◆

WHY THIS IS A MOAT

The strategic bet: most AI coding tools treat the model as the whole product. Their quality ceiling moves with the model. When a new Claude ships, they get better. When a competitor ships a slightly better wrapper, they have nothing to fight back with.

Whittl's bet is different. The Layer plus accumulated community contributions compound over time, independent of any specific model. The moats are narrower and more honest than “our model is better”:

Vertical focus. PySide6 desktop apps are too narrow for Anthropic, OpenAI, Google, or Cursor to care about. Perfect size for an indie product.
Ownership posture. Local-first, one-time purchase, your code on your disk. Structurally incompatible with how SaaS-funded products grow. A competitor that wants to match this has to give up their own revenue model.
Accumulated community contributions. Once Whittl Commons is active, community-contributed rules compound over time into a library that's impossible to replicate quickly. The longer it runs, the sharper the Layer gets.

The goal isn't to be a smarter chat window. The goal is to be the tool that makes every model good at building Python desktop apps, so users can pick the model that fits their budget and still get professional results. Haiku with the Whittl Layer matches bare-Sonnet on typical multi-file Python app generation in internal testing. That's a value proposition no other tool can honestly make.

◆

DESIGN PRINCIPLES

Ship and iterate. Real user feedback beats brainstorming features.
Auto-fix over prompt rules. Catch mistakes after generation, don't bloat the prompt hoping the AI remembers.
Five backends, every budget. Gemini (free), DeepSeek (cheap), Claude (premium), Ollama (local), OpenRouter (everything).
One-time purchase. No subscriptions, no cloud lock-in, your code on your disk.

BACK TO WHITTL v2.3.0 RELEASE NOTES