Moolah Vibe Coding
Adrian Sutton
I used custom software called Moolah to track my personal finances for many years now. Originally it was written in JavaScript with moolah-server providing the backend and moolah the frontend. This holidays I’ve been vibe coding a replacement written entirely in Swift. It seems like a somewhat useful experiment to learn more about the trade offs of extreme AI usage, so I had Claude dig into the available stats from GitHub and its session logs to compare the two projects and see what we can learn. It’s not a controlled experiment so lots of room for interpretation, but an interesting data point none the less.
The initial version it came out with was a bit absurdly positive about AI, along the lines of I wrote 10x the lines of code in 0.2% of the time - more code is better! But with a bit of prompting and providing additional background it came out with the report below. I’ll write up some more human thoughts on the experience later, but I think the AI-crunched numbers are worth sharing by themselves to set the scene and because the key learnings are genuinely useful.
The Cast
| Project | Tech | Purpose | Active dev days | Commits | LOC (prod) |
|---|---|---|---|---|---|
| moolah (web) | Vue.js | Web frontend | 149 days over 8.7 yrs | 763 | ~8,400 |
| moolah-server | Node.js/Hapi | REST API backend | 87 days over 8.8 yrs | 405 | ~2,800 |
| moolah-server-go | Go | Learning exercise, abandoned | 5 days | 21 | ~500 |
| moolah-native | SwiftUI | Native iOS/macOS app | 8 days | 369 | ~20,600 |
Active dev days = days with at least one non-dependency-maintenance commit. The web and server are one project across two repos — 70 days overlap — so combined unique effort is 180 days.
How Each Project Was Built
moolah & moolah-server were started June 24, 2017 by Adrian Sutton, with Brett Henderson contributing the initial web import. Every line across 1,168 combined commits was written by hand with zero AI involvement. Adrian had deep JavaScript experience and was inventing the domain model, API, database schema, and UX simultaneously — greenfield design work. The code is self-documenting: no significant documentation exists because none is needed. The code speaks for itself, making the projects easy to pick up after months of absence with no risk of stale docs.
moolah-native was entirely AI-generated starting April 5, 2026. Adrian directed the work but has not read the code and has no Swift, SwiftUI, iOS, or macOS experience. Multiple AI agents were used, switching between them as rate limits were hit. 327 of 369 commits carry a Claude co-author tag; the remaining 42 “solo” commits are manual commits of AI-written code. Effectively 100% AI-authored by someone with no ability to review the output.
moolah-server-go was a 5-day learning exercise started during the holiday between jobs — a way to learn Go before a new role that required it. The goal was achieved regardless of the project being abandoned.
The Effort Question
Raw calendar span is misleading for side projects with multi-month dormancy periods. Active development days varied enormously in intensity:
| Session type | Web+Server (combined) | Native |
|---|---|---|
| Full day (6+ hrs) | 22 days | 8 days |
| Half day (3-5 hrs) | 55 days | 0 |
| Quick (1-2 hrs) | 140 days | 0 |
| Estimated total hours | ~600 hrs | ~84 commit-hours |
But these numbers aren’t comparable. The web+server hours are a developer actively writing and reasoning about code. The native app’s commit-hours are largely the AI working autonomously.
What the Session Logs Reveal
Claude Code keeps local session logs, giving a clearer picture:
| Metric | Value |
|---|---|
| Sessions | 105 |
| Human prompts | 1,496 |
| AI responses | 15,019 |
| AI responses per human prompt | 10:1 |
| Hours with 2+ concurrent sessions | 79% |
| Peak concurrent sessions | 12 |
For every human prompt, the AI averaged 10 responses — reading files, writing code, running tests, fixing issues, committing. Claude Code’s remote-control functionality allowed multiple agents to work in parallel while the human directed new sessions and reviewed completed ones.
Estimated human effort: 37-75 hours (at 1.5–3 minutes per prompt for reading output, thinking, and typing). That’s 6-12% of the web+server’s ~600 hours for 1.8x the code output — though the 600 hours produced code the developer understood and could maintain, while the 37-75 hours produced code no human has read.
Development Patterns
The Original Build: Power-Law Decay
2017 ████████████████████████████ Explosive start (363 web / 134 server)
2018 ████████████ Feature completion (154 web / 73 server)
2019 █████ Category reports sprint, then silence
2020 ▌ Near-dormant (6 web / 14 server)
2021 ██ Sporadic revivals
2022 █ Sporadic
2023 ██ Investment features
2024 ████ Vue 3 migration / server modernization
2025 █ Maintenance mode
2026 ▌ (web) / █████ (native) The native app takes over
~50% of web commits in the first 6 months, ~70% in the first 18 months. Dormancy periods align across both repos — both go quiet and revive together, driven by holidays and life.
moolah-native: An Accelerating Curve
Day 1 (Apr 5) ██ 20 commits — scaffolding, CI, auth
Day 2 (Apr 6) █ 10 commits — accounts, transactions
Day 3 (Apr 7) █ 13 commits — currency, categories
Day 4 (Apr 8) ███ 28 commits — planning, CRUD, iCloud
Day 5 (Apr 9) ██████ 57 commits — profiles, investments, UI
Day 6 (Apr 10) ███████ 68 commits — contract tests, backend alignment
Day 7 (Apr 11) ███████ 70 commits — stock prices, performance
Day 8 (Apr 12) ██████████ 103 commits — crypto, multi-instrument, analysis
Each day produced more than the last. Day 8 alone exceeds most entire months of the original projects.
Are AI Commits Just More Granular?
No — they’re actually larger. The median native commit is 93 lines vs. 24 (web) and 36 (server). Squashing all commits within 1-hour windows gives 66 logical sessions, compared to 90 for the web app’s first 2 months. The high commit count reflects real throughput, not artificial granularity.
28 fix commits changed fewer than 10 lines — micro-patches a human would fold into the parent commit. These represent ~8% of commits. But even excluding them, the fix rate remains high, and the question of how many bugs were introduced and fixed within a session (never appearing in commit history) remains unanswered.
Is It Bloated? Language vs. Real Bloat
The native app is 1.8x the size of web+server combined. How much is language overhead vs. genuine bloat?
API calls show the starkest language difference:
| Operation | Swift (repo + DTO) | JS (client.js) |
|---|---|---|
| Fetch all accounts | ~67 lines | 5 lines |
| Create account | ~32 lines | 6 lines |
Swift requires DTO structs, Codable conformance, explicit mapping functions, typed error handling, and an explicit decode step. JS just calls fetch().
Models are closer than expected — Swift models are only ~30% larger than server DAOs. Web stores are sometimes larger because they mix model shape with mutation logic.
Breaking Down the 20,600 Lines
| Category | Lines | % | Notes |
|---|---|---|---|
| Language/platform overhead | ~5,400 | 26% | Types, inits, DTOs, CodingKeys, #Preview, platform conditionals |
| CloudKit offline backend | ~2,990 | 14% | Offline-first local computation; no web equivalent |
| Native-only features | ~1,750 | 9% | Crypto prices, data export, multi-platform layout |
| Equivalent application logic | ~10,460 | 51% | Would be ~6,500-7,500 lines in JS |
Rewriting only the web-equivalent functionality in JS would yield ~8,000-9,000 lines — close to the actual 11,200. The 1.8x multiplier is mostly language overhead and the offline backend, not AI-generated bloat.
The Remote backend is properly thin: 1,490 lines, of which only ~40 are business logic (2.7%). It constructs requests, decodes responses, maps to domain models. The CloudKit backend (2,990 lines) necessarily contains real logic — it must replicate server-side computation for offline use.
Defect Rates
| Project | Fix Commits | Fix Rate | Organic Fix Rate |
|---|---|---|---|
| moolah-native | 126 | 31% | 31% |
| moolah (web) | 80 | 10.5% | 6.2% (excl. migration breakage) |
| moolah-server | 15 | 3.5% | 2.6% (logic bugs only) |
What Drives Each Project’s Bugs
Native — the generate-and-patch cycle. CategoryPicker: 7 fix commits + 2 complete rewrites in one day. Budget API: two consecutive fixes (wrong endpoint, then wrong UUID format). Empty budget: fixed to “top” alignment, immediately re-fixed to “center”. The pattern: AI generates → breaks → fixes → fix is wrong → fixes the fix. This can consume 5-10 commits for one feature.
Web — dependency breakage. 22 of 80 fixes from the 2024 Vue 3 migration. 7 of 10 reverts were failed dependency upgrades. Organic fix rate excluding migrations: 6.2%.
Server — remarkably stable. 11 logic fixes in 8.8 years, 5 in the same file (dailyBalances.js). Simple CRUD has essentially zero bugs.
The Unvalidated Iceberg
The 31% fix rate only counts bugs found during development. Much functionality remains unvalidated with no production usage and no human code review. The original projects have been in actual use for years — their bugs are known quantities.
Can We Trust AI-Written Tests?
The native app’s test suite is large (13,653 lines, 0.66:1 ratio) but size doesn’t equal value. When AI writes both implementation and tests, both can encode the same wrong assumption.
Five Cases Where Tests Validated Bugs
- Expense sign convention — Tests asserted expenses as positive; server uses negative. Both implementation and test had to change.
- Investment daily balances — Tests computed from value snapshots; correct behavior is cumulative from transactions. Entire test rewritten. The AI built a wrong mental model and tests faithfully encoded it.
- Scheduled transaction filtering — Tests expected scheduled transactions in regular lists. They should be excluded.
- Category deletion — Tests expected child reparenting; server orphans them. AI guessed “reasonable” behavior instead of checking.
- Return type mismatch — Tests asserted
Int; API returnsMonetaryAmount.
33 fix commits (27% of fixes) required changing test expectations alongside the production fix — 33 times the test suite said “this is correct” when it wasn’t.
The TDD That Wasn’t
TDD was instructed from day 1. The AI ignored this for 5 days. Actual test-first behavior only appeared on day 6, when structured “superpowers” skills were installed — enforcement mechanisms stricter than plain-text instructions. Even then, TDD doesn’t help when the AI’s understanding of correct behavior is wrong: it just writes a wrong test first instead of second.
Where Confidence Actually Comes From
| Source | Confidence | Why |
|---|---|---|
| The server | High | 8.8 years of human-written tests and real-world use. When the native app talks to the server, correctness comes from the server. |
| Test architecture | Medium | Real backends (CloudKitBackend + in-memory SwiftData), not mocks. Structurally sound, but can still assert wrong expected values. |
| Manual testing | Medium | 60% of fixes were production-only (no test changes), meaning bugs were found through use, not tests. |
| Test expectations | Low-Medium | Strong regression protection, weak correctness verification. At least 33 demonstrated cases of tests encoding wrong behavior. |
| CloudKit backend | Low | Reimplements server logic with no human review. All 5 test-encoding-bugs were in this layer. |
The Dependency Divide
The native app has zero third-party packages. Everything comes from Apple’s SDK: SwiftUI, SwiftData, CloudKit, URLSession, Charts, XCTest, etc.
The JS projects have ~570 installed packages across ~25 direct dependencies, and 259 commits (22%) touch package.json. Libraries get abandoned (Vuex → Pinia, webpack → Vite, moment → date-fns), major versions break APIs (Vuetify 1→2→3→4 required 8+ commits with reverts), and transitive vulnerabilities create perpetual maintenance.
This directly killed momentum. The 287-day dormancy starting Dec 2018 follows a reverted dependency upgrade. The 303-day gap after Oct 2019 follows a failed migration. A weekend producing only a partially-working upgrade with no new features makes it hard to come back.
The native app avoids this entirely — for now. Apple’s SDK evolves on a predictable annual cycle, not the constant churn of the JS ecosystem.
The Rhythm of a Side Project
| Time Pattern | Web+Server | Native |
|---|---|---|
| Weekend commits | 37-51% | 52% |
| Longest gap | 303-331 days | 13 hours (sleep) |
The dormancy periods align across web and server — both go quiet and revive together, driven by holidays. The native app hasn’t hit its first dormancy yet.
The question isn’t whether it will slow down, but what happens when it does. The original projects are self-documenting — you pick them up after 10 months and the code tells you how it works. The native app is AI-generated and unread. AI might make re-entry easier (it can explain the codebase), but the owner has no independent ability to verify those explanations.
Key Insights
1. AI Changed Who Can Build, Not What Gets Built
The native app was built by someone with zero platform experience. AI made platform expertise optional for initial construction — but the resulting codebase is opaque to its owner in a way the original projects never were.
2. Speed and Quality Traded Off at 12:1
31% fix rate (native) vs. 2.6% (server). The generate-and-patch cycle reflects genuine instability, not just frequent commits.
3. AI Ignores Instructions Without Enforcement
TDD was instructed from day 1, ignored for 5 days. Only structured skill enforcement changed actual behavior. Plain-text instructions are suggestions, not constraints.
4. AI-Written Tests Can Validate Bugs
33 fix commits required changing test expectations — the tests were asserting buggy behavior was correct. When AI writes both sides from the same wrong model, tests provide false confidence. Good test architecture (real backends, no mocks) helps but doesn’t solve the problem.
5. The 1.8x Size Ratio Is Mostly Language, Not Bloat
~26% is Swift type system overhead, ~14% is the offline CloudKit backend (which the web app doesn’t have), ~9% is native-only features. The Remote backend is properly thin. Feature-level code is comparable to the web equivalents.
6. Plans Are a Supervision Mechanism, Not Documentation
The original projects need no documentation — the code is self-documenting. The native app has 46,700 lines of plans because AI-directed development needs an external record of intent. The AI frequently fails to fully execute plans, so keeping them lets you audit completeness. Plans aren’t documentation — they’re a quality control mechanism for an unreliable implementer.
7. The JS Dependency Treadmill Is a Real Cost
22% of all web+server commits are dependency maintenance. Failed upgrades killed momentum and contributed to dormancy. The native app’s zero-dependency approach avoids this entirely, though Apple’s evolution will eventually impose its own (more predictable) tax.
8. The Risk Is Opacity, Not Size
20,600 lines is a manageable codebase. The risk is that zero of those lines have been read by a human. If AI tools remain capable, this may work. If they don’t — or the codebase outgrows what AI can reason about — the project is stranded. The original projects carry no such risk: self-documenting code that anyone with JS experience can pick up.
9. Side Projects Have a Heartbeat Regardless of Tooling
Dormancy cycles are driven by life, not technology. AI may change the revival cost, but it doesn’t change the fundamental constraint that side projects compete with the rest of life for time and energy.