Moolah Vibe Coding

Adrian Sutton

April 13, 2026

I used custom software called Moolah to track my personal finances for many years now. Originally it was written in JavaScript with moolah-server providing the backend and moolah the frontend. This holidays I’ve been vibe coding a replacement written entirely in Swift. It seems like a somewhat useful experiment to learn more about the trade offs of extreme AI usage, so I had Claude dig into the available stats from GitHub and its session logs to compare the two projects and see what we can learn. It’s not a controlled experiment so lots of room for interpretation, but an interesting data point none the less.

The initial version it came out with was a bit absurdly positive about AI, along the lines of I wrote 10x the lines of code in 0.2% of the time - more code is better! But with a bit of prompting and providing additional background it came out with the report below. I’ll write up some more human thoughts on the experience later, but I think the AI-crunched numbers are worth sharing by themselves to set the scene and because the key learnings are genuinely useful.

The Cast

Project	Tech	Purpose	Active dev days	Commits	LOC (prod)
moolah (web)	Vue.js	Web frontend	149 days over 8.7 yrs	763	~8,400
moolah-server	Node.js/Hapi	REST API backend	87 days over 8.8 yrs	405	~2,800
moolah-server-go	Go	Learning exercise, abandoned	5 days	21	~500
moolah-native	SwiftUI	Native iOS/macOS app	8 days	369	~20,600

Active dev days = days with at least one non-dependency-maintenance commit. The web and server are one project across two repos — 70 days overlap — so combined unique effort is 180 days.

How Each Project Was Built

moolah & moolah-server were started June 24, 2017 by Adrian Sutton, with Brett Henderson contributing the initial web import. Every line across 1,168 combined commits was written by hand with zero AI involvement. Adrian had deep JavaScript experience and was inventing the domain model, API, database schema, and UX simultaneously — greenfield design work. The code is self-documenting: no significant documentation exists because none is needed. The code speaks for itself, making the projects easy to pick up after months of absence with no risk of stale docs.

moolah-native was entirely AI-generated starting April 5, 2026. Adrian directed the work but has not read the code and has no Swift, SwiftUI, iOS, or macOS experience. Multiple AI agents were used, switching between them as rate limits were hit. 327 of 369 commits carry a Claude co-author tag; the remaining 42 “solo” commits are manual commits of AI-written code. Effectively 100% AI-authored by someone with no ability to review the output.

moolah-server-go was a 5-day learning exercise started during the holiday between jobs — a way to learn Go before a new role that required it. The goal was achieved regardless of the project being abandoned.

The Effort Question

Raw calendar span is misleading for side projects with multi-month dormancy periods. Active development days varied enormously in intensity:

Session type	Web+Server (combined)	Native
Full day (6+ hrs)	22 days	8 days
Half day (3-5 hrs)	55 days	0
Quick (1-2 hrs)	140 days	0
Estimated total hours	~600 hrs	~84 commit-hours

But these numbers aren’t comparable. The web+server hours are a developer actively writing and reasoning about code. The native app’s commit-hours are largely the AI working autonomously.

What the Session Logs Reveal

Claude Code keeps local session logs, giving a clearer picture:

Metric	Value
Sessions	105
Human prompts	1,496
AI responses	15,019
AI responses per human prompt	10:1
Hours with 2+ concurrent sessions	79%
Peak concurrent sessions	12

For every human prompt, the AI averaged 10 responses — reading files, writing code, running tests, fixing issues, committing. Claude Code’s remote-control functionality allowed multiple agents to work in parallel while the human directed new sessions and reviewed completed ones.

Estimated human effort: 37-75 hours (at 1.5–3 minutes per prompt for reading output, thinking, and typing). That’s 6-12% of the web+server’s ~600 hours for 1.8x the code output — though the 600 hours produced code the developer understood and could maintain, while the 37-75 hours produced code no human has read.

Development Patterns

The Original Build: Power-Law Decay

2017  ████████████████████████████  Explosive start (363 web / 134 server)
2018  ████████████                  Feature completion (154 web / 73 server)
2019  █████                         Category reports sprint, then silence
2020  ▌                             Near-dormant (6 web / 14 server)
2021  ██                            Sporadic revivals
2022  █                             Sporadic
2023  ██                            Investment features
2024  ████                          Vue 3 migration / server modernization
2025  █                             Maintenance mode
2026  ▌ (web) / █████ (native)      The native app takes over

~50% of web commits in the first 6 months, ~70% in the first 18 months. Dormancy periods align across both repos — both go quiet and revive together, driven by holidays and life.

moolah-native: An Accelerating Curve

Day 1 (Apr 5)   ██                    20 commits — scaffolding, CI, auth
Day 2 (Apr 6)   █                     10 commits — accounts, transactions
Day 3 (Apr 7)   █                     13 commits — currency, categories
Day 4 (Apr 8)   ███                   28 commits — planning, CRUD, iCloud
Day 5 (Apr 9)   ██████                57 commits — profiles, investments, UI
Day 6 (Apr 10)  ███████               68 commits — contract tests, backend alignment
Day 7 (Apr 11)  ███████               70 commits — stock prices, performance
Day 8 (Apr 12)  ██████████            103 commits — crypto, multi-instrument, analysis

Each day produced more than the last. Day 8 alone exceeds most entire months of the original projects.

Are AI Commits Just More Granular?

No — they’re actually larger. The median native commit is 93 lines vs. 24 (web) and 36 (server). Squashing all commits within 1-hour windows gives 66 logical sessions, compared to 90 for the web app’s first 2 months. The high commit count reflects real throughput, not artificial granularity.

28 fix commits changed fewer than 10 lines — micro-patches a human would fold into the parent commit. These represent ~8% of commits. But even excluding them, the fix rate remains high, and the question of how many bugs were introduced and fixed within a session (never appearing in commit history) remains unanswered.

Is It Bloated? Language vs. Real Bloat

The native app is 1.8x the size of web+server combined. How much is language overhead vs. genuine bloat?

API calls show the starkest language difference:

Operation	Swift (repo + DTO)	JS (client.js)
Fetch all accounts	~67 lines	5 lines
Create account	~32 lines	6 lines

Swift requires DTO structs, Codable conformance, explicit mapping functions, typed error handling, and an explicit decode step. JS just calls fetch().

Models are closer than expected — Swift models are only ~30% larger than server DAOs. Web stores are sometimes larger because they mix model shape with mutation logic.

Breaking Down the 20,600 Lines

Category	Lines	%	Notes
Language/platform overhead	~5,400	26%	Types, inits, DTOs, CodingKeys, `#Preview`, platform conditionals
CloudKit offline backend	~2,990	14%	Offline-first local computation; no web equivalent
Native-only features	~1,750	9%	Crypto prices, data export, multi-platform layout
Equivalent application logic	~10,460	51%	Would be ~6,500-7,500 lines in JS

Rewriting only the web-equivalent functionality in JS would yield ~8,000-9,000 lines — close to the actual 11,200. The 1.8x multiplier is mostly language overhead and the offline backend, not AI-generated bloat.

The Remote backend is properly thin: 1,490 lines, of which only ~40 are business logic (2.7%). It constructs requests, decodes responses, maps to domain models. The CloudKit backend (2,990 lines) necessarily contains real logic — it must replicate server-side computation for offline use.

Defect Rates

Project	Fix Commits	Fix Rate	Organic Fix Rate
moolah-native	126	31%	31%
moolah (web)	80	10.5%	6.2% (excl. migration breakage)
moolah-server	15	3.5%	2.6% (logic bugs only)

What Drives Each Project’s Bugs

Native — the generate-and-patch cycle. CategoryPicker: 7 fix commits + 2 complete rewrites in one day. Budget API: two consecutive fixes (wrong endpoint, then wrong UUID format). Empty budget: fixed to “top” alignment, immediately re-fixed to “center”. The pattern: AI generates → breaks → fixes → fix is wrong → fixes the fix. This can consume 5-10 commits for one feature.

Web — dependency breakage. 22 of 80 fixes from the 2024 Vue 3 migration. 7 of 10 reverts were failed dependency upgrades. Organic fix rate excluding migrations: 6.2%.

Server — remarkably stable. 11 logic fixes in 8.8 years, 5 in the same file (dailyBalances.js). Simple CRUD has essentially zero bugs.

The Unvalidated Iceberg

The 31% fix rate only counts bugs found during development. Much functionality remains unvalidated with no production usage and no human code review. The original projects have been in actual use for years — their bugs are known quantities.

Can We Trust AI-Written Tests?

The native app’s test suite is large (13,653 lines, 0.66:1 ratio) but size doesn’t equal value. When AI writes both implementation and tests, both can encode the same wrong assumption.

Five Cases Where Tests Validated Bugs

Expense sign convention — Tests asserted expenses as positive; server uses negative. Both implementation and test had to change.
Investment daily balances — Tests computed from value snapshots; correct behavior is cumulative from transactions. Entire test rewritten. The AI built a wrong mental model and tests faithfully encoded it.
Scheduled transaction filtering — Tests expected scheduled transactions in regular lists. They should be excluded.
Category deletion — Tests expected child reparenting; server orphans them. AI guessed “reasonable” behavior instead of checking.
Return type mismatch — Tests asserted Int; API returns MonetaryAmount.

33 fix commits (27% of fixes) required changing test expectations alongside the production fix — 33 times the test suite said “this is correct” when it wasn’t.

The TDD That Wasn’t

TDD was instructed from day 1. The AI ignored this for 5 days. Actual test-first behavior only appeared on day 6, when structured “superpowers” skills were installed — enforcement mechanisms stricter than plain-text instructions. Even then, TDD doesn’t help when the AI’s understanding of correct behavior is wrong: it just writes a wrong test first instead of second.

Where Confidence Actually Comes From

Source	Confidence	Why
The server	High	8.8 years of human-written tests and real-world use. When the native app talks to the server, correctness comes from the server.
Test architecture	Medium	Real backends (CloudKitBackend + in-memory SwiftData), not mocks. Structurally sound, but can still assert wrong expected values.
Manual testing	Medium	60% of fixes were production-only (no test changes), meaning bugs were found through use, not tests.
Test expectations	Low-Medium	Strong regression protection, weak correctness verification. At least 33 demonstrated cases of tests encoding wrong behavior.
CloudKit backend	Low	Reimplements server logic with no human review. All 5 test-encoding-bugs were in this layer.

The Dependency Divide

The native app has zero third-party packages. Everything comes from Apple’s SDK: SwiftUI, SwiftData, CloudKit, URLSession, Charts, XCTest, etc.

The JS projects have ~570 installed packages across ~25 direct dependencies, and 259 commits (22%) touch package.json. Libraries get abandoned (Vuex → Pinia, webpack → Vite, moment → date-fns), major versions break APIs (Vuetify 1→2→3→4 required 8+ commits with reverts), and transitive vulnerabilities create perpetual maintenance.

This directly killed momentum. The 287-day dormancy starting Dec 2018 follows a reverted dependency upgrade. The 303-day gap after Oct 2019 follows a failed migration. A weekend producing only a partially-working upgrade with no new features makes it hard to come back.

The native app avoids this entirely — for now. Apple’s SDK evolves on a predictable annual cycle, not the constant churn of the JS ecosystem.

The Rhythm of a Side Project

Time Pattern	Web+Server	Native
Weekend commits	37-51%	52%
Longest gap	303-331 days	13 hours (sleep)

The dormancy periods align across web and server — both go quiet and revive together, driven by holidays. The native app hasn’t hit its first dormancy yet.

The question isn’t whether it will slow down, but what happens when it does. The original projects are self-documenting — you pick them up after 10 months and the code tells you how it works. The native app is AI-generated and unread. AI might make re-entry easier (it can explain the codebase), but the owner has no independent ability to verify those explanations.

Key Insights

1. AI Changed Who Can Build, Not What Gets Built

The native app was built by someone with zero platform experience. AI made platform expertise optional for initial construction — but the resulting codebase is opaque to its owner in a way the original projects never were.

2. Speed and Quality Traded Off at 12:1

31% fix rate (native) vs. 2.6% (server). The generate-and-patch cycle reflects genuine instability, not just frequent commits.

3. AI Ignores Instructions Without Enforcement

TDD was instructed from day 1, ignored for 5 days. Only structured skill enforcement changed actual behavior. Plain-text instructions are suggestions, not constraints.

4. AI-Written Tests Can Validate Bugs

33 fix commits required changing test expectations — the tests were asserting buggy behavior was correct. When AI writes both sides from the same wrong model, tests provide false confidence. Good test architecture (real backends, no mocks) helps but doesn’t solve the problem.

5. The 1.8x Size Ratio Is Mostly Language, Not Bloat

~26% is Swift type system overhead, ~14% is the offline CloudKit backend (which the web app doesn’t have), ~9% is native-only features. The Remote backend is properly thin. Feature-level code is comparable to the web equivalents.

6. Plans Are a Supervision Mechanism, Not Documentation

The original projects need no documentation — the code is self-documenting. The native app has 46,700 lines of plans because AI-directed development needs an external record of intent. The AI frequently fails to fully execute plans, so keeping them lets you audit completeness. Plans aren’t documentation — they’re a quality control mechanism for an unreliable implementer.

7. The JS Dependency Treadmill Is a Real Cost

22% of all web+server commits are dependency maintenance. Failed upgrades killed momentum and contributed to dormancy. The native app’s zero-dependency approach avoids this entirely, though Apple’s evolution will eventually impose its own (more predictable) tax.

8. The Risk Is Opacity, Not Size

20,600 lines is a manageable codebase. The risk is that zero of those lines have been read by a human. If AI tools remain capable, this may work. If they don’t — or the codebase outgrows what AI can reason about — the project is stranded. The original projects carry no such risk: self-documenting code that anyone with JS experience can pick up.

9. Side Projects Have a Heartbeat Regardless of Tooling

Dormancy cycles are driven by life, not technology. AI may change the revival cost, but it doesn’t change the fundamental constraint that side projects compete with the rest of life for time and energy.