Back to AI/ML Overview
πŸͺ The Brief Β· Stress-Test an AI Coding Partner on Real Production Software

CosmicKeysA consumer typing platform, built end-to-end with an AI coding partner β€” to measure what the new paradigm actually ships in production.

Multi-region GCP. 7 languages. 14 voices. 30 keyboard layouts. Stripe payments + GDPR. Blue-green deploys. 155,000 lines of production code across 1,237 commits in 6 months. The test bed was chosen to stress every dimension of production engineering at consumer scale β€” and to answer one question: what can an AI coding partner actually ship, end-to-end, under one experienced engineer's direction? Live at cosmickeys.app.

1,237
commits
155K
lines of production code
20 hats
1 engineer
6 mo
first commit β†’ launch
3 regions
GCP US / EU / Asia
7 langs
Β· 14 TTS voices Β· 30 layouts

πŸ“‹The Brief β€” Why This Product, This Way, At This Moment

After wrapping a long corporate engineering engagement, the question worth answering was: what can an AI coding partner actually ship in production, end-to-end, under realistic engineering discipline? Not in a demo. Not in a tutorial. In a real consumer product with real users, real payments, real regional infrastructure, real failure modes.

CosmicKeys is the experiment's test bed. The page below is the result.

What this page IS (and what it isn't)

CosmicKeys is an -coding-only project β€” and it's the FIRST product in a sequence. The story below is about how one experienced engineer used Claude Code as a coding partner (plus a dev-time 3-AI code review process: Claude + + Codex) to ship a production-grade consumer product solo in 6 months.

βœ… What this page IS about
  • β€’ coding with Claude Code as primary AI partner
  • β€’ Dev-time multi-AI code review (Claude + + Codex)
  • β€’ Production architecture that one engineer can ship solo
  • β€’ 1,237 commits / 155K LOC / 6 months of coding-productivity work
❌ What this page is NOT about
  • β€’ integration as a product feature
  • β€’ at runtime
  • β€’ AI-driven personalization or AI-in-the-product
  • β€’ The deployed app makes zero per-user inference calls β€” one narrow exception ( for support-ticket translation, covered later) and that's it

Why the explicit boundary: and runtime integration as product features came after CosmicKeys, with my next project WatchAlgo (which auto-generates solutions for user-submitted problems via a multi-agent pipeline). That's a different story for a different page. CosmicKeys is the -coding success story that came first and made the rest possible.

🎧Audio Edition
20 min listen

One Developer Shipping 155,000 Lines with AI

Prefer to listen? A two-host deep-dive walking through the entire CosmicKeys story β€” the experiment's premise, the Tango Development Pattern, the LLM Council code-review pattern, the production architecture, and the deliberate boundaries that made it shippable solo in 6 months. Same story as this page, as a conversation.
Download for offline listeningβ€’Same story as this page, as a conversation

Product selection criteria

For the experiment to mean anything, the product had to satisfy four constraints simultaneously:

Simple enough to be universal

Anyone with a computer should be able to use it without domain expertise β€” no field-specific learning curve, no regional cultural barrier. Theoretical user base in the billions.

Complex enough to exercise every dimension of production engineering

Multi-language (with real cultural adaptation, not translated strings). Real-time analytics. Payments and subscription lifecycle. Multi-region deployment with data residency. Accessibility. Authentication. Observability. Support tooling. If the AI partner couldn't ship cleanly in any one of those dimensions, the brief would have failed there.

Scale architecture that's real, not theoretical

The design has to handle 1 user OR 2 billion users without rewriting the foundation. Otherwise the experiment is testing the AI on a toy.

Consumer-facing β€” real users force real polish

The AI partner has to ship something real humans actually use, not internal tooling that papers over rough edges. Real users force real polish.

Why typing

Touch typing fits all four criteria precisely. It's universal β€” every computer user types every day, in every language, in every profession. It's multi-language not as an afterthought but as the core product (typing in German is a different motion from typing in English; typing on a Swedish keyboard exercises different muscles than on an American one). It needs payments to be sustainable without ads. It needs to scale from 1 to billions because typing won't go away. And it's consumer-facing in the most demanding way: typists feel latency, animation jitter, and audio drift in milliseconds because their own hands are providing the input ground truth.

Most consumer products would test the AI partner on one or two dimensions. Typing β€” done seriously β€” tests it on all of them. The rest of this page is the answer to whether the AI partner could actually ship under that brief.

The methodology β€” the Tango Development Pattern (TDP)

Let's give the dance a name: the Tango Development Pattern, or TDP if it catches on (and if it doesn't, you still got a useful mental model out of the bargain β€” coined here, free to take). Here's the bet behind the term: most engineers building with AI today are already doing some version of this dance, knowingly or unknowingly. The ones who get fluent in the choreography are the ones whose AI partner ships production-grade work; the ones who don't get fluent end up with impressive-but-fragile demos that fall apart at the edges.

The trick isn't extracting more out of AI. The trick is knowing when to lead (product vision, judgment calls, rejecting a β€œgood enough” answer that isn't really good, drawing the line on what NOT to build) and when to let AI lead (implementation depth across 20 specialties, breadth-of-knowledge moves you couldn't do alone in six months even if you tried). Learn the steps and the lead-swap, and you can ship pretty much any consumer product under a brief like this one. CosmicKeys is the worked example; TDP is the technique.

πŸ”‘The choreography that produced 155,000 lines of production code
Human leads β†’ AI executes β†’ Human reviews β†’ AI partner-reviews β†’ Human triages β†’ Ship.Repeat 1,237 times. That's TDP, .

For 25 years pre-AI, a consumer product of this scope would have required a team β€” product manager, designer, frontend, backend, DevOps, DBA, QA, security, localization, content, growth. A team of 5 to 8 people, working for 12 to 18 months, optimistically.

Under the brief, one experienced engineer plus one AI coding partner shipped it in 6 months β€” wearing 20 professional hats simultaneously, some carrying 20+ years of deep personal expertise (🎯), others where the human knew abstractly enough to lead the dance while AI brought the implementation depth (🀝).

The real superpower anyone building with AI as a partner needs isn't expertise in 20 domains. It's the ability to formulate the right question, evaluate AI's answer against the product vision, and make the final judgment call. That's the meta-skill that makes the tango work β€” and it's the skill that matters most in any role where AI is a force multiplier.

Neither the human alone nor the AI alone could have shipped CosmicKeys. It took both. Dancing.

🎯Why CosmicKeys Looks Different from Other Typing Apps

The touch-typing category is a graveyard of clumsy interfaces glued together with banner ads, gated at $9.99/month for features that should be free, and almost never localized beyond a translated UI string. The market accepts mediocrity because typing apps are seen as commodity software β€” pick one, suffer the ads, move on.

CosmicKeys is the counter-example the brief produced. When the engineering discipline says β€œtreat architecture, UX, and rich functionality as non-negotiable from day one β€” no trade-offs against each other,” the result looks structurally different from what the category typically ships.

🚫

No Ads. Ever.

Not in the lessons, not on the dashboard, not at session end. Typing practice depends on focus; ads break it.

🌍

Localized End-to-End

7 languages Γ— 9 regions. Native-speaker voice narration per locale. Region-specific keyboard layouts. Not a translated UI string β€” a fully localized product.

🎨

Polish Equal to Architecture

Every screen is hand-designed. Animations match your actual keyboard. Subscription management is one click, not a dark pattern.

🎯

Adaptive Lesson Selection (Deterministic)

Each completed lesson stores rich analytics (errors_by_char, slowest_keys, rhythm, per-finger latency) to the lesson_attempts table. The adaptive lesson recommender deterministically scores curriculum candidates against these analytics and surfaces the next lesson that targets your current weak keys. No runtime LLM call β€” the intelligence lives in the analytics design and the curriculum, not in an inference call per session.

⚑

Built Solo, Production Grade

Multi-region backend, client-side analytics pipeline, deterministic adaptive lesson selection, blue-green deploys to 3 Cloud Run regions β€” all shipped by one engineer in months, not years.

πŸ’š

Free, Sustainable Through Usage

Core features free forever. ElevenLabs TTS voices are pre-generated at content-release time (not a per-user cost) and cached in the CDN. Subscription tiers exist for higher usage limits and advanced features, not for gating core typing practice.

πŸ”‘The thesis the platform proves
One experienced engineer plus one AI coding partner can ship a multi-region, multi-language consumer product with the polish, architecture, and rich functionality usually attributed to a team of 5-7. Two foundational decisions made that possible β€” the Cosmic Identity and the Technology Stack β€” both spec'd before component code. Each is its own section below; together they're the rails the implementation runs on.

🎨The Cosmic Identity β€” The Style Guide That Runs on Every Surface

The first foundational decision after the brief was the visual identity, deliberately spec'd before any component code was written. The first surface a visitor sees isn't a screen β€” it's a URL. The product had to live at a domain that hinted at the theme before the user landed: cosmickeys.app, picked deliberately over generic options like typing.io or typingpractice.com because the cosmic frame was the product's differentiator from the category's sea of bland white-and-blue UIs.

The visual identity reduces to three primitives that show up on every surface across the 196 production components (counted as .tsx files across frontend/components, contexts, hooks, and the App Router routes β€” verifiable by anyone with the repo):

Theme β€” cosmic

Deep slate background, cyan-and-purple nebula gradients, dark surfaces with subtle light (the swirls and starlight you see inside this card are the cosmic palette running live). Picked because typing apps are used for long focused sessions; dark reduces eye strain and the cosmic palette signals depth where the category typically signals utilitarianism.

CosmicKeys logo β€” a typewriter keycap emerging from the cosmos, edge glowing as if catching starlight
Logo β€” a typewriter keycap emerging from the cosmos (shown left, the actual mark)

Its edge glows as if catching starlight (the soft purple aurora pulsing behind it on this card is exactly the effect the logo reads as in context); the cosmic radial gradient lives inside the keycap itself. Reads at every size from the 16px favicon to the 1200px share-card watermark β€” a single mark the eye recognizes whether it's a browser tab or a social preview.

Recurring design primitive β€” the prismatic card

A translucent surface with a soft glowing light bleeding from its edges (like this card itself β€” the green halo around its rim is the prismatic-card primitive rendered at full strength), used everywhere from lesson tiles to analytics widgets to social-share previews. When you see the prismatic glow in a screenshot, you're looking at a CosmicKeys artifact.

The style guide spec'd these decisions explicitly up front, so the AI coding partner had a constrained visual palette to execute against rather than improvising component-by-component. Every screen across 196 production components reads as part of the same family because the AI partner had rails to run on. Without the up-front spec, the same AI partner would have shipped a dozen subtly-inconsistent variations of the same card pattern β€” coherence is what disciplined direction produces, not what AI volunteers.

πŸ’‘A note on the human role for this hat (🀝 AI-augmented)
I'm not a designer by training. But across a 25-year career I've worked alongside UI/UX leads and creative directors closely enough to internalize what good design judgment sounds like β€” the questions to ask, the consistency to enforce, the rejections to make when something feels β€œgeneric dark mode” instead of cosmic. That's the 🀝 hat at work: abstract human judgment supplied by collaboration with experts over decades, paired with AI's implementation depth to ship a coherent visual identity across 196 components without a dedicated designer on the team. If you're an engineer who has worked closely with designers, you can lead an AI partner through this kind of work β€” the lesson here isn't β€œbe a designer,” it's β€œship a style guide before component code so the AI has rails to run on.”

βš™οΈThe Technology Stack β€” Engineering Choices, Not Just Tool Picks

With the brief defined and the visual identity locked, the third foundational decision is the technology stack. Each choice was made to optimize for one of two things: ship velocity (so the AI partner could move fast without re-deriving common patterns) or response latency (because typing apps live or die on input-to-paint time). The choices below aren't name-brand defaults; each was the answer to a specific question this product asked.

Frontend β€” Next.js 15 + React 19 + TypeScript + Tailwind v4

  • β€’ Server Components reduce client-side JS payload β€” meaningful for laptop users on cellular hotspots and for mobile visitors evaluating the product (typing practice itself requires a physical keyboard, but the marketing page often gets first-checked from a phone before the prospect comes back on their computer)
  • β€’ App Router's nested layouts simplify multi-language routing (/en, /de, /es, …)
  • β€’ TypeScript strict mode catches integration bugs the AI partner would otherwise find at runtime
  • β€’ Tailwind v4's CSS-first config matches the style-guide-before-code discipline β€” design tokens live in CSS, not buried in component files

Backend β€” Python FastAPI

  • β€’ Async-first means a single FastAPI instance handles hundreds of concurrent typing sessions without thread-pool starvation
  • β€’ Pydantic validates request/response shapes at the API boundary β€” the AI partner's generated endpoints can't ship malformed data
  • β€’ Python is the obvious language for the analytics pipeline (numpy/pandas-adjacent compute on JSONB analytics documents)
  • β€’ Auto-generated OpenAPI docs mean the frontend's API client can be regenerated rather than hand-maintained

Database β€” PostgreSQL

  • β€’ JSONB columns let typing analytics live as flexible documents under a strict primary schema β€” the per-user analytics document evolves without migrations
  • β€’ GIN indexes on JSONB make queries against deeply-nested analytics nearly as fast as on normalized tables
  • β€’ Regional read replicas (one per Cloud SQL region) deliver sub-50ms read latency to users in any region
  • β€’ Mature, predictable, well-understood operational characteristics β€” no surprises at 3am when the SLA depends on the DB

Infrastructure β€” Google Cloud Platform

  • β€’ Cloud Run scales the FastAPI service from zero to thousands of instances per region β€” cost-efficient at small scale, automatic when traffic spikes
  • β€’ Cloud SQL provides managed Postgres with regional replicas, automated backups, point-in-time recovery β€” operational work that would otherwise need a dedicated DBA
  • β€’ Workload Identity Federation eliminates long-lived service-account keys (security posture)
  • β€’ Multi-region deployment (US/EU/Asia) for data residency compliance + sub-100ms response time globally

The latency principle β€” local-first, sync-second

Typing is the most latency-sensitive interaction in any application. A user pressing J expects the J to appear instantly β€” anything over 50ms feels broken. So CosmicKeys uses a local-first architecture: every typing-session interaction (keystroke registration, WPM calculation, accuracy tracking) happens client-side first, with the result rendered immediately. Data is then batched and synced to the backend in the background, encrypted in localStorage during the gap.

  • Client-side WPM + accuracy computation on every keystroke β€” no server round-trip during typing. The number you see is the truth as of this exact millisecond, not as of the last ack.
  • Per-session analytics buffered in XOR-encrypted localStorage β€” keystroke events accumulate locally during practice.
  • Batched DB sync at natural break points (lesson complete, page navigation, idle threshold) β€” the server gets aggregated facts, never the raw keystroke stream.
  • Privacy bonus: the server never sees what users actually typed (custom-lesson content lives only in localStorage; the database stores derived metrics, not source text).
  • Offline tolerance: a user can practice through a 5-minute internet outage and have their progress sync when the connection returns.

The principle isn't β€œmake the network faster” β€” it's β€œremove the network from the critical path.” That's what makes the app feel native rather than web-app-y. The 9 sections of the Platform Tour below each apply this principle wherever input latency matters; the design-system and latency-principle callouts inside each section flag the specific instances.

πŸ–ΌοΈPlatform Tour β€” See It Without Signing Up

Whether you're curious how the product looks, evaluating the work without creating an account, or studying the build for your own AI-native project β€” the whole platform is captured below. Click any section header to expand it, then click any screenshot inside to open it at full size in a modal viewer with horizontal and vertical scroll for the larger images.

37 screenshots across 9 feature areas. Skim the section headings, expand the ones that matter to you, decide whether the architecture deep-dive below is worth your time.

πŸ’‘Before you click in β€” what I already know is imperfect

This is shipping at v1, and there are things an attentive reader (or a real user) might notice that I already know about and have made deliberate triage calls on:

  • Some surfaces are slightly over-engineered. Social sharing in particular has per-section share icons and emoji affordances in places where one page-level share would have been enough β€” leftover from an earlier design pass that I'd consolidate when bandwidth allows.
  • Some UI polish is still in the queue. A few spacing inconsistencies and icon-alignment edge cases remain. None block the user's path through the product.
  • A handful of minor bugs I'm aware of but haven't prioritized. They surface only in specific edge-case sequences, and I'm deliberately waiting for a real user to hit them in the wild before deciding which to fix first β€” that signal is more honest than my own guess at priority.

None of these are killer bugs. Nothing crashes the typing experience. Payments work. Within-region failover (Cloud Run auto-recover + Cloud SQL Regional HA + GLB health checks) works as designed. The analytics pipeline is honest. But if you spot something that feels off, I almost certainly already know about it and have made a deliberate call. That's part of what shipping a v1 looks like under the brief β€” knowing where the rough edges are, leaving the ones real users don't actually notice, and waiting for evidence before re-prioritizing.

The landing page is the first surface where the Cosmic Identity and the Tech Stack established above meet a real user. The visual frame is the prismatic-card pattern; the input latency feel comes from local-first state. Both are intentional. What the user actually does on this page is two things: pick a language, then choose how to sign in.

The landing page auto-detects the user's preferred language from their computer settings and uses it as the default β€” about 99% of visitors never touch the dropdown because the system-detected language is correct. The dropdown is there for the other 1%: users who want to switch to one of the other 7 selector entries (the dropdown shows 8 in total because English splits into US and UK, each with its own default keyboard layout). The chosen language is then tied to the account going forward β€” typing-history analytics are scoped per-language, so switching languages mid-account is intentionally awkward (use a separate account if you actively practice multiple).

Sign-in offers 5 paths so the user picks the path of least friction for them: 3 OAuth providers (Google, GitHub, Microsoft β€” no userid/password to remember, nothing for CosmicKeys to store or leak), Magic Link via Resend (one-tap sign-in via email link, no password ever set), and a fully-functional Guest mode where the user practices with zero account at all β€” their entire history lives in XOR-encrypted localStorage and migrates atomically to a real account on sign-up if they decide to stay.

πŸ§‘ Human
Product Manager + UX Designer + Security Reviewer
Defined the friction-free entry: 7 first-class language tiles as the landing hero (auto-detected via navigator.language, override-able), then 5 sign-in paths β€” 3 OAuth providers + Magic Link via Resend + a fully-functional Guest mode where the user's entire practice history lives in XOR-encrypted localStorage. Decided that the language choice should pre-configure not just UI strings but the voice family AND the keyboard layout (matched to OS + region detected by the pre-signin endpoint), so the first lesson loads in the correct visual + audio context with zero configuration wizard. Account-bound language means a German user who signs up in German sees German voice, German keyboard, German lesson content β€” consistency per learner, not per device.
πŸ€– AI Partner
Implementation Engineer
Generated the landing-page layout, the 3-OAuth-provider plumbing with their different scope and redirect patterns, the Resend magic-link send / verify / callback flow with localized email templates per language, and the guest phantom-user system whose XOR-encrypted localStorage progress migrates atomically to a real account on sign-up. Built the BROWSER_LANGUAGE_MAP detection layer with fallback through navigator.languages array, and the pre-signin endpoint that detects OS, keyboard type (ANSI/ISO), and language in parallel.
πŸ’ƒ The Tango
The human wrote the “5 methods, zero passwords, guest progress migrates” spec in 4 bullets. AI generated the implementation in one pass. Gemini partner-review flagged that the migration path lacked an anti-abuse guard for delete-then-restore cycles. Codex partner-review confirmed the session cookie was SameSite=Strict. The human triaged findings, iterated 3 rounds, and locked the friction at actually zero β€” not just “low.” The whole sign-in surface (5 paths + guest mode + 7-language pre-config + 3-AI security audit) shipped in 9 commits over 4 days; that throughput is the human-AI tango's signature.
Why this screen exists

The first 5 seconds on the landing page decide whether a visitor stays or bounces. CosmicKeys leads with 7 first-class languages as a top-of-page commitment rather than a buried settings toggle β€” that signals localization is baked in, not bolted on. The cosmic dark-space-with-cyan-and-purple-nebula palette signals visual confidence on a category dominated by sterile white-with-blue-accent typing apps, and the typing brand promise is implicit because nothing else is fighting for attention.

β–ΆDesign nuances most visitors miss(6)
  • Each of the 8 selector entries maps not just to a UI translation but to a specific TTS voice family (14 voices total β€” male + female per content language; English content is shared between the US and UK entries because the only difference is the default keyboard layout) and a curated set of regional keyboard layouts. Picking a language pre-configures the whole product, not just the chrome.
  • Pre-signin language detection runs before the user clicks anything: navigator.language β†’ BROWSER_LANGUAGE_MAP β†’ SUPPORTED_LANGUAGES[code] with a fallback through navigator.languages array; the user's best-match language is visually highlighted so they know the app already heard them.
  • In parallel with language detection, the pre-signin endpoint also detects OS (Mac/Windows) and keyboard type (ANSI/ISO), so by the time the user picks a language, the keyboard layout for the first lesson is already correct β€” no configuration wizard, no “first lesson in wrong layout” embarrassment.
  • The palette (deep slate background, cyan/purple nebula gradients) was chosen before a single React component was written β€” Sam locked the brand before the code so AI's generic dark-mode defaults got rejected on principle until they actually felt cosmic.
  • No cookie modal, no email-capture popover, no “rate our beta” toast on first visit β€” those exist but are deferred; the language hero is a deliberate inversion of the modern web's default of accosting the user before they've seen the product.
  • The page is server-rendered with the detected locale already applied, so the first paint shows the user's language β€” not a flash of English that swaps to German half a second later.
β–ΆWhat each hat contributed(6)
  • 🎯#1 Product Manager Β· Decided that language is a first-class onboarding decision, not a settings-screen afterthought β€” the alternative (default English, change later) would have lost the multilingual user before they typed a single key.
  • 🎯#2 Creative Director Β· Locked the cosmic theme + nebula palette pre-code as a brand decision; rejected AI's generic “dark mode” defaults until the page actually felt like a typing platform for stargazers rather than a Bootstrap landing template.
  • 🀝#3 UX / UI Designer Β· Designed the language-tiles-as-hero pattern with each tile sized so a desktop visitor sees all 7 at once, and a mobile visitor β€” typically a prospect evaluating the product before coming back on their computer for actual typing practice β€” sees at least 4 above the fold. Same hierarchy, scaled to each device's real role.
  • 🀝#11 Localization Lead Β· Owned the entire 7-language matrix: UI strings, voice mappings, keyboard layouts, custom-lesson templates, locale-aware SEO routing, currency mapping β€” every layer of the product is localized, not just the words on this page.
  • 🎯#13 Security Engineer Β· Reviewed the pre-signin detection endpoint to confirm no PII leaks during the OS/keyboard/language detection β€” the IP and User-Agent never reach the database, only their derived facts (OS, ANSI/ISO).
  • 🀝#16 Privacy & Compliance Officer Β· GDPR consent banner is deferred to after the first interaction so the hero remains uncluttered β€” but it's mandatory before any localStorage writes happen, so consent is genuinely informed, not coerced.
Why this screen exists

The biggest adoption blocker in the typing-app category is the sign-up wall β€” most users want to try the product before committing an account. CosmicKeys offers 5 sign-in paths so each user picks the path of least friction for them, and crucially, Guest mode lets them practice with zero account at all, their progress kept in XOR-encrypted localStorage and migratable to a real account later if they decide to stay. The signal to the user is: we're confident enough in the product that we don't need to trap you behind a form.

β–ΆDesign nuances most visitors miss(6)
  • The 3 OAuth providers (Google, GitHub, Microsoft) cover almost the entire population that already has some account at a major identity provider β€” Microsoft is non-optional for European/enterprise users where Microsoft accounts dominate.
  • Magic Link uses Resend as the email-send vendor rather than the enterprise defaults (SendGrid, Postmark, SES) β€” Resend's developer-first API and Tailwind email templates suit a solo developer's stack, a pragmatic vendor pick that beats the enterprise default for this use case.
  • Guest mode creates a phantom user whose XOR-encrypted localStorage progress migrates atomically to a real account on signup β€” an explicit Sam intervention added an anti-abuse guard so that delete-then-restore cycles can't be used to game guest privileges back into the account model.
  • “Try without sign-up” is visually equal to the OAuth buttons β€” no dark pattern relegating it to small grey text below the fold; the platform genuinely wants you to take this path if it's lowest friction for you.
  • After the user picks any path, the pre-signin intelligence (OS, keyboard, language) is already in the session, so the first lesson loads with correct keyboard layout and voice β€” no configuration wizard interrupts the flow.
  • All 5 paths land the user on the same dashboard with the same state β€” auth method is a transport concern, not a feature toggle; a guest who signs up later doesn't see a different product.
β–ΆWhat each hat contributed(5)
  • 🎯#1 Product Manager Β· Designed the Guest β†’ Registered β†’ Subscriber state machine and insisted Guest mode be a first-class path, not a buried “skip for now” β€” iterated 3 rounds with AI until the anti-abuse guard for delete/restore cycles was in.
  • 🀝#11 Localization Lead Β· Translated every label, error string, and email template (magic-link subject, body, button copy) into all 7 languages so a German user who clicks “forgot password” gets a German email, not an English one with their name in it.
  • 🎯#13 Security Engineer Β· Ran 3-AI security audits on the auth module (Claude + Gemini + Codex independent passes) and triaged findings β€” accepted 4 real issues, rejected 2 false positives; e.g., the session-cookie SameSite policy was confirmed strict.
  • 🀝#15 Integration Engineer Β· Integrated all 5 auth surfaces: 3 OAuth providers with different scope and redirect patterns, the Resend magic-link send/verify/callback flow, and the guest phantom-user creation path with XOR localStorage migration.
  • 🀝#16 Privacy & Compliance Officer Β· Built explicit consent capture for the account creation path (account, IP, UA, document-version) so the GDPR audit trail can answer “what did this user consent to and when” for any regulator request.

🎩The 20 Hats β€” How One Engineer Wore Them All

Every screenshot above. Every architectural decision below. Every line of those 155,000 lines. Producing a platform of this scope solo requires wearing 20 professional hats simultaneously. Each one shows up in the codebase as concrete artifacts β€” components, modules, infrastructure, tests, lessons, voices, legal pages. Here are the hats this particular build needed.

But not every hat represents twenty years of personal expertise β€” and that's the point of the tango. Some hats can only be worn by someone with deep personal experience (🎯). Others require knowing the domain abstractly enough to lead the dance while AI brings the implementation depth (🀝 β€” the human provides the goal and the judgment, AI provides the domain depth). The mix below is what one specific build looked like; the mix for a different product would be different.

πŸ”‘What the 🀝 hats actually mean
For GDPR, I knew β€œwe need to let users delete and export their data β€” it's required.” AI knew the specific Articles (17 and 20), the 30-day processing window, that consent audits need IP + user agent + document version, and the anti-abuse patterns around delete β†’ restore cycles. I brought the goal and the final approval. AI brought the domain expertise. Same pattern for SEO, accessibility, animation physics, and voice production. The judgment call on what to ship β€” and what the bar is β€” stays human.

All 20 Hats Β· Each a Verse in the Dance

🎯

1. Product Manager

Sam decided what NOT to build (no gamification, no leaderboards-for-leaderboards) and designed the guest β†’ registered β†’ subscriber state machine. AI executed; Sam iterated 3 rounds until the anti-abuse guard for delete β†’ restore cycles was in.

🎯

2. Creative Director

Sam named CosmicKeys, defined the cosmic theme, picked the dark-space-with-cyan-and-purple-nebula palette before a single line of code. AI followed the style guide; Sam rejected "generic dark mode" components until they actually felt cosmic.

🀝

3. UX / UI Designer

Sam designed the analytics tab order (moved Fingers to position 2 because it’s the most actionable insight). AI built 196 components to spec; Sam redesigned the SVG hit-targets to work cleanly on touch screens β€” prospects often evaluate the product on a phone before committing to a typing session on their laptop β€” and added the color-blind alternatives.

🀝

4. Animation Engineer

Sam: "pinky rotates more than index β€” anatomically accurate." AI: Framer Motion code. Sam tested across all 30 keyboard layouts and iterated until every ISO variant’s bottom row animated correctly.

🎯

5. Software Architect

Sam chose local-first sync (typing is latency-sensitive) and timestamp-based conflict resolution. AI built optimisticSync.ts. partner-review caught a race condition in the background sync retry.

🎯

6. Domain Expert Β· Typing Pedagogy

Sam knew per-finger latency matters more than aggregate WPM, and that rhythm is more diagnostic than speed. Specified that stutter detection be calibrated against each user's own typing variance (coefficient of variation against their own baseline) rather than a global constant. AI doesn’t understand motor-skill acquisition β€” those choices come from domain knowledge.

🎯

7. Infrastructure Engineer

Sam designed the 3-region Terraform topology and switched AI’s default from service-account keys to Workload Identity Federation (zero long-lived credentials). Added a manual promotion gate β€” AI defaulted to auto-deploy.

🀝

8. SEO Strategist

Sam: "rank for typing test and position against Monkeytype, TypingClub, Keybr, 10FastFingers, Typing.com." AI ran 12+ SEO audit skills and generated the /vs pages. Sam framed each competitor’s positioning. Result: ChatGPT now cites CosmicKeys as a search referral source.

🎯

9. QA / Testing Lead

Sam designed the Playwright locale-injection matrix (test German / Danish / Spanish via Accept-Language headers, no VPN needed). AI wrote the test code. Sam designed what to test at E2E vs unit layer.

🎯

10. DevOps / Release Engineer

Sam created 14 custom Claude Code skills (stage-it, ship-it, go-live, rollback, db-query, prod-proxy, green-test-reset, ai-partner-review, deep-scan, mermaid-sync, …). Replaces a dedicated DevOps engineer + DBA + QA lead.

🀝

11. Localization Lead

Sam: "translate the German lesson templates β€” but adapt, don’t translate. German recipes use metric. German legal style has its own conventions." AI generated; Sam caught American measurements in "German" recipes and insisted Swiss German voices be distinct from standard German.

🎯

12. Data Engineer

Sam: "compute analytics at write time, store the results, never replay the raw keystroke stream β€” it doesn’t scale." AI would have defaulted to "store everything, compute on read." The data-engineering tradeoff was Sam’s.

🎯

13. Security Engineer

Sam ran security audits (Claude Code as primary architect + and Codex as reviewers) on the auth module. Three scanned independently; Sam triaged β€” accepted 4 findings, rejected 2 false positives. AI found vulnerabilities; Sam decided which mattered. All at dev-time; nothing deployed.

🎯

14. Data Analytics Designer

Sam designed the per-finger latency tab, rhythm analysis, and neighbor-key confusion (did you hit the key NEXT to the correct one?) using neighbor_calculator.py data. No competitor has rhythm + finger-level confusion because no knows these are the actionable metrics for motor-skill learning.

🀝

15. Integration Engineer

Sam integrated 6+ external APIs β€” ElevenLabs (14 voices Γ— 7 languages, pre-generated at content-release time), Stripe (multi-currency + VPN-proof grandfathering), (called only for support-ticket translation β€” the deployed app's one external AI-related call, treated like any other third-party service call), MaxMind GeoIP (Docker-embedded), 3 OAuth providers + Resend Magic Link. AI wrote the calls; Sam discovered the edge cases (umlauts, Swiss-German pronunciation, currency arbitrage).

🀝

16. Privacy & Compliance Officer

Sam: "implement GDPR Article 17 + 20 β€” 30-day deletion window, consent audit with IP + UA + document-version, ZIP export of all user data, localized legal pages." AI built the endpoints. Sam added the anti-abuse guard for delete β†’ restore cycles β€” AI didn’t think of that.

🎯

17. Code Review Lead / Engineering Manager

Sam designed the code-review pattern: Claude Code as primary architect and code-generation agent + and Codex engaged as reviewers. Each reviews independently, on the code Claude Code produced; Sam triages findings. Equivalent to a senior team peer-reviewing every commit β€” entirely at dev-time, not deployed.

🀝

18. Customer Support Architect

Sam designed the support flow: html2canvas auto-captures the user’s screen, message auto-translates user-language ↔ English via a API call (-2.0-flash primary, -2.5-flash fallback) on the write path; both versions stored, so reads are cache hits with zero per-view cost. This is the deployed app's only external AI-related call. Architecture is built; the AI triage layer plugs in when message volume justifies it.

🎯

19. Growth & Social Marketing Strategist

Sam designed sharing as a growth loop, not an afterthought. 5 social-share components with auto-generated branded cards (WPM + accuracy + brand) so every shared result targets exactly the right audience. AI built the components; the growth-loop strategy was Sam’s.

🀝

20. Content Strategist

Sam: "8 lesson template categories that map to real jobs β€” nurse, lawyer, developer, support, journalist β€” each authored separately per language (7 locale files) rather than translated." AI generated; Sam caught that the legal-document template was too simple (real legal typing has citation styles + section numbering) and insisted custom content stay in localStorage (privacy by design).

πŸ’‘What this list isn't
It isn't a brag sheet of expert-level proficiency in 20 domains β€” that would be unbelievable, and it'd be a lie. What it IS: a working demonstration that one experienced builder, plus AI-native tooling, plus the discipline to lead the dance rigorously, can operate at the velocity of a 5–8 person team. That's the multiplier AI provides when the human knows how to wield it β€” the kind of leverage that turns one curious builder into a full product team in 2026, whether the title on the door says engineer, PM, designer, QA lead, or founder.

πŸ€–The AI-Partner Workflow β€” How the Dance Actually Works

The 20 hats above explain the choreography. This section explains the tools that made executing the dance at production scale possible β€” custom skills, multi-AI code review, and an institutional memory that survives session crashes.

14 Custom Claude Code Skills Β· Replacing a Team

Each skill is a reusable automation workflow triggered by natural language. Together they cover what a traditional 5-person team would distribute across roles.

SkillWhat It DoesTraditional Team Equivalent
stage-itCommits, merges to staging, pushes β€” β€œstage it”CI pipeline trigger
ship-itStaging β†’ main, green deploy + Vercel previewRelease manager
go-liveAuto-detects FE / BE changes, promotes accordinglyDevOps engineer
rollbackEmergency revert of backend + frontendIncident responder
db-queryPre-built production queries across 3 regionsDBA
prod-proxyManages Cloud SQL proxy to all regionsInfra access tool
green-test-resetClears test user data after green deploy verificationQA engineer
ai-partner-reviewOrchestrates + Codex review consensusTwo senior code reviewers
deep-scan-powered codebase exploration Β· 1M contextSenior engineer research
mermaid-syncGenerates / updates architecture diagrams from codeArchitecture documentation

10 of the 14 shown. Each replaces a workflow that a 5-person team would distribute across roles.

The Β· Three Models, One Voting Bench

I call this pattern the : Claude Code is the primary architect and code-generation agent that produces the work; and Codex are engaged as reviewers on what Claude Code generated. Three different models trained on different data with different priorities β€” their blind spots don't overlap, which is the whole point. Before every major deployment, the ai-partner-review skill delegates the diff to the Council and synthesizes the consensus; I triage the findings the way a tech lead triages PR comments. This is the entire -usage story for CosmicKeys β€” happens at development time, never deployed. The one runtime call the deployed app makes is for support-ticket translation, and that's treated like any other external API integration (Stripe, Resend) β€” not as SDK plumbing.

Claude (Opus)

Primary development partner. Knows the full codebase context. Synthesizes the consensus into SYNTHESIS_*.md files.

1M-token . Excels at multi-file root-cause analysis β€” finds bugs that span files Claude reviewed separately.

Codex

OpenAI's code-focused model. Independent security review. Provides a second opinion on the same diff.

πŸ”‘Why three models, not one
Single-model review has model-specific blind spots. Three models with different training, different context-window strengths, and different inductive biases produce real consensus β€” and the disagreements are often more informative than the agreements (when two flag the same thing and one doesn't, the dissent is usually the clue worth investigating).

Session Recovery Β· Crash-Proof Development

Claude Code sessions have finite context windows. When a session hits its limit or crashes, the next one can resume exactly where the previous one stopped by reading four sources:

  • .claude/SESSION_STATE.md β€” current work state, files changed, verification checklist
  • .ai-context/diagrams/*.md β€” 21 mermaid diagrams providing architectural context
  • CLAUDE.md β€” project root instructions, full orientation for a new session
  • ~/.claude/projects/<path>/memory/ β€” user-specific context that persists across all sessions

A new Claude session reads all four and can resume mid-deployment β€” including verifying a deploy that the crashed session was monitoring. This replaces the β€œteam wiki + onboarding doc” overhead that traditional teams maintain.

πŸ’šFree for Everyone β€” Sustainable by Design

Most consumer products either ship ads or paywall the experience. CosmicKeys does neither. The economics are intentional.

The core experience is designed to stay free.*

Sign up, pick lessons, practice, see analytics, share progress β€” these are intended to remain free for the things people use daily. In normal use, the vast majority of users complete typical sessions without encountering the usage-based paywall.

Usage-based tiers gate the heavy paths, not the core.

The three plans (FREE, TRIAL, PRO) are defined in backend/app/services/entitlements.py: FREE gives 3 speed tests per day and the core typing practice; TRIAL gives 10 tests per day for 7 days; PRO removes the test cap and unlocks pro lessons and full analytics history. The thresholds gate operations that genuinely scale with usage (speed-test storage, deep analytics retention, expanded curriculum access) β€” not the core practice loop, and not voice audio (voices are pre-generated at content-release time so adding users does not add per-user TTS cost). Infrastructure scaling across the three Cloud Run regions (US / EU / Asia) is the platform's actual marginal cost.

The result: a sustainable platform without an ad model.

Hosting stays predictable because the expensive operations are bounded by the algorithm. Power users pay because they actually use more than the free allowance provides β€” not because the free tier was intentionally crippled to force an upgrade. There's a meaningful difference between β€œfree tier limited because of unit economics” and β€œfree tier crippled to extract revenue.”

What This Looks Like In the Product

The actual subscription management screen β€” clear pricing, instant downgrade, no dark patterns. Click to enlarge.

Why this screen exists

Two things are happening on a subscription page that often get conflated: the user wants to change their billing state (upgrade, downgrade, cancel), and the company wants to keep PCI scope as small as possible. Most apps optimize the first against the user (friction-as-retention) and ignore the second until they get audited. The cleaner design separates them: the page itself should remove every dark pattern between the user and the action they came to take, and the payment data path should stay outside your servers entirely via a hosted checkout like Stripe Checkout. Those two decisions compound β€” the simpler the page, the easier it is to keep PCI scope minimal because there's less custom code to audit.

β–ΆDesign nuances most visitors miss(5)
  • Current plan is highlighted with a check + “Your plan” label so the user always knows where they are; available plans show their price and feature delta from current, not a marketing comparison table.
  • Downgrade is a one-click action with one confirmation (“Are you sure? Your subscription will end on X”) β€” no retention survey, no “Wait! Here's a 50% discount” popup, no dark-pattern friction designed to trap the user.
  • Upgrade flows through Stripe Checkout (no custom payment form), so the user's card data never touches CosmicKeys' servers β€” keeping PCI scope minimal by staying out of the payment data path, which is what solo developers should do unless they have a specific reason not to.
  • Multi-currency pricing is automatic via MaxMind GeoIP (embedded in Docker) β€” a German user sees EUR, a Norwegian user sees NOK, a Swiss user sees CHF, a US user sees USD, with regional pricing power-parity adjustments calibrated to the seven supported-language markets.
  • Invoice history is listed below current plan with one-click PDF download via Stripe β€” every charge is documented and accessible without a support ticket, which is rare in the consumer-SaaS world.
β–ΆWhat each hat contributed(4)
  • 🎯#1 Product Manager Β· Drew the line on no retention-survey dark patterns even if it costs short-term retention β€” the page's simplicity is itself a trust signal that compounds across every other surface in the product.
  • 🀝#3 UX / UI Designer Β· Designed the “Your plan” highlight + plan-delta comparison so the user can see at a glance what changes if they upgrade or downgrade β€” no marketing-table arms race with feature checkmarks.
  • 🀝#15 Integration Engineer Β· Integrated Stripe Checkout + webhooks for the upgrade path and Stripe Customer Portal for invoice download β€” leveraging Stripe's infrastructure rather than rebuilding billing UI from scratch.
  • 🀝#16 Privacy & Compliance Officer Β· Confirmed that subscription-cancellation triggers GDPR-compliant data-retention rules β€” billing info is retained for 7 years for tax compliance, typing analytics fall under the standard 30-day deletion window if the user also deletes account.
πŸ”‘Why ads were rejected as a model
Banner ads break the focus that typing practice fundamentally requires. Interstitial ads between lessons train the user to associate the platform with friction. Once you ship an ad model, every product decision starts optimizing for impressions instead of mastery. The category is full of cautionary examples.

* About the free tier: CosmicKeys uses a usage-based paywall algorithm that continuously monitors infrastructure burn rate across the platform's three GCP regions (US / EU / Asia) and adjusts thresholds dynamically. The descriptions above reflect the platform's current design philosophy and typical behavior at the time of writing β€” they are not a binding commercial guarantee. The specific limits, which features are metered, and the pricing tiers may change over time as operating costs evolve. CosmicKeys reserves the right to adjust tiers, limits, and feature availability at its discretion. For current commercial terms, see cosmickeys.app.

πŸ”What Most Visitors Don't See

You've now seen the surface β€” the lessons, the analytics, the polish. When people visit cosmickeys.app they see a beautiful typing practice platform with animated finger guides. What they don't see is the production architecture underneath: a multi-region setup across three Cloud Run regions (US, EU, Asia), a voice narration pipeline that pre-generates audio lessons in seven languages (ElevenLabs TTS, baked at content-release time), a client-side analytics layer that captures every keystroke and persists it to both localStorage and the database, and a deterministic adaptive-lesson selector that scores curriculum candidates against each learner's stored error patterns.

πŸ”‘The polish that isn't visible is where the engineering lives
Anyone can ship a typing app in a weekend with Next.js. Shipping a typing app that handles users in Stockholm, Berlin, and San Francisco with sub-100ms keystroke feedback, native-language voice narration for each of the seven supported languages (English US, English UK, German, Swiss German, Danish, Spanish, Norwegian, Swedish β€” 8 selector entries because English splits into US and UK), platform-managed within-region resilience (Cloud Run auto-recovery + Cloud SQL Regional HA), and deterministic adaptive lesson selection driven by per-finger latency + error-pattern analytics β€” that's the interesting part. Most of this work is invisible to end users, which is exactly how it's supposed to be.

πŸ—οΈSystem Overview

CosmicKeys Global Architecture
Users Worldwidestatic asset request · API requestVercel Edge Networkanycast · single global deploy (iad1)frontend laneGCP Global Load Balanceranycast IP · health-check failoverbackend lane → api.cosmickeys.appNext.js 16 — Frontendsingle deploy, served from Vercel edgevercel.json → regions: ["iad1"]static + ISR; API calls go to GLB↗ cross-lane: fetches from api.cosmickeys.appUSus-central1Cloud RunFastAPI+ Traffic CopdirectoryEUeurope-west1Cloud RunFastAPI+ Traffic Cophome + replicaAsiaasia-southeast1Cloud RunFastAPI+ Traffic Cophome + replicawrites proxy to home regionCloud SQLUSPostgres 15REGIONAL HACloud SQLEUPostgres 15REGIONAL HACloud SQLAsiaPostgres 15REGIONAL HAGoogle Gemini API — support translation onlygemini-2.0-flash · gemini-2.5-flash fallbackthe deployed app's one external AI-related calldeployment & secretsTerraform → Blue-Green Cloud Rungit push → deploy --tag=green --no-trafficsmoke test → ./scripts/promote.sh (manual).github/workflows/deploy-prod.yamlGCP Secret Manager→ Cloud Run env vars (auto via IAM)→ Vercel env vars (manual sync)db_password · auth_secret · stripe · stripe_webhook
🌐

Two Anycast Edges

Vercel Edge serves the Next.js frontend as a single global deploy (vercel.json regions: ["iad1"]) β€” Vercel's anycast network caches static assets worldwide. GCP Global Load Balancer fronts the API (api.cosmickeys.app) with a single anycast IP that routes to the healthiest of three regional Cloud Run backends.

⚑

Compute: Single Frontend, Three Backends

Frontend is one Next.js 16 build on Vercel β€” global edge does the rest. Backend is three independent Cloud Run services (us-central1, europe-west1,asia-southeast1) running FastAPI + a Traffic Cop middleware that re-routes writes back to a user's home region when they roam.

πŸ’Ύ

Data: Per-Region Cloud SQL

Each region has its own Cloud SQL Postgres 15 with REGIONAL HA (zonal failover within the region). US holds the directory source-of-truth; EU and Asia point to it for shared directory data via app-level lookups, not native cross-region replication. Support-ticket translations are written through a API call (-2.0-flash primary, -2.5-flash fallback β€” the deployed app's one external AI-related call) and persisted to Postgres.

🚦

Blue-Green via Cloud Run

Terraform manages the regional infrastructure; the GitHub Actions workflow at.github/workflows/deploy-prod.yaml deploys new revisions withgcloud run deploy --tag=green --no-traffic. After smoke tests pass,./scripts/promote.sh shifts traffic β€” manual gate by design, so a bad build never auto-replaces a healthy one across three regions at once.

πŸ”

GCP Secret Manager as Source-of-Truth

Secrets (DB password, auth secret, Stripe live + webhook) live in GCP Secret Manager. The Cloud Run service account getssecretAccessor via Terraform IAM β€” secrets are injected at deploy time, never in code or container images. Vercel-side secrets are intentionally a manual sync(the Terraform Vercel provider is left disabled in vercel_secrets.tf) so a credential rotation needs a human ack before the frontend picks it up.

πŸ›°οΈThe Traffic Cop: Multi-Region Anycast Routing

This is the part most engineers wave off with β€œjust put it behind Cloudflare.” That works until you have stateful services, database write consistency, and region-specific localization. I designed a deliberate routing strategy that handles all three.

Three Problems the Traffic Cop Solves

1. Latency: Route to the nearest region

For a real-time typing platform, every 50ms of round-trip latency is user-visible. Anycast routing at the edge sends each user to the geographically nearest data center. A user in Stockholm hits EU (europe-west1); a user in London hits EU; a user in Seattle hits US (us-central1). The Asia region (asia-southeast1, Singapore) primarily serves prospects in Asia evaluating the marketing page β€” typing content itself is in the seven supported European/US languages, not localized for Asian languages. Zero configuration, automatic.

2. Failover: GCP handles it within the region; we don't try to cross regions

The vast majority of failure modes resolve themselves inside the region β€” Cloud Run keeps a minimum of one instance running and automatically replaces a crashed instance (typically within seconds), and Cloud SQL runs withavailability_type = REGIONAL, which gives automatic zonal failover from primary to standby in another zone. At the routing layer, GCP's Global Load Balancer continuously health-checks each region's backend and stops sending new traffic to one that's failing. The deliberate trade-off:data is region-locked. An EU user's account, progress, and lesson history live in EU Cloud SQL only β€” we do NOT spill EU users into US Cloud Run if EU went fully down, because US has no copy of their data. That preserves GDPR data residency and write consistency; the cost is that a true whole-region outage would briefly degrade users in that region rather than seamlessly route them elsewhere. If a connection blip interrupts a user, localStorage queues recent keystrokes and flushes back to the home region when it's reachable again.

3. Localization: Serve region-appropriate content

Each region serves localized content for the seven supported languages β€” German, Swiss German, Danish, Spanish, Norwegian, Swedish served from EU; English (US) and English (UK) served from US. The traffic cop attaches a region header that the backend uses for content selection. The localization is cache-friendly: each region's CDN caches its own language variants, not a global cache invalidated by any single update.

🌍7 Languages Γ— 9 Regions Localization

Localization is usually treated as an afterthought β€” a post-launch β€œinternationalization” project. I designed it as a first-class architectural concern from day one because retrofitting i18n into a typing platform (where literally every character on the screen has to localize) is a nightmare.

The Languages (8 selector entries Β· 7 base languages)

English (US)English (UK)GermanSwiss GermanDanishSpanishNorwegianSwedish

Source of truth: SUPPORTED_LANGUAGES in frontend/lib/languageDetection.ts β€” the 8 entries map to 7 base languages because English has separate US and UK variants (each with its own default keyboard layout: ANSI for US, ISO for UK).

Localization Scope β€” Beyond UI Strings

πŸ”€

Character Sets

All seven supported languages are Latin-script, but each adds keys that don't exist on US ANSI: German and Swiss German umlauts (Γ€ ΓΆ ΓΌ ß), Spanish Γ± ΒΏ Β‘, and Nordic Γ¦ ΓΈ Γ₯. Each needs its own keyboard layout JSON, its own neighbor-adjacency map for typo detection, and its own finger-assignment for the extra keys.

⌨️

Keyboard Layouts

QWERTY (US ANSI), QWERTY-ISO (UK), QWERTZ (German, Swiss German), plus Nordic ISO and Spanish ISO variants β€” across Mac and Windows. Finger animation guides must match the user's actual keyboard, and the ISO bottom-row extra key shifts every other key's finger assignment by one position.

πŸ“–

Lesson Content

Typing exercises, sample sentences, and progressive difficulty curves are all localized per language. German beginner lessons drill umlauts (Γ€ ΓΆ ΓΌ ß) before moving to longer words; Spanish beginner lessons drill Γ± and accented vowels; Nordic lessons drill Γ¦ ΓΈ Γ₯ β€” different beginner curves per language because the hard keys are different.

πŸ—£οΈ

Voice Narration

Every instructional prompt has a native-speaker audio variant. The user hears lesson guidance in their language, not English subtitles.

πŸ“

Units and Formatting

Dates, numbers, and progress metrics are formatted per locale β€” DD.MM.YYYY in German/Swiss German, DD/MM/YYYY in UK English/Spanish/Nordic, M/D/YYYY in US English. Decimal separators (1,234.56 vs 1.234,56) and thousand separators differ across the seven supported locales.

🎨

Cultural Theming

Color palettes and imagery adapted where colors carry cultural meaning. Error states don't use red in cultures where red signals success.

πŸŽ™οΈVoice Narration Pipeline

One of the features that gets zero credit from first-time visitors is the voice narration system. Every instructional prompt β€” β€œplace your left index finger on F, your right index on J” β€” has a professionally-generated audio variant in every supported language.

Voice Generation Pipeline
Lesson Content
source material, English
canonical instructional prompts
Localization Translation
LLM-assisted + human review
produces per-language text variants
TTS Pipeline
per-locale voice model
native-speaker audio for each language
Audio Assets
CDN-cached per region
stored close to listeners, not at origin
Lazy-loaded at Lesson Start
downloaded on demand
minimal bandwidth for the 99% of lessons a user never opens
πŸ’‘Why the audio is per-region, not per-user
Each region's CDN caches its own audio variants for the seven supported languages, so a user in Madrid downloads Spanish audio from the EU CDN, a user in Stockholm downloads Swedish audio from the same EU CDN, and a user in New York downloads English (US) audio from the US CDN β€” not from an origin halfway around the world. This is one of the invisible wins: the user hears instant voice feedback because the audio was already cached 50ms away from them, not 200ms away.

πŸ“‘Analytics β€” Captured Locally, Persisted to the Cloud

Every keystroke is captured in the browser. WPM, accuracy, error patterns, and per-key heatmaps are all computed client-side β€” instantly, with zero network round-trip β€” and rendered to the UI as you type. The same data is persisted twice: in localStorage for session resilience, and in the backend database for cross-device access and long-term analytics.

πŸ”‘Why local-first, not server-streamed
Typing feedback can't wait for the network. A 100ms server round-trip between hitting a key and seeing the WPM tick up would break the rhythm and make the platform feel laggy. Computing analytics client-side means the user sees their progress at native keyboard speed β€” no server in the loop for feedback. The server's job is persistence and historical aggregation, not real-time delivery.

The Capture & Persistence Pipeline

Per-Session Analytics Flow
Keystroke Event
browser-captured
character pressed + high-precision timestamp
Client-Side Compute
WPM / accuracy / heatmap
updated in memory and rendered to the UI instantly
localStorage Write
periodic checkpoint
session state survives refresh and brief disconnects
Batched DB Sync
POST to FastAPI on session events
session completion, milestones, error patterns persisted to PostgreSQL
Cross-Device Read
dashboards query the DB
analytics history available on any device the user signs into

What Gets Captured and Persisted

⌨️

Keystroke Events

Character pressed, timing delta, accuracy flag. The atomic unit captured in-memory, then persisted as aggregated session data.

⏱️

Words Per Minute

Rolling average computed client-side and ticked into the UI on every keystroke β€” no server round-trip.

🎯

Error Patterns

Which keys the user mistypes most often (errors_by_char, slowest_keys, rhythm). Aggregated client-side, persisted to lesson_attempts on lesson completion, and consumed by the deterministic adaptive lesson selector.

πŸ”₯

Session Heatmaps

Finger-level accuracy heatmap driving the visual feedback (glow intensity per finger). Computed live, snapshotted to the DB at lesson end.

πŸ“ˆ

Progress Milestones

Lesson completions, new speed records, language milestones β€” fire-and-forget POSTs to the notification service.

πŸ’Ύ

Resilient by Default

localStorage is the safety net. Lose the network mid-lesson, your session continues; analytics flush to DB when you reconnect.

🎯Adaptive Lesson Selection β€” Deterministic, Not LLM

Up front: there is no runtime call in this loop. The deployed app does not make any inference calls for lesson recommendation β€” the β€œadaptive” behavior is deterministic code consuming rich analytics that are computed in the browser and persisted to the lesson_attemptstable at lesson completion. The intelligence lives in the analytics design and the curriculum, not in an inference call.

(Claude is part of the dev-time workflow β€” Claude Code as Sam's primary architect and code-generation agent in the pattern (Claude Code as primary + and Codex as reviewers) β€” but none of these are invoked at runtime by the deployed CosmicKeys app. The deployed app's one external AI-related call is to for support-ticket translation only; details in the support flow above.)

Adaptive Lesson Selection Loop (no LLM)
Lesson Result (in browser)
key_stream_json captured in-memory
every keystroke event captured with timing delta and accuracy flag
Compute Analytics (in browser)
rhythm, errors_by_char, slowest_keys, avg_keystroke, per-finger latency
computed client-side at lesson completion β€” no round-trip needed for live WPM/accuracy
Persist to lesson_attempts
POST /api/progress/attempts
aggregated analytics stored to the relational table; best scores upserted to lesson_progress
Deterministic Scoring
curriculum pool Γ— user analytics
next lesson selected by scoring candidates against stored error patterns β€” algorithmic, repeatable, no LLM inference
Recommended Next Lesson
localized + pre-generated voice narration
served from the curriculum pool; difficulty matched to the user's current gaps
πŸ’‘Why deterministic, not LLM?
For a per-keystroke learning loop, an call would add 200–500ms of latency, cost per session, and non-determinism β€” none of which this product needs. The hard part of typing-skill personalization isn't natural language understanding; it's knowing which analytics matter (per-finger latency, rhythm coefficient-of-variation, neighbor-key confusion) and designing a curriculum that can be scored against them. That's a curriculum-design problem and an analytics-design problem, not an problem. The architecture leaves room to plug an scorer in later if signal calls for it β€” but shipping today without one is the right call, not a missing piece.

πŸ›‘οΈFailover and Resilience

Most of failover on CosmicKeys is β€œGCP handles it within the region.” The interesting part β€” the part that's actually a design decision rather than a vendor-provided default β€” is the one place we deliberately chose data integrity over cross-region failover. Here's what each layer actually does when something breaks.

♻️

Within-Region Auto-Recovery (Platform-Managed)

Each region's Cloud Run is configured with a minimum instance count β‰₯ 1, so there is always at least one warm instance ready. If an instance crashes, GCP automatically spins up a replacement β€” typically within seconds β€” and the failed instance is removed from the load-balancer pool. We do not write any of this logic; it is the Cloud Run platform.

πŸ—„οΈ

Cloud SQL Regional HA β€” Automatic Zonal Failover

Each region's Postgres 15 runs with availability_type=REGIONAL (configured in terraform/modules/cosmic_region/main.tf for the production environment). That gives automatic zonal failover from primary to a synchronous standby in a different zone within the same region. Zero data loss, no manual intervention, no application code involvement.

🩺

GCP Global Load Balancer Health Checks

GCP's Global Load Balancer continuously health-checks each region's backend and stops sending new requests to a failing one. This happens at the anycast routing layer β€” built into GCP, not custom code. New requests from users near the failed region naturally route to the next-healthy region; the in-app Traffic Cop middleware then proxies writes back to the user's home region for write consistency.

πŸ”’

Region-Locked Data β€” The Deliberate Trade-Off

Data does NOT replicate across regions. An EU user's account, progress, and typing history live in EU Cloud SQL only. If the entire EU region went fully down (rare β€” GCP region-wide outages are infrequent), the EU user would be briefly degraded rather than seamlessly served from US Cloud Run, because US has no copy of their data. We chose this for GDPR data residency and write consistency; the cost is one failure mode we accept rather than engineer around.

πŸ’Ύ

Client-Side Resilience (Connection Blips)

localStorage caches the current lesson and recent keystrokes locally. A brief connection loss does not interrupt the typing session β€” the user keeps typing, analytics keep computing in the browser. When the home region is reachable again, queued writes flush to that region (not to whichever region the network happens to route to at flush time).

πŸšͺ

Cross-Region Reads for Directory Data Only

A small set of shared, read-mostly data (lesson catalog, plan definitions, global content metadata) lives in the US Cloud SQL as the directory source-of-truth; EU and Asia look it up via app-level cross-region reads. User-owned data (progress, account, history) never crosses regions in either direction.

βš™οΈOther Engineering Decisions Worth Noting

A handful of architectural choices that distinguish β€œbuilt a thing” from β€œbuilt a thing the right way” β€” the kind of decisions that don't show up in a feature list but do show up in a postmortem when the wrong one was made.

Blue-Green with Manual Promotion Gate

Green revisions deploy to all 3 Cloud Run regions with 0% traffic. Testing happens against the green endpoint directly. Only after explicit verification does promote.sh flip traffic. Auto-deploy is convenient until it ships a breaking change at 3am.

GeoIP Embedded in the Docker Image

MaxMind GeoLite2-Country is baked into the Docker image at CI build time. Cold starts don't hit MaxMind's download servers. Zero runtime dependency on a third party. Refreshed on every build for current data.

Workload Identity Federation Β· Zero Long-Lived Credentials

GitHub Actions authenticates to GCP via OIDC tokens. No service account keys exist anywhere. Secret Manager access is IAM-scoped per region β€” EU Cloud Run can only read EU database passwords.

JSONB-First Analytics Β· Single Document Per User

Speed-test analytics live as JSONB documents (not normalized tables). Reading a user's full history is one document fetch. GIN-indexed for fast queries against specific keys. Schema-flexible β€” adding fields doesn't require a migration.

GDPR Audit Trail Β· The Real Articles

Every consent action stored with IP, user agent, timestamp, document version. Article 17 (Erasure) with 30-day window + anti-abuse guard for delete β†’ restore. Article 20 (Portability) ZIP export of all user data. Demonstrable consent for any document version β€” what an auditor needs.

XOR-Encrypted localStorage for Guest Users

Guest progress is XOR-encrypted + base64-encoded before localStorage write. Not bulletproof against determined attackers (key is in minified JS), but raises the bar against casual DevTools tampering. The right tradeoff for the β€œtry without signup” zero-friction goal.

πŸ”§The Tech Stack

Frontend

  • β€’ Next.js 15 (App Router) + React 19
  • β€’ TypeScript strict mode
  • β€’ Tailwind CSS v4 for styling
  • β€’ Framer Motion for animations
  • β€’ localStorage + Fetch for client-side analytics capture and DB sync

Backend

  • β€’ FastAPI (Python 3.11) for REST endpoints, including the batched analytics-sync API
  • β€’ SQLAlchemy ORM + async Postgres driver
  • β€’ Pydantic for request/response validation
  • β€’ OAuth2 (Google + GitHub) for authentication; httpOnly + secure + SameSite session cookies; magic-link sign-in via Resend
  • β€’ Stripe for payments β€” Stripe Checkout for hosted payment flow (card data never touches the backend, keeping PCI scope minimal) + Stripe webhooks for subscription lifecycle events + Stripe Customer Portal for self-serve invoice download. Multi-currency pricing driven by MaxMind GeoIP detection (USD / GBP / EUR / CHF / DKK / SEK / NOK).
  • β€’ API call for support-ticket translation only (-2.0-flash primary, -2.5-flash fallback) β€” translate-on-write, both versions stored, no per-view cost. This is the deployed app's only external AI-related API call β€” treated like any other third-party service call alongside Stripe, OAuth, and Resend. There is no SDK plumbing built into the product, no per-user inference, no in the user data path. All other AI usage on this project β€” code generation, the code-review process, architecture conversations β€” happens at development time via Claude Code, not at runtime.

Data

  • β€’ PostgreSQL 15 on GCP Cloud SQL β€” per-region, availability_type=REGIONAL for zonal failover; no cross-region replication of user data. US holds the directory source-of-truth (lesson catalog, plan definitions); EU and Asia look it up via app-level reads.
  • β€’ localStorage for offline-first analytics capture; queued writes flush to the user's home region on reconnect
  • β€’ CDN cache for localized audio assets per region (ElevenLabs voices pre-generated at content-release time, not per-user runtime)
  • No Redis: in-process caches only; the codebase has a comment noting Redis β€œshould be replaced” for multi-instance β€” future work, not shipped.

Infrastructure

  • β€’ Vercel Edge Network β€” anycast frontend (single global deploy, vercel.json regions:["iad1"])
  • β€’ GCP Global Load Balancer β€” anycast IP for backend API; health-checks each region's backend
  • β€’ Cloud Run across 3 regions (us-central1, europe-west1, asia-southeast1); min instances β‰₯ 1 for warm capacity + auto-replacement of crashed instances
  • β€’ Terraform for all GCP infra; blue-green deploys via --tag=green --no-traffic + manual promote.sh
  • β€’ GCP Secret Manager for secrets (auto-injected to Cloud Run via IAM; manual sync to Vercel by design)
  • β€’ Workload Identity Federation β€” GitHub Actions β†’ GCP via OIDC tokens; zero long-lived service-account keys
  • β€’ MaxMind GeoLite2 embedded in the backend Docker image at build time (no runtime download)
🎯

Leadership Takeaway

A field note on building with AI as a partner: the question in 2026 isn't whether you personally have hands-on expertise in every domain a modern product touches. No one does. The question is whether you have the discipline to wield AI as a force multiplier β€” formulating the right question, leading the dance, evaluating AI's answer against a clear product vision, and making the final judgment call when the dance produces options. That's the meta-skill that turns one curious builder β€” engineer, PM, designer, QA lead, founder, whoever β€” into a multi-role product team.

What the dance demands, in any role:

  • Architectural taste under tradeoffs β€” you can articulate a personal decision between local-first vs server-authoritative, multi-region with data residency vs single-region, manual promotion vs auto-deploy. AI cannot make these calls for you; humans must, and the reasoning is the signal.
  • Domain judgment at the abstract level β€” for areas where you're not a personal expert (GDPR Articles, accessibility standards, animation physics, voice production), you have an AI-augmented workflow that produces production-grade results. The 🀝 hats matter as much as the 🎯 ones.
  • Workflow discipline β€” a documented choreography (human leads β†’ AI executes β†’ multi-AI review β†’ human triages β†’ ship). Not vibe-coding without a rigorous loop. You know the specific failure modes you've hit and how you encoded the lessons so they don't repeat.
  • Reusable infrastructure built for AI-native work β€” custom skills / tools / agents that scale beyond a single project. A 5-person team used to need a DevOps engineer + DBA + QA lead. The AI-native equivalent is a portfolio of automations that absorbs those roles into the builder's workflow.
  • Shipped proof at meaningful scope β€” production deployments to real users, not demo projects. Multi-region, multi-locale, with regulatory compliance and observability β€” the things that separate β€œbuilt a thing” from β€œbuilt a thing the right way.”

Why this scales beyond a solo build: every architectural component on this page β€” the traffic cop middleware, the localization pipeline, the pre-generated voice catalog, the client-side analytics, the deterministic adaptive lesson selection, the dev-time multi-AI review process β€” is a reusable pattern that scales from solo build to platform team. Multi-region consumer infrastructure is expensive to build once and cheap to replicate. The same engineering pattern that produces a solo build in six months produces a team multiplier when scaled to a five-person platform team serving multiple products. The opposite isn't true β€” a team of five vibe-coders does not become an AI-native engineering organization just by adding Claude licenses. The discipline has to live in the workflow, not the tooling.

CosmicKeys is the worked example. Each section above shows what evidence looks like for one item on the list: the 20 Hats show role-coverage discipline; the AI-Partner Workflow shows the choreography; the Engineering Decisions show architectural taste; the Platform Tour shows shipped proof; the Architecture sections show the systems underneath; the freemium economics show product judgment beyond technical depth. If you're learning the dance, this is what the receipts look like at production scale. If you're evaluating others who claim to do this kind of work, this is the bar to compare them against.