Multi-region GCP. 7 languages. 14 voices. 30 keyboard layouts. Stripe payments + GDPR. Blue-green deploys. 155,000 lines of production code across 1,237 commits in 6 months. The test bed was chosen to stress every dimension of production engineering at consumer scale β and to answer one question: what can an AI coding partner actually ship, end-to-end, under one experienced engineer's direction? Live at cosmickeys.app.
After wrapping a long corporate engineering engagement, the question worth answering was: what can an AI coding partner actually ship in production, end-to-end, under realistic engineering discipline? Not in a demo. Not in a tutorial. In a real consumer product with real users, real payments, real regional infrastructure, real failure modes.
CosmicKeys is the experiment's test bed. The page below is the result.
CosmicKeys is an agentic-coding-only project β and it's the FIRST product in a sequence. The story below is about how one experienced engineer used Claude Code as a coding partner (plus a dev-time 3-AI code review process: Claude + Gemini + Codex) to ship a production-grade consumer product solo in 6 months.
Why the explicit boundary: multi-agent orchestration and runtime LLM integration as product features came after CosmicKeys, with my next project WatchAlgo (which auto-generates solutions for user-submitted problems via a multi-agent pipeline). That's a different story for a different page. CosmicKeys is the agentic-coding success story that came first and made the rest possible.
For the experiment to mean anything, the product had to satisfy four constraints simultaneously:
Anyone with a computer should be able to use it without domain expertise β no field-specific learning curve, no regional cultural barrier. Theoretical user base in the billions.
Multi-language (with real cultural adaptation, not translated strings). Real-time analytics. Payments and subscription lifecycle. Multi-region deployment with data residency. Accessibility. Authentication. Observability. Support tooling. If the AI partner couldn't ship cleanly in any one of those dimensions, the brief would have failed there.
The design has to handle 1 user OR 2 billion users without rewriting the foundation. Otherwise the experiment is testing the AI on a toy.
The AI partner has to ship something real humans actually use, not internal tooling that papers over rough edges. Real users force real polish.
Touch typing fits all four criteria precisely. It's universal β every computer user types every day, in every language, in every profession. It's multi-language not as an afterthought but as the core product (typing in German is a different motion from typing in English; typing on a Swedish keyboard exercises different muscles than on an American one). It needs payments to be sustainable without ads. It needs to scale from 1 to billions because typing won't go away. And it's consumer-facing in the most demanding way: typists feel latency, animation jitter, and audio drift in milliseconds because their own hands are providing the input ground truth.
Most consumer products would test the AI partner on one or two dimensions. Typing β done seriously β tests it on all of them. The rest of this page is the answer to whether the AI partner could actually ship under that brief.
Let's give the dance a name: the Tango Development Pattern, or TDP if it catches on (and if it doesn't, you still got a useful mental model out of the bargain β coined here, free to take). Here's the bet behind the term: most engineers building with AI today are already doing some version of this dance, knowingly or unknowingly. The ones who get fluent in the choreography are the ones whose AI partner ships production-grade work; the ones who don't get fluent end up with impressive-but-fragile demos that fall apart at the edges.
The trick isn't extracting more out of AI. The trick is knowing when to lead (product vision, judgment calls, rejecting a βgood enoughβ answer that isn't really good, drawing the line on what NOT to build) and when to let AI lead (implementation depth across 20 specialties, breadth-of-knowledge moves you couldn't do alone in six months even if you tried). Learn the steps and the lead-swap, and you can ship pretty much any consumer product under a brief like this one. CosmicKeys is the worked example; TDP is the technique.
For 25 years pre-AI, a consumer product of this scope would have required a team β product manager, designer, frontend, backend, DevOps, DBA, QA, security, localization, content, growth. A team of 5 to 8 people, working for 12 to 18 months, optimistically.
Under the brief, one experienced engineer plus one AI coding partner shipped it in 6 months β wearing 20 professional hats simultaneously, some carrying 20+ years of deep personal expertise (π―), others where the human knew abstractly enough to lead the dance while AI brought the implementation depth (π€).
The real superpower anyone building with AI as a partner needs isn't expertise in 20 domains. It's the ability to formulate the right question, evaluate AI's answer against the product vision, and make the final judgment call. That's the meta-skill that makes the tango work β and it's the skill that matters most in any role where AI is a force multiplier.
Neither the human alone nor the AI alone could have shipped CosmicKeys. It took both. Dancing.
The touch-typing category is a graveyard of clumsy interfaces glued together with banner ads, gated at $9.99/month for features that should be free, and almost never localized beyond a translated UI string. The market accepts mediocrity because typing apps are seen as commodity software β pick one, suffer the ads, move on.
CosmicKeys is the counter-example the brief produced. When the engineering discipline says βtreat architecture, UX, and rich functionality as non-negotiable from day one β no trade-offs against each other,β the result looks structurally different from what the category typically ships.
Not in the lessons, not on the dashboard, not at session end. Typing practice depends on focus; ads break it.
7 languages Γ 9 regions. Native-speaker voice narration per locale. Region-specific keyboard layouts. Not a translated UI string β a fully localized product.
Every screen is hand-designed. Animations match your actual keyboard. Subscription management is one click, not a dark pattern.
Each completed lesson stores rich analytics (errors_by_char, slowest_keys, rhythm, per-finger latency) to the lesson_attempts table. The adaptive lesson recommender deterministically scores curriculum candidates against these analytics and surfaces the next lesson that targets your current weak keys. No runtime LLM call β the intelligence lives in the analytics design and the curriculum, not in an inference call per session.
Multi-region backend, client-side analytics pipeline, deterministic adaptive lesson selection, blue-green deploys to 3 Cloud Run regions β all shipped by one engineer in months, not years.
Core features free forever. ElevenLabs TTS voices are pre-generated at content-release time (not a per-user cost) and cached in the CDN. Subscription tiers exist for higher usage limits and advanced features, not for gating core typing practice.
The first foundational decision after the brief was the visual identity, deliberately spec'd before any component code was written. The first surface a visitor sees isn't a screen β it's a URL. The product had to live at a domain that hinted at the theme before the user landed: cosmickeys.app, picked deliberately over generic options like typing.io or typingpractice.com because the cosmic frame was the product's differentiator from the category's sea of bland white-and-blue UIs.
The visual identity reduces to three primitives that show up on every surface across the 196 production components (counted as .tsx files across frontend/components, contexts, hooks, and the App Router routes β verifiable by anyone with the repo):
Deep slate background, cyan-and-purple nebula gradients, dark surfaces with subtle light (the swirls and starlight you see inside this card are the cosmic palette running live). Picked because typing apps are used for long focused sessions; dark reduces eye strain and the cosmic palette signals depth where the category typically signals utilitarianism.
Its edge glows as if catching starlight (the soft purple aurora pulsing behind it on this card is exactly the effect the logo reads as in context); the cosmic radial gradient lives inside the keycap itself. Reads at every size from the 16px favicon to the 1200px share-card watermark β a single mark the eye recognizes whether it's a browser tab or a social preview.
A translucent surface with a soft glowing light bleeding from its edges (like this card itself β the green halo around its rim is the prismatic-card primitive rendered at full strength), used everywhere from lesson tiles to analytics widgets to social-share previews. When you see the prismatic glow in a screenshot, you're looking at a CosmicKeys artifact.
The style guide spec'd these decisions explicitly up front, so the AI coding partner had a constrained visual palette to execute against rather than improvising component-by-component. Every screen across 196 production components reads as part of the same family because the AI partner had rails to run on. Without the up-front spec, the same AI partner would have shipped a dozen subtly-inconsistent variations of the same card pattern β coherence is what disciplined direction produces, not what AI volunteers.
With the brief defined and the visual identity locked, the third foundational decision is the technology stack. Each choice was made to optimize for one of two things: ship velocity (so the AI partner could move fast without re-deriving common patterns) or response latency (because typing apps live or die on input-to-paint time). The choices below aren't name-brand defaults; each was the answer to a specific question this product asked.
/en, /de, /es, β¦)Typing is the most latency-sensitive interaction in any application. A user pressing J expects the J to appear instantly β anything over 50ms feels broken. So CosmicKeys uses a local-first architecture: every typing-session interaction (keystroke registration, WPM calculation, accuracy tracking) happens client-side first, with the result rendered immediately. Data is then batched and synced to the backend in the background, encrypted in localStorage during the gap.
The principle isn't βmake the network fasterβ β it's βremove the network from the critical path.β That's what makes the app feel native rather than web-app-y. The 9 sections of the Platform Tour below each apply this principle wherever input latency matters; the design-system and latency-principle callouts inside each section flag the specific instances.
Whether you're curious how the product looks, evaluating the work without creating an account, or studying the build for your own AI-native project β the whole platform is captured below. Click any section header to expand it, then click any screenshot inside to open it at full size in a modal viewer with horizontal and vertical scroll for the larger images.
37 screenshots across 9 feature areas. Skim the section headings, expand the ones that matter to you, decide whether the architecture deep-dive below is worth your time.
This is shipping at v1, and there are things an attentive reader (or a real user) might notice that I already know about and have made deliberate triage calls on:
None of these are killer bugs. Nothing crashes the typing experience. Payments work. Within-region failover (Cloud Run auto-recover + Cloud SQL Regional HA + GLB health checks) works as designed. The analytics pipeline is honest. But if you spot something that feels off, I almost certainly already know about it and have made a deliberate call. That's part of what shipping a v1 looks like under the brief β knowing where the rough edges are, leaving the ones real users don't actually notice, and waiting for evidence before re-prioritizing.
The landing page is the first surface where the Cosmic Identity and the Tech Stack established above meet a real user. The visual frame is the prismatic-card pattern; the input latency feel comes from local-first state. Both are intentional. What the user actually does on this page is two things: pick a language, then choose how to sign in.
The landing page auto-detects the user's preferred language from their computer settings and uses it as the default β about 99% of visitors never touch the dropdown because the system-detected language is correct. The dropdown is there for the other 1%: users who want to switch to one of the other 7 selector entries (the dropdown shows 8 in total because English splits into US and UK, each with its own default keyboard layout). The chosen language is then tied to the account going forward β typing-history analytics are scoped per-language, so switching languages mid-account is intentionally awkward (use a separate account if you actively practice multiple).
Sign-in offers 5 paths so the user picks the path of least friction for them: 3 OAuth providers (Google, GitHub, Microsoft β no userid/password to remember, nothing for CosmicKeys to store or leak), Magic Link via Resend (one-tap sign-in via email link, no password ever set), and a fully-functional Guest mode where the user practices with zero account at all β their entire history lives in XOR-encrypted localStorage and migrates atomically to a real account on sign-up if they decide to stay.
navigator.language, override-able), then 5 sign-in paths β 3 OAuth providers + Magic Link via Resend + a fully-functional Guest mode where the user's entire practice history lives in XOR-encrypted localStorage. Decided that the language choice should pre-configure not just UI strings but the voice family AND the keyboard layout (matched to OS + region detected by the pre-signin endpoint), so the first lesson loads in the correct visual + audio context with zero configuration wizard. Account-bound language means a German user who signs up in German sees German voice, German keyboard, German lesson content β consistency per learner, not per device.BROWSER_LANGUAGE_MAP detection layer with fallback through navigator.languages array, and the pre-signin endpoint that detects OS, keyboard type (ANSI/ISO), and language in parallel.The first 5 seconds on the landing page decide whether a visitor stays or bounces. CosmicKeys leads with 7 first-class languages as a top-of-page commitment rather than a buried settings toggle β that signals localization is baked in, not bolted on. The cosmic dark-space-with-cyan-and-purple-nebula palette signals visual confidence on a category dominated by sterile white-with-blue-accent typing apps, and the typing brand promise is implicit because nothing else is fighting for attention.
navigator.language β BROWSER_LANGUAGE_MAP β SUPPORTED_LANGUAGES[code] with a fallback through navigator.languages array; the user's best-match language is visually highlighted so they know the app already heard them.The biggest adoption blocker in the typing-app category is the sign-up wall β most users want to try the product before committing an account. CosmicKeys offers 5 sign-in paths so each user picks the path of least friction for them, and crucially, Guest mode lets them practice with zero account at all, their progress kept in XOR-encrypted localStorage and migratable to a real account later if they decide to stay. The signal to the user is: we're confident enough in the product that we don't need to trap you behind a form.
Every screenshot above. Every architectural decision below. Every line of those 155,000 lines. Producing a platform of this scope solo requires wearing 20 professional hats simultaneously. Each one shows up in the codebase as concrete artifacts β components, modules, infrastructure, tests, lessons, voices, legal pages. Here are the hats this particular build needed.
But not every hat represents twenty years of personal expertise β and that's the point of the tango. Some hats can only be worn by someone with deep personal experience (π―). Others require knowing the domain abstractly enough to lead the dance while AI brings the implementation depth (π€ β the human provides the goal and the judgment, AI provides the domain depth). The mix below is what one specific build looked like; the mix for a different product would be different.
Sam decided what NOT to build (no gamification, no leaderboards-for-leaderboards) and designed the guest β registered β subscriber state machine. AI executed; Sam iterated 3 rounds until the anti-abuse guard for delete β restore cycles was in.
Sam named CosmicKeys, defined the cosmic theme, picked the dark-space-with-cyan-and-purple-nebula palette before a single line of code. AI followed the style guide; Sam rejected "generic dark mode" components until they actually felt cosmic.
Sam designed the analytics tab order (moved Fingers to position 2 because itβs the most actionable insight). AI built 196 components to spec; Sam redesigned the SVG hit-targets to work cleanly on touch screens β prospects often evaluate the product on a phone before committing to a typing session on their laptop β and added the color-blind alternatives.
Sam: "pinky rotates more than index β anatomically accurate." AI: Framer Motion code. Sam tested across all 30 keyboard layouts and iterated until every ISO variantβs bottom row animated correctly.
Sam chose local-first sync (typing is latency-sensitive) and timestamp-based conflict resolution. AI built optimisticSync.ts. Gemini partner-review caught a race condition in the background sync retry.
Sam knew per-finger latency matters more than aggregate WPM, and that rhythm is more diagnostic than speed. Specified that stutter detection be calibrated against each user's own typing variance (coefficient of variation against their own baseline) rather than a global constant. AI doesnβt understand motor-skill acquisition β those choices come from domain knowledge.
Sam designed the 3-region Terraform topology and switched AIβs default from service-account keys to Workload Identity Federation (zero long-lived credentials). Added a manual promotion gate β AI defaulted to auto-deploy.
Sam: "rank for typing test and position against Monkeytype, TypingClub, Keybr, 10FastFingers, Typing.com." AI ran 12+ SEO audit skills and generated the /vs pages. Sam framed each competitorβs positioning. Result: ChatGPT now cites CosmicKeys as a search referral source.
Sam designed the Playwright locale-injection matrix (test German / Danish / Spanish via Accept-Language headers, no VPN needed). AI wrote the test code. Sam designed what to test at E2E vs unit layer.
Sam created 14 custom Claude Code skills (stage-it, ship-it, go-live, rollback, db-query, prod-proxy, green-test-reset, ai-partner-review, deep-scan, mermaid-sync, β¦). Replaces a dedicated DevOps engineer + DBA + QA lead.
Sam: "translate the German lesson templates β but adapt, donβt translate. German recipes use metric. German legal style has its own conventions." AI generated; Sam caught American measurements in "German" recipes and insisted Swiss German voices be distinct from standard German.
Sam: "compute analytics at write time, store the results, never replay the raw keystroke stream β it doesnβt scale." AI would have defaulted to "store everything, compute on read." The data-engineering tradeoff was Samβs.
Sam ran LLM Council security audits (Claude Code as primary architect + Gemini and Codex as LLM reviewers) on the auth module. Three LLMs scanned independently; Sam triaged β accepted 4 findings, rejected 2 false positives. AI found vulnerabilities; Sam decided which mattered. All at dev-time; nothing deployed.
Sam designed the per-finger latency tab, rhythm analysis, and neighbor-key confusion (did you hit the key NEXT to the correct one?) using neighbor_calculator.py data. No competitor has rhythm + finger-level confusion because no LLM knows these are the actionable metrics for motor-skill learning.
Sam integrated 6+ external APIs β ElevenLabs (14 voices Γ 7 languages, pre-generated at content-release time), Stripe (multi-currency + VPN-proof grandfathering), Google Gemini (called only for support-ticket translation β the deployed app's one external AI-related call, treated like any other third-party service call), MaxMind GeoIP (Docker-embedded), 3 OAuth providers + Resend Magic Link. AI wrote the calls; Sam discovered the edge cases (umlauts, Swiss-German pronunciation, currency arbitrage).
Sam: "implement GDPR Article 17 + 20 β 30-day deletion window, consent audit with IP + UA + document-version, ZIP export of all user data, localized legal pages." AI built the endpoints. Sam added the anti-abuse guard for delete β restore cycles β AI didnβt think of that.
Sam designed the LLM Council code-review pattern: Claude Code as primary architect and code-generation agent + Gemini and Codex engaged as LLM reviewers. Each LLM reviews independently, on the code Claude Code produced; Sam triages findings. Equivalent to a senior team peer-reviewing every commit β entirely at dev-time, not deployed.
Sam designed the support flow: html2canvas auto-captures the userβs screen, message auto-translates user-language β English via a Google Gemini API call (gemini-2.0-flash primary, gemini-2.5-flash fallback) on the write path; both versions stored, so reads are cache hits with zero per-view cost. This is the deployed app's only external AI-related call. Architecture is built; the AI triage layer plugs in when message volume justifies it.
Sam designed sharing as a growth loop, not an afterthought. 5 social-share components with auto-generated branded cards (WPM + accuracy + brand) so every shared result targets exactly the right audience. AI built the components; the growth-loop strategy was Samβs.
Sam: "8 lesson template categories that map to real jobs β nurse, lawyer, developer, support, journalist β each authored separately per language (7 locale files) rather than translated." AI generated; Sam caught that the legal-document template was too simple (real legal typing has citation styles + section numbering) and insisted custom content stay in localStorage (privacy by design).
The 20 hats above explain the choreography. This section explains the tools that made executing the dance at production scale possible β custom skills, multi-AI code review, and an institutional memory that survives session crashes.
Each skill is a reusable automation workflow triggered by natural language. Together they cover what a traditional 5-person team would distribute across roles.
| Skill | What It Does | Traditional Team Equivalent |
|---|---|---|
| stage-it | Commits, merges to staging, pushes β βstage itβ | CI pipeline trigger |
| ship-it | Staging β main, green deploy + Vercel preview | Release manager |
| go-live | Auto-detects FE / BE changes, promotes accordingly | DevOps engineer |
| rollback | Emergency revert of backend + frontend | Incident responder |
| db-query | Pre-built production queries across 3 regions | DBA |
| prod-proxy | Manages Cloud SQL proxy to all regions | Infra access tool |
| green-test-reset | Clears test user data after green deploy verification | QA engineer |
| ai-partner-review | Orchestrates Gemini + Codex review consensus | Two senior code reviewers |
| deep-scan | Gemini-powered codebase exploration Β· 1M context | Senior engineer research |
| mermaid-sync | Generates / updates architecture diagrams from code | Architecture documentation |
10 of the 14 shown. Each replaces a workflow that a 5-person team would distribute across roles.
I call this pattern the LLM Council: Claude Code is the primary architect and code-generation agent that produces the work; Gemini and Codex are engaged as LLM reviewers on what Claude Code generated. Three different models trained on different data with different priorities β their blind spots don't overlap, which is the whole point. Before every major deployment, the ai-partner-review skill delegates the diff to the Council and synthesizes the consensus; I triage the findings the way a tech lead triages PR comments. This is the entire LLM-usage story for CosmicKeys β happens at development time, never deployed. The one runtime LLM call the deployed app makes is Google Gemini for support-ticket translation, and that's treated like any other external API integration (Stripe, Resend) β not as LLM SDK plumbing.
Primary development partner. Knows the full codebase context. Synthesizes the consensus into SYNTHESIS_*.md files.
1M-token context window. Excels at multi-file root-cause analysis β finds bugs that span files Claude reviewed separately.
OpenAI's code-focused model. Independent security review. Provides a second opinion on the same diff.
Claude Code sessions have finite context windows. When a session hits its limit or crashes, the next one can resume exactly where the previous one stopped by reading four sources:
.claude/SESSION_STATE.md β current work state, files changed, verification checklist.ai-context/diagrams/*.md β 21 mermaid diagrams providing architectural contextCLAUDE.md β project root instructions, full orientation for a new session~/.claude/projects/<path>/memory/ β user-specific context that persists across all sessionsA new Claude session reads all four and can resume mid-deployment β including verifying a deploy that the crashed session was monitoring. This replaces the βteam wiki + onboarding docβ overhead that traditional teams maintain.
Most consumer products either ship ads or paywall the experience. CosmicKeys does neither. The economics are intentional.
Sign up, pick lessons, practice, see analytics, share progress β these are intended to remain free for the things people use daily. In normal use, the vast majority of users complete typical sessions without encountering the usage-based paywall.
The three plans (FREE, TRIAL, PRO) are defined in backend/app/services/entitlements.py: FREE gives 3 speed tests per day and the core typing practice; TRIAL gives 10 tests per day for 7 days; PRO removes the test cap and unlocks pro lessons and full analytics history. The thresholds gate operations that genuinely scale with usage (speed-test storage, deep analytics retention, expanded curriculum access) β not the core practice loop, and not voice audio (voices are pre-generated at content-release time so adding users does not add per-user TTS cost). Infrastructure scaling across the three Cloud Run regions (US / EU / Asia) is the platform's actual marginal cost.
Hosting stays predictable because the expensive operations are bounded by the algorithm. Power users pay because they actually use more than the free allowance provides β not because the free tier was intentionally crippled to force an upgrade. There's a meaningful difference between βfree tier limited because of unit economicsβ and βfree tier crippled to extract revenue.β
The actual subscription management screen β clear pricing, instant downgrade, no dark patterns. Click to enlarge.
Two things are happening on a subscription page that often get conflated: the user wants to change their billing state (upgrade, downgrade, cancel), and the company wants to keep PCI scope as small as possible. Most apps optimize the first against the user (friction-as-retention) and ignore the second until they get audited. The cleaner design separates them: the page itself should remove every dark pattern between the user and the action they came to take, and the payment data path should stay outside your servers entirely via a hosted checkout like Stripe Checkout. Those two decisions compound β the simpler the page, the easier it is to keep PCI scope minimal because there's less custom code to audit.
* About the free tier: CosmicKeys uses a usage-based paywall algorithm that continuously monitors infrastructure burn rate across the platform's three GCP regions (US / EU / Asia) and adjusts thresholds dynamically. The descriptions above reflect the platform's current design philosophy and typical behavior at the time of writing β they are not a binding commercial guarantee. The specific limits, which features are metered, and the pricing tiers may change over time as operating costs evolve. CosmicKeys reserves the right to adjust tiers, limits, and feature availability at its discretion. For current commercial terms, see cosmickeys.app.
You've now seen the surface β the lessons, the analytics, the polish. When people visit cosmickeys.app they see a beautiful typing practice platform with animated finger guides. What they don't see is the production architecture underneath: a multi-region setup across three Cloud Run regions (US, EU, Asia), a voice narration pipeline that pre-generates audio lessons in seven languages (ElevenLabs TTS, baked at content-release time), a client-side analytics layer that captures every keystroke and persists it to both localStorage and the database, and a deterministic adaptive-lesson selector that scores curriculum candidates against each learner's stored error patterns.
Vercel Edge serves the Next.js frontend as a single global deploy (vercel.json regions: ["iad1"]) β Vercel's anycast network caches static assets worldwide. GCP Global Load Balancer fronts the API (api.cosmickeys.app) with a single anycast IP that routes to the healthiest of three regional Cloud Run backends.
Frontend is one Next.js 16 build on Vercel β global edge does the rest. Backend is three independent Cloud Run services (us-central1, europe-west1,asia-southeast1) running FastAPI + a Traffic Cop middleware that re-routes writes back to a user's home region when they roam.
Each region has its own Cloud SQL Postgres 15 with REGIONAL HA (zonal failover within the region). US holds the directory source-of-truth; EU and Asia point to it for shared directory data via app-level lookups, not native cross-region replication. Support-ticket translations are written through a Google Gemini API call (gemini-2.0-flash primary, gemini-2.5-flash fallback β the deployed app's one external AI-related call) and persisted to Postgres.
Terraform manages the regional infrastructure; the GitHub Actions workflow at.github/workflows/deploy-prod.yaml deploys new revisions withgcloud run deploy --tag=green --no-traffic. After smoke tests pass,./scripts/promote.sh shifts traffic β manual gate by design, so a bad build never auto-replaces a healthy one across three regions at once.
Secrets (DB password, auth secret, Stripe live + webhook) live in GCP Secret Manager. The Cloud Run service account getssecretAccessor via Terraform IAM β secrets are injected at deploy time, never in code or container images. Vercel-side secrets are intentionally a manual sync(the Terraform Vercel provider is left disabled in vercel_secrets.tf) so a credential rotation needs a human ack before the frontend picks it up.
This is the part most engineers wave off with βjust put it behind Cloudflare.β That works until you have stateful services, database write consistency, and region-specific localization. I designed a deliberate routing strategy that handles all three.
For a real-time typing platform, every 50ms of round-trip latency is user-visible. Anycast routing at the edge sends each user to the geographically nearest data center. A user in Stockholm hits EU (europe-west1); a user in London hits EU; a user in Seattle hits US (us-central1). The Asia region (asia-southeast1, Singapore) primarily serves prospects in Asia evaluating the marketing page β typing content itself is in the seven supported European/US languages, not localized for Asian languages. Zero configuration, automatic.
The vast majority of failure modes resolve themselves inside the region β Cloud Run keeps a minimum of one instance running and automatically replaces a crashed instance (typically within seconds), and Cloud SQL runs withavailability_type = REGIONAL, which gives automatic zonal failover from primary to standby in another zone. At the routing layer, GCP's Global Load Balancer continuously health-checks each region's backend and stops sending new traffic to one that's failing. The deliberate trade-off:data is region-locked. An EU user's account, progress, and lesson history live in EU Cloud SQL only β we do NOT spill EU users into US Cloud Run if EU went fully down, because US has no copy of their data. That preserves GDPR data residency and write consistency; the cost is that a true whole-region outage would briefly degrade users in that region rather than seamlessly route them elsewhere. If a connection blip interrupts a user, localStorage queues recent keystrokes and flushes back to the home region when it's reachable again.
Each region serves localized content for the seven supported languages β German, Swiss German, Danish, Spanish, Norwegian, Swedish served from EU; English (US) and English (UK) served from US. The traffic cop attaches a region header that the backend uses for content selection. The localization is cache-friendly: each region's CDN caches its own language variants, not a global cache invalidated by any single update.
Localization is usually treated as an afterthought β a post-launch βinternationalizationβ project. I designed it as a first-class architectural concern from day one because retrofitting i18n into a typing platform (where literally every character on the screen has to localize) is a nightmare.
Source of truth: SUPPORTED_LANGUAGES in frontend/lib/languageDetection.ts β the 8 entries map to 7 base languages because English has separate US and UK variants (each with its own default keyboard layout: ANSI for US, ISO for UK).
All seven supported languages are Latin-script, but each adds keys that don't exist on US ANSI: German and Swiss German umlauts (Γ€ ΓΆ ΓΌ Γ), Spanish Γ± ΒΏ Β‘, and Nordic Γ¦ ΓΈ Γ₯. Each needs its own keyboard layout JSON, its own neighbor-adjacency map for typo detection, and its own finger-assignment for the extra keys.
QWERTY (US ANSI), QWERTY-ISO (UK), QWERTZ (German, Swiss German), plus Nordic ISO and Spanish ISO variants β across Mac and Windows. Finger animation guides must match the user's actual keyboard, and the ISO bottom-row extra key shifts every other key's finger assignment by one position.
Typing exercises, sample sentences, and progressive difficulty curves are all localized per language. German beginner lessons drill umlauts (Γ€ ΓΆ ΓΌ Γ) before moving to longer words; Spanish beginner lessons drill Γ± and accented vowels; Nordic lessons drill Γ¦ ΓΈ Γ₯ β different beginner curves per language because the hard keys are different.
Every instructional prompt has a native-speaker audio variant. The user hears lesson guidance in their language, not English subtitles.
Dates, numbers, and progress metrics are formatted per locale β DD.MM.YYYY in German/Swiss German, DD/MM/YYYY in UK English/Spanish/Nordic, M/D/YYYY in US English. Decimal separators (1,234.56 vs 1.234,56) and thousand separators differ across the seven supported locales.
Color palettes and imagery adapted where colors carry cultural meaning. Error states don't use red in cultures where red signals success.
One of the features that gets zero credit from first-time visitors is the voice narration system. Every instructional prompt β βplace your left index finger on F, your right index on Jβ β has a professionally-generated audio variant in every supported language.
Every keystroke is captured in the browser. WPM, accuracy, error patterns, and per-key heatmaps are all computed client-side β instantly, with zero network round-trip β and rendered to the UI as you type. The same data is persisted twice: in localStorage for session resilience, and in the backend database for cross-device access and long-term analytics.
Character pressed, timing delta, accuracy flag. The atomic unit captured in-memory, then persisted as aggregated session data.
Rolling average computed client-side and ticked into the UI on every keystroke β no server round-trip.
Which keys the user mistypes most often (errors_by_char, slowest_keys, rhythm). Aggregated client-side, persisted to lesson_attempts on lesson completion, and consumed by the deterministic adaptive lesson selector.
Finger-level accuracy heatmap driving the visual feedback (glow intensity per finger). Computed live, snapshotted to the DB at lesson end.
Lesson completions, new speed records, language milestones β fire-and-forget POSTs to the notification service.
localStorage is the safety net. Lose the network mid-lesson, your session continues; analytics flush to DB when you reconnect.
Up front: there is no runtime LLM call in this loop. The deployed app does not make any inference calls for lesson recommendation β the βadaptiveβ behavior is deterministic code consuming rich analytics that are computed in the browser and persisted to the lesson_attemptstable at lesson completion. The intelligence lives in the analytics design and the curriculum, not in an inference call.
(Claude is part of the dev-time workflow β Claude Code as Sam's primary architect and code-generation agent in the LLM Council pattern (Claude Code as primary + Gemini and Codex as LLM reviewers) β but none of these are invoked at runtime by the deployed CosmicKeys app. The deployed app's one external AI-related call is to Google Gemini for support-ticket translation only; details in the support flow above.)
Most of failover on CosmicKeys is βGCP handles it within the region.β The interesting part β the part that's actually a design decision rather than a vendor-provided default β is the one place we deliberately chose data integrity over cross-region failover. Here's what each layer actually does when something breaks.
Each region's Cloud Run is configured with a minimum instance count β₯ 1, so there is always at least one warm instance ready. If an instance crashes, GCP automatically spins up a replacement β typically within seconds β and the failed instance is removed from the load-balancer pool. We do not write any of this logic; it is the Cloud Run platform.
Each region's Postgres 15 runs with availability_type=REGIONAL (configured in terraform/modules/cosmic_region/main.tf for the production environment). That gives automatic zonal failover from primary to a synchronous standby in a different zone within the same region. Zero data loss, no manual intervention, no application code involvement.
GCP's Global Load Balancer continuously health-checks each region's backend and stops sending new requests to a failing one. This happens at the anycast routing layer β built into GCP, not custom code. New requests from users near the failed region naturally route to the next-healthy region; the in-app Traffic Cop middleware then proxies writes back to the user's home region for write consistency.
Data does NOT replicate across regions. An EU user's account, progress, and typing history live in EU Cloud SQL only. If the entire EU region went fully down (rare β GCP region-wide outages are infrequent), the EU user would be briefly degraded rather than seamlessly served from US Cloud Run, because US has no copy of their data. We chose this for GDPR data residency and write consistency; the cost is one failure mode we accept rather than engineer around.
localStorage caches the current lesson and recent keystrokes locally. A brief connection loss does not interrupt the typing session β the user keeps typing, analytics keep computing in the browser. When the home region is reachable again, queued writes flush to that region (not to whichever region the network happens to route to at flush time).
A small set of shared, read-mostly data (lesson catalog, plan definitions, global content metadata) lives in the US Cloud SQL as the directory source-of-truth; EU and Asia look it up via app-level cross-region reads. User-owned data (progress, account, history) never crosses regions in either direction.
A handful of architectural choices that distinguish βbuilt a thingβ from βbuilt a thing the right wayβ β the kind of decisions that don't show up in a feature list but do show up in a postmortem when the wrong one was made.
Green revisions deploy to all 3 Cloud Run regions with 0% traffic. Testing happens against the green endpoint directly. Only after explicit verification does promote.sh flip traffic. Auto-deploy is convenient until it ships a breaking change at 3am.
MaxMind GeoLite2-Country is baked into the Docker image at CI build time. Cold starts don't hit MaxMind's download servers. Zero runtime dependency on a third party. Refreshed on every build for current data.
GitHub Actions authenticates to GCP via OIDC tokens. No service account keys exist anywhere. Secret Manager access is IAM-scoped per region β EU Cloud Run can only read EU database passwords.
Speed-test analytics live as JSONB documents (not normalized tables). Reading a user's full history is one document fetch. GIN-indexed for fast queries against specific keys. Schema-flexible β adding fields doesn't require a migration.
Every consent action stored with IP, user agent, timestamp, document version. Article 17 (Erasure) with 30-day window + anti-abuse guard for delete β restore. Article 20 (Portability) ZIP export of all user data. Demonstrable consent for any document version β what an auditor needs.
Guest progress is XOR-encrypted + base64-encoded before localStorage write. Not bulletproof against determined attackers (key is in minified JS), but raises the bar against casual DevTools tampering. The right tradeoff for the βtry without signupβ zero-friction goal.
gemini-2.0-flash primary, gemini-2.5-flash fallback) β translate-on-write, both versions stored, no per-view cost. This is the deployed app's only external AI-related API call β treated like any other third-party service call alongside Stripe, OAuth, and Resend. There is no LLM SDK plumbing built into the product, no per-user inference, no LLM in the user data path. All other AI usage on this project β code generation, the LLM Council code-review process, architecture conversations β happens at development time via Claude Code, not at runtime.availability_type=REGIONAL for zonal failover; no cross-region replication of user data. US holds the directory source-of-truth (lesson catalog, plan definitions); EU and Asia look it up via app-level reads.vercel.json regions:["iad1"])--tag=green --no-traffic + manual promote.shA field note on building with AI as a partner: the question in 2026 isn't whether you personally have hands-on expertise in every domain a modern product touches. No one does. The question is whether you have the discipline to wield AI as a force multiplier β formulating the right question, leading the dance, evaluating AI's answer against a clear product vision, and making the final judgment call when the dance produces options. That's the meta-skill that turns one curious builder β engineer, PM, designer, QA lead, founder, whoever β into a multi-role product team.
What the dance demands, in any role:
Why this scales beyond a solo build: every architectural component on this page β the traffic cop middleware, the localization pipeline, the pre-generated voice catalog, the client-side analytics, the deterministic adaptive lesson selection, the dev-time multi-AI review process β is a reusable pattern that scales from solo build to platform team. Multi-region consumer infrastructure is expensive to build once and cheap to replicate. The same engineering pattern that produces a solo build in six months produces a team multiplier when scaled to a five-person platform team serving multiple products. The opposite isn't true β a team of five vibe-coders does not become an AI-native engineering organization just by adding Claude licenses. The discipline has to live in the workflow, not the tooling.
CosmicKeys is the worked example. Each section above shows what evidence looks like for one item on the list: the 20 Hats show role-coverage discipline; the AI-Partner Workflow shows the choreography; the Engineering Decisions show architectural taste; the Platform Tour shows shipped proof; the Architecture sections show the systems underneath; the freemium economics show product judgment beyond technical depth. If you're learning the dance, this is what the receipts look like at production scale. If you're evaluating others who claim to do this kind of work, this is the bar to compare them against.