An email inbox rebuilt as a production system — Gmail API, a SQLite sender catalog, a 4×/day cron, a human-in-the-loop over Telegram, and Claude Code as the interface layer. Ten days of focused evenings closed a battle I'd been losing for years.

Gmail's UI treats every email as a decision. The real decisions are about senders.
For years I built up filters one promotional email at a time, usually at 1 AM after a particularly irritating newsletter. Forty-plus filters later, I couldn't tell what they did without reading each one. When Google Pay sent me a confirmation that I'd added a card, I couldn't find the email.
The unlock was realizing my inbox isn't a list of messages — it's a flow of senders. Each message from a sender I've already decided about doesn't need to be decided again. So I made the sender the primary key: a SQLite catalog with one row per sender and a stored decision (archive / mark_read / keep_inbox / unsubscribe). Every other piece of the system became a dumb executor of that catalog.
Gmail API · SQLite catalog · triage script (+ sweeps) · me.
OAuth2 to a GCP project, scopes: mail.google.com, gmail.send, gmail.settings.basic. Token cached on disk, refresh-token-based. App published to production mode so tokens don't expire every 7 days.
One row per sender. Columns: email, domain, decision, decided_by (user / rule / gemini), confidence, lifetime counts, sample subjects. This file is the source of truth.
Core loop: fetch unread → look each sender up in catalog → apply decision. Plus two sweeps: spam purge (create domain filters for new primary domains, permanently delete current spam) and badge sweep (clear UNREAD on archived-but-unread messages so Gmail's sidebar badges don't lie).
New senders enter a review queue. Security keywords always surface even from auto-archived senders (so password-reset emails don't slip through). Telegram pings only when there's something genuinely new — silence is the feature.
email = 'specials@eml.cigar.com'
domain = 'eml.cigar.com'
display_name = 'CIGAR.com Specials'
decision = 'archive' # archive | mark_read | keep_inbox | unsubscribe
category = 'Marketing'
decided_by = 'user' # user | rule | gemini | inferred
confidence = 100
total_count = 483 # lifetime
unread_count = 0 # this run
sample_subjects = ['50% off...', ...]
first_seen = 2024-02-12
last_seen = 2026-04-22A single catalog row
The kind of invariants you don't know you need until you ship the bug.
Filtering specials@eml.cigar.com, support@cigar.com, and rewards@cigar.com as three separate rules is bureaucracy. Collapse the sender to its primary registered domain (cigar.com) using a public-suffix-aware heuristic. One domain, one rule.
An early version of the script auto-created a filter for every primary domain in Spam. One spam message came from a gmail.com address. The script happily created a filter that would have trashed every future email from any gmail.com sender. I also caught:
insidetracker.com — a paid subscriptionne.jp / org.bd / edu.pl — public suffixes, not registered domains (like .co.uk)frame.io — a tool I use; only the marketing subdomain should be filteredThe fix: two persistent lists baked into the script — a public-suffix list so the collapse doesn't over-shorten, and a DEFAULT_SKIP list of multi-tenant email hosts (gmail, yahoo, icloud, outlook…) and real services I use. Never blanket-filter these.
A huge chunk of a real inbox isn't spam — it's transactional receipts. Shipping notifications, order confirmations, utility bills, card alerts. I don't want them archived (I'll need to search later), but I also don't want them unread (false urgency). The mark_read decision handles exactly this class: stay in inbox, don't contribute to the counter. This was the single biggest contributor to going from 35,000 to zero.
One morning my Updates sidebar badge said 2. I clicked — empty. Gmail's sidebar counts every message with CATEGORY_UPDATES + UNREAD, regardless of INBOX state. But the Updates tab filters to INBOX. Some older filters had archived messages without also clearing UNREAD, leaving phantom, unreachable, permanently-unread messages lighting up the badge.
# Clear UNREAD on any archived-but-unread message — these are unreachable from any tab
resp = svc.users().messages().list(
userId='me',
q='is:unread -in:inbox -in:trash -in:spam -in:drafts',
maxResults=500,
).execute()
ids = [m['id'] for m in resp.get('messages', [])]
if ids:
svc.users().messages().batchModify(
userId='me',
body={'ids': ids, 'removeLabelIds': ['UNREAD']},
).execute()The ten-line fix — runs at the end of every triage
A Telegram bot that pings you only when something actually needs you.
Every triage run classifies mail into three buckets: known senders (rules apply automatically), new senders (added as pending and queued for my review), and security items (always surfaced, even if the sender is auto-archived — because a password-reset email from a marketing-archived domain still matters).
When I'm at my desk, I review the pending queue by typing something like “archive joolca, keep apple card, unsubscribe starlink” and the system translates that into --decide calls. When I'm not at my desk, Telegram pings me — only if there's a new sender, a security item, or a pending unread count above a threshold. A bot that pings every four hours with “nothing new” trains you to ignore the bot. Silence is the feature.
launchd cron → triage → spam sweep → badge sweep → notify (maybe).
Scheduled via launchd at 07:00, 12:00, 17:00, and 22:00 — the hours of my day when checking an inbox is actually useful. Each run does four things in order: apply the catalog's decisions to current unread, scan Spam and purge (creating filters for new primary domains in the process), sweep the ghost-unread archive, and fire a Telegram notification only if there's something worth firing about.
The wrapper is 30 lines of bash. The triage itself is around 400 lines of Python. The spam sweep is 150. The badge sweep is ten. All the interesting behavior lives in the catalog schema and the decision logic — the plumbing is boring on purpose.
Natural-language triage beats remembering shell commands.
I didn't build this in one sitting. I built it through conversation. When I type “check gmail”, Claude invokes the triage skill and returns structured output: N unread, M auto-archived, K pending. If there are pending senders, it surfaces them with sample subjects, and I can reply “archive X, keep Y” in plain English. Claude translates that into a batch of --decide calls.
When I have ideas like “always delete everything in the spam folder after you apply the rules”, Claude understands that's a standing behavior change, wires it into the launchd wrapper, and tests it end-to-end. The next scheduled run at 17:00 just does it. This is what I mean when I call LLMs engineering partners rather than autocomplete.
The stuff I'd do differently if I started again.