Back to AI/ML Overview
🧠 First Open-Source Project · 100% Local by Default · v0.11

MnemosPersonal RAG. Drop a folder, ask a question β€” even from your phone.

A personal RAG (retrieval-augmented generation) system that defaults to 100% local β€” embeddings on your machine, chat on your machine, zero external inference calls in the default install. Drop a folder, ask questions in plain English, get answers with citations to your own files. Then, optionally, pair a private Telegram bot to ask the same questions from anywhere. MIT-licensed. github.com/cosmicflow-space/mnemos.

Mnemos demo β€” cited answers from your own files, on your desktop and from your phone via Telegram
One brain, two surfaces: ask on your desktop, or from your phone via a private Telegram bot β€” every answer cited to your own files. 100% local by default.
v0.11
active development
MIT
license Β· Zen Algorithms LLC
0
external inference calls (Tier 1)
< 90s
install to first answer
~/.mnemos
one SQLite file Β· 384-dim vec
πŸ“²
ask from phone via Telegram

πŸ“‹The Brief β€” Why Mnemos Exists

Most tools today are either (a) cloud SaaS that requires uploading your private documents to someone else's servers, or (b) heavyweight self-hosted platforms that demand a separate , container orchestrator, and an afternoon of setup before you ask the first question. Both fail the β€œI want to ask my own files questions and keep them on my own machine” test.

Mnemos is the opposite of both: a local-first single-user that installs with one command, stores everything in a single SQLite file at ~/.mnemos/mnemos.db, and runs the pipeline locally (BGE-small via ONNX) β€” so you don't pay an API to index your own files. The chat model is also local by default (Ollama). External plugins (Claude / GPT / ) are opt-in, not required.

What this page IS (and what it isn't)

Mnemos is my first open-source project β€” a different category from my solo consumer products (CosmicKeys and WatchAlgo). Those I built and host for end users; mnemos I built and release for other developers to run themselves.

βœ… What mnemos IS
  • β€’ Personal over your own files
  • β€’ 100% local by default β€” Ollama + BGE-small, zero external inference calls (one-time model weight fetch only)
  • β€’ Single-pane UI β€” everything behind a settings launcher; you never leave chat
  • β€’ Ask from your phone via private Telegram bot (default-deny, no public server)
  • β€’ MIT-licensed, contribution-welcome (CLA model)
❌ What mnemos IS NOT
  • β€’ Not multi-tenant β€” single user, one machine
  • β€’ Not a SaaS β€” you run it yourself
  • β€’ Not a no-code visual builder β€” opinionated pipeline, one strong default per stage
  • β€’ Not an agent platform β€” only, deliberately scoped
  • β€’ Not production-grade yet β€” active development, v0.11 (core stable since v0.1)

The CTA is GitHub (clone, run, star, contribute) β€” not β€œsign up to try.” There's nothing to sign up for; the install is one command on your laptop.

πŸ–ΌοΈWhat It Looks Like β€” One Pane, Six Surfaces

Mnemos is built around a deliberate single-pane UI β€” no drag-and-drop canvas, no setting-storm before you can ask a question. The home page IS the chat page; everything else lives behind the glowing settings launcher at the bottom-left. Click any screenshot to open at full size.

πŸ”’How Private Do You Want It? β€” Three Tiers

Personal means personal . Privacy is the key β€” and the default. Mnemos gives you three tiers, in order of increasing data egress. You choose. Most users stay in TierΒ 1.

Tier 1 β€” Fully local (default)
  • β€’ : BGE-small via ONNX, on your machine (first use downloads ~120 MB of weights from Hugging Face once, then cached)
  • β€’ Chat: Ollama (v0.1 wired) or bundled llama.cpp (v0.2 stub) β€” on your machine
  • β€’ Network: zero external inference calls after the one-time model fetch. No of your data leave the machine.
  • β€’ Auth: none β€” no API keys, no OAuth
  • β€’ Audit log proves it: query events record provider: "ollama" β€” no external-provider call is ever recorded
  • β€’ Best for: sensitive data, offline use, full sovereignty

node setup.mjs defaults to this tier.

Tier 2 β€” Hybrid (opt-in, per-question or per-session)
  • β€’ : local default (or external if you switch)
  • β€’ Chat: Claude / GPT / β€” you pick, you provide auth
  • β€’ Network: only retrieved (~500–2k tokens) cross the boundary. Never raw files. Never your full corpus.
  • β€’ Audit: provider, model, retrieved IDs, prompt-size estimate, latency
  • β€’ Best for: frontier-model quality, when you've consciously chosen to share retrieved

The chat UI lets you switch the model per question.

Tier 3 β€” Fine-tuned local (v0.3 sketch)

Train a small open model on your own corpus. Stays on your machine. Becomes specifically yours over time. Roadmap consideration β€” community input welcome.

πŸ’‘What about vendor OAuth tokens?
Mnemos's credential auto-detection scans for first-party CLI OAuth files (~/.claude/.credentials.json,~/.codex/auth.json) and surfaces them as detected β€” but marks them non-importable with a TOS-explanation note. Anthropic explicitly prohibits third-party OAuth reuse; OpenAI does not document it. API key is the only sanctioned method for every external provider. Honesty about credential safety is itself a positioning point.

πŸ“²Ask From Your Phone β€” The Telegram Remote Channel (v0.9)

Your personal runs on your computer β€” but you don't have to be at your computer to use it. Pair a private Telegram bot and query your indexed documents from anywhere.

πŸ”Œ

No public server, nothing exposed

Mnemos reaches out to Telegram via long polling β€” so it works behind home NAT with no port-forwarding, no tunnel, no public IP. Same outbound-only posture as everything else in mnemos.

πŸ›‘οΈ

Default-deny, single-use pairing

The bot answers only you. Pairing happens via a single-use, time-boxed, CSPRNG-generated code entered in the UI β€” any other chat is ignored. DMs only, never groups. Query-only β€” no source-management commands accepted via Telegram.

🏠

Your documents stay home

Only the question and answer pass through Telegram (and whichever model you've configured). Files never leave your machine. Bot token stored chmod-600 at ~/.mnemos/.env, never logged.

πŸ€–

Uses your configured model

Local Ollama by default, or Claude / GPT / if you've set one β€” the bot mirrors your UI choice. Server-side persisted (MNEMOS_DEFAULT_MODEL) so it survives restarts.

πŸ”‘3-way AI security review at ship time

Telegram was reviewed by an AI partner panel (Claude + + Codex) before v0.9 shipped. The review confirmed: no authorization bypass, no token leakage, atomic pairing logic, no path for unauthorized chats to receive answers. Set up in Settings β†’ πŸ“² Telegram; there's a built-in step-by-step guide for users new to Telegram bots.

The catch, by design: your computer must be awake with mnemos running for the bot to reply. No cloud backend β€” by choice.

WhatsApp is on the radar but not yet supported β€” there's no free, local-first-friendly bot API for it the way Telegram offers.

βœ“Verified-Answer Memory β€” Make Small Local Models Nail Facts (v0.5)

Small local models are great for sovereignty but sometimes fumble specific facts. Mnemos lets you tag a correct answer with βœ“ Save verified β€” the Qβ†’A pair is stored, and future closely-matching questions inject that confirmed answer into the prompt. Even a small local model now answers correctly.

Strict semantic match

A verified answer fires only when the new question is semantically close to the original (vector match against verified-Q ). Loose paraphrases caught; unrelated questions left alone.

Lazy content-hash invalidation

A verified answer stops firing once its source change β€” so when your underlying files update, stale verified answers drop out automatically. No manual cache invalidation.

β€œβœ“ verified” badge in chat

Boosted answers carry a visible badge so you always know which answers came from verified memory vs raw retrieval. Transparency over silent magic.

Management view in Settings

Review, edit, or delete your verified answers from Settings β†’ Verified answers. Backed by the verified_answer + vec_verified tables; queryable via /api/verified.

Design rationale + considered-and-deferred alternatives documented in docs/design-notes/verified-answer-memory.md.

πŸ”„Self-Updating Sources β€” Auto Re-Scan + Pause/Resume

Static archives shouldn't burn background CPU. Hot folders should re-index when they change. Heavy ingests should be pausable. Mnemos handles all three with per-source scheduling, durable pause state, and a live launcher-ring indicator.

Per-source cadence β€” manual default, as fast as every 5 min

Each source has its own re-scan cadence (set when added, editable any time). Default is Manual β€” most users add static documents and don't want background CPU spent re-scanning them. Point a changing folder at Daily or 5-minute cadence from a dropdown. Re-scans are incremental (content-hash skip), so only changed/new files re-embed. Implemented as a periodic poll via Next.js instrumentation.ts; tunable with MNEMOS_WATCH_TICK_MS, disable with MNEMOS_DISABLE_WATCHER=1.

Pause / Resume β€” durable across restarts

Pause a running ingest per source β€” or all at once (β€œPause all,” useful before bed or on heavy CPU) β€” from the Sources panel. Resume continues where it left off (already-finished files are hash-skipped). Pause is durable β€” persisted to the source.pausedflag β€” and survives a restart. The background watcher skips paused sources (no auto-resume on next tick). Cooperative: stops at file boundaries, marks any interrupted file partial so resume reprocesses it.

Concurrency-safe β€” atomic DB lease prevents collisions

Manual ↻ Re-scan, background watcher, and even multiple server processes coordinate through an atomic DB ingest lease (source.ingesting_since), so the same source is never ingested twice at once (which would race writes). Failed auto-scan stays due and retries next tick. Crashed lease self-heals after 30 minutes.

Live status indicator β€” launcher ring breathes

The settings launcher (bottom-left avatar) reflects live ingestion: breathing cyan ring while ingesting (with a done/total tooltip), amber when paused, red on error. Backed by an in-memory status registry + GET /api/ingest/status; pinned to globalThis so the watcher and routes share one instance.

πŸ“Sources Without Surprises β€” Files, Folders, Metadata, Counts

πŸ“„

Folders or single files (v0.8)

Paste a folder path or an absolute path to a single file. Mnemos auto-detects which. A single explicitly-chosen file bypasses the soft noise filters (since the choice is deliberate) β€” but the security hard-lock still always applies (a single .env or id_rsa is still refused, including via symlink).

πŸ“

Per-file metadata (v0.8)

Every ingested file carries one extra retrievable describing path, size (human + raw bytes), last-modified date, and type. Metadata questions like β€œhow big is resume.pdf?” or β€œwhen was notes.md modified?”retrieve reliably even when no content ranks high. Re-scanning backfills metadata for files indexed before this feature without re- content.

πŸ“Š

Inventory questions (v0.11)

β€œHow many documents do I have? What types? Which sources?” are now answered from a COUNT-based Library Overview (total files, , by type, by source) injected into the prompt for inventory-intent questions. Gated to inventory questions so normal queries pay no aggregate cost.

πŸ”

Filename retrieval (v0.11)

Questions that name a file (β€œVIN in ipostal.pdf?”) now co-retrieve that file's content spliced adjacent β€” bounded by a fixed budget β€” so the file's synthetic metadata doesn't out-rank and bury its own content at scale.

πŸ›‘οΈ

Containment guard (v0.11)

Adding a folder that's already inside a registered source no longer creates a duplicate β€” mnemos detects the overlap (realpath + path-boundary check) and refreshes the parent so the subfolder is indexed under it. Adding a folder that contains existing sources warns and offers β€œAdd anyway.” Stops the silent double-ingestion that overlapping sources would otherwise cause.

πŸ”„

Hard-locked exclusions (since v0.1)

.env, *.pem, id_rsa*, anything under ~/.aws/ β€” never indexed even with explicit user opt-in. Soft-default exclusions (logs, lockfiles, hidden dotfiles, files >10 MB) are opt-in to include. The asymmetry β€” soft defaults for noise, hard locks for credentials β€” matches the actual risk profile.

πŸ—οΈArchitecture β€” Monorepo + Plugin SDK Contract

Mnemos is a pnpm monorepo with a strict separation between the pipeline (in packages/core/) and the providers + loaders (in plugins/). The Plugin SDK in between is the contract that makes mnemos genuinely extensible.

mnemos/
β”œβ”€β”€ apps/web/              # Next.js 15 UI + API routes + instrumentation.ts
β”‚                          #   (background source watcher + Telegram poller)
β”œβ”€β”€ packages/
β”‚   β”œβ”€β”€ core/              # RAG pipeline: ingest, query, prompt, registry, crypto
β”‚   β”œβ”€β”€ db/                # SQLite + sqlite-vec wrapper (384-dim vec)
β”‚   β”œβ”€β”€ plugin-sdk/        # Plugin SDK barrel (apiVersion 0.1)
β”‚   └── cli/               # `mnemos` CLI
└── plugins/
    β”œβ”€β”€ embed-local/       # BGE-small via ONNX β€” bundled default
    β”œβ”€β”€ anthropic/         # Claude (streaming)
    β”œβ”€β”€ openai/            # Chat + Embeddings (text-embedding-3-small @ 384)
    β”œβ”€β”€ gemini/            # [STUB β€” v0.2]
    β”œβ”€β”€ ollama/            # Chat + Embeddings β€” host-local, default chat
    β”œβ”€β”€ llama-cpp/         # [STUB β€” v0.2]
    β”œβ”€β”€ loader-pdf/        # via pdf-parse
    β”œβ”€β”€ loader-markdown/   # YAML frontmatter parsing
    β”œβ”€β”€ loader-plaintext/  # .txt, .log, .csv, .json
    β”œβ”€β”€ loader-web/        # URL ingestion
    └── loader-code/       # 25+ source-code extensions
πŸ”‘The Plugin SDK contract

Plugins can only import from mnemos/plugin-sdk. They cannot reach into packages/core/** or other plugins' internals. The SDK is versioned (apiVersion 0.1, additive-only changes within 0.1.x; breaking changes will bump apiVersion) and backward-compatible.

This is what makes mnemos genuinely extensible β€” third-party plugins won't break when the core changes, because they only touch the SDK barrel. The barrel is small enough to audit and stable enough to depend on.

βš™οΈTech Stack

Runtime + Build

  • β€’ Node 22+ β€” only prerequisite for users
  • β€’ pnpm monorepo (pnpm-workspace.yaml)
  • β€’ TypeScript throughout, strict mode, ES2022, ESM
  • β€’ Next.js 15 for the web UI + API routes
  • β€’ instrumentation.ts for background services (source watcher + Telegram poller)

Storage

  • β€’ SQLite β€” single file at ~/.mnemos/mnemos.db
  • β€’ sqlite-vec β€” vector extension wrapped via packages/db/
  • β€’ 384-dim vectors standardized across all bundled embedders
  • β€’ No separate β€” deliberate β€œfree by default” call

Models (local default)

  • β€’ BGE-small via ONNX (@xenova/) β€” bundled local embedder, 384-dim
  • β€’ Ollama β€” default chat provider; /api/query defaults to providerId: "ollama"
  • β€’ OpenAI text--3-small (Matryoshka @ 384) + Ollama all-minilm as alternative embedders

Deployment

  • β€’ node setup.mjs β€” cross-OS installer (macOS / Linux / Windows). Reads INSTALL.md (executable docs).
  • β€’ docker compose up -d for Docker preference
  • β€’ pnpm install && pnpm dev manual path for contributors
  • β€’ Port + state-dir overrides β€” MNEMOS_PORT, MNEMOS_STATE_DIR, MNEMOS_BIND all cross-platform (Windows cmd.exe included since v0.10)

πŸ”ŒPluggable Providers β€” What's Wired in v0.11

The plugin SDK supports two kinds of providers: chat ( completions for answering questions) and (used at ingest time to vectorize ). Mix and match β€” for example, use local BGE-small for free ingest, then Claude for high-quality chat.

PluginChatStatus (v0.11)Notes
embed-localβ€”βœ“ defaultWiredBGE-small via ONNX (bundled, 384-dim) β€” zero API cost for ingest
ollamaβœ“ defaultβœ“WiredHost-local chat (default), no auth, fully offline
anthropicβœ“β€”WiredClaude streaming β€” bring your own key
openaiβœ“βœ“WiredGPT chat + text--3-small @ 384
β€”β€”Stub β€” v0.2Planned; not wired yet. API-key route only at v0.2; OAuth + ADC tracked for later
llama-cppβ€”β€”Stub β€” v0.2Planned bundled local , no separate daemon

Document loaders (current)

loader-pdfv0.1loader-markdownv0.1loader-plaintextv0.1loader-webv0.1loader-codev0.1loader-docx Β· Wordv0.11loader-xlsx Β· Excelv0.11

Word loader uses mammoth; Excel uses exceljs rendered per-sheet to CSV-style text. Both externalized the same way loader-pdf is, with a loader-resolution regression test. Image / audio / video ingestion are detected and politely deferredrather than ingested in v0.11 β€” OCR / vision / local speech-to-text are explicit roadmap items.

πŸ—ΊοΈRoadmap β€” Shipped Since v0.1

The core has been stable since v0.1; eleven minor releases since then have layered on features. The fullCHANGELOGtells the version-by-version story; the highlights:

Shipped (v0.2 β†’ v0.11)

Current
  • β€’ v0.2 β€” single-pane UI + settings launcher, per-provider model picker with inline pricing, token cost tracking, rich Markdown output, light/dark theme
  • β€’ v0.3 β€” credential detection (with TOS-honest non-importable OAuth/ADC), per-response Sources + Data sent transparency
  • β€’ v0.4 β€” shell-style chat input (↑/↓ history, Ctrl+C clears), detect newly-installed local models
  • β€’ v0.5 β€” verified-answer memory (Qβ†’A boost for small local models, with lazy invalidation)
  • β€’ v0.8 β€” automatic per-source re-scan (concurrency-safe lease), single-file sources, per-file metadata
  • β€’ v0.9 β€” Telegram remote channel (ask your from your phone), server-persisted model selection, default re-scan switched to Manual
  • β€’ v0.10 β€” PDF ingestion fix in next dev, cross-platform MNEMOS_PORT override, refreshed hero GIF + screenshots
  • β€’ v0.11 β€” pause/resume ingestion (durable), launcher-ring status indicator, inventory question answers, filename-mention retrieval fix, source-containment guard

Directions being explored

Ideas, not promises
  • β€’ More file types beyond text + PDF β€” images (OCR / vision), audio / video (local speech-to-text). Today these are detected and politely deferred.
  • β€’ Better answers β€” of retrieved ; + bundled llama.cpp providers fully wired
  • β€’ Easier installs β€” npm global install, native macOS/Linux installers
  • β€’ More places to ask from β€” WhatsApp (pending a local-first-friendly API path), email ingestion
  • β€’ Tier 3 β€” fine-tuned local models on your own corpus

These follow the project's north star: local-first, single-operator, β€” not an agent platform. Rather ship a few things well than promise many.

πŸ“œOpen Source β€” License + Contribution

MIT-licensed Β· Zen Algorithms LLC

The MIT-licensed code stays MIT-licensed. You can use, fork, and modify mnemos for any purpose, commercial or personal.

A CLA via cla-assistant.io applies on first PR (one click, covers all future PRs). It's standard open-source legal hygiene β€” keeps the licensing flexible if terms ever need to evolve for a specific use case, without having to track down and re-permission every prior contributor. The MIT-licensed code stays MIT-licensed regardless.

AI-assisted contributions welcome

See AGENTS.md for collaboration patterns. Whether you're writing a plugin, fixing a bug, or polishing the UI β€” Claude Code, Cursor, Copilot, all welcome. The PR review process is the same whether the code came from your hands directly or via a coding agent under your direction.

Security vulnerabilities: report privately via GitHub Security Advisory.

🧠

Ready to try it?

git clone https://github.com/cosmicflow-space/mnemos.git && cd mnemos && node setup.mjs β€” end-to-end on a typical laptop in under 90 seconds, with Ollama as the default chat provider.

🎯

Leadership Takeaway

Why I'm releasing this as open source: the personal- category needs a default that isn't β€œsign up for someone else's SaaS and upload your private docs” or β€œspend a Saturday wiring six services together.” A local-first single-user that installs with one command β€” and answers from your phone over a private channel β€” is the minimum viable shape of that default. Mnemos is my first try at it, released so others can run it, fork it, and improve it.

Why the framing is β€œ100% local by default” specifically: the audit log proves it by inspection. Open Tier 1 install, run a query, inspect /api/audit β€” every query event records provider: "ollama". No external-provider call is ever recorded. β€œPrivacy by default” isn't a marketing claim β€” it's a property you can verify from the database.

What I want from the community: opinionated feedback. Run it against your folders. Pair the Telegram bot and ask questions from your phone. Tell me which file types broke, which prompts produced weird answers, which security defaults are too tight or too loose. File issues. Send PRs. Eleven minor releases in under two weeks because the architecture is absorbing real use. That window is still open.