A personal RAG (retrieval-augmented generation) system that defaults to 100% local β embeddings on your machine, chat on your machine, zero external inference calls in the default install. Drop a folder, ask questions in plain English, get answers with citations to your own files. Then, optionally, pair a private Telegram bot to ask the same questions from anywhere. MIT-licensed. github.com/cosmicflow-space/mnemos.

Most RAG tools today are either (a) cloud SaaS that requires uploading your private documents to someone else's servers, or (b) heavyweight self-hosted platforms that demand a separate vector database, container orchestrator, and an afternoon of setup before you ask the first question. Both fail the βI want to ask my own files questions and keep them on my own machineβ test.
Mnemos is the opposite of both: a local-first single-user RAG that installs with one command, stores everything in a single SQLite file at ~/.mnemos/mnemos.db, and runs the embedding pipeline locally (BGE-small via ONNX) β so you don't pay an API to index your own files. The chat model is also local by default (Ollama). External LLM plugins (Claude / GPT / Gemini) are opt-in, not required.
Mnemos is my first open-source project β a different category from my solo consumer products (CosmicKeys and WatchAlgo). Those I built and host for end users; mnemos I built and release for other developers to run themselves.
The CTA is GitHub (clone, run, star, contribute) β not βsign up to try.β There's nothing to sign up for; the install is one command on your laptop.
Mnemos is built around a deliberate single-pane UI β no drag-and-drop canvas, no setting-storm before you can ask a question. The home page IS the chat page; everything else lives behind the glowing settings launcher at the bottom-left. Click any screenshot to open at full size.
Personal RAG means personal RAG. Privacy is the key β and the default. Mnemos gives you three tiers, in order of increasing data egress. You choose. Most users stay in TierΒ 1.
llama.cpp (v0.2 stub) β on your machinequery events record provider: "ollama" β no external-provider call is ever recordednode setup.mjs defaults to this tier.
The chat UI lets you switch the model per question.
Train a small open model on your own corpus. Stays on your machine. Becomes specifically yours over time. Roadmap consideration β community input welcome.
~/.claude/.credentials.json,~/.codex/auth.json) and surfaces them as detected β but marks them non-importable with a TOS-explanation note. Anthropic explicitly prohibits third-party OAuth reuse; OpenAI does not document it. API key is the only sanctioned method for every external provider. Honesty about credential safety is itself a positioning point.Your personal RAG runs on your computer β but you don't have to be at your computer to use it. Pair a private Telegram bot and query your indexed documents from anywhere.
Mnemos reaches out to Telegram via long polling β so it works behind home NAT with no port-forwarding, no tunnel, no public IP. Same outbound-only posture as everything else in mnemos.
The bot answers only you. Pairing happens via a single-use, time-boxed, CSPRNG-generated code entered in the UI β any other chat is ignored. DMs only, never groups. Query-only β no source-management commands accepted via Telegram.
Only the question and answer pass through Telegram (and whichever model you've configured). Files never leave your machine. Bot token stored chmod-600 at ~/.mnemos/.env, never logged.
Local Ollama by default, or Claude / GPT / Gemini if you've set one β the bot mirrors your UI choice. Server-side persisted (MNEMOS_DEFAULT_MODEL) so it survives restarts.
Telegram was reviewed by an AI partner panel (Claude + Gemini + Codex) before v0.9 shipped. The review confirmed: no authorization bypass, no token leakage, atomic pairing logic, no path for unauthorized chats to receive answers. Set up in Settings β π² Telegram; there's a built-in step-by-step guide for users new to Telegram bots.
The catch, by design: your computer must be awake with mnemos running for the bot to reply. No cloud backend β by choice.
WhatsApp is on the radar but not yet supported β there's no free, local-first-friendly bot API for it the way Telegram offers.
Small local models are great for sovereignty but sometimes fumble specific facts. Mnemos lets you tag a correct answer with β Save verified β the QβA pair is stored, and future closely-matching questions inject that confirmed answer into the prompt. Even a small local model now answers correctly.
A verified answer fires only when the new question is semantically close to the original (vector match against verified-Q embeddings). Loose paraphrases caught; unrelated questions left alone.
A verified answer stops firing once its source chunks change β so when your underlying files update, stale verified answers drop out automatically. No manual cache invalidation.
Boosted answers carry a visible badge so you always know which answers came from verified memory vs raw retrieval. Transparency over silent magic.
Review, edit, or delete your verified answers from Settings β Verified answers. Backed by the verified_answer + vec_verified tables; queryable via /api/verified.
Design rationale + considered-and-deferred alternatives documented in docs/design-notes/verified-answer-memory.md.
Static archives shouldn't burn background CPU. Hot folders should re-index when they change. Heavy ingests should be pausable. Mnemos handles all three with per-source scheduling, durable pause state, and a live launcher-ring indicator.
Each source has its own re-scan cadence (set when added, editable any time). Default is Manual β most users add static documents and don't want background CPU spent re-scanning them. Point a changing folder at Daily or 5-minute cadence from a dropdown. Re-scans are incremental (content-hash skip), so only changed/new files re-embed. Implemented as a periodic poll via Next.js instrumentation.ts; tunable with MNEMOS_WATCH_TICK_MS, disable with MNEMOS_DISABLE_WATCHER=1.
Pause a running ingest per source β or all at once (βPause all,β useful before bed or on heavy CPU) β from the Sources panel. Resume continues where it left off (already-finished files are hash-skipped). Pause is durable β persisted to the source.pausedflag β and survives a restart. The background watcher skips paused sources (no auto-resume on next tick). Cooperative: stops at file boundaries, marks any interrupted file partial so resume reprocesses it.
Manual β» Re-scan, background watcher, and even multiple server processes coordinate through an atomic DB ingest lease (source.ingesting_since), so the same source is never ingested twice at once (which would race chunk writes). Failed auto-scan stays due and retries next tick. Crashed lease self-heals after 30 minutes.
The settings launcher (bottom-left avatar) reflects live ingestion: breathing cyan ring while ingesting (with a done/total tooltip), amber when paused, red on error. Backed by an in-memory status registry + GET /api/ingest/status; pinned to globalThis so the watcher and routes share one instance.
Paste a folder path or an absolute path to a single file. Mnemos auto-detects which. A single explicitly-chosen file bypasses the soft noise filters (since the choice is deliberate) β but the security hard-lock still always applies (a single .env or id_rsa is still refused, including via symlink).
Every ingested file carries one extra retrievable chunk describing path, size (human + raw bytes), last-modified date, and type. Metadata questions like βhow big is resume.pdf?β or βwhen was notes.md modified?βretrieve reliably even when no content chunk ranks high. Re-scanning backfills metadata chunks for files indexed before this feature without re-embedding content.
βHow many documents do I have? What types? Which sources?β are now answered from a COUNT-based Library Overview (total files, chunks, by type, by source) injected into the prompt for inventory-intent questions. Gated to inventory questions so normal queries pay no aggregate cost.
Questions that name a file (βVIN in ipostal.pdf?β) now co-retrieve that file's content chunks spliced adjacent β bounded by a fixed budget β so the file's synthetic metadata chunk doesn't out-rank and bury its own content at scale.
Adding a folder that's already inside a registered source no longer creates a duplicate β mnemos detects the overlap (realpath + path-boundary check) and refreshes the parent so the subfolder is indexed under it. Adding a folder that contains existing sources warns and offers βAdd anyway.β Stops the silent double-ingestion that overlapping sources would otherwise cause.
.env, *.pem, id_rsa*, anything under ~/.aws/ β never indexed even with explicit user opt-in. Soft-default exclusions (logs, lockfiles, hidden dotfiles, files >10 MB) are opt-in to include. The asymmetry β soft defaults for noise, hard locks for credentials β matches the actual risk profile.
Mnemos is a pnpm monorepo with a strict separation between the RAG pipeline (in packages/core/) and the providers + loaders (in plugins/). The Plugin SDK in between is the contract that makes mnemos genuinely extensible.
mnemos/
βββ apps/web/ # Next.js 15 UI + API routes + instrumentation.ts
β # (background source watcher + Telegram poller)
βββ packages/
β βββ core/ # RAG pipeline: ingest, query, prompt, registry, crypto
β βββ db/ # SQLite + sqlite-vec wrapper (384-dim vec)
β βββ plugin-sdk/ # Plugin SDK barrel (apiVersion 0.1)
β βββ cli/ # `mnemos` CLI
βββ plugins/
βββ embed-local/ # BGE-small via ONNX β bundled default
βββ anthropic/ # Claude (streaming)
βββ openai/ # Chat + Embeddings (text-embedding-3-small @ 384)
βββ gemini/ # [STUB β v0.2]
βββ ollama/ # Chat + Embeddings β host-local, default chat
βββ llama-cpp/ # [STUB β v0.2]
βββ loader-pdf/ # via pdf-parse
βββ loader-markdown/ # YAML frontmatter parsing
βββ loader-plaintext/ # .txt, .log, .csv, .json
βββ loader-web/ # URL ingestion
βββ loader-code/ # 25+ source-code extensionsPlugins can only import from mnemos/plugin-sdk. They cannot reach into packages/core/** or other plugins' internals. The SDK is versioned (apiVersion 0.1, additive-only changes within 0.1.x; breaking changes will bump apiVersion) and backward-compatible.
This is what makes mnemos genuinely extensible β third-party plugins won't break when the core changes, because they only touch the SDK barrel. The barrel is small enough to audit and stable enough to depend on.
pnpm-workspace.yaml)instrumentation.ts for background services (source watcher + Telegram poller)~/.mnemos/mnemos.dbpackages/db/@xenova/transformers) β bundled local embedder, 384-dim/api/query defaults to providerId: "ollama"node setup.mjs β cross-OS installer (macOS / Linux / Windows). Reads INSTALL.md (executable docs).docker compose up -d for Docker preferencepnpm install && pnpm dev manual path for contributorsMNEMOS_PORT, MNEMOS_STATE_DIR, MNEMOS_BIND all cross-platform (Windows cmd.exe included since v0.10)The plugin SDK supports two kinds of providers: chat (LLM completions for answering questions) and embeddings (used at ingest time to vectorize chunks). Mix and match β for example, use local BGE-small for free ingest, then Claude for high-quality chat.
| Plugin | Chat | Embeddings | Status (v0.11) | Notes |
|---|---|---|---|---|
| embed-local | β | β default | Wired | BGE-small via ONNX (bundled, 384-dim) β zero API cost for ingest |
| ollama | β default | β | Wired | Host-local chat (default), no auth, fully offline |
| anthropic | β | β | Wired | Claude streaming β bring your own key |
| openai | β | β | Wired | GPT chat + text-embedding-3-small @ 384 |
| gemini | β | β | Stub β v0.2 | Planned; not wired yet. API-key route only at v0.2; OAuth + Vertex ADC tracked for later |
| llama-cpp | β | β | Stub β v0.2 | Planned bundled local LLM, no separate daemon |
Word loader uses mammoth; Excel uses exceljs rendered per-sheet to CSV-style text. Both externalized the same way loader-pdf is, with a loader-resolution regression test. Image / audio / video ingestion are detected and politely deferredrather than ingested in v0.11 β OCR / vision / local speech-to-text are explicit roadmap items.
The core has been stable since v0.1; eleven minor releases since then have layered on features. The fullCHANGELOGtells the version-by-version story; the highlights:
next dev, cross-platform MNEMOS_PORT override, refreshed hero GIF + screenshotsllama.cpp providers fully wiredThese follow the project's north star: local-first, single-operator, RAG β not an agent platform. Rather ship a few things well than promise many.
The MIT-licensed code stays MIT-licensed. You can use, fork, and modify mnemos for any purpose, commercial or personal.
A CLA via cla-assistant.io applies on first PR (one click, covers all future PRs). It's standard open-source legal hygiene β keeps the licensing flexible if terms ever need to evolve for a specific use case, without having to track down and re-permission every prior contributor. The MIT-licensed code stays MIT-licensed regardless.
See AGENTS.md for collaboration patterns. Whether you're writing a plugin, fixing a bug, or polishing the UI β Claude Code, Cursor, Copilot, all welcome. The PR review process is the same whether the code came from your hands directly or via a coding agent under your direction.
Security vulnerabilities: report privately via GitHub Security Advisory.
git clone https://github.com/cosmicflow-space/mnemos.git && cd mnemos && node setup.mjs β end-to-end on a typical laptop in under 90 seconds, with Ollama as the default chat provider.
Why I'm releasing this as open source: the personal-RAG category needs a default that isn't βsign up for someone else's SaaS and upload your private docsβ or βspend a Saturday wiring six services together.β A local-first single-user RAG that installs with one command β and answers from your phone over a private channel β is the minimum viable shape of that default. Mnemos is my first try at it, released so others can run it, fork it, and improve it.
Why the framing is β100% local by defaultβ specifically: the audit log proves it by inspection. Open Tier 1 install, run a query, inspect /api/audit β every query event records provider: "ollama". No external-provider call is ever recorded. βPrivacy by defaultβ isn't a marketing claim β it's a property you can verify from the database.
What I want from the community: opinionated feedback. Run it against your folders. Pair the Telegram bot and ask questions from your phone. Tell me which file types broke, which prompts produced weird answers, which security defaults are too tight or too loose. File issues. Send PRs. Eleven minor releases in under two weeks because the architecture is absorbing real use. That window is still open.