🎯Overview 🌌Cosmic Managed AI 📚Foundations 🏛️Model Committee 🧭RAG Anatomy 🗂️Enterprise RAG 🛠️Agent Frameworks 🏗️Platform Anatomy 🔭Observability & Evals 🧪Local LLM Field Notes 💻Polyglot 🏭AI Factory ⌨️CosmicKeys 📊WatchAlgo ⚖️AI Underwriting

Vertical Case Study #1

HR Knowledge Base — where access control is the hard part

Employees asking policy questions is the most universal enterprise RAG use case — and the one where naive implementations leak data fastest. This walkthrough specializes the generic pipeline for HR, with the access-control, jurisdiction, and reporting-chain gotchas that every real HR deployment has to solve.

🛤️RAG Learning Path—Read in order to build a production RAG system

✓

📚

Foundations

The concepts & mental models

✓

🧭

RAG Anatomy

Full production pipeline

🗂️You are here

Vertical Examples

Domain case studies

Not building a RAG system? The Model Committee deep-dive is a parallel track covering the eight specialized model families and routing patterns — read it after Foundations instead of RAG Anatomy if model composition is what you're after.

🏢Business context — who's asking, what are they asking

The users

🧑‍💼 Individual contributors — asking about their own benefits, PTO, policies
👔 People managers — asking about comp bands, hiring policies, leave approval rules for their reports
📋 HR business partners — cross-employee policy lookups, jurisdiction comparisons
🔍 Compliance team — audit queries, policy change history

The query shapes

💬 “What's our parental leave policy for California employees?”
💬 “Am I eligible for the 401(k) match yet?”
💬 “How many vacation days do I have left this year?”
💬 “What's the process for taking bereavement leave?”
💬 “When does my equity grant vest?”

⚠️Why RAG is the right pattern here (not fine-tuning)

HR policies change constantly — every benefits enrollment, every jurisdiction update, every compensation review cycle. Fine-tuning a model on policies would mean retraining every time a policy changes, which is a non-starter. RAG with a well-maintained document store lets policy updates flow into the system the moment they're published, with full audit trails of which version was cited. This is the single most important architectural decision for HR knowledge bases, and it's non-obvious to people coming from fine-tuning backgrounds.

🏗️The architecture

Below is the full production architecture, with the same 15 numbered steps as the Anatomy page, but specialized for an HR deployment. Read the bands top-to-bottom, follow the numbered arrows, and the whole request path is laid out visually.

💡Haven't read the Anatomy page yet?

Start with the Anatomy walkthrough for the generic 15-step explanation of each hop. This vertical page assumes you understand the generic pipeline and focuses on what's different for HR.

Enterprise RAG — Full Architecture (Naive RAG)

🔑What's different for HR — three wrinkles that make this hard

Wrinkle 1 — The hardest one

Permission scoping must happen BEFORE the vector search

In the generic anatomy, Step 4 fetches employee attributes and Step 6 passes them as a filter to the vector DB. For HR, this is the step where most naive implementations leak data. The failure mode: developers retrieve first, then try to filter permissions in the LLM prompt, which fails on two counts — the LLM can't reliably self-enforce permissions, AND retrieved chunks can leak through the response even when the LLM tries to hide them.

The correct pattern is filter-first retrieval: compute the full set of filters from the asker's attributes BEFORE calling the vector DB, and pass them as a metadata filter the DB enforces at the index level. Chunks the asker isn't allowed to see never enter the candidate pool, so they can't leak.

python

def build_retrieval_filter(employee: Employee) -> dict:
    """
    Compute the full metadata filter BEFORE retrieval.
    Every retrieved chunk is gated through this filter at the vector DB.
    """
    return {
        # Policy scope: only global + asker's specific jurisdiction
        "applies_to": {"$in": ["global", employee.work_state]},

        # Only policies in effect as of today
        "effective_date": {"$lte": today()},
        "expires_date": {"$gte": today()},

        # Compensation-band policies filtered by asker's role level
        "min_role_level": {"$lte": employee.role_level},

        # Manager-only content gated by management scope
        "requires_manager": {"$eq": employee.is_manager},

        # Confidential content requires explicit group membership
        "confidential_group": {
            "$in": employee.groups + ["public"]
        },
    }

# Called BEFORE embed + vector search
filter_ = build_retrieval_filter(employee)
results = vector_db.search(
    vector=embed(query),
    top_k=50,
    filter=filter_,  # <-- enforced at the INDEX level, not in the prompt
)

↕ Scroll

The filter is computed from the asker's HRIS attributes and passed as a metadata filter the vector DB enforces at index level.

Wrinkle 2

Jurisdictional versioning — the same policy has 50 variants

Parental leave in California is not the same as parental leave in Texas. 401(k) match eligibility varies by employment classification. A single “policy” in the eyes of the employee is actually dozens of jurisdiction-specific variants in the eyes of the document store.

Two wrong ways to solve this: (1) embed all variants as one blob — the query for “CA parental leave” retrieves an averaged representation of every state, and the LLM hallucinates details. (2) Ask the LLM to pick the right variant from a list — the LLM often picks the wrong one, especially if the user's question doesn't explicitly mention the state.

The right way: filter the candidate pool to the asker's jurisdiction at retrieval time, using the same attribute filter pattern above. The employee's work_state comes from HRIS, not from the user's question, so it's reliable.

Wrinkle 3

Citations aren't optional — they're the trust mechanism

Employees asking HR questions are often anxious (leave, comp, termination) and the answers have legal weight. A conversational summary without citations is almost worse than no answer at all — the employee might rely on it and be wrong. The system must render a clickable link back to the exact policy document and section for every claim in the response.

Implementation: the prompt template explicitly instructs the LLM to include chunk IDs in its response, post-processing parses them out and renders them as inline citations. If a response contains a claim without a citation, the guardrail strips the claim.

📥Ingestion pipeline

Documents flow in from several sources on different schedules. The pipeline is incremental — only changed documents are re-embedded — and every chunk is tagged with the metadata used in the retrieval filter above.

Sources

Confluence (policies), SharePoint (benefits PDFs), Workday (employee handbook), legal repository (compliance docs)

Ingestion schedule

Webhook-driven for Confluence/SharePoint; nightly batch for Workday and legal repo

Chunking strategy

Recursive character splitter at 800 tokens with 100-token overlap. Policy documents have strong section structure, so we also split on h1/h2 boundaries.

Embedding model

text-embedding-3-small (1,536 dims). General-purpose works fine for HR — no need for a domain-tuned model.

Metadata per chunk

doc_id, doc_title, section, effective_date, expires_date, applies_to (list of jurisdictions), min_role_level, confidential_group, requires_manager

Incremental indexing

Content-hash per chunk — unchanged chunks skip re-embedding. A policy update that touches 3 sections costs 3 embeddings, not the whole document.

Version retention

Past policy versions remain in the index with expires_date set to the supersession date. Enables historical queries (‘what was my leave policy in Q2 2024?’) and audit trails.

🗄️Storage & infrastructure

Vector DB

pgvector inside the main app Postgres. Keeps operational complexity low; hybrid search (BM25 + vector) runs in a single query.

BM25 index

Postgres tsvector with GIN index — same database as pgvector, so one query handles both.

Original documents

S3 with versioning. The vector index stores chunk text + metadata; full original documents stay in S3 for display and audit.

Reranker

Cohere Rerank-v3 via managed API. Alternative: self-hosted BGE-reranker-large if cost or data residency requires on-prem.

LLM

Claude Sonnet 4.5. Chosen for instruction-following on citation requirements — stays on the retrieved context instead of filling gaps from training data.

Cache layer

Redis for two things: (1) HRIS attributes by user_id, TTL 1 hour; (2) semantic cache for identical queries from users in the same jurisdiction, TTL 15 minutes.

Audit store

Append-only Postgres table: user_id, query, retrieved_chunk_ids, response, timestamp, latency_ms. Retained 7 years per HR record retention policy.

🛡️Security & compliance

Access control

🔐 Filter-first retrieval (see Wrinkle 1)
🔐 Attribute cache TTL 1 hour — termination events invalidate the cache
🔐 Manager scope validation — manager queries about direct reports verified against the org chart at retrieval time, not prompt time
🔐 Row-level security on S3 — even if metadata filtering fails, the original document fetch is re-checked

PII & compliance

🛡️ No PII in chunks — employee names, IDs, SSNs stripped at ingestion (policies are generic)
🛡️ PII in queries is masked in logs — audit retains query shape, not raw PII
🛡️ GDPR / CCPA delete rights — user_id references in audit log can be purged on request
🛡️ SOC 2 controls — all access logged, encryption at rest and in transit
🛡️ Data residency — EU employee queries routed to EU-hosted infrastructure

📊Evaluation strategy

“How do you know your RAG system is working?” is the question that separates demo RAG from production RAG. HR has the luxury of a cleaner golden set than most domains because HR policies are authoritative — there's exactly one right answer per query.

🎯

Retrieval precision

Does the top-5 contain the policy the HR team says is canonical for this query? Measured against a curated golden set of 500 representative queries, refreshed quarterly. Target: 95%+ top-5 recall.

📝

Answer faithfulness

Does the response only make claims supported by the retrieved chunks? LLM-as-judge with a separate model (GPT-4o) scoring faithfulness. Target: 98%+, with sampled human review of the 2% failures weekly.

🛡️

Permission leak eval

Adversarial test set of queries designed to extract confidential content (e.g., “what does my manager's manager make?”). Every failure is a P0. Target: zero leaks, ever.

⚠️Gotchas I've learned the hard way

⚠️“Just filter in the prompt” doesn't work

A common shortcut: include the asker's permissions in the system prompt and ask the LLM to refuse queries it shouldn't answer. This fails because (1) LLMs can't reliably self-enforce complex permission rules, (2) retrieved context leaks through the response even when the LLM tries to hide it, and (3) prompt injection attacks can override the refusal logic. Filter at the vector DB level, not in the prompt.

⚠️Cache invalidation on terminations

HRIS attribute caching improves latency, but a terminated employee's cache entry must be purged instantly — otherwise they can query the HR knowledge base with their old permissions for up to the TTL window. Wire the termination event stream into the cache invalidation path; don't rely on TTL alone.

⚠️Jurisdictional drift in policy authors' documents

Policy authors often write one document that handles multiple jurisdictions with inline callouts (“California employees see section 4.2”). The chunker splits these in ways that break the jurisdictional assignment. Solution: a pre-ingestion step that splits jurisdiction-tagged sections into separate chunks, each tagged with its specific applies_to list, instead of treating the whole document as “all jurisdictions.”

⚠️The ‘recent update’ problem

When a policy changes, the old version is superseded but its embedding still lives in the index. Without expires_date filtering, the retrieval will still happily return outdated chunks. Always filter on effective_date and expires_date, and make it a required field in the chunk schema — don't trust authors to remember.

Related Architecture

🧭

Enterprise RAG Anatomy →

The generic pipeline walkthrough — 15 steps, two diagrams, and the FAQ.

🗂️

Enterprise RAG Hub →

All vertical case studies, including the ones still under construction.