Employees asking policy questions is the most universal enterprise RAG use case β and the one where naive implementations leak data fastest. This walkthrough specializes the generic pipeline for HR, with the access-control, jurisdiction, and reporting-chain gotchas that every real HR deployment has to solve.
Below is the full production architecture, with the same 15 numbered steps as the Anatomy page, but specialized for an HR deployment. Read the bands top-to-bottom, follow the numbered arrows, and the whole request path is laid out visually.
Start with the Anatomy walkthrough for the generic 15-step explanation of each hop. This vertical page assumes you understand the generic pipeline and focuses on what's different for HR.
In the generic anatomy, Step 4 fetches employee attributes and Step 6 passes them as a filter to the vector DB. For HR, this is the step where most naive implementations leak data. The failure mode: developers retrieve first, then try to filter permissions in the LLM prompt, which fails on two counts β the LLM can't reliably self-enforce permissions, AND retrieved chunks can leak through the response even when the LLM tries to hide them.
The correct pattern is filter-first retrieval: compute the full set of filters from the asker's attributes BEFORE calling the vector DB, and pass them as a metadata filter the DB enforces at the index level. Chunks the asker isn't allowed to see never enter the candidate pool, so they can't leak.
def build_retrieval_filter(employee: Employee) -> dict:
"""
Compute the full metadata filter BEFORE retrieval.
Every retrieved chunk is gated through this filter at the vector DB.
"""
return {
# Policy scope: only global + asker's specific jurisdiction
"applies_to": {"$in": ["global", employee.work_state]},
# Only policies in effect as of today
"effective_date": {"$lte": today()},
"expires_date": {"$gte": today()},
# Compensation-band policies filtered by asker's role level
"min_role_level": {"$lte": employee.role_level},
# Manager-only content gated by management scope
"requires_manager": {"$eq": employee.is_manager},
# Confidential content requires explicit group membership
"confidential_group": {
"$in": employee.groups + ["public"]
},
}
# Called BEFORE embed + vector search
filter_ = build_retrieval_filter(employee)
results = vector_db.search(
vector=embed(query),
top_k=50,
filter=filter_, # <-- enforced at the INDEX level, not in the prompt
)The filter is computed from the asker's HRIS attributes and passed as a metadata filter the vector DB enforces at index level.
Parental leave in California is not the same as parental leave in Texas. 401(k) match eligibility varies by employment classification. A single βpolicyβ in the eyes of the employee is actually dozens of jurisdiction-specific variants in the eyes of the document store.
Two wrong ways to solve this: (1) embed all variants as one blob β the query for βCA parental leaveβ retrieves an averaged representation of every state, and the LLM hallucinates details. (2) Ask the LLM to pick the right variant from a list β the LLM often picks the wrong one, especially if the user's question doesn't explicitly mention the state.
The right way: filter the candidate pool to the asker's jurisdiction at retrieval time, using the same attribute filter pattern above. The employee's work_state comes from HRIS, not from the user's question, so it's reliable.
Employees asking HR questions are often anxious (leave, comp, termination) and the answers have legal weight. A conversational summary without citations is almost worse than no answer at all β the employee might rely on it and be wrong. The system must render a clickable link back to the exact policy document and section for every claim in the response.
Implementation: the prompt template explicitly instructs the LLM to include chunk IDs in its response, post-processing parses them out and renders them as inline citations. If a response contains a claim without a citation, the guardrail strips the claim.
Documents flow in from several sources on different schedules. The pipeline is incremental β only changed documents are re-embedded β and every chunk is tagged with the metadata used in the retrieval filter above.
βHow do you know your RAG system is working?β is the question that separates demo RAG from production RAG. HR has the luxury of a cleaner golden set than most domains because HR policies are authoritative β there's exactly one right answer per query.
Does the top-5 contain the policy the HR team says is canonical for this query? Measured against a curated golden set of 500 representative queries, refreshed quarterly. Target: 95%+ top-5 recall.
Does the response only make claims supported by the retrieved chunks? LLM-as-judge with a separate model (GPT-4o) scoring faithfulness. Target: 98%+, with sampled human review of the 2% failures weekly.
Adversarial test set of queries designed to extract confidential content (e.g., βwhat does my manager's manager make?β). Every failure is a P0. Target: zero leaks, ever.