In 2025 a large federal customer had to migrate AI vendors mid-program. The teams that survived it had built one thing: a model-agnostic architecture, where the provider is a swappable part. This page is the full blueprint β the provider gateway, the model-selection guardrails, and the data-security layers (multi-tenant isolation, encryption in transit with mTLS, and encryption at rest) β decomposed far enough that a high-schooler and a 25-year veteran both finish it thinking βnothing was left hand-wavy.β
Build the system so that which AI vendor you use and whose data is flowing through it are both decisions the architecture enforces β not things a developer has to remember to get right on every call.
That sentence is the entire page. Everything below is the decomposition β because an abstraction you cannot break down into concrete, configured, diagrammed parts is just a slogan. We will take it apart into six layers, and for each one show the actual code, the actual config, and the exact place a private key or a tenant ID physically lives.
Here is the full request lifecycle β one user request, from the edge of the network to the model and back. Watch where identity gets attached, where the encryption changes shape, and where the data finally comes to rest. Every box below has its own section.

If the image has not rendered yet, here is the same lifecycle as plain text β this is the spine of the whole page:
User / Agent
β TLS 1.3 (encryption in transit β Layer 5)
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EDGE / LOAD BALANCER terminates public TLS β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β mTLS (client certificate from here inward)
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CONTROLLER β identity injected ONCE (Layer 4) β
β tenant_id Β· workspace_id Β· role β
β read from the verified token β request-scoped ctx β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INPUT GUARDRAILS (Layer 3) β
β prompt-injection Β· jailbreak Β· PII scan β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MODEL GATEWAY β one neutral interface (Layer 1) β
β decision matrix picks a provider (Layer 2) β
β βββββββββββ ββββββββββ βββββββββββ ββββββββββββββββ β
β βAnthropicβ β OpenAI β β Bedrock β β Vertex / ... β β
β βββββββββββ ββββββββββ βββββββββββ ββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β mTLS egress β client cert, private key
βΌ stays in the pod / HSM, never on the wire
[ LLM PROVIDER ]
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OUTPUT GUARDRAILS (Layer 3) β
β PII leak Β· policy violation Β· hallucination check β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PERSIST + AUDIT (Layer 6) β
β AES-256-GCM at rest Β· per-tenant data key β
β KMS holds the root key Β· append-only audit log β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββBefore the layers, the motivation β because if you do not feel the problem, the solution looks like over-engineering.
When you call an LLM provider, you are taking a hard dependency on a company you do not control. That company can β and routinely does β have an outage, change its prices, deprecate the exact model your prompts were tuned for, change its data-handling terms, or lose a compliance certification your customer requires. None of those are hypothetical. All of them happened to someone in the last year.
If your application code is full of openai.chat.completions.create(...) calls, then βswitch providersβ is a multi-week refactor touching every file that talks to a model. You will be asked to do that refactor precisely when there is an outage, a price shock, or a compliance deadline β i.e. under maximum pressure. The vendor-agnostic architecture moves that cost forward and shrinks it: switching becomes a config change plus, at most, one new adapter.
This is not anti-vendor. You will still pick the best model for each job β that is Layer 2. The point is that the choice stays yours, reversible, and cheap to change. The same instinct that says βdo not hard-code your database vendor into every queryβ says βdo not hard-code your model vendor into every prompt.β
The model gateway is one internal module with one job: expose a provider-neutral way to call an LLM. Application code calls the gateway. The gateway β and only the gateway β knows that Anthropic, OpenAI, Bedrock, and Vertex exist.
Step one is a neutral request and response shape. It is deliberately small β the common denominator of what every provider can do:
// The shape the WHOLE application speaks. No vendor words in here.
export interface ModelRequest {
messages: { role: 'system' | 'user' | 'assistant'; content: string }[]
maxTokens: number
temperature?: number
// a capability tag, NOT a vendor model name β Layer 2 turns this into a real model
task: 'cheap-extract' | 'general' | 'hard-reasoning' | 'long-context'
}
export interface ModelResponse {
text: string
provider: string // which vendor actually served it (for logs/audit)
model: string // the concrete model id that was used
usage: { inputTokens: number; outputTokens: number; costUsd: number }
}
// Every vendor adapter implements exactly this. That's the entire seam.
export interface ModelProvider {
readonly name: string
complete(req: ModelRequest): Promise<ModelResponse>
}Step two is one thin adapter per vendor. Each adapter translates the neutral shape into that vendor's wire format and back. This is the only file in the codebase that imports a vendor SDK:
import Anthropic from '@anthropic-ai/sdk'
import { ModelProvider, ModelRequest, ModelResponse } from '../types'
import { resolveModel } from '../decision-matrix'
export class AnthropicAdapter implements ModelProvider {
readonly name = 'anthropic'
private client = new Anthropic() // reads its key from the KMS-backed secret mount
async complete(req: ModelRequest): Promise<ModelResponse> {
const model = resolveModel('anthropic', req.task) // Layer 2 decides the concrete model
const system = req.messages.find(m => m.role === 'system')?.content
const turns = req.messages.filter(m => m.role !== 'system')
const res = await this.client.messages.create({
model,
max_tokens: req.maxTokens,
temperature: req.temperature ?? 0.7,
system,
messages: turns as Anthropic.MessageParam[],
})
const text = res.content.filter(b => b.type === 'text').map(b => b.text).join('')
return {
text,
provider: this.name,
model,
usage: {
inputTokens: res.usage.input_tokens,
outputTokens: res.usage.output_tokens,
costUsd: priceFor(model, res.usage),
},
}
}
}
// gateway/adapters/openai.ts, bedrock.ts, vertex.ts are the same shape β
// import the vendor SDK, map the neutral request in, map the response out.Step three is the gateway itself. It owns the cross-cutting concerns so every call gets them for free β provider selection, fallback, retries, timeouts, cost accounting, and tenant tagging:
import { ModelRequest, ModelResponse, ModelProvider } from './types'
import { AnthropicAdapter } from './adapters/anthropic'
import { OpenAIAdapter } from './adapters/openai'
import { BedrockAdapter } from './adapters/bedrock'
import { VertexAdapter } from './adapters/vertex'
import { chooseProvider } from './decision-matrix' // Layer 2
import { tenantContext } from '../context/tenant' // Layer 4
import { auditLog } from '../audit' // Layer 6
const providers: Record<string, ModelProvider> = {
anthropic: new AnthropicAdapter(),
openai: new OpenAIAdapter(),
bedrock: new BedrockAdapter(),
vertex: new VertexAdapter(),
}
export async function complete(req: ModelRequest): Promise<ModelResponse> {
const { tenantId, workspaceId } = tenantContext.get() // ambient β see Layer 4
// Layer 2 returns an ordered list: [primary, ...fallbacks]
const order = chooseProvider(req, { tenantId })
let lastErr: unknown
for (const name of order) {
try {
const res = await providers[name].complete(req)
await auditLog({ tenantId, workspaceId, provider: name, model: res.model, usage: res.usage })
return res
} catch (err) {
lastErr = err // outage / rate limit / deprecation β try the next provider
}
}
throw new Error('All providers failed: ' + String(lastErr))
}When a new provider is required β say a customer mandates AWS Bedrock for FedRAMP reasons β you write one new file: gateway/adapters/bedrock.ts, implementing the same ModelProvider interface. You register it in the providers map. You add it to the decision matrix. Zero application files change, because no application file ever imported a vendor SDK in the first place. That is the whole payoff of Layer 1.
The gateway can reach four providers. Which one should serve a given request? That is not a vibe β it is a checklist. This is what the LinkedIn mock interview was really asking: what criteria do you use to choose a model, and what are the guardrails on that choice?
Here is the matrix. Every row is a question the router asks before a single token is spent:
| Criterion | The question it answers | Example consequence |
|---|---|---|
| Capability fit | Is this task simple extraction, or hard multi-step reasoning? | cheap-extract β Haiku / Flash; hard-reasoning β Opus / o-series |
| Cost per token | What does this task cost at each provider, at expected volume? | A 10M-call/day classifier on a frontier model is a budget fire |
| Latency budget | Is a human waiting on this, or is it a background job? | Interactive chat β fast model; nightly batch β cheapest model |
| Context window | Does the prompt plus retrieved context fit the model's window? | A 400K-token document review forces a long-context model |
| Data residency | Is this tenant's data allowed to leave a region or a boundary? | An EU or GovCloud tenant pins to an in-boundary provider only |
| Compliance posture | Does this provider hold the certification this tenant requires? | FedRAMP / SOC 2 / HIPAA β a missing cert removes a provider |
| Provider health | Is the primary provider currently degraded or rate-limiting? | Circuit-breaker open β skip straight to the fallback |
| Fallback chain | If the chosen provider fails mid-request, who is next? | Always return an ordered list, never a single provider |
In code, the matrix is a pure function: request in, ordered provider list out. Pure means it is trivially testable β and a model-selection decision you cannot unit-test is a decision you cannot defend in an audit.
import { ModelRequest } from './types'
import { getTenantPolicy } from '../tenancy/policy' // per-tenant residency + compliance rules
import { circuitState } from './health'
// Returns an ORDERED list: [primary, ...fallbacks]. Never a single value.
export function chooseProvider(
req: ModelRequest,
ctx: { tenantId: string },
): string[] {
const policy = getTenantPolicy(ctx.tenantId) // e.g. { residency: 'us-gov', certs: ['fedramp'] }
// 1. HARD FILTER β drop any provider this tenant is not allowed to use.
// Residency and compliance are guardrails, not preferences: they cannot be traded away.
let eligible = ALL_PROVIDERS.filter(p =>
p.regions.includes(policy.residency) &&
policy.certs.every(c => p.certifications.includes(c)),
)
// 2. HEALTH FILTER β drop providers whose circuit breaker is currently open.
eligible = eligible.filter(p => circuitState(p.name) !== 'open')
// 3. RANK the survivors by capability fit, then latency, then cost.
const ranked = eligible.sort((a, b) =>
capabilityScore(b, req.task) - capabilityScore(a, req.task) ||
a.p50LatencyMs - b.p50LatencyMs ||
costFor(a, req) - costFor(b, req),
)
if (ranked.length === 0) {
throw new Error(`No compliant provider for tenant ${ctx.tenantId}`) // fail closed
}
return ranked.map(p => p.name) // primary first, fallbacks after
}Notice the two-stage shape: a hard filter first, then a ranking. The hard filter encodes the things that are never negotiable β a GovCloud tenant's data does not go to a non-GovCloud provider to save money, ever. The ranking encodes the things you optimize once the non-negotiables are satisfied. Mixing those two into one weighted score is the classic mistake: it lets a cost saving silently outvote a compliance requirement. Keep them separate, and the system fails closed β if nothing is compliant, it errors rather than guessing.
Guardrails are the code that sits between the user and the model, and again between the model and anything the model's output touches. The model is powerful and gullible; the guardrails are the seatbelt.
In the lifecycle, guardrails wrap the gateway call β nothing reaches a provider unchecked, and nothing leaves the model unchecked:
import { complete } from '../gateway'
import { ModelRequest, ModelResponse } from '../gateway/types'
import { scanInput, scanOutput, GuardrailError } from './checks'
import { tenantContext } from '../context/tenant'
export async function guardedComplete(req: ModelRequest): Promise<ModelResponse> {
const { tenantId } = tenantContext.get()
// ββ INPUT GATE ββββββββββββββββββββββββββββββββββββββββββββββββ
const inVerdict = await scanInput(req, tenantId)
if (inVerdict.blocked) {
throw new GuardrailError('input', inVerdict.reason) // never reaches a provider
}
// ββ THE MODEL CALL (Layers 1 + 2) βββββββββββββββββββββββββββββ
const res = await complete(inVerdict.sanitizedRequest)
// ββ OUTPUT GATE βββββββββββββββββββββββββββββββββββββββββββββββ
const outVerdict = await scanOutput(res, tenantId)
if (outVerdict.blocked) {
throw new GuardrailError('output', outVerdict.reason) // never reaches the user
}
return outVerdict.sanitizedResponse
}Because guardrails wrap the gateway and not a specific vendor, they keep working unchanged when you switch providers. The injection scanner does not care whether the model underneath is Claude or GPT or a Bedrock-hosted model β it inspects the neutral request and the neutral response. One more thing the abstraction buys you.
This is the layer the mock interview pushed hardest on, and rightly so. In a multi-tenant system, the catastrophic failure is not downtime β it is Tenant A seeing Tenant B's data. And here is the uncomfortable truth about how that happens:
Nobody writes SELECT * FROM documents on purpose. What happens is: a query is written correctly with WHERE tenant_id = ?, and then six months later someone adds a new query in a hurry and forgets the filter β or forgets to thread the tenantId argument through the fourth function in the call chain. The leak is a dropped parameter. So the architecture's job is to make the parameter impossible to drop.
Do not pass tenantId, workspaceId, and role as ordinary function arguments that every layer has to remember to forward. Instead, the controller establishes them once, at the very edge of the request, from the verified token β and injects them into a request-scoped context that every lower layer reads automatically.
Which customer organization. The hard partition β data never crosses it.
Which project / team inside that customer. A softer partition for scoping within a tenant.
What this user may do (RBAC) β admin, member, read-only β also injected, also enforced below.
Step one β the controller reads identity from the verified token and injects it. This is the only place identity is set:
import { Router } from 'express'
import { verifyToken } from '../auth'
import { tenantContext } from '../context/tenant'
import { guardedComplete } from '../guardrails/wrap'
export const chatRouter = Router()
chatRouter.post('/chat', async (req, res, next) => {
// 1. Verify the token. This is the trust boundary β nothing before it is trusted.
const claims = await verifyToken(req.headers.authorization)
// claims = { tenantId, workspaceId, role, userId } β cryptographically verified
// 2. INJECT identity into the request-scoped context, then run the handler INSIDE it.
// Everything downstream of tenantContext.run() can read this β and nothing can
// run a query without it, because the data layer refuses to (see step 3).
await tenantContext.run(claims, async () => {
const answer = await guardedComplete({
messages: req.body.messages,
maxTokens: 1024,
task: 'general',
})
res.json({ answer: answer.text })
})
})Step two β the context itself. In Node this is AsyncLocalStorage (Python: contextvars; Java: a request-scoped bean). It is βambientβ β available to any code running inside the request, without being handed down explicitly:
import { AsyncLocalStorage } from 'node:async_hooks'
export interface TenantClaims {
tenantId: string
workspaceId: string
role: 'admin' | 'member' | 'readonly'
userId: string
}
const als = new AsyncLocalStorage<TenantClaims>()
export const tenantContext = {
run: <T>(claims: TenantClaims, fn: () => Promise<T>) => als.run(claims, fn),
// Throws if called outside a request scope. That throw is a FEATURE:
// it means no code path can quietly run "tenant-less".
get(): TenantClaims {
const claims = als.getStore()
if (!claims) throw new Error('No tenant context β query attempted outside a request scope')
return claims
},
}Step three β the data layer reads the tenant from the context itself. There is no public way to query without a tenant filter, because the function that would let you do that does not exist:
import { pool } from './pool'
import { tenantContext } from '../context/tenant'
// Every query goes through here. There is no exported "raw query" function.
export async function scopedQuery<T>(sql: string, params: unknown[] = []): Promise<T[]> {
const { tenantId } = tenantContext.get() // ambient β cannot be forgotten, cannot be faked
const client = await pool.connect()
try {
// Wall 1 (application): bind the tenant into a session variable for THIS connection.
await client.query('SET LOCAL app.tenant_id = $1', [tenantId])
// Wall 2 (database): Postgres row-level security reads that same variable β see below.
const result = await client.query(sql, params)
return result.rows as T[]
} finally {
client.release()
}
}-- Row-level security: even a query with NO tenant filter only sees its own tenant's rows.
-- This is the independent second wall β it holds even if application code has a bug.
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON documents
USING (tenant_id = current_setting('app.tenant_id')::uuid);
-- Now: SELECT * FROM documents -- returns ONLY the current tenant's rows.
-- The filter is no longer the developer's responsibility β it is the database's.A request now passes three walls that each stop a cross-tenant leak on their own:
scopedQuery reads it from tenantContext.get(), which throws when absent.WHERE clause only sees the current tenant's rows, because the database enforces it.βIndependentβ means a single bug breaches at most one wall. You do not get a cross-tenant leak from one mistake β you would need three simultaneous, unrelated failures. That is what defense-in-depth actually means, made concrete.
And the model call inherits all of this for free: back in Layer 1, the gateway read tenantContext.get() to tag every provider call and every audit row with the tenant. Prompts, logs, and cost accounting stay partitioned by the same identity that partitions the database β because it is the same identity, injected once.
Encryption in transit means: while data is moving across a network, anyone who taps the wire sees ciphertext. The statement is easy. The engineering is in the details β which hops, what kind of TLS, and where the keys physically live. Let us not leave any of that hand-wavy.
There is no hop β however βinternalβ β that travels in plaintext. The network perimeter is not a substitute for encryption; that assumption is exactly what zero-trust architecture exists to kill.
HOP PROTECTION WHO PROVES IDENTITY βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ browser β load balancer TLS 1.3 server only load balancer β controller mTLS server + client controller β gateway / guardrails mTLS server + client gateway β LLM provider (egress) mTLS server + client service β database TLS 1.3 + cert server (+ client cert)
The public edge uses ordinary TLS 1.3: the browser checks the server's certificate, the browser stays anonymous (the user authenticates with a token, separately). From the load balancer inward, every hop upgrades to mTLS β mutual TLS β where the client also presents a certificate and the server refuses the connection without one.
This is the question the interview asked directly. In a client-certificate exchange there are two key pairs, and the entire security of the scheme rests on where the private halves live and never leave:
Public certificate: sent to the client during the handshake. Travels the wire freely β it is public.
Private key: never leaves the server. Mounted into the pod from a secrets store or a hardware security module, file permissions 0400, readable only by the server process's user. Not in the container image. Not in source control. Not in an environment variable. Not in a log line.
Public certificate: sent to the server during the handshake so the server can verify the caller. Also public, also fine on the wire.
Private key: never leaves the client. Same treatment β mounted secret or HSM, 0400, process-readable only. The client uses it to sign one value during the handshake, proving it holds the key without ever transmitting it.
Only public certificates ever cross the wire. Private keys are mounted, never sent. A private key's entire life is: generated inside a boundary (a KMS, an HSM, or a sealed secrets process), delivered to exactly one workload as a read-only mount, used in memory, and destroyed when the pod dies. If a private key has ever been in a git repo, a Slack message, a Dockerfile, or a log β it is not a private key anymore, it is a public one, and the certificate must be revoked.
The mTLS handshake, step by step β note that the private keys are used but never sent:
CLIENT SERVER β β β ββ 1. ClientHello ββββββββββββββββββββββββββββββββββββββΆ β β β β βββ 2. ServerHello + server's PUBLIC certificate ββββββ β β + "I require a client certificate too" β β β β 3. verify server cert against the trusted CA bundle β β (is this really who I meant to call?) β β β β ββ 4. client's PUBLIC certificate ββββββββββββββββββββββΆ β β ββ 5. a signature made WITH the client's PRIVATE key βββΆ β β (the private key never leaves β only the signature) β β β β 6. verify client cert against β β the trusted CA bundle, and β β verify the signature β β β β βββββ 7. encrypted channel open β both sides proven βββββΆβ
And the configuration β concretely, this is all it is. Issue a cert + key pair per service from an internal certificate authority, mount them, point the client and server at the file paths, and pin the CA bundle so only internally-issued certificates are trusted:
# The cert + key pair is issued by the internal CA (cert-manager / Vault PKI)
# and delivered as a secret. The pod MOUNTS it β the key is never in the image.
volumes:
- name: mtls-certs
secret:
secretName: gateway-client-cert # contains tls.crt, tls.key, ca.crt
defaultMode: 0400 # read-only, owner-only
containers:
- name: model-gateway
volumeMounts:
- name: mtls-certs
mountPath: /etc/mtls
readOnly: true # the workload cannot even modify itimport { Agent } from 'undici'
import { readFileSync } from 'node:fs'
// The private key is READ from the mount at startup, held in memory, and used to
// sign handshakes. It is never logged, never serialized, never sent.
export const mtlsAgent = new Agent({
connect: {
cert: readFileSync('/etc/mtls/tls.crt'), // our PUBLIC certificate β ok to present
key: readFileSync('/etc/mtls/tls.key'), // our PRIVATE key β used in memory only
ca: readFileSync('/etc/mtls/ca.crt'), // pin: trust ONLY internally-issued certs
rejectUnauthorized: true, // refuse any peer we cannot verify
},
})
// Every egress call from the gateway uses this agent. In a service mesh
// (Istio, Linkerd) the sidecar does all of the above and the app code stays clean β
// but the model is identical: mounted key, public cert on the wire, pinned CA.Encryption at rest means: while data sits in storage β the database, the object store, the backups, the logs β it is ciphertext. A stolen disk or a leaked snapshot is useless without the keys. Again the statement is simple; the engineering is in which keys, held where, and who can ask them to do what.
Naive encryption uses one key for everything; if it leaks, everything is exposed, and rotating it means re-encrypting the world. Envelope encryption fixes both problems with a hierarchy of keys:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β KMS (hardware security module) β
β β
β ROOT KEY ββ never leaves the KMS hardware. Ever. β
β β the application cannot read it β it can only ASK β
β β the KMS to use it, and every ask is logged. β
β βΌ β
β wraps (encrypts) each tenant's DATA KEY β
ββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β the app stores only the WRAPPED (encrypted) data key
βΌ
βββββββββββββββββββββββββ βββββββββββββββββββββββββ
β Tenant A data key β β Tenant B data key β ... one per tenant
β (stored wrapped) β β (stored wrapped) β
ββββββββββββ¬βββββββββββββ ββββββββββββ¬βββββββββββββ
β to read Tenant A's data: β
β 1. ask KMS to unwrap A's data key (in memory only)
β 2. AES-256-GCM decrypt the rows with it
β 3. discard the unwrapped key
βΌ
ciphertext rows in Postgres / object store / backupsSo βwhere does the key liveβ has a precise answer at this layer:
import { KMSClient, DecryptCommand } from '@aws-sdk/client-kms'
import { createCipheriv, createDecipheriv, randomBytes } from 'node:crypto'
import { tenantContext } from '../context/tenant'
import { getWrappedDataKey } from './keystore'
const kms = new KMSClient()
// Decrypt a stored field for the CURRENT tenant (identity comes from Layer 4).
export async function decryptField(ciphertext: Buffer, iv: Buffer, tag: Buffer): Promise<string> {
const { tenantId } = tenantContext.get()
// 1. Fetch this tenant's data key β but it is WRAPPED; we cannot use it yet.
const wrapped = await getWrappedDataKey(tenantId)
// 2. Ask the KMS to unwrap it. The ROOT key does the work inside the KMS;
// we get back the data key in memory only. We never see the root key.
const { Plaintext: dataKey } = await kms.send(new DecryptCommand({ CiphertextBlob: wrapped }))
// 3. AES-256-GCM decrypt the field with the unwrapped data key.
const decipher = createDecipheriv('aes-256-gcm', dataKey as Buffer, iv)
decipher.setAuthTag(tag)
const plain = Buffer.concat([decipher.update(ciphertext), decipher.final()])
// 4. Discard the unwrapped key β it lived in memory for one operation.
;(dataKey as Buffer).fill(0)
return plain.toString('utf8')
}Because each tenant has its own data key, deleting a tenant for real does not mean hunting down every row and every backup. You destroy that tenant's key in the KMS. Instantly, every copy of their data β including snapshots, including backups you cannot even reach β becomes permanently unreadable ciphertext. That is crypto-shredding, and for βright to be forgottenβ and federal data-handling requirements it is the difference between a deletion you can prove and one you can only claim.
The layers of at-rest encryption stack β each is independent, like the tenancy walls:
The volume itself is encrypted. Stops a physically stolen disk. Always on.
Transparent data encryption β the DB files are ciphertext. Stops a leaked snapshot.
The most sensitive columns β prompts, documents, PII β encrypted per-tenant on top.
Walk a single request through everything we built. This is the animated diagram at the top, narrated:
tenant_id, workspace_id, role β is injected once into the request-scoped context (Layer 4). Nothing downstream has to be handed identity; it is ambient and enforced.In that whole lifecycle, the application's feature code never imported a vendor SDK, never passed a tenant ID by hand, never touched a private key, and never built an unscoped query. Every one of those was the architecture's job, not the developer's memory. That is the abstraction statement from the top, fully decomposed.
Put one model gateway between your app and every AI vendor, so switching providers is a config change and adding one is a single adapter. Choose the provider per request with a decision matrix that hard-filters on residency and compliance before it ever optimizes for cost. Make tenant identity injected once at the controller and ambient everywhere below, backed by row-level security and per-tenant keys, so a cross-tenant leak needs three simultaneous failures, not one. Encrypt every hop in transit (TLS, then mTLS, with private keys mounted and never sent) and every byte at rest (AES-256-GCM, envelope-encrypted, root key sealed in a KMS) β so the vendor is a dependency you control, and the data is safe whether it is moving or still.