Google's managed billion-scale ANN database โ built on ScaNN, the same Approximate Nearest Neighbor library that powers Google Search's embedding lookup. Formerly named Matching Engine.
What it is: Managed vector database built on ScaNN. Operates at billion-vector scale with p99 latency < 100ms. Indexes are tree-quantized at build time for fast retrieval.
Why ScaNN matters: Beats most open-source ANN libraries (FAISS, HNSWLIB) on recall/latency curves at scale. Uses anisotropic vector quantization plus learned tree structure โ same algorithm the production Google Search embedding lookup uses.
When to migrate from pgvector to Vector Search: when you cross ~10M vectors with sub-100ms p99 requirements. Below that, operational simplicity of pgvector wins. Above, ScaNN's recall advantage and managed operational story dominate.
INGEST TIME:
Docs โโ> chunker โโ> embedding model โโ> N x D vectors
(Gemini text-embedding-005,
OpenAI text-embedding-3,
or any model)
|
v
โโโโโโโโโโโโโโโโโโโโโโ
โ Vector Search โ
โ Index Builder โ
โ (ScaNN) โ
โ - tree-quantize โ
โ - prune branches โ
โ - calibrate ANN โ
โโโโโโโโโโโโโโโโโโโโโโ
|
v
[ Deployed Index ]
(immutable snapshot,
rebuild on update)
QUERY TIME:
Query โโ> embed โโ> Vector Search โโ> top-K (e.g. K=100)
|
v
[ optional rerank
w/ cross-encoder ]
|
v
top-5 โโ> GeminiProblem: E-commerce platform with 100M SKUs. Search needs to handle "wireless headphones for runners under $200" โ a query that mixes semantic intent, attribute filters, and price constraints.
Architecture: Each product embedded with Gemini text-embedding-005 (title + description + reviews). Filters (price, brand, in_stock) stored as namespace tokens on each vector. Vertex Vector Search performs filtered ANN retrieval in < 30ms. Top-100 results pass through Cohere Rerank v3 for query-specific reordering. Then top-20 served to the front end. End-to-end: ~80ms.
Problem: Fortune 500 internal docs across 50M files (Confluence, Sharepoint, Google Drive, Slack archives). Employees need "ask anything" with citations.
Architecture: Nightly ingestion via Vertex AI Pipelines: docs โ> chunking (1000 tokens, 200 overlap) โ> Gemini embeddings โ> Vector Search index. Per-doc access permissions encoded as ACL namespace tokens. Query path: user query embedded โ> filtered ANN โ> cross-encoder rerank โ> top-5 โ> Gemini Pro synthesizes answer with citations. ACL enforcement happens at the Vector Search filter level, NOT after retrieval โ critical for compliance.
Problem: Stock photo library wants "find images that look like THIS reference image and ALSO match this description."
Architecture: Each image embedded with CLIP-style multimodal embeddings (Gemini or Vertex AI multimodalembedding model). Same vector space for images and text. Query combines image vector + text vector with weighted sum, then ANN search returns visually + semantically similar results. ScaNN handles 100M+ images with p99 < 80ms.
Problem: Recommendation system needs "users similar to this user" for cold-start and exploration. 100M+ user embeddings, updated continuously as user behavior evolves.
Architecture: User embeddings (256-dim) generated nightly via Vertex AI Pipelines two-tower model. Stored in Vector Search streaming-update index (vs the standard immutable batch index). Recommendation serving queries the index by user vector to find K-nearest users; their consumption history seeds candidate retrieval. Sub-50ms ANN latency makes this realtime-feasible.
When to migrate: when index build time on pgvector exceeds ~30 min for full rebuild, OR p99 query latency creeps past 100ms with HNSW tuning, OR you need filter-aware ANN (pgvector's pre-filter approach degrades at high cardinality).
Migration steps:
Gotchas: Vector Search indexes are immutable in the batch flavor โ re-index for any large update. Use streaming-update indexes if you need realtime additions (slightly more expensive). Filter namespace tokens must be planned at ingest time โ you can't add new filters retroactively without re-indexing.