๐Ÿ”
ยท Deep Dive

Google's managed billion-scale database โ€” built on ScaNN, the same library that powers Google Search's lookup. Formerly named .

TL;DR

What it is: Managed built on ScaNN. Operates at billion-vector scale with p99 latency < 100ms. Indexes are tree- at build time for fast retrieval.

Why ScaNN matters: Beats most open-source libraries (FAISS, HNSWLIB) on recall/latency curves at scale. Uses anisotropic vector plus learned tree structure โ€” same algorithm the production Google Search lookup uses.

When to migrate from pgvector to Vector Search: when you cross ~10M vectors with sub-100ms p99 requirements. Below that, operational simplicity of pgvector wins. Above, ScaNN's recall advantage and managed operational story dominate.

How Works

INGEST TIME:
  Docs โ”€โ”€> chunker โ”€โ”€> embedding model โ”€โ”€> N x D vectors
                       (Gemini text-embedding-005,
                        OpenAI text-embedding-3,
                        or any model)
                                                  |
                                                  v
                                       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                       โ”‚  Vector Search     โ”‚
                                       โ”‚  Index Builder     โ”‚
                                       โ”‚  (ScaNN)           โ”‚
                                       โ”‚  - tree-quantize   โ”‚
                                       โ”‚  - prune branches  โ”‚
                                       โ”‚  - calibrate ANN   โ”‚
                                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                  |
                                                  v
                                          [ Deployed Index ]
                                          (immutable snapshot,
                                          rebuild on update)

QUERY TIME:
  Query โ”€โ”€> embed โ”€โ”€> Vector Search โ”€โ”€> top-K (e.g. K=100)
                                                  |
                                                  v
                                       [ optional rerank
                                         w/ cross-encoder ]
                                                  |
                                                  v
                                             top-5 โ”€โ”€> Gemini

Production Use Cases

Use Case 1 โ€” over 100M product catalog

Problem: E-commerce platform with 100M SKUs. Search needs to handle "wireless headphones for runners under $200" โ€” a query that mixes semantic intent, attribute filters, and price constraints.

Architecture: Each product embedded with text--005 (title + description + reviews). Filters (price, brand, in_stock) stored as namespace tokens on each vector. performs filtered retrieval in < 30ms. Top-100 results pass through Cohere Rerank v3 for query-specific reordering. Then top-20 served to the front end. End-to-end: ~80ms.

Use Case 2 โ€” Enterprise document at 50M docs

Problem: Fortune 500 internal docs across 50M files (Confluence, Sharepoint, Google Drive, Slack archives). Employees need "ask anything" with citations.

Architecture: Nightly ingestion via : docs โ”€> (1000 tokens, 200 overlap) โ”€> โ”€> Vector Search index. Per-doc access permissions encoded as ACL namespace tokens. Query path: user query embedded โ”€> filtered โ”€> rerank โ”€> top-5 โ”€> synthesizes answer with citations. ACL enforcement happens at the Vector Search filter level, NOT after retrieval โ€” critical for compliance.

Use Case 3 โ€” image + text search

Problem: Stock photo library wants "find images that look like THIS reference image and ALSO match this description."

Architecture: Each image embedded with CLIP-style ( or multimodalembedding model). Same vector space for images and text. Query combines image vector + text vector with weighted sum, then search returns visually + semantically similar results. ScaNN handles 100M+ images with p99 < 80ms.

Use Case 4 โ€” Recommendation store

Problem: Recommendation system needs "users similar to this user" for cold-start and exploration. 100M+ user , updated continuously as user behavior evolves.

Architecture: User (256-dim) generated nightly via two-tower model. Stored in Vector Search streaming-update index (vs the standard immutable batch index). Recommendation serving queries the index by user vector to find K-nearest users; their consumption history seeds candidate retrieval. Sub-50ms latency makes this realtime-feasible.

Migrating from pgvector to

When to migrate: when index build time on pgvector exceeds ~30 min for full rebuild, OR p99 query latency creeps past 100ms with tuning, OR you need filter-aware (pgvector's pre-filter approach degrades at high cardinality).

Migration steps:

  1. Export vectors + metadata from Postgres as JSONL files to GCS.
  2. Create a Vector Search Index resource pointing at the GCS path; choose dimensions, distance measure (cosine / dot / Euclidean), and shard count.
  3. Build the index (Google does this; takes minutes to hours depending on size).
  4. Create an Index Endpoint and deploy the built index to it.
  5. Swap your application code: replace pgvector query with Vector Search query. The retrieval interface is similar โ€” embed query, fetch , return . Metadata filters use namespace tokens.
  6. Run dual-write for 1 week to verify recall parity, then cut over.

Gotchas: Vector Search indexes are immutable in the batch flavor โ€” re-index for any large update. Use streaming-update indexes if you need realtime additions (slightly more expensive). Filter namespace tokens must be planned at ingest time โ€” you can't add new filters retroactively without re-indexing.

Glossary

Related Reading