Vertex AI Vector Search

Google's managed billion-scale ANN database — built on ScaNN, the same Approximate Nearest Neighbor library that powers Google Search's embedding lookup. Formerly named Matching Engine.

TL;DR

What it is: Managed vector database built on ScaNN. Operates at billion-vector scale with p99 latency < 100ms. Indexes are tree-quantized at build time for fast retrieval.

Why ScaNN matters: Beats most open-source ANN libraries (FAISS, HNSWLIB) on recall/latency curves at scale. Uses anisotropic vector quantization plus learned tree structure — same algorithm the production Google Search embedding lookup uses.

When to migrate from pgvector to Vector Search: when you cross ~10M vectors with sub-100ms p99 requirements. Below that, operational simplicity of pgvector wins. Above, ScaNN's recall advantage and managed operational story dominate.

How Vertex Vector Search Works

INGEST TIME:
  Docs ──> chunker ──> embedding model ──> N x D vectors
                       (Gemini text-embedding-005,
                        OpenAI text-embedding-3,
                        or any model)
                                                  |
                                                  v
                                       ┌────────────────────┐
                                       │  Vector Search     │
                                       │  Index Builder     │
                                       │  (ScaNN)           │
                                       │  - tree-quantize   │
                                       │  - prune branches  │
                                       │  - calibrate ANN   │
                                       └────────────────────┘
                                                  |
                                                  v
                                          [ Deployed Index ]
                                          (immutable snapshot,
                                          rebuild on update)

QUERY TIME:
  Query ──> embed ──> Vector Search ──> top-K (e.g. K=100)
                                                  |
                                                  v
                                       [ optional rerank
                                         w/ cross-encoder ]
                                                  |
                                                  v
                                             top-5 ──> Gemini

Production Use Cases

Use Case 1 — Semantic search over 100M product catalog

Problem: E-commerce platform with 100M SKUs. Search needs to handle "wireless headphones for runners under $200" — a query that mixes semantic intent, attribute filters, and price constraints.

Architecture: Each product embedded with Gemini text-embedding-005 (title + description + reviews). Filters (price, brand, in_stock) stored as namespace tokens on each vector. Vertex Vector Search performs filtered ANN retrieval in < 30ms. Top-100 results pass through Cohere Rerank v3 for query-specific reordering. Then top-20 served to the front end. End-to-end: ~80ms.

Use Case 2 — Enterprise document RAG at 50M docs

Problem: Fortune 500 internal docs across 50M files (Confluence, Sharepoint, Google Drive, Slack archives). Employees need "ask anything" with citations.

Architecture: Nightly ingestion via Vertex AI Pipelines: docs ─> chunking (1000 tokens, 200 overlap) ─> Gemini embeddings ─> Vector Search index. Per-doc access permissions encoded as ACL namespace tokens. Query path: user query embedded ─> filtered ANN ─> cross-encoder rerank ─> top-5 ─> Gemini Pro synthesizes answer with citations. ACL enforcement happens at the Vector Search filter level, NOT after retrieval — critical for compliance.

Use Case 3 — Multi-modal image + text search

Problem: Stock photo library wants "find images that look like THIS reference image and ALSO match this description."

Architecture: Each image embedded with CLIP-style multimodal embeddings (Gemini or Vertex AI multimodalembedding model). Same vector space for images and text. Query combines image vector + text vector with weighted sum, then ANN search returns visually + semantically similar results. ScaNN handles 100M+ images with p99 < 80ms.

Use Case 4 — Recommendation embedding store

Problem: Recommendation system needs "users similar to this user" for cold-start and exploration. 100M+ user embeddings, updated continuously as user behavior evolves.

Architecture: User embeddings (256-dim) generated nightly via Vertex AI Pipelines two-tower model. Stored in Vector Search streaming-update index (vs the standard immutable batch index). Recommendation serving queries the index by user vector to find K-nearest users; their consumption history seeds candidate retrieval. Sub-50ms ANN latency makes this realtime-feasible.

Migrating from pgvector to Vertex Vector Search

When to migrate: when index build time on pgvector exceeds ~30 min for full rebuild, OR p99 query latency creeps past 100ms with HNSW tuning, OR you need filter-aware ANN (pgvector's pre-filter approach degrades at high cardinality).

Migration steps:

Export vectors + metadata from Postgres as JSONL files to GCS.
Create a Vector Search Index resource pointing at the GCS path; choose dimensions, distance measure (cosine / dot / Euclidean), and shard count.
Build the index (Google does this; takes minutes to hours depending on size).
Create an Index Endpoint and deploy the built index to it.
Swap your application code: replace pgvector query with Vector Search query. The retrieval interface is similar — embed query, fetch top-K, return chunks. Metadata filters use namespace tokens.
Run dual-write for 1 week to verify recall parity, then cut over.

Gotchas: Vector Search indexes are immutable in the batch flavor — re-index for any large update. Use streaming-update indexes if you need realtime additions (slightly more expensive). Filter namespace tokens must be planned at ingest time — you can't add new filters retroactively without re-indexing.

Glossary

Vertex AI Vector Search

Vector Database

Semantic Search

Hybrid Search

Cross-Encoder Reranker

RAG

Agentic RAG

Gemini Embeddings

Vertex AI

Vertex AI Pipelines

Vertex AI Vector Search

TL;DR

How Vertex Vector Search Works

Production Use Cases

Use Case 1 — Semantic search over 100M product catalog

Use Case 2 — Enterprise document RAG at 50M docs

Use Case 3 — Multi-modal image + text search

Use Case 4 — Recommendation embedding store

Migrating from pgvector to Vertex Vector Search

Glossary

Related Reading