Vector databases in one paragraph
A vector database stores embeddings — numerical vector representations of content produced by an embedding model. Given a query embedding, the database returns the most similar stored embeddings (and their associated content) using approximate nearest-neighbour algorithms. Vector databases are the operational substrate of most production RAG systems: ingest documents, embed them, store them; at query time embed the question, retrieve the closest matches, give them to Claude.
Why vector databases matter
Traditional keyword search (BM25, full-text) finds documents that share words with the query. Vector search finds documents that share meaning. "How do I cancel my subscription" and "I want to stop my plan" share no keywords but share meaning — vector search finds both, keyword search finds only the literal match. For retrieval over enterprise content where users phrase queries in many different ways, vector search is qualitatively better.
Vector database options
Common production options:
- pgvector — Postgres extension, ideal when you already operate Postgres and corpus size is moderate (low millions of vectors). Lower operational overhead.
- Pinecone — hosted vector database with operational simplicity. Common choice for teams that want to outsource the operations.
- Weaviate — hybrid search (vector + keyword) is first-class. Open-source with hosted offering.
- Qdrant — high-performance, especially under heavy query load. Open-source with hosted offering.
- Chroma — lightweight, common for prototyping and smaller corpora.
- AWS / GCP / Azure native — hyperscaler vector services, useful when consolidating on a single cloud.
NINtec's Discovery phase recommends a vector database based on corpus size, query rate, latency budget, and operational posture.
Beyond simple vector search
Production retrieval is rarely pure vector search. Common augmentations:
- Hybrid search — combine vector and BM25 keyword results, then merge
- Re-ranking — use a cross-encoder model to re-score the top-N candidates
- Metadata filtering — restrict search to chunks matching tenant, date range, or document type
- Multi-query — generate multiple query variants and union the results
- Recursive retrieval — retrieve, summarise, retrieve again with the summary
NINtec's RAG practice tunes the retrieval pattern against your specific eval set rather than starting from a generic recipe.