Vector Database
A vector database is a data store designed to persist high-dimensional numerical vectors (embeddings) and execute similarity searches across them at scale. Where a relational database answers queries like "find rows where country = 'IT'", a vector database answers queries like "find the 10 vectors closest in meaning to this query vector" — a fundamentally different operation that standard database indices cannot perform efficiently.
How It Works
When a document is converted to an embedding, the resulting vector (e.g., 1536 floating-point numbers) is stored in the vector database alongside a reference to the source text. To query the database, the incoming query is embedded with the same model and the database performs an approximate nearest-neighbor (ANN) search — returning the k stored vectors geometrically closest to the query vector.
ANN algorithms (HNSW, IVF, Flat) trade a small degree of precision for a large gain in speed. On millions of vectors, an exact exhaustive search would take seconds; ANN returns results in milliseconds with negligible accuracy loss for most applications.
Popular vector databases and libraries include Pinecone, Weaviate, Qdrant, Milvus, and pgvector (a PostgreSQL extension that adds vector search to relational tables).
Common Use Cases
- RAG knowledge bases — storing chunks of documents as vectors so a language model can retrieve relevant context before generating a response.
- Semantic search engines — powering meaning-based search over product catalogs, CRM records, or support articles.
- Recommendation systems — finding items most similar to a user's embedding profile.
- Anomaly detection — flagging records whose vectors are unusually distant from all known clusters.
Vector Database vs. Traditional Database
A relational database can store vectors as array columns, but it cannot perform ANN search efficiently without an extension like pgvector. A dedicated vector database is purpose-built for this workload: its indexing structures, storage layout, and query planner are all optimized for similarity search rather than exact-match retrieval.
For applications where vector search is one capability among many (e.g., filtering by date AND semantic similarity), a hybrid approach using pgvector within an existing Postgres instance is often simpler than operating a separate service.
Related Terms
Knowlee's Approach
Knowlee's knowledge graph stores account intelligence as a combination of structured graph relationships and embedding vectors, enabling both structured traversal (which company is connected to which person) and semantic retrieval (which account signals are most relevant to this outreach). The vector layer sits inside the graph store rather than as a separate service, keeping the retrieval architecture coherent. This architecture is explored in depth in The Enterprise Knowledge Graph Moat.