BlogVector Databases Explained: What Every Developer Building AI Features Needs to Know
AI & Machine Learning

Vector Databases Explained: What Every Developer Building AI Features Needs to Know

By Madhukar May 14, 2026 6 min read

If you have started building any feature that involves semantic search, LLM context retrieval, or recommendation systems, you have probably hit the point where a traditional SQL database is the wrong tool.

Vector databases exist to solve a specific problem: finding things that are semantically similar, not just exact matches.

What a Vector Is

A vector, in this context, is a list of floating point numbers that represents the meaning of some content. An embedding model (like OpenAI's text-embedding-3-small or an open-source equivalent) takes text and converts it into a high-dimensional vector.

Similar content produces similar vectors. "How do I center a div" and "flexbox centering techniques" produce vectors that are close together in the vector space. "Quarterly earnings report" produces a vector far from both.

python
import openai

response = openai.embeddings.create(
    model="text-embedding-3-small",
    input="How do I deploy a FastAPI app to Railway?"
)
embedding = response.data[0].embedding  # List of 1536 floats

Similarity Search

Once you have vectors for all your documents, finding semantically similar ones is a matter of measuring vector distance. The most common metric is cosine similarity — how aligned two vectors are in direction.

Traditional SQL cannot do this efficiently. Searching 1 million high-dimensional vectors for the 10 most similar to a query requires algorithms like HNSW (Hierarchical Navigable Small World) — approximate nearest neighbor search that trades perfect accuracy for massive speed gains.

When You Actually Need a Vector Database

You need dedicated vector search when:

  • Your dataset is large enough that exhaustive search is too slow (generally >100k documents)
  • You need metadata filtering combined with semantic search
  • You need to update the vector index frequently

For smaller datasets, PostgreSQL with the pgvector extension handles vector search surprisingly well and may save you the operational overhead of a separate database.

sql
-- pgvector: search for 5 most similar documents
SELECT content, embedding <=> query_vector AS distance
FROM documents
ORDER BY distance
LIMIT 5;

Practical Starting Point

Before reaching for Pinecone, Weaviate, or Qdrant, try pgvector if you are already on PostgreSQL. Migrate to a dedicated vector database when you hit scale or operational limits — not before.

M

Madhukar

Founder & Lead Engineer, Devpads

Building lightweight, high-performance, and privacy-first developer utilities. Madhukar specializes in modern web architectures, code editor tooling, and developer workspace experiences. Read more about our mission on our dedicated About Page or get in touch via Contact Us.

Stack: React · Vite · Tailwind · FastAPI · PostgreSQL