Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Where you put the embeddings matters less at 1k docs than you think and more at 1M docs than you realize. For < 10k chunks, a numpy array + cosine works. For 10k-1M chunks, FAISS or pgvector. For 1M+ chunks AND multi-tenant production traffic, a managed vector DB (Pinecone, Weaviate, Qdrant Cloud, Turbopuffer). The right choice depends on (1) corpus size, (2) write throughput, (3) filter requirements (need to query by user_id?), (4) operational appetite. Don't pay for a managed vector DB at 10k chunks; don't run FAISS at 50M chunks. Match the tool to the scale.
For most teams under 100k chunks, pgvector wins by default — your existing Postgres becomes the vector DB, transactional guarantees and joins come for free, and the operational cost is zero new infrastructure. For purpose-built vector DBs at larger scale, Qdrant and Weaviate are open-source standouts; Pinecone is the managed default but pricey. The newer entrants (Turbopuffer, LanceDB) trade off some features for cost or DX. Pick deliberately, document why, and re-evaluate every 6 months.
Use these three in order. Each builds on the one before.
Compare four vector store options (numpy, pgvector, Qdrant/Weaviate, Pinecone) — when is each one the right choice?
Walk me through HNSW (hierarchical navigable small world) at a conceptual level. Why is it sub-linear in N, and what knobs (m, ef) trade?
I have 50M chunks, 100 queries/sec, multi-tenant, need to filter by tenant_id. Design the vector store: which engine, what shard topology, what index type, expected p99 latency?
# pgvector — the boring default that scales to 100k+ chunks fine
# Install: CREATE EXTENSION vector; in your Postgres.
CREATE TABLE chunks (
id bigserial PRIMARY KEY,
doc_id bigint NOT NULL,
content text NOT NULL,
embedding vector(1536), -- match your model's dim
metadata jsonb,
created_at timestamptz NOT NULL DEFAULT now()
);
-- HNSW index — fast approximate search; 'cosine' op family for cosine similarity
CREATE INDEX chunks_emb_idx ON chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- query
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM chunks
WHERE metadata->>'tenant_id' = $2 -- ALSO get to filter for free
ORDER BY embedding <=> $1::vector -- cosine distance ASC
LIMIT 10;
# Python:
import psycopg
from pgvector.psycopg import register_vector
with psycopg.connect(DB_URL) as conn:
register_vector(conn)
with conn.cursor() as cur:
# insert
cur.execute(
"INSERT INTO chunks (doc_id, content, embedding, metadata) VALUES (%s, %s, %s, %s)",
(doc_id, text, vector, {"tenant_id": "t-42"}),
)
# query
cur.execute(
"SELECT content FROM chunks ORDER BY embedding <=> %s LIMIT 10",
(query_vec,),
)
results = [r[0] for r in cur.fetchall()]python3 main.py