Capstok — learn by doing

Why this matters

Where you put the embeddings matters less at 1k docs than you think and more at 1M docs than you realize. For < 10k chunks, a numpy array + cosine works. For 10k-1M chunks, FAISS or pgvector. For 1M+ chunks AND multi-tenant production traffic, a managed vector DB (Pinecone, Weaviate, Qdrant Cloud, Turbopuffer). The right choice depends on (1) corpus size, (2) write throughput, (3) filter requirements (need to query by user_id?), (4) operational appetite. Don't pay for a managed vector DB at 10k chunks; don't run FAISS at 50M chunks. Match the tool to the scale.

Demo

For most teams under 100k chunks, pgvector wins by default — your existing Postgres becomes the vector DB, transactional guarantees and joins come for free, and the operational cost is zero new infrastructure. For purpose-built vector DBs at larger scale, Qdrant and Weaviate are open-source standouts; Pinecone is the managed default but pricey. The newer entrants (Turbopuffer, LanceDB) trade off some features for cost or DX. Pick deliberately, document why, and re-evaluate every 6 months.

Try it yourself

Pick a vector store based on your scale (numpy if < 5k, pgvector if you already use Postgres, managed if you want zero-ops). Document the choice.
Index 1,000 docs. Time the index build, the query latency, and the storage cost.
Scale to 50,000 docs. Re-measure. The 50x growth should produce only ~5x latency (because the index is logarithmic).
Add a metadata filter (e.g. tenant_id) — vector + filter together. Most managed DBs handle this; pgvector handles it natively.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

Compare four vector store options (numpy, pgvector, Qdrant/Weaviate, Pinecone) — when is each one the right choice?

2. Why it works (the mechanism)

Walk me through HNSW (hierarchical navigable small world) at a conceptual level. Why is it sub-linear in N, and what knobs (m, ef) trade?

3. Advanced — application & what's next

I have 50M chunks, 100 queries/sec, multi-tenant, need to filter by tenant_id. Design the vector store: which engine, what shard topology, what index type, expected p99 latency?

References

Chat about this lesson

# pgvector — the boring default that scales to 100k+ chunks fine
# Install: CREATE EXTENSION vector; in your Postgres.

CREATE TABLE chunks (
  id          bigserial PRIMARY KEY,
  doc_id      bigint NOT NULL,
  content     text NOT NULL,
  embedding   vector(1536),       -- match your model's dim
  metadata    jsonb,
  created_at  timestamptz NOT NULL DEFAULT now()
);

-- HNSW index — fast approximate search; 'cosine' op family for cosine similarity
CREATE INDEX chunks_emb_idx ON chunks
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- query
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM chunks
WHERE metadata->>'tenant_id' = $2     -- ALSO get to filter for free
ORDER BY embedding <=> $1::vector       -- cosine distance ASC
LIMIT 10;

# Python:
import psycopg
from pgvector.psycopg import register_vector

with psycopg.connect(DB_URL) as conn:
    register_vector(conn)
    with conn.cursor() as cur:
        # insert
        cur.execute(
            "INSERT INTO chunks (doc_id, content, embedding, metadata) VALUES (%s, %s, %s, %s)",
            (doc_id, text, vector, {"tenant_id": "t-42"}),
        )
        # query
        cur.execute(
            "SELECT content FROM chunks ORDER BY embedding <=> %s LIMIT 10",
            (query_vec,),
        )
        results = [r[0] for r in cur.fetchall()]

Run: python3 main.py

Vector stores — your first index