Project · Submit: live hybrid RAG + eval results + writeup

project

hard

module project

Ship something real. Submit your work when you're done.

Brief

Build a hybrid-search RAG against a real corpus (your team docs OR a public dataset). Submit: the live URL (or repo + demo video), the full eval results table (recall@5 ≥ 0.85, accuracy ≥ 0.80, p99 < 2s, cost < $0.01/query), traces of 5 example queries, and a 2-page writeup of what surprised you in the eval.

Deliverables

Live demo URL (or repo + video + run instructions).
Golden eval set (50+ cases) + harness producing the 4 metrics.
Hybrid pipeline (dense + BM25 + RRF + rerank + citations).
Observability: traces visible for every query.
Refresh pipeline: edit one doc; verify the new content surfaces in search within minutes.
Writeup: what worked, what didn't, what's next.

How we grade it

Eval metrics all hit (or honest documentation of why not).
Hybrid measurably beats dense-only on the eval set.
Citations work and are verifiable.
Refresh tested by editing a doc + querying.
Traces capture full request flow.

Project · Submit: live hybrid RAG + eval results + writeup

Hints

Expected output

Stretch goals