Build a hybrid-search RAG against a real corpus (your team docs OR a public dataset). Submit: the live URL (or repo + demo video), the full eval results table (recall@5 ≥ 0.85, accuracy ≥ 0.80, p99 < 2s, cost < $0.01/query), traces of 5 example queries, and a 2-page writeup of what surprised you in the eval.
# Hybrid RAG over Engineering Wiki
Corpus: 412 markdown docs, ~1900 chunks
Eval: 50 cases (15 user-Qs, 25 from team logs, 10 adversarial)
Final metrics:
recall@5: 0.91 (target 0.85 ✓)
accuracy: 0.86 (target 0.80 ✓)
p99 latency: 1.7s (target 2.0s ✓)
avg cost/query: $0.0072 (target $0.01 ✓)
Surprises:
- "version-number" queries (e.g. "v2.1") needed BM25; dense missed them.
- Citations API moved trust the most; users started clicking sources.
- Rerank lifted accuracy 14% over hybrid-only.