Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
A demo RAG works; a production RAG survives weekly. Add: traces (every query logged with retrieved chunks + answer + timing), embedding cache (same query → cached vector), incremental refresh (changed docs re-embedded), and basic monitoring (latency, cost, refusal rate). The polish takes 1-2 days and makes the difference between 'demo' and 'team uses daily'.
Observability: LangFuse or equivalent — every query is a trace with spans. Embedding cache: Redis SETEX on query_hash. Refresh: webhook (or daily cron) detects changed docs by source_hash; only changed chunks get re-embedded; old chunks dropped. Monitoring: 3 dashboards (latency, cost, quality signals — thumbs up rate if you have it).
Use these three in order. Each builds on the one before.
Name 4 'production polish' items every shipped RAG needs.
Walk me through hash-based incremental refresh. Why is it better than full rebuild?
Design a 'staleness watchdog' that surfaces 'this answer is based on a doc updated 18 months ago'.
# Observability
from langfuse import Langfuse
fuse = Langfuse()
def chat(question, user_id):
trace = fuse.trace(name="rag_query", user_id=user_id, input={"question": question})
with trace.span(name="retrieve") as span:
chunks = hybrid_then_rerank(question, k=5)
span.update(output={"chunk_ids": [c["id"] for c in chunks]})
with trace.span(name="generate") as span:
ans = answer(question, chunks)
span.update(output=ans["text"][:500])
trace.update(output=ans["text"])
return ans
# Embedding cache (Redis)
import hashlib, json, redis
r = redis.from_url(os.environ["REDIS_URL"])
def cached_embed(text):
key = "emb:" + hashlib.sha256(text.encode()).hexdigest()
cached = r.get(key)
if cached:
return json.loads(cached)
vec = embed_batch([text])[0]
r.setex(key, 86400, json.dumps(vec))
return vec
# Refresh on doc change
def refresh_doc(doc_path, new_content):
new_hash = hashlib.sha256(new_content.encode()).hexdigest()
existing_hashes = {r["id"]: r["source_hash"] for r in db.query("SELECT id, source_hash FROM chunks WHERE doc_id = %s", (doc_id_for(doc_path),))}
new_chunks = ingest(doc_path, new_content, ...)
# diff: re-embed only changed; delete missing
...python3 main.py