Garbage in, garbage out: context quality dominates

hard

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

Teams spend weeks tuning prompts and model choice while feeding the model stale, malformed, or contradictory context — then wonder why quality is stuck. In applied AI, context quality is usually the binding constraint. A mediocre model with clean, relevant, well-ordered context beats a frontier model with messy context. This task is the mindset shift: when output is bad, suspect the context first. It's almost always cheaper to fix than the model.

Demo

A quick triage routine: before blaming the model, dump exactly what you sent it and eyeball it. Most 'the model is dumb' bugs are visible the moment you print the assembled context — duplicated chunks, truncated mid-sentence, the wrong document, or yesterday's data.

Try it yourself

Add debug_context to your own app and run a failing query. Read what you actually sent — not what you think you sent.
Look for the four classic defects: duplicates, mid-sentence truncation, wrong/stale document, contradictions. You'll usually find at least one.
Log the assembled context for 10 real queries and tally how many had a context defect vs. a genuine model error. The ratio surprises most people.
Add an assertion that fails the request if the same block id appears twice — turn a silent defect into a loud one.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

Explain the claim 'context quality dominates model choice' for applied AI. Why should I debug my context before switching to a bigger model?

2. Why it works (the mechanism)

List the most common context defects (duplication, truncation, staleness, contradiction, wrong doc) and how each one degrades the model's output.

3. Advanced — application & what's next

Design a lightweight 'context linter' that runs before every LLM call in production and flags likely-bad context. What checks would it include and which should hard-fail vs. warn?

References

Working within free-tier limits. Free / low-tier provider keys rate-limit aggressively, and eval or agent loops that fan out calls will hit 429 Too Many Requests fast. Survive it: read Retry-After and the x-ratelimit-* headers and back off (exponential backoff with jitter + a max-retry cap) instead of hammering; cap in-flight requests with a small concurrency limiter so you stay under the RPM/TPM ceiling; cache identical requests so retries don't re-spend quota; downshift to a smaller/cheaper model for practice runs; use the provider for non-interactive jobs; or sidestep hosted limits entirely by running a small model locally (Ollama / llama.cpp) or on a free Colab/Kaggle GPU while you learn.