Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Teams spend weeks tuning prompts and model choice while feeding the model stale, malformed, or contradictory context — then wonder why quality is stuck. In applied AI, context quality is usually the binding constraint. A mediocre model with clean, relevant, well-ordered context beats a frontier model with messy context. This task is the mindset shift: when output is bad, suspect the context first. It's almost always cheaper to fix than the model.
A quick triage routine: before blaming the model, dump exactly what you sent it and eyeball it. Most 'the model is dumb' bugs are visible the moment you print the assembled context — duplicated chunks, truncated mid-sentence, the wrong document, or yesterday's data.
debug_context to your own app and run a failing query. Read what you actually sent — not what you think you sent.Use these three in order. Each builds on the one before.
Explain the claim 'context quality dominates model choice' for applied AI. Why should I debug my context before switching to a bigger model?
List the most common context defects (duplication, truncation, staleness, contradiction, wrong doc) and how each one degrades the model's output.
Design a lightweight 'context linter' that runs before every LLM call in production and flags likely-bad context. What checks would it include and which should hard-fail vs. warn?
Working within free-tier limits. Free / low-tier provider keys rate-limit aggressively, and eval or agent loops that fan out calls will hit
429 Too Many Requestsfast. Survive it: readRetry-Afterand thex-ratelimit-*headers and back off (exponential backoff with jitter + a max-retry cap) instead of hammering; cap in-flight requests with a small concurrency limiter so you stay under the RPM/TPM ceiling; cache identical requests so retries don't re-spend quota; downshift to a smaller/cheaper model for practice runs; use the provider for non-interactive jobs; or sidestep hosted limits entirely by running a small model locally (Ollama / llama.cpp) or on a free Colab/Kaggle GPU while you learn.
def debug_context(system: str, messages: list[dict]):
print("=== SYSTEM ===\n", system[:2000])
for m in messages:
role = m["role"]
content = m["content"]
text = content if isinstance(content, str) else str(content)
print(f"=== {role.upper()} ({len(text)} chars) ===\n{text[:1000]}\n")
# Before debugging the model, debug the input you actually sent:
debug_context(
system="Answer from CONTEXT only.\nCONTEXT:\n[doc-1] ...\n[doc-1] ...", # dup!
messages=[{"role": "user", "content": "What changed in v2?"}],
)python3 main.py