Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Prompt engineering is about the words you write. Context engineering is about everything the model can see when it generates the next token — the system prompt, the conversation so far, retrieved documents, tool results, and the formatting that holds them together. As soon as you move past one-shot chat into RAG, agents, or long sessions, the prompt is a tiny fraction of the context, and the hard problems all live in what else you put in the window and in what order. Treating the context as a budgeted, engineered artifact — not a string you concatenate — is the difference between a demo and a system that holds up.
Here's the whole discipline in one frame: a model call is f(context) -> tokens, where context is a list of messages plus a system prompt, all measured in tokens. Prompt engineering optimizes one message. Context engineering optimizes the whole list: what's included, what's dropped, what order it's in, and how it's formatted. The snippet below shows the same question answered with a bare prompt vs. an engineered context — same model, very different reliability.
Use these three in order. Each builds on the one before.
In one paragraph, explain the difference between prompt engineering and context engineering, and why the distinction starts to matter once I build RAG or agents.
A model call is f(context) -> tokens. Walk me through everything that counts as 'context' in a chat API request, and how each part influences the next-token distribution.
I'm getting inconsistent answers from an LLM feature even though my prompt is good. Give me a checklist of context-level causes (ordering, role placement, stale history, missing grounding) to diagnose before I touch the prompt.
When the model call fails. Read the error and decide: fix the request,
retry, or fall back. 400/422 (bad params, context-length exceeded),
401/403 (auth / no access to that model), 404 (wrong model id) are
fatal — fix and don't retry. 429, 500/502/503, Anthropic 529
(overloaded), and timeouts are transient — retry with backoff. Watch for
non-HTTP failures too: finish_reason: "length" truncation (raise
max_tokens or continue), safety refusals, malformed JSON / failed
tool-call parsing (validate against a schema and repair-retry), and
mid-stream disconnects. Always log the provider request id with the
error so you can trace it later.
from anthropic import Anthropic
client = Anthropic()
QUESTION = "Is the X200 power supply compatible with the M-series board?"
# Bare prompt — the model guesses from training data
bare = client.messages.create(
model="claude-sonnet-4-6", max_tokens=300,
messages=[{"role": "user", "content": QUESTION}],
)
# Engineered context — system role + retrieved facts + an explicit grounding rule
SPECS = "X200 PSU: 12V/40A, ATX. M-series board: requires 12V/30A min, ATX 24-pin."
engineered = client.messages.create(
model="claude-sonnet-4-6", max_tokens=300,
system=(
"You are a hardware compatibility assistant. Answer ONLY from the SPECS "
"block. If the specs don't settle it, say what's missing."
f"\n\nSPECS:\n{SPECS}"
),
messages=[{"role": "user", "content": QUESTION}],
)
print(bare.content[0].text, "\n---\n", engineered.content[0].text)python3 main.py