Agentic and Applied AI / Course

Context Engineering

Engineer the context window as a budgeted resource: select, order, compress, cache, and measure what the model sees. The discipline underneath every reliable RAG, agent, and long-session assistant.

Free preview

Certificate: 1 of 5 capstones

Ten modules, ~100 challenges on the discipline that decides whether an LLM feature is reliable: what goes into the context window, in what order, how it's compressed and cached, and how you measure it. Goes beyond prompt engineering into the anatomy of the window (roles, tokens, position effects), retrieval as selection, three-tier memory, tool-result and structured context, compression, prompt + semantic caching, long-context strategy, and a full eval/regression discipline. Python-first with runnable code against real provider APIs (Anthropic + OpenAI), and built on 2026 production practice: prompt caching, citations, lost-in-the-middle, and LLM-as-judge evals done right.

Built by Lakshya Kumar

context-engineering

llm

rag

prompt-caching

memory

long-context

evals

Before you start4 items

You're comfortable in Python and have called an LLM API before (Anthropic or OpenAI).
You understand basic prompting (system/user roles, instructions) — this course starts where prompt engineering ends.
Helpful but not required: you've built or used a RAG system. The RAG Systems course is a great companion.
An API key (free tier is fine — the course teaches you to work within rate limits).

Is this course for you?Ask an AI

Get access to Context Engineering

$3.99

30-day access

Prefer the whole catalog? See all-access membership.

Ask for access

We grant free access case-by-case — students, career-switchers, builders on a tight budget. Sign in to send us a note.

Capstone projects

Submit any 1 of 5 to earn the certificate

Complete all modules, then submit the required number of capstone projects. Each must earn a passing rating from an admin reviewer.

capstoneA production context pipeline

Build a complete context pipeline for a real corpus and use case: gather → score → select-under-budget → order → format, with dedup, recency, retrieval + rerank, three-tier memory, compression with fidelity guarantees, and a layered cache. Ship it behind an API and prove quality with an eval harness (retrieval + generation metrics) and a CI regression gate. Submit the repo, the eval report, and a cost/latency breakdown.

Submit context pipelineMinimum rating for approval: 3/5

context-eval-packA context eval + regression pack

Further reading & study material6 sources

Paste this into any AI chat. Fill in the bracketed parts with your context — you'll get back a straight answer on whether this belongs on your plate.

Prompt

I'm taking a "Context Engineering" course — engineering the LLM context window as a budgeted resource: what to include, in what order, how to compress and cache it, and how to measure it. It covers the anatomy of the window (roles, tokens, position effects), retrieval as selection, three-tier memory, tool results, compression, prompt + semantic caching, long-context strategy, and evals/regression gates. Python-first against Anthropic + OpenAI.

Here's my context:
1. What I'm building: [describe the feature/product]
2. My current context approach: [naive concatenation / basic RAG / agent / long-context stuffing]
3. Where it's failing: [inconsistent answers / too expensive / too slow / hallucinations / forgets things]
4. My model + window: [model and context size]

Given that, answer:
- Which module should I prioritize and why?
- Which of the five levers (select / order / compress / cache / measure) is most likely my bottleneck?
- Name 3 concrete changes I could make this week, and how I'd measure that each one helped.
- Name 1 thing this course won't fix so I have the right expectations.

Build a reusable eval pack another team could drop onto their RAG/agent: labeled-set tooling, retrieval + generation metrics, a calibrated LLM-as-judge, ablation runner, online-signal ingester, and a CI gate. Submit it as a small package or repo with docs.

Submit eval packMinimum rating for approval: 3/5

memory-systemA three-tier memory system

Ship a memory system (short-term + summarized + long-term) with salience extraction, relevance-gated recall, PII redaction, and cross-session continuity, integrated into a working assistant. Submit the live demo + a writeup of what it remembers and forgets and why.

Submit memory systemMinimum rating for approval: 3/5

caching-stackA layered caching stack with measured savings

Build prompt caching + semantic caching + embedding caching for a real workload with version-based invalidation, then report measured hit rate, cost savings, latency reduction, and false-hit rate over real traffic. Submit the implementation + the metrics dashboard.

Submit caching stackMinimum rating for approval: 3/5

long-context-benchmarkA long-context vs. retrieval benchmark

Pick a real corpus and produce a rigorous decision report: needle-in-haystack recall map for your model, long-context and retrieval (and hybrid) implementations, and an accuracy/latency/cost comparison ending in a defensible recommendation. Submit the benchmark code + report.

Submit benchmarkMinimum rating for approval: 3/5

The position-effects paper behind Modules 2 and 9. Required reading.

Context Engineering

What Context Engineering Is

Anatomy of the Context Window

What to Put In (and Leave Out)

Retrieval as Context

Memory

Tool Results & Structured Context

Context Compression

Prompt & Context Caching

Long-Context Strategies

Measuring Context Quality