End-to-end: tools, memory, planning, guardrails, evals, observability, cost, safety, shipping. Build an agent that survives real users.
Ten modules from 'what an agent actually is' to 'how to ship one'. Covers the practical 2026 stack: tool calling, MCP, memory layers, plan-then-execute, guardrails at three layers, multi-axis evals with CI gates, prompt caching + model routing, full observability with traces+metrics+logs, safety against direct and indirect prompt injection, and the launch+improvement cadence. Python-first with TypeScript where it matters.
Built by Lakshya Kumar
Paste this into any AI chat. Fill in the bracketed parts with your context — you'll get back a straight answer on whether this belongs on your plate.
We grant free access case-by-case — students, career-switchers, builders on a tight budget. Sign in to send us a note.
Sign in to applyComplete all modules, then submit the required number of capstone projects. Each must earn a passing rating from an admin reviewer.
Pick a real or realistic use case. Build the agent end-to-end through every module: tools, memory, plan-execute, guardrails, evals, observability, cost optimization, safety, shipping. Ship to ≥10 beta users for 2+ weeks. Submit the live URL, metrics, traces, and a 5-page writeup of what surfaced in beta.
Build a small reusable toolkit (npm/pypi package) implementing the core patterns: tool dispatcher with authz + audit, multi-layer memory, guardrail middleware, eval harness, tracing wrapper. Open-source it.
I'm taking a "Building Production Agents" course covering the practical 2026 stack: tool calling, MCP, memory layers, plan-then-execute, guardrails, multi-axis evals with CI gates, prompt caching + model routing, full observability, safety against prompt injection, and the launch + improvement cadence. My context: 1. My current product / project is: [describe] 2. Current agent state: [haven't built one / prototype / shipping to users] 3. My stack: [language, model provider] 4. My biggest agent problem: [hallucinations? cost? latency? safety?] Given that, answer: - Which module should I prioritize? - Name 3 concrete wins this course would unlock for my situation. - Name 1 thing the course won't help with so I don't have wrong expectations. - If I only had 2 hours this week, which single technique gives me the biggest lift?
Build the eval harness from Module 6 as a standalone tool: trajectory matcher, LLM-judge with calibration, CI gate, online metric integration. Document a sample integration.
Build the full safety stack (Module 5 + 9) for an agent: input/output filters, tool authz, indirect-injection defenses, abuse detection, kill switches, incident playbook. Plus a 30-prompt red team report.
Take an existing agent and cut cost ≥50% via the techniques in Module 7. Submit before/after dashboards, the eval showing no regression, and a writeup ranking which lever paid off most.
Module 2 reads almost directly from this. Required.