Machine Learning & AI / Course

Advanced AI Engineering

Fine-tuning, quantization, and production deployment of large language models. For engineers who want to go beyond API calls.

Free preview

Certificate: 1 of 5 capstones

Go beyond prompting and RAG. This course covers the full model adaptation lifecycle: LoRA and QLoRA fine-tuning from first principles, SFT pipelines with TRL, preference optimization with DPO and RLHF, quantization schemes (int8, int4, GPTQ, AWQ), production serving with vLLM and TGI, distributed training across multiple GPUs, custom training objectives, evaluation suites that catch regressions before they ship, multi-model deployment infrastructure, and safety and alignment engineering. Every module has working Python code (no pseudocode, no toy examples) and a real project. The capstone is a fine-tuned, quantized, evaluated, and deployed model you built from scratch.

Built by Lakshya Kumar

llm

fine-tuning

lora

qlora

quantization

vllm

rlhf

dpo

alignment

production

Before you start5 items

Completed 'Applied AI: From ML to Modern Systems' or equivalent: familiar with transformers, HuggingFace, and PyTorch at intermediate level.
Has trained at least one model in PyTorch (even a simple MLP). Understands loss, optimizer, and training loop.
Comfortable with Linux command line and Docker. Has run a FastAPI server.
Access to at least one GPU (a Google Colab T4 is enough for Modules 1–4; A100/4090 is needed from Module 5 onward).
No prior fine-tuning experience required — Module 1 starts from LoRA first principles.

Is this course for you?Ask an AI

Get access to Advanced AI Engineering

$3.99

30-day access

Prefer the whole catalog? See all-access membership.

Ask for access

We grant free access case-by-case — students, career-switchers, builders on a tight budget. Sign in to send us a note.

Capstone projects

Submit any 1 of 5 to earn the certificate

Complete all modules, then submit the required number of capstone projects. Each must earn a passing rating from an admin reviewer.

capstoneFine-tune, evaluate, quantize, and deploy a production LLM

Pick a domain (customer support, code review, legal summarization, medical Q&A, or similar). Fine-tune a 7B model with LoRA or QLoRA using TRL. Run DPO on top with 200+ preference pairs. Quantize to int4 with AWQ. Evaluate with a custom 50-example eval set showing ≥70% win rate vs base model. Deploy behind a vLLM server with a FastAPI gateway that handles auth, rate limiting, and structured logging. Ship as a GitHub repo with README, model card (training procedure, eval results, known limitations), and a working Docker setup.

Submit capstoneMinimum rating for approval: 3/5

rlhf-pipeline-on-small-modelRLHF Pipeline on a Small Model

Further reading & study material6 sources

Paste this into any AI chat. Fill in the bracketed parts with your context — you'll get back a straight answer on whether this belongs on your plate.

Prompt

I'm considering 'Advanced AI Engineering' — a course on fine-tuning, quantization, and production deployment of large language models. 100 Python challenges: LoRA from scratch, QLoRA memory math, TRL SFT pipelines, DPO and RLHF, quantization (int8/int4/GPTQ/AWQ), vLLM/TGI serving, distributed training, custom objectives, eval suites, multi-model deployment, and safety engineering.

Context about me:
1. My current AI work: [e.g. "I call OpenAI APIs at work", "I've fine-tuned BERT for classification", "I did the LLM from Scratch course", "I run models locally with Ollama"]
2. My GPU access: [e.g. "only Colab T4", "personal 4090", "work A100 cluster", "cloud spot instances"]
3. What I want to be able to do: [e.g. "fine-tune Llama for my company's domain", "reduce serving costs by 10×", "build an AI product that doesn't depend on OpenAI", "run models on-premise for privacy"]

Answer:
- Which 2 modules give me the highest leverage in the next 3 months?
- What concrete artifact will I build that proves I can do this work?
- Is this course right for me or should I do 'Applied AI: From ML to Modern Systems' first?
- What will I NOT be able to do after this — e.g. "pre-train a model from scratch", "match OpenAI's safety team", "build GPT-4 level performance"?

Run an RLHF (or DPO) pipeline on a small open-source model (e.g., Llama 3.2 1B). Generate preference data, train a reward model (or use direct preference optimization), fine-tune the base model, and evaluate before/after on alignment + capability benchmarks.

SubmitMinimum rating for approval: 3/5

speculative-decoding-implementationSpeculative Decoding Implementation

Implement speculative decoding (draft + verify) with a small draft model and large target model. Measure throughput gain vs vanilla decoding on a 200-token generation benchmark. Identify when speculation helps and when overhead dominates.

SubmitMinimum rating for approval: 3/5

long-context-rope-extensionLong-Context Extension via RoPE Scaling

Take a 4k-context model and extend its effective context to 32k via RoPE scaling (linear or YaRN). Validate on the needle-in-haystack benchmark at 8k, 16k, 32k. Document the degradation curve and the practical context limit.

SubmitMinimum rating for approval: 3/5

production-inference-optimizationProduction Inference Optimization

Take a fine-tuned model and produce 4 deployable variants: bf16 baseline, INT8 quantized, INT4 (AWQ or GPTQ), and llama.cpp Q4_K_M. Benchmark each for latency (P50/P99), throughput (tok/s), quality (3 benchmark suites), and cost per million tokens. Pick the winner with justification.

SubmitMinimum rating for approval: 3/5

The QLoRA paper. Read before Module 2.

Advanced AI Engineering

Fine-Tuning Fundamentals

SFT at Scale — Multi-GPU Training, Data Quality, Catastrophic Forgetting

Preference Optimization — DPO, SimPO, ORPO, Reward Models

Advanced Quantization — GPTQ, AWQ, GGUF, Quantization-Aware Training

Efficient Inference — Speculative Decoding, Batching, KV Cache

Distributed Training — Data Parallelism, Tensor Parallelism, Pipeline Parallelism

Custom Training Objectives — Contrastive, Distillation, Constitutional AI

Production Eval Suites — Benchmark Selection, Regression Testing, Red-Teaming

Deployment at Scale — Multi-Model Serving, Autoscaling, Cost Optimization

Safety & Alignment Engineering