Engineering / Course

Data-Intensive Systems

100 challenges from CQRS and CDC through Kafka, distributed transactions, time-series, object storage, streaming video, multi-region deployments, Lambda/Kappa architectures, and database internals deep cuts.

Free preview

Certificate: 1 of 5 capstones

Data-Intensive Systems takes you from the first principles of event-driven architecture — CQRS, change-data capture, and the Kafka log — through the hard problems that appear at scale: distributed transactions across services, time-series data at write speeds no RDBMS can sustain, multi-region deployments with replication lag you can actually measure, and real-time analytics pipelines that must be correct under failure. The final two modules go below the query interface into PostgreSQL internals: buffer pools, WAL replay, MVCC row versions, autovacuum, B-tree splits, statistics histograms, and write amplification — the layer where production databases break in ways that `EXPLAIN` alone can't diagnose. Every challenge is a runnable program, every project has a testable correctness criterion, and the capstone requires you to design and build a production-grade data platform from scratch.

Built by Lakshya Kumar

data-engineering

kafka

cqrs

cdc

streaming

engineering

Before you start4 items

Comfortable writing concurrent programs in at least one of Go, Python, Rust, or Node.js
Familiar with SQL at the level of window functions, CTEs, and EXPLAIN output
Has run a PostgreSQL or MySQL instance in production or in a serious side project
Understands the basic Kafka producer/consumer model (partitions, offsets, consumer groups)

Is this course for you?Ask an AI

Paste this into any AI chat. Fill in the bracketed parts with your context — you'll get back a straight answer on whether this belongs on your plate.

Get access to Data-Intensive Systems

$3.99

30-day access

Prefer the whole catalog? See all-access membership.

Ask for access

We grant free access case-by-case — students, career-switchers, builders on a tight budget. Sign in to send us a note.

Capstone projects

Submit any 1 of 5 to earn the certificate

Complete all modules, then submit the required number of capstone projects. Each must earn a passing rating from an admin reviewer.

capstoneData Platform

Design and implement a production-grade data platform that ingests events from at least two sources via Kafka, stores them in a time-series store and a relational store using a CQRS pattern, exposes a read API backed by materialized views, and includes a Kappa-style analytics pipeline that computes rolling 1-minute and session-level aggregations. The platform must handle at least 10,000 events/second, survive a simulated node failure without data loss (demonstrated via WAL replay or Kafka offset replay), and produce an ops runbook covering vacuum, index maintenance, and upgrade decisions.

Submit data platformMinimum rating for approval: 3/5

exactly-once-pipelineExactly-Once Streaming Pipeline

Further reading & study material5 sources

Data-Intensive Systems

CQRS — Command Query Responsibility Segregation

CDC — Change Data Capture

Kafka — Event Streaming at Scale

Distributed Transactions — Sagas, 2PC, and the Outbox Pattern

Time-Series Databases & Metrics

Object Storage — S3, MinIO, and Blob Patterns

Streaming Media — Video, HLS, and Adaptive Bitrate

Multi-Region Deployments — Global Consistency and Regional Failover

Lambda & Kappa Architectures

Database Internals Deep Cuts