Context Switching Costs

hard

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

Every time the OS switches from one thread to another, it pays a tax that your application feels but never sees in profiling: saving and restoring 40+ registers, flushing the TLB on some architectures, and more critically, evicting the L1/L2 cache lines the previous thread was actively working with. A context switch itself takes 1–5 µs on Linux, but the subsequent cache-miss cascade can add 100 µs of effective slowdown per switch. Multiply by 100,000 context switches per second — easily reached on a thread-per-request server under high load — and you're burning a full CPU core on overhead that produces zero user-visible work. This is why event-loop servers like nginx outperform Apache's thread-per-connection model at high concurrency even on the same hardware.

Demo

Rough numbers on a modern Linux box:

OS thread context switch: ~1–5 µs plus cache-miss penalty (often the real cost)
Virtual thread / coroutine switch: ~100 ns, no cache eviction
Process switch: ~2× OS thread switch (TLB also flushes on some CPUs)

If you're context-switching 1M times per second, you've just burned a full CPU core on overhead. This is why event-loop servers like nginx and Node.js can outperform thread-per-request servers under high concurrency.

Try it yourself

On Linux, run vmstat 1 while running a thread-heavy benchmark. Watch the cs (context switch) column climb. Record the rate at peak load.
Use perf stat -e context-switches (Linux) or sudo dtrace -n 'sched:::off-cpu { @[execname] = count(); }' (macOS) to count context switches for a simple program with 100 threads sleeping in a tight loop.
Benchmark a tight loop of 10M iterations on one goroutine vs. split across 10 goroutines. Compare wall time — the 10-goroutine version should be slower due to switch overhead on a single core.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

How expensive is a thread context switch on Linux, and what exactly happens during one?

2. Why it works (the mechanism)

Explain why L1/L2 cache-miss penalties often dominate the measured cost of a context switch, even more than the kernel overhead itself.

3. Advanced — application & what's next

Design a micro-benchmark that isolates context-switch cost from the work done between switches. How do I avoid accidentally measuring something else?