Requesting a GPU in a pod spec

medium

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

All the infrastructure exists so that a workload can ask for a GPU, and the way it asks is a single line under resources. But GPUs behave differently from CPU and memory in one critical way: they are not compressible or fractional by default — you request whole GPUs, and the scheduler hands you exclusive use of that device. Getting the request syntax right, and understanding that a request of 1 means one entire physical GPU, is the difference between a pod that schedules and one that sits Pending or, worse, silently runs on CPU. This is the most-used five lines of YAML in the entire course.

Demo

A GPU request goes under resources.limits with the key nvidia.com/gpu. The pod below asks for one GPU and runs a CUDA workload; the scheduler will only place it on a node with a free GPU.

Try it yourself

Apply the pod and confirm its logs print the CUDA sample's success line.
Set nvidia.com/gpu to a number larger than any node has and confirm the pod stays Pending with an 'Insufficient nvidia.com/gpu' event.
Remove the GPU request entirely and confirm the same image runs without GPU access (it will fail or fall back to CPU).
Schedule two single-GPU pods on a 1-GPU node and confirm the second stays Pending — GPUs are exclusive.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

In one paragraph, how does a pod request a GPU in Kubernetes, and what does requesting '1' actually grant?

2. Why it works (the mechanism)

Walk me through what the scheduler does with a pod that has nvidia.com/gpu: 1 — how it finds a node and reserves the device.

3. Advanced — application & what's next

Given that GPUs are scheduled as whole, exclusive devices, what scheduling and utilization problems does that create for small workloads, and which later techniques (MIG, time-slicing) address them?