The total cost of owning a serving stack

hard

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

The decision to run your own serving stack is ultimately economic, and the cost is more than GPU rental. It includes the engineering time to build and maintain the stack, the on-call burden, the idle capacity you pay for to meet peak SLAs, and the opportunity cost of not shipping product features. A hosted API has a clear per-token price; a self-hosted stack trades that for fixed infrastructure plus human cost that only pays off at scale. Being able to model total cost of ownership honestly — including the parts that don't show up on the GPU invoice — is what makes the build-vs-buy decision defensible to leadership and keeps you from self-hosting a workload that a hosted API would serve more cheaply.

Demo

The demo builds a simple TCO model comparing a hosted API's per-token cost against a self-hosted stack's GPU plus amortized engineering cost, revealing the break-even volume.

Try it yourself

Find the monthly token volume where self-hosting becomes cheaper than the hosted API for your numbers.
Add the on-call and idle-capacity cost explicitly and watch the break-even volume rise.
Reason about how low GPU utilization (paying for idle capacity) hurts the self-hosted case.
Decide, for your real volume and team size, whether self-hosting is justified yet.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

What goes into the total cost of owning a self-hosted serving stack beyond GPU rental?

2. Why it works (the mechanism)

Walk me through a TCO model comparing a hosted API's per-token price to a self-hosted stack's GPU plus amortized engineering and on-call cost.

3. Advanced — application & what's next

Given my token volume, GPU costs, team size, and utilization, compute the break-even point where self-hosting beats a hosted API, and name the hidden costs that most often flip the decision.