Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
There are three ways to get AI into your product — call a hosted API, run an open model yourself, or fine-tune/train something custom — and for 95% of product teams the right first answer is 'call the API'. Builders waste enormous effort self-hosting or fine-tuning before they've proven the feature works at all, when a hosted frontier model would have validated the idea in an afternoon. The decision hinges on volume, latency, privacy, and how differentiated the model needs to be. Knowing where each option pays off keeps you from over-engineering the plumbing before you've earned the right to optimize it.
A decision routine over the few variables that actually matter — volume, data sensitivity, latency floor, and whether a generic model is good enough — that lands on API, self-host, or fine-tune.
Use these three in order. Each builds on the one before.
In one paragraph, explain the difference between using a hosted AI API, self-hosting a model, and fine-tuning your own.
Walk me through how to decide between calling an API and self-hosting an open model for a product feature.
At what point does self-hosting or fine-tuning actually beat a hosted API for a product team, and how would I know I've reached it?
def build_or_buy(monthly_calls, data_is_sensitive, needs_offline, generic_model_good_enough):
if generic_model_good_enough and monthly_calls < 5_000_000 and not needs_offline:
if data_is_sensitive:
return "Hosted API with a zero-retention / enterprise tier"
return "Hosted API — fastest to validate, no infra"
if needs_offline or (data_is_sensitive and monthly_calls > 5_000_000):
return "Self-host an open model (vLLM/Ollama) — control + privacy at scale"
return "Consider fine-tuning ONLY after the API version proves the feature"
print(build_or_buy(50_000, False, False, True))
print(build_or_buy(20_000_000, True, True, True))python3 main.py