Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Once you understand the serving problem, the real decision is what to build yourself versus adopt. Rolling your own gives maximum control and zero per-seat cost but means you reimplement batching, metrics, versioning, and GPU sharing — the exact wheels Triton already provides. Triton is the open, framework-agnostic platform you assemble and tune. NIM is the opinionated, prepackaged, optimized microservice you deploy in minutes but with less flexibility and a licensing/registry dependency. Choosing well requires honestly weighing engineering time, the heterogeneity of your model fleet, your latency targets, and your tolerance for vendor coupling. This task frames the trade-off the rest of the course equips you to make.
The demo scores the three options against the dimensions that actually decide it — engineering effort, flexibility, time-to-production, and lock-in — so the choice becomes explicit rather than vibes.
Use these three in order. Each builds on the one before.
Explain the build-vs-buy choice for model serving: rolling your own, using Triton, or using NIM.
Walk me through the dimensions (engineering cost, flexibility, time-to-production, lock-in) that distinguish roll-your-own vs. Triton vs. NIM and how they trade off.
Given my team size, model-fleet heterogeneity, latency targets, and lock-in tolerance, how should I reason about choosing among roll-your-own, Triton, and NIM?
# A decision matrix for the build-vs-buy question. Higher = better for that option.
options = {
"roll-your-own": {"control": 5, "time_to_prod": 1, "eng_cost": 1, "lock_in_freedom": 5},
"triton": {"control": 4, "time_to_prod": 3, "eng_cost": 3, "lock_in_freedom": 4},
"nim": {"control": 2, "time_to_prod": 5, "eng_cost": 5, "lock_in_freedom": 2},
}
# Weight the dimensions by what YOUR team values right now:
weights = {"control": 1, "time_to_prod": 3, "eng_cost": 2, "lock_in_freedom": 1}
def score(opt):
return sum(opt[k] * weights[k] for k in weights)
ranked = sorted(options, key=lambda o: score(options[o]), reverse=True)
for o in ranked:
print(f"{o:>14}: {score(options[o])}")
print("-> with these weights, prefer:", ranked[0])python3 main.py