Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
In enterprise serving a model is never a single thing — it's a sequence of versions, and changing the live version is a risky operation that can regress quality, latency, or cost for real users. You need to run new and old versions side by side, route a slice of traffic to the new one, compare metrics, and roll back instantly if it's worse. Without first-class versioning, every model update is a fingers-crossed full cutover. Understanding version policies (serve latest, serve specific, serve all, canary a fraction) is what lets you ship model changes the way mature teams ship code: gradually and reversibly.
The demo models a version policy and a canary split: most traffic stays on the proven version while a small fraction tries the new one, and a guard reverts if its error rate climbs.
Use these three in order. Each builds on the one before.
Why is model versioning a first-class concern in production serving, and what risks does a naive full cutover create?
Walk me through how running two model versions simultaneously with a canary traffic split lets me detect a regression before full rollout.
Design a safe model-version rollout: version policy, canary fraction, the metrics that gate promotion, and the automatic rollback condition.
import random
# Two versions loaded at once; canary 10% of traffic to the new version.
VERSIONS = {"v1": {"ready": True, "error_rate": 0.01},
"v2": {"ready": True, "error_rate": 0.03}} # v2 is currently worse!
CANARY_FRACTION = 0.10
def route(request_id):
return "v2" if random.random() < CANARY_FRACTION else "v1"
# Simulate 1000 requests and watch the canary's behavior.
hits = {"v1": 0, "v2": 0}
for i in range(1000):
hits[route(i)] += 1
print("traffic split:", hits)
# Guard: if the canary's error rate exceeds the baseline by a margin, roll back.
baseline = VERSIONS["v1"]["error_rate"]
if VERSIONS["v2"]["error_rate"] > baseline * 2:
print("ROLLBACK: v2 error rate too high; pin all traffic to v1")python3 main.py