Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Most AI projects die between 'it works in my notebook' and 'users are using it'. The gap isn't technical complexity — it's the integration work: picking an API, handling errors and rate limits, structuring the prompt reliably, validating the output before showing it to users, and adding the 5 lines of logging that let you debug failures later. This task walks through building a complete, minimal AI feature end-to-end: a FastAPI endpoint that accepts a question, calls an LLM API with a structured prompt, validates the response, logs the exchange, and returns a typed JSON response.
Most AI projects collapse between prototype and production because the gap is not the model — it is the integration layer: structured prompts that constrain output format, Pydantic validation that catches malformed JSON before it reaches users, typed error handling that returns correct HTTP status codes, and request-level logging that makes failures reproducible. This endpoint is the minimal version of that integration layer that you can extend for any task.
# pip install fastapi uvicorn openai pydantic
# Requires OPENAI_API_KEY env var (or swap for any compatible API)
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
import openai, logging, json, os
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
app = FastAPI()
client = openai.OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "sk-demo"))
class ClassifyRequest(BaseModel):
text: str = Field(..., min_length=1, max_length=2000)
class ClassifyResponse(BaseModel):
label: str
confidence: str # "high" | "medium" | "low"
reason: str
SYSTEM_PROMPT = """You are a text classifier. Classify the input as one of:
finance, sports, science, politics, other.
Respond with ONLY valid JSON in this exact format:
{"label": "<category>", "confidence": "high|medium|low", "reason": "<one sentence>"}"""
@app.post("/classify", response_model=ClassifyResponse)
async def classify(req: ClassifyRequest):
try:
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": req.text},
],
temperature=0,
max_tokens=100,
)
raw = resp.choices[0].message.content.strip()
data = json.loads(raw) # parse — raises if model returned non-JSON
result = ClassifyResponse(**data) # validate fields via Pydantic
logger.info({"text": req.text[:50], "label": result.label})
return result
except (json.JSONDecodeError, KeyError) as e:
logger.error(f"LLM output parse failed: {e}, raw={raw!r}")
raise HTTPException(status_code=502, detail="Model returned invalid JSON")
except openai.RateLimitError:
raise HTTPException(status_code=429, detail="Rate limit — try again later")python3 main.pyuvicorn main:app --reload) and call it with curl: curl -X POST http://localhost:8000/classify -H 'Content-Type: application/json' -d '{"text": "The Fed raised rates"}'. Inspect the response.SYSTEM_PROMPT. Re-run the same curl. Does the error handling catch the parse failure? Check the log output.json.JSONDecodeError is raised, retry the API call once with the original prompt plus '\n\nIMPORTANT: Respond ONLY with JSON, no other text.' appended. How often does the retry succeed?import uuid; req_id = str(uuid.uuid4())[:8]. Log both the request and the response with the same req_id. This is the first step toward distributed tracing.Use these three in order. Each builds on the one before.
In one paragraph, explain why `temperature=0` is used in a classification endpoint. What would happen to the responses if you set `temperature=1.0`?
Walk me through the failure modes of an LLM-backed API endpoint: malformed JSON response, hallucinated label not in the allowed set, network timeout, rate limit, and context window exceeded. For each: what exception is raised, and what is the correct HTTP status code to return to the client?
I'm running this classify endpoint at 50K requests/day and the OpenAI bill is too high. Walk me through three cost-reduction strategies: (1) caching identical or near-identical inputs, (2) switching to a smaller model for easy cases (routing), (3) fine-tuning a tiny classifier to replace the LLM entirely for this narrow task. For each: implementation complexity, cost reduction estimate, and accuracy risk.