Putting it together — a minimal AI feature from API to user

hard

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

Most AI projects die between 'it works in my notebook' and 'users are using it'. The gap isn't technical complexity — it's the integration work: picking an API, handling errors and rate limits, structuring the prompt reliably, validating the output before showing it to users, and adding the 5 lines of logging that let you debug failures later. This task walks through building a complete, minimal AI feature end-to-end: a FastAPI endpoint that accepts a question, calls an LLM API with a structured prompt, validates the response, logs the exchange, and returns a typed JSON response.

Demo

Most AI projects collapse between prototype and production because the gap is not the model — it is the integration layer: structured prompts that constrain output format, Pydantic validation that catches malformed JSON before it reaches users, typed error handling that returns correct HTTP status codes, and request-level logging that makes failures reproducible. This endpoint is the minimal version of that integration layer that you can extend for any task.

# pip install fastapi uvicorn openai pydantic
# Requires OPENAI_API_KEY env var (or swap for any compatible API)

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
import openai, logging, json, os

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

app = FastAPI()
client = openai.OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "sk-demo"))

class ClassifyRequest(BaseModel):
    text: str = Field(..., min_length=1, max_length=2000)

class ClassifyResponse(BaseModel):
    label: str
    confidence: str   # "high" | "medium" | "low"
    reason: str

SYSTEM_PROMPT = """You are a text classifier. Classify the input as one of:
finance, sports, science, politics, other.

Respond with ONLY valid JSON in this exact format:
{"label": "<category>", "confidence": "high|medium|low", "reason": "<one sentence>"}"""

@app.post("/classify", response_model=ClassifyResponse)
async def classify(req: ClassifyRequest):
    try:
        resp = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": req.text},
            ],
            temperature=0,
            max_tokens=100,
        )
        raw = resp.choices[0].message.content.strip()
        data = json.loads(raw)    # parse — raises if model returned non-JSON
        result = ClassifyResponse(**data)   # validate fields via Pydantic
        logger.info({"text": req.text[:50], "label": result.label})
        return result
    except (json.JSONDecodeError, KeyError) as e:
        logger.error(f"LLM output parse failed: {e}, raw={raw!r}")
        raise HTTPException(status_code=502, detail="Model returned invalid JSON")
    except openai.RateLimitError:
        raise HTTPException(status_code=429, detail="Rate limit — try again later")

Run: python3 main.py

Try it yourself

Run the server locally (uvicorn main:app --reload) and call it with curl: curl -X POST http://localhost:8000/classify -H 'Content-Type: application/json' -d '{"text": "The Fed raised rates"}'. Inspect the response.

Deliberately break the prompt by removing the JSON format instruction from SYSTEM_PROMPT. Re-run the same curl. Does the error handling catch the parse failure? Check the log output.

Add a retry: if json.JSONDecodeError is raised, retry the API call once with the original prompt plus '\n\nIMPORTANT: Respond ONLY with JSON, no other text.' appended. How often does the retry succeed?

Add a request-ID to every log entry: import uuid; req_id = str(uuid.uuid4())[:8]. Log both the request and the response with the same req_id. This is the first step toward distributed tracing.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

In one paragraph, explain why `temperature=0` is used in a classification endpoint. What would happen to the responses if you set `temperature=1.0`?

2. Why it works (the mechanism)

Walk me through the failure modes of an LLM-backed API endpoint: malformed JSON response, hallucinated label not in the allowed set, network timeout, rate limit, and context window exceeded. For each: what exception is raised, and what is the correct HTTP status code to return to the client?

3. Advanced — application & what's next

I'm running this classify endpoint at 50K requests/day and the OpenAI bill is too high. Walk me through three cost-reduction strategies: (1) caching identical or near-identical inputs, (2) switching to a smaller model for easy cases (routing), (3) fine-tuning a tiny classifier to replace the LLM entirely for this narrow task. For each: implementation complexity, cost reduction estimate, and accuracy risk.