Structured outputs — when free-form fails

hard

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

Agents often produce structured data, not prose: a JSON object, a function-call schema, a row of fields. Free-form text outputs are fragile to parse, hallucinate fields, and break downstream code. Structured outputs (Anthropic 'tool use as response', OpenAI JSON mode, Outlines/Instructor for local models) force the model to emit data matching a schema. The result is bulletproof parsing and dramatically lower failure rates.

Demo

Patterns: (1) Tool calling as response — define an output tool with your schema; force the model to call it. (2) OpenAI structured outputs with response_format: json_schema. (3) Pydantic + Instructor for local validation. The trick: a schema with explicit field types and descriptions prevents 95% of 'the model gave me a string when I wanted a number' bugs. Combine with retry-on-validation-fail for the remaining 5%.

Try it yourself

Find a place in your agent that parses LLM output text into structure. Switch to structured outputs. Watch the bug rate drop.
Define schemas with field descriptions, not just types. 'category: str' is less useful than 'category: str (one of: A, B, C)'.
Add retry-on-validation-error: if the model's output fails Pydantic validation, send it back with the error message and ask for a fix.
For complex schemas (10+ fields), break them into nested objects. Easier for the model to reason about.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

Why prefer structured outputs over text-parsing? Give one concrete bug each prevents.

2. Why it works (the mechanism)

Walk me through 'tool_choice: forced' in Anthropic: how does forcing a specific tool change the model's generation?

3. Advanced — application & what's next

Design a structured output for an agent that produces a multi-step plan (5 steps, each with 'tool_to_call', 'reasoning', 'expected_output'). What's the schema and how do you validate?