Streaming the agent — what to render and when

hard

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

An agent that takes 10 seconds with a blank screen is broken UX. Streaming makes the same wall-clock feel half as long: render the model's text as it generates, render tool_use events as they fire, render tool results when they return. The UI becomes a live transcript of the agent's reasoning. Streaming changes nothing about latency but everything about the experience.

Demo

Three event types to stream: text deltas (assemble into the final answer), tool_call events (show 'searching docs for X...'), tool_result events (show 'found 3 results'). Use Server-Sent Events (SSE) or WebSocket. Render a collapsible 'reasoning' section above the final answer. Most production AI UIs (Claude, ChatGPT, Perplexity) do this — the wins are real and the implementation is just SSE + the model's streaming API.

Try it yourself

Add streaming to your agent. SSE is the easiest transport for browser-based UIs.
Render tool calls as they happen, not just the final answer. 'Searching for X... done' makes 5s of waiting feel like 1s.
Test on real users. Most prefer streaming so strongly that going back feels broken.
Pre-warm the stream: start the first LLM call the moment the user finishes typing, not on submit click. Cuts perceived latency.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

Why does streaming an agent make it feel faster even though total latency doesn't change?

2. Why it works (the mechanism)

Walk me through SSE for agent streaming: events, payload shape, browser API, how the UI assembles them.

3. Advanced — application & what's next

Design a 'verbose mode' that shows the agent's reasoning vs 'quick mode' that only shows the final answer. What UX states and controls?