Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
An agent that takes 10 seconds with a blank screen is broken UX. Streaming makes the same wall-clock feel half as long: render the model's text as it generates, render tool_use events as they fire, render tool results when they return. The UI becomes a live transcript of the agent's reasoning. Streaming changes nothing about latency but everything about the experience.
Three event types to stream: text deltas (assemble into the final answer), tool_call events (show 'searching docs for X...'), tool_result events (show 'found 3 results'). Use Server-Sent Events (SSE) or WebSocket. Render a collapsible 'reasoning' section above the final answer. Most production AI UIs (Claude, ChatGPT, Perplexity) do this — the wins are real and the implementation is just SSE + the model's streaming API.
Use these three in order. Each builds on the one before.
Why does streaming an agent make it feel faster even though total latency doesn't change?
Walk me through SSE for agent streaming: events, payload shape, browser API, how the UI assembles them.
Design a 'verbose mode' that shows the agent's reasoning vs 'quick mode' that only shows the final answer. What UX states and controls?
# Anthropic streaming in agent loop
def stream_agent(user_msg, on_event):
messages = [{"role": "user", "content": user_msg}]
for _ in range(6):
with client.messages.stream(
model="claude-sonnet-4-6", max_tokens=1500,
tools=TOOLS, messages=messages,
) as stream:
for event in stream:
if event.type == "content_block_start":
if event.content_block.type == "tool_use":
on_event({"type": "tool_start", "name": event.content_block.name})
elif event.type == "content_block_delta":
if hasattr(event.delta, "text"):
on_event({"type": "text_delta", "delta": event.delta.text})
elif hasattr(event.delta, "partial_json"):
on_event({"type": "tool_arg_delta", "delta": event.delta.partial_json})
elif event.type == "message_stop":
final = stream.get_final_message()
messages.append({"role": "assistant", "content": final.content})
if final.stop_reason == "end_turn":
return
# execute tools (also potentially with on_event for tool_result)
tool_results = []
for block in final.content:
if block.type == "tool_use":
on_event({"type": "tool_running", "name": block.name, "input": block.input})
result = dispatch(block.name, block.input)
on_event({"type": "tool_done", "name": block.name, "preview": str(result)[:200]})
tool_results.append({"type": "tool_result", "tool_use_id": block.id, "content": str(result)})
messages.append({"role": "user", "content": tool_results})python3 main.py