Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
The bot will make thousands of decisions before you know whether it's helping or hurting. Without a log, you'll form your judgment from three or four salient memories — probably the two times it embarrassed you and the one time it saved you an hour — and your policy will drift on vibes. Reading the meter means designing the log FIRST, before any decision code, so that every action the bot takes writes a structured line: {timestamp, action_type, rung, stakes, outcome, human_override?, tokens_spent, latency_ms}. Six weeks in, this log answers questions your intuition can't: what fraction of AUTO actions get later overridden (should have been NOTIFY); which action types have the worst outcome distribution (should be paused); how much you're actually spending per action (is the ROI real). Every module in this course will emit at least one log line per action, and Module 10 is about actually reading them.
A minimum-viable log line + a query that turns 5,000 log lines into an actionable decision. Structured, append-only, cheap.
notes field for free-text observations and use it whenever you override an action.Use these three in order. Each builds on the one before.
In one paragraph, explain why a structured action log matters more than the bot's decision code.
Walk me through how a 5,000-row action log answers questions that my gut memory can't — availability bias, salience, and log-scale outcome distributions.
My bot has been running for 4 weeks. Design me 3 specific SQL queries against the log that would tell me: (1) which actions to promote to AUTO, (2) which to demote to ASK, (3) what my per-week bot budget should be.
import json, os, time
from dataclasses import dataclass, asdict
from typing import Optional
LOG_PATH = "bot_actions.jsonl"
@dataclass
class ActionLog:
ts: float
action_type: str
rung: str
stakes: str
outcome: str
human_override: Optional[str] # "approved" | "edited" | "canceled" | None
tokens_in: int
tokens_out: int
latency_ms: int
cost_usd: float
def log(entry: ActionLog):
with open(LOG_PATH, "a") as f:
f.write(json.dumps(asdict(entry)) + "\n")
def read_log():
if not os.path.exists(LOG_PATH):
return []
with open(LOG_PATH) as f:
return [json.loads(line) for line in f]
# Simulate a few actions
for i in range(10):
log(ActionLog(
ts=time.time() + i,
action_type="draft_reply",
rung="NOTIFY",
stakes="medium",
outcome="clean" if i != 3 else "user_override",
human_override="edited" if i == 3 else None,
tokens_in=1200, tokens_out=250,
latency_ms=1800,
cost_usd=0.0042,
))
# The one query that matters: what's the human-override rate per rung?
rows = read_log()
from collections import Counter, defaultdict
overrides = defaultdict(lambda: {"total": 0, "overridden": 0})
for r in rows:
key = (r["action_type"], r["rung"])
overrides[key]["total"] += 1
if r["human_override"] is not None:
overrides[key]["overridden"] += 1
for (a, rung), s in sorted(overrides.items()):
pct = 100 * s["overridden"] / s["total"] if s["total"] else 0
print(f"{a:20s} rung={rung:6s} overrides={pct:4.1f}% ({s['overridden']}/{s['total']})")
# Rule of thumb: if AUTO gets >5% overridden, downgrade to NOTIFY.
python3 main.py