Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Every action the bot takes on your behalf either earns trust or spends it. A well-drafted reply that lands cleanly earns a small amount. A weird auto-send that confuses a friend spends a large amount. A wrongly deleted email spends a catastrophic amount. Trust is not a linear meter — it's asymmetric, path-dependent, and easily zeroed out. This means the design goal is not 'maximize helpful actions' but 'never take an action whose worst-case trust cost exceeds its expected value'. In practice: aggressive automations on low-stakes, high-frequency tasks (delete newsletter spam, categorize routine mail) earn small amounts of trust steadily; automations on high-stakes tasks (auto-reply to your investor) must be conservative or gated because a single misfire zeroes months of earned trust. The rest of the course is about building the second kind while never letting it destroy the first.
A trust-ledger primitive: every bot action gets a signed trust-delta based on outcome. Watch how a single high-stakes misfire wipes out weeks of steady wins.
Use these three in order. Each builds on the one before.
In one paragraph, explain why trust with an automated agent is asymmetric — small wins earn slowly, one bad event costs a lot.
Walk me through the mechanism: why can 120 clean automations be undone by one bad send, in terms of memory salience, storytelling, and stakeholder response?
Design a policy: below what trust threshold should the bot enter 'quiet mode' (draft-only, no auto-sends) until the human resets it?
from datetime import datetime, timedelta
from dataclasses import dataclass, field
@dataclass
class TrustEvent:
when: datetime
action: str
stakes: str # "low" | "medium" | "high" | "catastrophic"
outcome: str # "clean" | "minor_fix" | "user_override" | "burned"
delta: float = 0
def score(event: TrustEvent) -> float:
matrix = {
("low", "clean"): +0.5,
("low", "minor_fix"): +0.1,
("low", "user_override"): -0.5,
("medium", "clean"): +2.0,
("medium", "minor_fix"): +0.5,
("medium", "user_override"): -3.0,
("high", "clean"): +5.0,
("high", "minor_fix"): -1.0,
("high", "user_override"): -10.0,
("high", "burned"): -50.0,
("catastrophic", "burned"): -200.0,
}
return matrix.get((event.stakes, event.outcome), 0)
# Simulate a month of a good bot
events = []
for _ in range(120): # 120 low-stakes wins
events.append(TrustEvent(datetime.now(), "delete_newsletter", "low", "clean"))
for _ in range(40): # 40 medium wins
events.append(TrustEvent(datetime.now(), "draft_reply", "medium", "clean"))
for _ in range(5): # a couple minor edits
events.append(TrustEvent(datetime.now(), "draft_reply", "medium", "minor_fix"))
trust = sum(score(e) for e in events)
print(f"After 165 clean/minor-fix events: trust = {trust:+.1f}")
# One high-stakes misfire
events.append(TrustEvent(datetime.now(), "auto_reply_investor", "high", "burned"))
trust = sum(score(e) for e in events)
print(f"After one high-stakes misfire: trust = {trust:+.1f}")
print("\nDesign implication: high-stakes actions MUST be gated by grace window + escalation.")
python3 main.py