Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Before you tell anyone about your agent, it should reliably serve YOU. The dogfood checklist: deploy works, auth works, agent answers correctly on a small eval, audit logs capture traces, cost caps prevent runaway, rollback procedure tested. If you can't survive ONE user (yourself) for a week, scaling is irrelevant.
Day-0 checklist: (1) Service deployed at a public URL with HTTPS. (2) Auth gates the API. (3) Agent answers correctly on 10 representative queries. (4) Audit log captures every request + response. (5) Cost caps prevent disaster. (6) Rollback works (deliberately deploy a bad version; revert in <5 min). (7) Health endpoint + uptime monitoring. (8) One alert (high error rate) wired to YOU. (9) Daily $ cost visible somewhere you check. Pass all 9 before announcing.
Use these three in order. Each builds on the one before.
What's the 'one user works' checklist? Why does it exist?
Walk me through dogfooding: why is YOUR daily usage the most-undervalued QA signal?
Adapt the checklist for an enterprise B2B agent (single-tenant initially). What changes?
# Day-0 checklist
## Live service
- [ ] Deployed to public URL
- [ ] HTTPS enforced
- [ ] /health endpoint returns 200 when ready
## Auth + identity
- [ ] Auth provider configured
- [ ] User model + tier in DB
- [ ] Test user can sign up + auth + call API
## Agent
- [ ] Answers correctly on 10 representative queries (manual eval)
- [ ] System prompt + model pinned in version control
- [ ] Tool definitions are real (not placeholder)
## Audit + safety
- [ ] Trace logged per request
- [ ] PII handling decided (log or hash)
- [ ] Cost caps per user + global
## Operations
- [ ] Rollback procedure tested (revert a deploy in <5 min)
- [ ] One alert: error rate > 5% for 5 min
- [ ] Uptime monitor (external) configured
- [ ] Daily cost visible on a dashboard or in email
## Pre-announcement
- [ ] Used it daily for a week, no major issues
- [ ] At least 2 friends used it for 30 minutes, found no critical bugs
- [ ] You know exactly what to do if it breaks at 3amnode main.js