Capstok — learn by doing

Why this matters

Before you tell anyone about your agent, it should reliably serve YOU. The dogfood checklist: deploy works, auth works, agent answers correctly on a small eval, audit logs capture traces, cost caps prevent runaway, rollback procedure tested. If you can't survive ONE user (yourself) for a week, scaling is irrelevant.

Demo

Day-0 checklist: (1) Service deployed at a public URL with HTTPS. (2) Auth gates the API. (3) Agent answers correctly on 10 representative queries. (4) Audit log captures every request + response. (5) Cost caps prevent disaster. (6) Rollback works (deliberately deploy a bad version; revert in <5 min). (7) Health endpoint + uptime monitoring. (8) One alert (high error rate) wired to YOU. (9) Daily $ cost visible somewhere you check. Pass all 9 before announcing.

Try it yourself

Go through the 9-item checklist. Fail any? Fix before announcing.
Use your own agent daily for a week. The bugs you find here are the bugs your users won't find later.
Have a friend use it for 30 minutes. The 'fresh eyes' bugs matter.
Don't share publicly until all 9 items pass. The first impression matters.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

What's the 'one user works' checklist? Why does it exist?

2. Why it works (the mechanism)

Walk me through dogfooding: why is YOUR daily usage the most-undervalued QA signal?

3. Advanced — application & what's next

Adapt the checklist for an enterprise B2B agent (single-tenant initially). What changes?

References

Chat about this lesson

# Day-0 checklist

## Live service
- [ ] Deployed to public URL
- [ ] HTTPS enforced
- [ ] /health endpoint returns 200 when ready

## Auth + identity
- [ ] Auth provider configured
- [ ] User model + tier in DB
- [ ] Test user can sign up + auth + call API

## Agent
- [ ] Answers correctly on 10 representative queries (manual eval)
- [ ] System prompt + model pinned in version control
- [ ] Tool definitions are real (not placeholder)

## Audit + safety
- [ ] Trace logged per request
- [ ] PII handling decided (log or hash)
- [ ] Cost caps per user + global

## Operations
- [ ] Rollback procedure tested (revert a deploy in <5 min)
- [ ] One alert: error rate > 5% for 5 min
- [ ] Uptime monitor (external) configured
- [ ] Daily cost visible on a dashboard or in email

## Pre-announcement
- [ ] Used it daily for a week, no major issues
- [ ] At least 2 friends used it for 30 minutes, found no critical bugs
- [ ] You know exactly what to do if it breaks at 3am

Run: node main.js

The 'one user works' checklist