Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
On Day 0 your service has no users, so losing the DB feels theoretical. By the time you have 1,000 paying users, losing the DB is the company. The backup conversation must happen at Day 0 because it's a five-minute task then and a multi-week project at 100k users. Three things matter: (1) automated daily backups, (2) the backups are stored off the server they came from, and (3) you have actually restored from one. If you haven't restored, you don't have backups — you have a directory of files that might be backups.
Managed Postgres providers (RDS, Supabase, Neon, DigitalOcean, Crunchy) all do automated daily snapshots — for free or near-free — and PITR (point-in-time recovery) on the paid tier. Use it. If you're self-hosting, set up pg_dump on a cron writing to S3, with a 14-30 day rolling retention. Then schedule a quarterly restore drill: spin up an empty DB, restore the latest backup into it, run a couple of queries. The drill is the part everyone skips, and the part that turns 'we have backups' from a hope into a fact.
# /etc/cron.daily/pg-backup.sh — self-hosted Postgres
#!/bin/bash
set -euo pipefail
TS=$(date -u +%Y-%m-%dT%H-%M-%SZ)
DUMP_FILE=/tmp/pg-backup-$TS.sql.gz
# 1. Dump and compress in one pipe — never touches disk uncompressed
pg_dump "$DATABASE_URL" | gzip > "$DUMP_FILE"
# 2. Encrypt before it leaves the box
gpg --batch --yes --passphrase-file /etc/backup-pass --symmetric "$DUMP_FILE"
rm "$DUMP_FILE"
# 3. Push to off-box storage
aws s3 cp "$DUMP_FILE.gpg" "s3://my-backups/postgres/$TS.sql.gz.gpg"
# 4. Prune anything older than 30 days
aws s3 ls s3://my-backups/postgres/ \
| awk '{print $4}' \
| sort -r | tail -n +31 \
| xargs -I{} aws s3 rm "s3://my-backups/postgres/{}"
# Test the restore quarterly. The script below isn't enough — you have to actually run it.node main.jspg_dump on a cron). Note where the backups physically live — they should NOT be on the same machine as your DB.SELECT count(*) FROM users; — does the number match prod, minus the last day's growth? If it errors, your backup is broken.Use these three in order. Each builds on the one before.
Explain RPO, RTO, PITR, and the difference between a logical dump (pg_dump) and a physical backup (pg_basebackup / managed snapshot). When is each one the right choice?
Walk me through what happens during a Postgres PITR: how does WAL archiving work, what does the DB do when you say 'restore to 2pm yesterday', and where can it go wrong?
I want an RPO of 5 minutes and RTO of 15 minutes for a 200 GB Postgres on a managed provider. Cost is a real constraint. Design the backup + replica + restore-drill setup and estimate the monthly cost.