Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Every well-written test follows the same three-phase structure: Arrange (set up the state needed for the test), Act (call the code under test exactly once), Assert (verify the outcome). This pattern is universal — it works for unit tests, integration tests, API tests, and UI tests in every language and framework. When tests don't follow AAA, they become confusing: multiple assertions spread through setup code, no clear separation between what's being tested and what's setup, and failures that don't clearly indicate which assertion failed. Enforcing AAA makes tests readable by anyone, maintainable by anyone, and debuggable in 30 seconds.
Arrange-Act-Assert is the universal grammar of well-written tests: Arrange sets up the exact state the test requires, Act calls the code under test exactly once, and Assert verifies the single observable outcome. When tests skip this structure — interleaving setup with assertions, or testing multiple behaviors in one function — failures stop being self-explaining because you can no longer tell which assertion failed or which setup step is responsible. Enforcing AAA is what makes a failing test useful rather than merely red.
import pytest
from dataclasses import dataclass, field
from typing import List
from datetime import datetime
@dataclass
class Order:
items: List[dict] = field(default_factory=list)
discount_pct: float = 0.0
created_at: datetime = field(default_factory=datetime.now)
def add_item(self, name: str, price: float, qty: int = 1):
self.items.append({"name": name, "price": price, "qty": qty})
def subtotal(self) -> float:
return sum(i["price"] * i["qty"] for i in self.items)
def total(self) -> float:
return round(self.subtotal() * (1 - self.discount_pct / 100), 2)
def apply_coupon(self, code: str):
valid = {"SAVE10": 10.0, "SAVE25": 25.0}
if code not in valid:
raise ValueError(f"Invalid coupon: {code}")
self.discount_pct = valid[code]
# ✗ BAD: No AAA, mixed concerns, unclear what's being tested
def test_order_bad():
o = Order()
o.add_item("Widget", 9.99)
assert o.subtotal() == 9.99
o.add_item("Gadget", 19.99, qty=2)
assert o.subtotal() == 49.97 # wait, what are we testing?
o.apply_coupon("SAVE10")
assert o.total() == 44.97 # which assertion is the meaningful one?
# ✓ GOOD: One behavior per test, clear AAA sections
def test_subtotal_sums_all_item_prices():
# Arrange
order = Order()
order.add_item("Widget", 9.99, qty=1)
order.add_item("Gadget", 19.99, qty=2)
# Act
result = order.subtotal()
# Assert
assert result == 49.97 # 9.99 + 19.99*2 = 49.97
def test_discount_coupon_reduces_total_by_percentage():
# Arrange
order = Order()
order.add_item("Widget", 100.00)
order.apply_coupon("SAVE25")
# Act
result = order.total()
# Assert
assert result == 75.00 # 100 - 25% = 75
def test_invalid_coupon_raises_value_error():
# Arrange
order = Order()
# Act + Assert (exception testing combines these)
with pytest.raises(ValueError, match="Invalid coupon: BOGUS"):
order.apply_coupon("BOGUS")python3 main.pytest_order_bad function and split it into 3 separate tests following AAA. Give each a descriptive name that starts with 'test_[thing under test][condition][expected result]'. Run both versions and compare the failure messages when you intentionally break the subtotal calculation.test_empty_cart_total_is_zero. How many lines is your Arrange section? Sometimes Arrange is just one line (or zero lines if the initial state is enough). This is fine — AAA doesn't mean verbose.assert lines are OK if they verify one concept — e.g., checking both status code and response body of one API call.)pytest-bdd or plain comments.Use these three in order. Each builds on the one before.
In one paragraph, explain the AAA pattern: what happens in each phase, and why having exactly one Act (one call to the code under test) per test is important for diagnosing failures.
Walk me through why the 'bad' test above (`test_order_bad`) is harder to maintain: if the `subtotal()` calculation breaks, which assertion fails, and how would you know which feature is broken? Compare to the AAA versions.
I'm reviewing a PR with a test that has 15 asserts, 30 lines of setup, and tests 4 different behaviors. Walk me through how to refactor it into multiple focused tests: how to identify the 'one thing' each sub-test is testing, how to extract shared setup into a fixture, and what to name each new test function.