Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Unit test writing is one of the highest-ROI applications of AI assistance in software engineering. The task is well-defined (test this function), the success criteria are clear (tests pass, mutations are killed), and the output is deterministic enough to verify. A senior SDET with good prompting skills can generate a first draft of 20 unit tests for a function in 2 minutes — then spend their time reviewing for coverage gaps and edge cases rather than typing boilerplate. But AI-generated tests have consistent blind spots: they favor happy paths, miss boundary values, under-test error handling, and often over-specify implementation details. The review checklist is what makes AI test generation useful rather than misleading.
AI-assisted unit test generation works best when the prompt gives the model a concrete role, a specific function to test, an explicit list of cases to cover, and a format to follow — vague prompts produce vague tests. Even with a good prompt, AI-generated test suites have systematic blind spots: they favor happy paths, under-test error handling, miss boundary values at numeric and string edges, and often over-specify implementation details rather than behavior. A review checklist applied after generation is what separates useful AI-generated tests from dangerous ones that pass easily while testing nothing meaningful.
# AI unit test generation prompt templates
FUNCTION_UNDER_TEST = '''
def validate_credit_card(number: str, expiry: str, cvv: str) -> dict:
"""
Validate credit card details.
- number: 16-digit string (Visa/Mastercard format)
- expiry: MM/YY format, must be in the future
- cvv: 3-digit string
Returns: {"valid": bool, "errors": list[str]}
"""
import re
from datetime import datetime
errors = []
if not re.match(r"^\d{16}$", number):
errors.append("Card number must be 16 digits")
if not re.match(r"^(0[1-9]|1[0-2])/\d{2}$", expiry):
errors.append("Expiry must be MM/YY format")
else:
month, year = expiry.split("/")
exp_date = datetime(2000 + int(year), int(month), 1)
if exp_date < datetime.now().replace(day=1):
errors.append("Card is expired")
if not re.match(r"^\d{3}$", cvv):
errors.append("CVV must be 3 digits")
return {"valid": len(errors) == 0, "errors": errors}
'''
# Prompt 1: Comprehensive test generation
GENERATE_PROMPT = f"""You are a senior SDET. Generate comprehensive pytest unit tests for:
{FUNCTION_UNDER_TEST}
Cover:
1. Happy path (all fields valid)
2. Each invalid field independently (invalid number, invalid expiry format, expired card, invalid CVV)
3. Boundary values (expiry this month, expiry last month, expiry in 12 months)
4. Multiple simultaneous errors
5. Edge cases: empty strings, None inputs, very long inputs, whitespace
For each test:
- Use descriptive name: test_[function]_[condition]_[expected_result]
- Follow AAA structure with clear comments
- Assert exact values, not just truthy/falsy
Do NOT mock datetime.now() — assume tests run in the present.
Output only the test code, no explanations."""
# Prompt 2: Review the AI output
REVIEW_CHECKLIST = """
After AI generates tests, check each item:
☐ Happy path test exists
☐ Each error case tested independently (not all in one test)
☐ Boundary values: at the limit, just inside, just outside
☐ Empty/None inputs tested (if applicable)
☐ Multiple errors tested simultaneously
☐ Error messages asserted (not just "errors is not empty")
☐ No implementation-coupled assertions (testing behavior, not internals)
☐ Test names describe behavior, not implementation
☐ Each test has one clear Act line
☐ Tests would FAIL if the corresponding condition was removed from the function
"""
print("Prompt template:")
print(GENERATE_PROMPT[:500] + "...")
print("\nReview checklist:")
print(REVIEW_CHECKLIST)python3 main.pyvalidate_credit_card function and run the GENERATE_PROMPT through an LLM. Apply the REVIEW_CHECKLIST to the output. How many items are missing? Add the missing tests manually.Use these three in order. Each builds on the one before.
In one paragraph, explain what class of test cases AI test generation consistently misses and what a human SDET needs to add. Why does AI tend to favor happy-path tests?
Walk me through a code review of AI-generated unit tests: what specific things do you check for (assertion quality, boundary coverage, test independence, name clarity), and what would cause you to reject and re-prompt vs accept and manually enrich?
I want to integrate AI test generation into our SDET workflow: developer writes a function → AI generates first-draft tests → SDET reviews and enriches → tests are merged with the code in the same PR. Walk me through: the prompt template standard, the review SLA (how long should enrichment take), the quality gate (minimum mutation score before merge), and how to measure whether this workflow improves test quality over 3 months.