Using AI for unit test writing — prompt patterns and review checklist

hard

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

Unit test writing is one of the highest-ROI applications of AI assistance in software engineering. The task is well-defined (test this function), the success criteria are clear (tests pass, mutations are killed), and the output is deterministic enough to verify. A senior SDET with good prompting skills can generate a first draft of 20 unit tests for a function in 2 minutes — then spend their time reviewing for coverage gaps and edge cases rather than typing boilerplate. But AI-generated tests have consistent blind spots: they favor happy paths, miss boundary values, under-test error handling, and often over-specify implementation details. The review checklist is what makes AI test generation useful rather than misleading.

Demo

AI-assisted unit test generation works best when the prompt gives the model a concrete role, a specific function to test, an explicit list of cases to cover, and a format to follow — vague prompts produce vague tests. Even with a good prompt, AI-generated test suites have systematic blind spots: they favor happy paths, under-test error handling, miss boundary values at numeric and string edges, and often over-specify implementation details rather than behavior. A review checklist applied after generation is what separates useful AI-generated tests from dangerous ones that pass easily while testing nothing meaningful.

# AI unit test generation prompt templates

FUNCTION_UNDER_TEST = '''
def validate_credit_card(number: str, expiry: str, cvv: str) -> dict:
    """
    Validate credit card details.
    - number: 16-digit string (Visa/Mastercard format)
    - expiry: MM/YY format, must be in the future
    - cvv: 3-digit string
    Returns: {"valid": bool, "errors": list[str]}
    """
    import re
    from datetime import datetime
    errors = []
    if not re.match(r"^\d{16}$", number):
        errors.append("Card number must be 16 digits")
    if not re.match(r"^(0[1-9]|1[0-2])/\d{2}$", expiry):
        errors.append("Expiry must be MM/YY format")
    else:
        month, year = expiry.split("/")
        exp_date = datetime(2000 + int(year), int(month), 1)
        if exp_date < datetime.now().replace(day=1):
            errors.append("Card is expired")
    if not re.match(r"^\d{3}$", cvv):
        errors.append("CVV must be 3 digits")
    return {"valid": len(errors) == 0, "errors": errors}
'''

# Prompt 1: Comprehensive test generation
GENERATE_PROMPT = f"""You are a senior SDET. Generate comprehensive pytest unit tests for:

{FUNCTION_UNDER_TEST}

Cover:
1. Happy path (all fields valid)
2. Each invalid field independently (invalid number, invalid expiry format, expired card, invalid CVV)
3. Boundary values (expiry this month, expiry last month, expiry in 12 months)
4. Multiple simultaneous errors
5. Edge cases: empty strings, None inputs, very long inputs, whitespace

For each test:
- Use descriptive name: test_[function]_[condition]_[expected_result]
- Follow AAA structure with clear comments
- Assert exact values, not just truthy/falsy

Do NOT mock datetime.now() — assume tests run in the present.
Output only the test code, no explanations."""

# Prompt 2: Review the AI output
REVIEW_CHECKLIST = """
After AI generates tests, check each item:

☐ Happy path test exists
☐ Each error case tested independently (not all in one test)
☐ Boundary values: at the limit, just inside, just outside
☐ Empty/None inputs tested (if applicable)
☐ Multiple errors tested simultaneously
☐ Error messages asserted (not just "errors is not empty")
☐ No implementation-coupled assertions (testing behavior, not internals)
☐ Test names describe behavior, not implementation
☐ Each test has one clear Act line
☐ Tests would FAIL if the corresponding condition was removed from the function
"""

print("Prompt template:")
print(GENERATE_PROMPT[:500] + "...")
print("\nReview checklist:")
print(REVIEW_CHECKLIST)

Run: python3 main.py

Try it yourself

Take the validate_credit_card function and run the GENERATE_PROMPT through an LLM. Apply the REVIEW_CHECKLIST to the output. How many items are missing? Add the missing tests manually.

Ask the LLM to generate tests for a function you wrote recently. Then run mutation testing on the generated tests. What mutation score do they achieve? What's the lowest-hanging fruit to improve?

Ask an LLM: 'What test cases for this function are you most likely to miss?' This metacognitive prompt often surfaces the edge cases the AI itself is biased toward missing (security edge cases, concurrency, timezone-related logic).

Compare AI test generation for a pure function (easy, deterministic input/output) vs a function with side effects (harder — AI often over-mocks). Which class of function does AI do better on, and how do you adjust the prompt for the harder case?

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

In one paragraph, explain what class of test cases AI test generation consistently misses and what a human SDET needs to add. Why does AI tend to favor happy-path tests?

2. Why it works (the mechanism)

Walk me through a code review of AI-generated unit tests: what specific things do you check for (assertion quality, boundary coverage, test independence, name clarity), and what would cause you to reject and re-prompt vs accept and manually enrich?

3. Advanced — application & what's next

I want to integrate AI test generation into our SDET workflow: developer writes a function → AI generates first-draft tests → SDET reviews and enriches → tests are merged with the code in the same PR. Walk me through: the prompt template standard, the review SLA (how long should enrichment take), the quality gate (minimum mutation score before merge), and how to measure whether this workflow improves test quality over 3 months.