Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
A flaky test is a test that passes sometimes and fails other times for the same code — without any code changes. Flaky tests are toxic: they erode trust in the test suite, cause developers to ignore failures ('it's probably just flaky'), and hide real bugs. Google's research found that 1.5% of their tests are flaky, and that each flaky test takes 16x more engineering time to deal with than a passing or reliably-failing test. The causes are almost always the same: time dependencies (explicit sleeps, missing awaits), shared state between tests, network/timing races, random test data, and environment-specific behavior.
Flaky tests are more dangerous than reliably-failing tests because they train the team to ignore failures, masking real regressions in the noise. Google's research found that each flaky test costs 16 times more engineering time to handle than a deterministic one. The root causes — explicit sleeps, missing awaits, shared state between tests, non-deterministic data, and environment-specific timing — are each fixable once identified, and recognizing the pattern is the first step.
import { test, expect, Page } from '@playwright/test';
// ── CAUSE 1: Explicit sleeps instead of waiting for conditions ──────────────
// ✗ FLAKY: Sleep assumes loading takes < 2s — fails when it takes 2.1s
// await page.waitForTimeout(2000);
// await expect(page.locator('.data-table')).toHaveCount(10);
// ✓ FIX: Wait for the condition itself
async function waitForTableLoaded(page: Page) {
await expect(page.locator('.data-table tr')).toHaveCount(10, { timeout: 10000 });
}
// ── CAUSE 2: Missing await on assertions ────────────────────────────────────
// ✗ FLAKY: Assertion resolves synchronously before data loads
// expect(page.locator('.result')).toHaveText('Success'); // missing await
// ✓ FIX: Always await Playwright assertions
// await expect(page.locator('.result')).toHaveText('Success');
// ── CAUSE 3: Tests sharing state (order-dependent) ─────────────────────────
// ✗ FLAKY: Test B relies on data created by Test A
// test('B: delete user created in test A', ...) — breaks if A runs later
// ✓ FIX: Each test creates its own data (test isolation)
test.beforeEach(async ({ page }) => {
// Create a unique user for this test only
const email = `test-${Date.now()}@example.com`;
await page.goto('/admin/create-user');
await page.getByLabel('Email').fill(email);
await page.getByRole('button', { name: 'Create' }).click();
// Store in test context
await page.evaluate((e) => window.__testUser = e, email);
});
// ── CAUSE 4: Non-deterministic test data ────────────────────────────────────
// ✗ FLAKY: Random data occasionally triggers a validation rule
// const amount = Math.random() * 1000;
// ✓ FIX: Use deterministic test data
const TEST_AMOUNTS = { valid: 50.00, belowMin: 0.99, aboveMax: 10001.00 };
// ── CAUSE 5: Hardcoded timeouts based on network speed ─────────────────────
// ✗ FLAKY in CI (slower network): waitForResponse with short timeout
// await page.waitForResponse('/api/data', { timeout: 1000 });
// ✓ FIX: Use environment-aware timeouts or wait for UI state
// const timeout = process.env.CI ? 15000 : 5000;
console.log('Flaky test patterns documented — run with --repeat-each=5 to detect flakiness');node main.js--repeat-each=10. Does it always pass? If it fails even once in 10 runs, it's flaky. This is the simplest flakiness detector.waitForTimeout. Replace it with waitForSelector or an expect assertion. Run both versions 5 times each. Does the timeout-based version fail more often?test.skip('flaky - see BUG-123'). Write a comment explaining the root cause and the fix needed. This is more professional than deleting the test or commenting it out.Use these three in order. Each builds on the one before.
In one paragraph, explain what a flaky test is and why it's more damaging to a test suite than a consistently-failing test. What does 'the wolf who cried wolf' have to do with flaky tests?
Walk me through the 5 most common root causes of flaky Playwright tests (explicit sleeps, missing awaits, shared state, non-deterministic data, environment-specific timing) and for each: a specific code pattern that causes it and a specific fix.
My team's CI test suite has a 15% failure rate on every run, but only 3% of the time does the same test fail twice in a row — classic flakiness. We've accepted it and just 'retry on failure'. Walk me through why this is dangerous (what real bugs it hides), and a structured 4-week program to eliminate flakiness from a 300-test suite.