Flaky tests — root causes, detection, and elimination

hard

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

A flaky test is a test that passes sometimes and fails other times for the same code — without any code changes. Flaky tests are toxic: they erode trust in the test suite, cause developers to ignore failures ('it's probably just flaky'), and hide real bugs. Google's research found that 1.5% of their tests are flaky, and that each flaky test takes 16x more engineering time to deal with than a passing or reliably-failing test. The causes are almost always the same: time dependencies (explicit sleeps, missing awaits), shared state between tests, network/timing races, random test data, and environment-specific behavior.

Demo

Flaky tests are more dangerous than reliably-failing tests because they train the team to ignore failures, masking real regressions in the noise. Google's research found that each flaky test costs 16 times more engineering time to handle than a deterministic one. The root causes — explicit sleeps, missing awaits, shared state between tests, non-deterministic data, and environment-specific timing — are each fixable once identified, and recognizing the pattern is the first step.

import { test, expect, Page } from '@playwright/test';

// ── CAUSE 1: Explicit sleeps instead of waiting for conditions ──────────────
// ✗ FLAKY: Sleep assumes loading takes < 2s — fails when it takes 2.1s
// await page.waitForTimeout(2000);
// await expect(page.locator('.data-table')).toHaveCount(10);

// ✓ FIX: Wait for the condition itself
async function waitForTableLoaded(page: Page) {
  await expect(page.locator('.data-table tr')).toHaveCount(10, { timeout: 10000 });
}

// ── CAUSE 2: Missing await on assertions ────────────────────────────────────
// ✗ FLAKY: Assertion resolves synchronously before data loads
// expect(page.locator('.result')).toHaveText('Success');  // missing await

// ✓ FIX: Always await Playwright assertions
// await expect(page.locator('.result')).toHaveText('Success');

// ── CAUSE 3: Tests sharing state (order-dependent) ─────────────────────────
// ✗ FLAKY: Test B relies on data created by Test A
// test('B: delete user created in test A', ...) — breaks if A runs later

// ✓ FIX: Each test creates its own data (test isolation)
test.beforeEach(async ({ page }) => {
  // Create a unique user for this test only
  const email = `test-${Date.now()}@example.com`;
  await page.goto('/admin/create-user');
  await page.getByLabel('Email').fill(email);
  await page.getByRole('button', { name: 'Create' }).click();
  // Store in test context
  await page.evaluate((e) => window.__testUser = e, email);
});

// ── CAUSE 4: Non-deterministic test data ────────────────────────────────────
// ✗ FLAKY: Random data occasionally triggers a validation rule
// const amount = Math.random() * 1000;

// ✓ FIX: Use deterministic test data
const TEST_AMOUNTS = { valid: 50.00, belowMin: 0.99, aboveMax: 10001.00 };

// ── CAUSE 5: Hardcoded timeouts based on network speed ─────────────────────
// ✗ FLAKY in CI (slower network): waitForResponse with short timeout
// await page.waitForResponse('/api/data', { timeout: 1000 });

// ✓ FIX: Use environment-aware timeouts or wait for UI state
// const timeout = process.env.CI ? 15000 : 5000;
console.log('Flaky test patterns documented — run with --repeat-each=5 to detect flakiness');

Run: node main.js

Try it yourself

Run a Playwright test with --repeat-each=10. Does it always pass? If it fails even once in 10 runs, it's flaky. This is the simplest flakiness detector.

Find a test in your suite that uses waitForTimeout. Replace it with waitForSelector or an expect assertion. Run both versions 5 times each. Does the timeout-based version fail more often?

Add a flakiness quarantine: mark a known-flaky test with test.skip('flaky - see BUG-123'). Write a comment explaining the root cause and the fix needed. This is more professional than deleting the test or commenting it out.

Research Google's 'Test Hermetic Principle': a test should be independent of external systems and other tests. List the external dependencies in your current test suite (real API, real database, real time, random data). For each, describe how you'd make the test hermetic.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

In one paragraph, explain what a flaky test is and why it's more damaging to a test suite than a consistently-failing test. What does 'the wolf who cried wolf' have to do with flaky tests?

2. Why it works (the mechanism)

Walk me through the 5 most common root causes of flaky Playwright tests (explicit sleeps, missing awaits, shared state, non-deterministic data, environment-specific timing) and for each: a specific code pattern that causes it and a specific fix.

3. Advanced — application & what's next

My team's CI test suite has a 15% failure rate on every run, but only 3% of the time does the same test fail twice in a row — classic flakiness. We've accepted it and just 'retry on failure'. Walk me through why this is dangerous (what real bugs it hides), and a structured 4-week program to eliminate flakiness from a 300-test suite.