E2E Testing: Playwright as Your Development Eyes

Let me tell you about the most expensive mock in our codebase.

We had a trading engine that fetched market data from an API with pagination. The tests mocked the API client and returned canned responses. Coverage was above 90%. Every test passed. CI was green.

In production, the system silently fetched page 1 of results forever. Every page request returned the same data. The pagination was broken. Positions were being evaluated against stale data.

The root cause: urljoin(base, path) drops the base path when path starts with /. Our mocked tests never constructed real URLs — they returned responses regardless of what URL was called. The mock could not lie about something it never evaluated.

The Case for E2E

End-to-end tests run your actual code against actual dependencies. No mocks. No stubs. No canned responses.

This is expensive. E2E tests are slower, flakier, harder to maintain. But they catch an entire class of bugs that unit tests structurally cannot:

API response shape changes
URL construction errors
Authentication flow failures
State file corruption
Timing-dependent race conditions
Visual rendering bugs

The urljoin bug would have been caught by a single E2E test that hit the real API and verified the second page contained different data than the first page. One test. The mock-based suite had 47 tests covering pagination. All passed. All lied.

Playwright as Development Eyes

Most teams use Playwright for post-deployment smoke tests. That is like using a smoke detector to decide whether to build the house out of wood.

Playwright is most powerful during development. Not after. During.

The workflow:

1. Auth-Navigate-Screenshot-Review

// playwright/visual-retro.spec.ts
import { test } from '@playwright/test';

const BREAKPOINTS = [
  { name: 'desktop', width: 1280, height: 800 },
  { name: 'tablet', width: 768, height: 1024 },
  { name: 'mobile', width: 375, height: 812 },
];

test.describe('Visual Retro', () => {
  for (const bp of BREAKPOINTS) {
    test(`dashboard renders at ${bp.name}`, async ({ page }) => {
      await page.setViewportSize({ width: bp.width, height: bp.height });

      // Auth if needed
      await page.goto('/login');
      await page.fill('#token', process.env.AUTH_TOKEN!);
      await page.click('button[type="submit"]');

      // Navigate to target
      await page.goto('/dashboard');
      await page.waitForSelector('[data-testid="dashboard-loaded"]');

      // Screenshot
      await page.screenshot({
        path: `screenshots/dashboard-${bp.name}.png`,
        fullPage: true,
      });
    });
  }
});

Run this after every significant UI change. Not at the end of the sprint. After every change.

2. Three Breakpoints, Every Time

Desktop (1280px), tablet (768px), mobile (375px). These are the minimum. Every screenshot suite captures all three because responsive bugs hide at the transitions.

We caught a category navigation bar that disappeared entirely at 768px. The CSS had a display: none at the wrong breakpoint. Code review — human and AI — read the CSS and saw nothing wrong. The screenshot showed an empty space where navigation should be.

3. Screenshot Diffs, Not Just Screenshots

Playwright supports visual comparison out of the box:

await expect(page).toHaveScreenshot('dashboard-desktop.png', {
  maxDiffPixelRatio: 0.01,  // 1% tolerance for anti-aliasing
});

The first run generates baseline images. Every subsequent run compares against the baseline. If more than 1% of pixels differ, the test fails with a visual diff showing exactly what changed.

This catches:

Font size regressions
Spacing changes from dependency updates
Color shifts from CSS variable overrides
Layout breaks from new content
Missing icons or images

E2E Against Live APIs

For backend systems — trading engines, data pipelines, content generators — Playwright is not the right tool. But the principle is the same: test against real dependencies.

# tests/e2e/test_pagination_live.py
import pytest
from app.client import MarketClient

@pytest.mark.e2e
def test_pagination_returns_different_pages():
    """The urljoin bug: verify page 2 is not page 1."""
    client = MarketClient()

    page_1 = client.fetch_markets(page=1, limit=10)
    page_2 = client.fetch_markets(page=2, limit=10)

    # Pages must contain data
    assert len(page_1) > 0
    assert len(page_2) > 0

    # Pages must be different
    page_1_ids = {m['id'] for m in page_1}
    page_2_ids = {m['id'] for m in page_2}
    assert page_1_ids != page_2_ids, "Page 2 returned same data as page 1"

Mark E2E tests separately from unit tests. Run unit tests on every commit. Run E2E tests on every PR merge and on a schedule (daily or per-deploy). This gives you the speed of unit tests during development and the accuracy of E2E tests before shipping.

# Unit tests: fast, every commit
pytest tests/unit/ -x --timeout=30

# E2E tests: slower, every PR
pytest tests/e2e/ -x --timeout=120 -m e2e

The Docker Gotcha

One E2E lesson that cost us a full debugging session: if your service runs in Docker with a build: context (not a pre-built image), docker compose restart serves stale assets.

The container restarts, but it uses the old build. Your code changes are not reflected. Your E2E tests pass against the old code. You think the bug is fixed. It is not.

The fix:

# Wrong: serves stale assets
docker compose restart my-service

# Right: rebuilds and deploys
docker compose build my-service && docker compose up -d my-service

This is not a Playwright bug. It is an infrastructure bug. But E2E tests that run against Docker services hit it constantly, and the symptom — "my fix is not working" — wastes hours before you realize the container is running old code.

Mocks are public relations. E2E tests are Nature. Build your quality system on what cannot be fooled. This is the Rewired Minds principle in practice: rewire your testing assumptions before they rewire your production stability.

Lesson 140 Drill

Set up a Playwright screenshot suite for one page of your application. Capture all three breakpoints. Run it, review the screenshots, and identify at least one visual issue you did not know existed.
Find one mocked test in your codebase that tests an API integration. Write a corresponding E2E test that hits the real API. Compare what each test actually validates.
If you run Docker services, verify right now: is your dev environment running the latest build, or stale assets? Run docker compose build and check if anything changes. Track how long ago you last rebuilt.