ASK KNOX
beta
LESSON 142

Visual QA and UI Retros: Seeing What Code Review Can't

Code review reads code. It does not see UI. A visual retro with Playwright screenshots at three breakpoints catches the rendering bugs, layout shifts, and interaction failures that no amount of code review will ever find.

10 min read·Quality Engineering Mastery

Code review is excellent at catching logic errors, structural problems, and anti-patterns. It is terrible at catching visual bugs.

This is not a criticism. It is a structural limitation. Code review reads code. UI bugs live in the rendered output — the pixels on the screen, the layout at a specific viewport width, the interaction when a user clicks a button that triggers a state change that re-renders a component that was not designed for that state.

No amount of reading code will tell you that a stat card is 4 pixels too wide at 768px.

The Mission Control Incident

We shipped a Mission Control dashboard update. Code review was clean — both human and AI reviewers approved. Tests passed. CI green. Coverage above 90%.

Then I ran a visual retro. Playwright captured screenshots at three breakpoints. Four bugs, immediately visible:

  1. Bold text not rendering — Markdown content had **bold** syntax but the renderer was not applying font-weight. The text appeared normal. Code review saw the markdown parser config and found nothing wrong. The screenshot showed plain text where bold text should be.

  2. Stat cards misaligned — At 1024px (between tablet and desktop breakpoints), stat cards in a flex container wrapped unevenly, creating a 2-1 layout instead of 3-across. The CSS was correct for 1280px and 768px. Nobody checked 1024px.

  3. Category bar disappeared on mobile — A navigation bar with category filters had display: none at a breakpoint that was meant to collapse it into a hamburger menu. The hamburger menu was never implemented. The categories simply vanished at 768px and below.

  4. Activity tab showing stale data — The activity feed component fetched data on mount but never set up a polling interval. On first load, it showed current data. After 5 minutes, it showed 5-minute-old data. After an hour, it showed hour-old data. No visual indication that the data was stale.

Four bugs. Zero detected by code review. All detected in under 5 minutes by looking at screenshots.

The Visual Retro Workflow

A visual retro is a structured screenshot review after every UI delivery. Not after every sprint. After every delivery.

Step 1: Capture

Run Playwright against the running application at three breakpoints:

const pages = [
  { path: '/dashboard', name: 'dashboard' },
  { path: '/dashboard/activity', name: 'activity' },
  { path: '/dashboard/settings', name: 'settings' },
];

for (const pg of pages) {
  for (const bp of BREAKPOINTS) {
    await page.setViewportSize({ width: bp.width, height: bp.height });
    await page.goto(pg.path);
    await page.waitForLoadState('networkidle');
    await page.screenshot({
      path: `retro/${pg.name}-${bp.name}.png`,
      fullPage: true,
    });
  }
}

This produces 9 screenshots (3 pages x 3 breakpoints). For a larger application, prioritize the pages that changed.

Step 2: Review

Open every screenshot. Not glance. Open and review with intent. You are looking for:

  • Elements that are missing or displaced
  • Text that is cut off, overlapping, or invisible
  • Spacing that is inconsistent with the design
  • Colors or styles that differ from expectations
  • Interactive elements that appear broken (buttons without hover state indicators, inputs without focus rings)
  • Data that is wrong, stale, or missing

Step 3: Log and Fix

Every issue gets logged with the screenshot as evidence. Then fix in the same session. Do not create a ticket for "fix later." Fix now, re-screenshot, confirm fixed.

The Functional Checklist

Before evaluating aesthetics, verify function. A visually perfect component that crashes on empty state is worse than an ugly one that handles every edge case.

## Functional Checklist
- [ ] Every component renders without errors (check console)
- [ ] All links navigate to correct destinations
- [ ] Tabs and navigation switches work at all breakpoints
- [ ] API calls return data and populate components
- [ ] Markdown/rich text parses and renders correctly
- [ ] Empty states display appropriate messages (not blank space or errors)
- [ ] Loading states show indicators (not frozen UI)
- [ ] Error states show user-friendly messages (not stack traces)
- [ ] Forms validate input and show validation messages
- [ ] Auth-gated routes redirect correctly when unauthenticated

Run this checklist by interacting with the application, not by reading the code. Click every link. Switch every tab. Submit an empty form. Navigate while unauthenticated. These are the interactions that surface bugs code review cannot see.

The Aesthetic Checklist

After function is verified, evaluate form:

## Aesthetic Checklist
- [ ] Visual hierarchy is clear (most important content is most prominent)
- [ ] Color coding is meaningful and consistent (not decorative)
- [ ] Information density is balanced (not too sparse, not overwhelming)
- [ ] Typography is consistent (font sizes, weights, line heights)
- [ ] Spacing is rhythmic (consistent gaps between sections)
- [ ] Alignment is precise (no elements offset by 1-2 pixels)
- [ ] Responsive transitions are smooth (no jarring layout shifts between breakpoints)
- [ ] Dark mode/light mode are both complete (no unthemed components)

The Docker Gotcha (Again)

This matters enough to repeat: if your UI runs in Docker with a build: context, restarting the container does not apply your code changes.

# This serves the OLD build:
docker compose restart frontend

# This serves the NEW build:
docker compose build frontend && docker compose up -d frontend

I have watched engineers spend 30 minutes debugging a "CSS bug" that was actually stale assets from a container that was never rebuilt. The visual retro shows the old UI. The engineer checks the code — the code is correct. They change the CSS. Re-screenshot. Same old UI. More confusion.

The fix is always the same: rebuild, then restart. Build the muscle memory.

Automating the Visual Retro

For teams and CI pipelines, the visual retro can be partially automated:

// In CI: compare against baseline screenshots
test('visual regression check', async ({ page }) => {
  await page.goto('/dashboard');
  await page.waitForLoadState('networkidle');

  // This automatically compares against stored baseline
  await expect(page).toHaveScreenshot('dashboard-desktop.png', {
    maxDiffPixelRatio: 0.01,
    threshold: 0.2,
  });
});

Automated visual regression catches unintentional changes. But it does not replace the human retro for new features — because there is no baseline to compare against. For new features, the first retro is manual. After that, the screenshots become the baseline.

The visual retro removes the bugs you cannot see in code. It is the final lens between "it compiles" and "it works." This is the InDecision principle applied to UI: decisions made without complete visual information compound into user-facing failures.

Lesson 142 Drill

  1. Run a visual retro on your current project right now. Capture screenshots at 1280px, 768px, and 375px for your three most important pages. Review each screenshot and log every issue you find.
  2. Run the functional checklist against one page of your application. Click every link, switch every tab, submit every form. Log every failure.
  3. Run the aesthetic checklist against the same page. Identify at least two spacing or alignment inconsistencies. Fix them, re-screenshot, and confirm the fix.
  4. If you use Docker: check whether your current running container reflects your latest code. Rebuild and compare. Track how often stale assets fool you.