The Delivery Checklist: From Merged PR to Running System

You have written the test strategy. You have run E2E tests against live APIs. You have audited the code with a multi-agent swarm. You have run a visual retro at three breakpoints.

The PR is merged. CI is green. The team moves on.

And then the system breaks in production because nobody verified that the process actually starts.

The Full Quality Gate

This is the consolidated checklist from this entire track. Every item exists because we shipped without it at least once and paid the price.

## Quality Gate — Complete Checklist

### Code Quality
- [ ] 90% coverage floor met (pytest+cov / vitest+coverage-v8)
- [ ] CI green in clean environment (not just local)
- [ ] No new lint warnings or errors
- [ ] Regression test exists for every bug fix in this PR

### E2E Validation
- [ ] E2E tests pass against real APIs (not just mocks)
- [ ] Process starts and stays running for > 5 minutes
- [ ] Logs are clean (no unexpected errors or warnings)
- [ ] Data flows through the full pipeline (input → processing → output)
- [ ] State files are created/updated correctly

### Visual QA (UI deliveries)
- [ ] Playwright screenshots captured at 1280px, 768px, 375px
- [ ] Functional checklist passed (renders, links, tabs, API calls, empty states)
- [ ] Aesthetic checklist passed (hierarchy, color, density, alignment)
- [ ] Visual diff shows only intentional changes

### External Dependencies
- [ ] All external prerequisites confirmed (API keys, portal settings, permissions)
- [ ] Third-party API contracts verified (response shapes match expectations)
- [ ] Rate limits understood and respected

### Operational
- [ ] Process monitored (watchdog, health check, or equivalent)
- [ ] Alerting configured for failure signatures
- [ ] Rollback plan documented (or deployment is reversible by default)

Not every delivery requires every item. A documentation update does not need Playwright screenshots. A backend-only change does not need visual QA. But the items that apply must all pass. No exceptions for "we will fix it after deploy."

Regression Tests for Every Bug

This rule is non-negotiable: every bug fix ships with a regression test. The test must fail before the fix and pass after.

Why? Because bugs recur. The conditions that created the bug the first time will create it again — through a refactor, a dependency update, a configuration change. The regression test is the insurance policy.

def test_pagination_returns_different_pages():
    """
    Regression: urljoin drops base path with leading slash.
    Fixed in commit abc123. Must not regress.
    """
    page_1 = client.fetch(page=1)
    page_2 = client.fetch(page=2)
    assert set(page_1) != set(page_2)

The comment matters. It links the test to the original bug. When someone reads this test in 6 months and wonders why it exists, the comment tells the story.

External Prerequisites Are Blocking

We shipped a feature that required a Discord bot permission ("Server Members Intent") to be enabled in the Developer Portal. The code was correct. The tests passed. The deployment succeeded. The bot crashed on startup because it could not read the member list.

The fix was a checkbox in a web portal. It took 30 seconds. But the time from deploy to realizing the crash to investigating the logs to identifying the missing permission to finding the right portal page to enabling the setting was 2 hours.

That 30-second checkbox should have been a blocking prerequisite in the PR description. Not a comment. Not a TODO. A blocking item with a checkbox that was checked before merge.

## External Prerequisites (BLOCKING — do not merge until confirmed)
- [x] Discord Developer Portal: Server Members Intent enabled
- [x] API key provisioned and added to .env on target server
- [ ] DNS record updated for new subdomain  ← NOT CONFIRMED, DO NOT MERGE

State File Validation

E2E validation is not just about outputs. It is about state.

Many systems maintain state files — databases, JSON files, lock files, position logs. If the output looks correct but the state file is corrupted, the next run will fail. Or worse: the next run will silently use corrupted state and produce incorrect output that looks correct.

After every delivery that touches state management:

## State Validation
- [ ] State file exists at expected path
- [ ] State file schema matches expected format
- [ ] State file contains correct data after a complete run
- [ ] State file is cleaned up correctly on shutdown/restart
- [ ] Concurrent access does not corrupt state (lock mechanism works)

At Tesseract Intelligence, we run 49+ applications that manage trading positions, market data, and decision state. A corrupted state file in a trading engine does not just cause a bug — it causes financial loss. State validation is not optional.

Process Monitoring

The delivery is not done when the process starts. It is done when you have verified the process stays running.

# Start the process
launchctl load ~/Library/LaunchAgents/com.operator.service.plist

# Wait 5 minutes
sleep 300

# Verify it is still running
launchctl list | grep com.operator.service

# Check logs for errors
tail -50 /tmp/service.log | grep -i error

If the process crashes 3 minutes after start due to a timing-dependent bug, a "starts successfully" check will not catch it. You need the 5-minute soak test.

For critical services, this extends to a watchdog — an external process that monitors your service and restarts it on failure. But the first-deploy validation is manual: start, wait, verify.

The Compound Learning Retro

The final component of the delivery checklist is the retro. Not a weekly retrospective meeting. A per-session retro that runs after every coding session.

The format:

## [2026-03-20] — Category: Quality Engineering

**Mistake:** Deployed without verifying the category bar renders on mobile.

**Root Cause:** Visual retro was skipped due to time pressure. Assumed code review
was sufficient for a CSS-only change.

**Rule:** Every UI delivery requires Playwright screenshots at 3 breakpoints.
No exceptions for "small CSS changes."

**Detection Latency:** 48 hours (user report)

**Detection Method:** User screenshot in Discord support channel

**Alerting Gap:** No automated visual regression tests in CI pipeline.
Should have Playwright visual comparison on every PR that touches CSS.

Each retro entry answers six questions:

What went wrong?
Why did it go wrong?
What rule prevents it from happening again?
How long did it take to detect?
How was it detected?
What alerting should have caught it earlier?

The retro goes into the project's lessons.md file. Critical lessons get escalated to CLAUDE.md so they apply to every future session. The most important lessons get indexed in long-term memory so they survive across projects.

The Delivery Is Not the PR

The PR is the proposal. The deployment is the attempt. The running system is the delivery.

PR merged        → you proposed a change
Deploy succeeded → the change reached the environment
Process starts   → the change is running
Logs clean       → the change is not crashing
Data flows       → the change is doing its job
State correct    → the change is not corrupting anything
Retro logged     → the learning is captured

Only when all seven lines are true is the delivery complete.

Lesson Drill

Take your last merged PR. Run through the full quality gate checklist above. Check every item that was actually verified before merge. Count the items that were not verified. That gap is your quality debt.
Find one bug fix in your recent history that does not have a regression test. Write the test. Make it fail against the old code. Make it pass against the fix.
Write a compound learning retro for the last bug you encountered. Fill in all six fields: mistake, root cause, rule, detection latency, detection method, alerting gap. Put it in your project's lessons.md.
For one running service, verify right now: Is the process running? Are the logs clean? Is the state file correct? How long has it been since you last checked?

You have completed the core quality gate sequence. One battle story remains — lesson 179 on async testing patterns, the discipline that keeps this whole gate from rotting the next time a runtime bump lands.

So far you have built:

A mental model that treats quality as a lens, not a phase
A test strategy discipline that plans verification before code
E2E testing that catches what mocks cannot
Multi-agent audits that find bugs no single reviewer sees
Visual QA that sees what code review cannot
A delivery checklist that defines done as a running system

The quality gate is not bureaucracy. It is the difference between shipping and shipping with confidence. One lesson remains before the gate is complete — then go build something and verify it actually works.