The Six Checkpoints
Between 'I want a feature' and 'the feature is shipped' there are six quality checkpoints. Most vibe coders use zero. Adding even one — tests — eliminates 60-70% of bugs. This lesson teaches all six and how they compound.
Between the thought "I want a feature" and the reality "the feature is shipped and working," there are six opportunities to catch bugs. Six gates where a defect can be intercepted before it reaches users.
Most vibe coders use zero of these gates. They prompt the AI, paste the output, push to production. The bugs go straight from the AI's imagination to the user's experience with nothing in between.
This lesson teaches all six checkpoints, what each one catches, and — critically — how they compound. Because the math here is not intuitive. Adding tests alone does not give you 10% improvement. It gives you 40 percentage points. The compounding effect of multiple checkpoints is what separates production systems from prototypes.
Checkpoint 1: The Design Spec
Before you prompt the AI, write down what you want. Three to five sentences. Not a novel. Not a requirements document. A spec.
What does the feature do? What inputs does it take? What outputs does it produce? What constraints must it satisfy? What should it explicitly NOT do?
This sounds trivial. It eliminates 15% of bugs.
Here is why: without a spec, you prompt the AI with vague instructions. The AI makes assumptions. Those assumptions are often wrong — because the AI does not know your system, your users, or your constraints. A spec forces you to think before you prompt, which means the AI gets better instructions, which means the output is closer to what you actually need.
The spec template:
Feature: [name]
Purpose: [what it does and why]
Inputs: [what data it receives]
Outputs: [what it produces]
Constraints: [what it must NOT do, edge cases to handle]
Five lines. Takes 90 seconds to write. Catches scope creep, misunderstood requirements, and vague prompts before they produce vague code.
Checkpoint 2: Output Review
The AI generated code. Read it.
Not "glance at it." Not "scroll past it." Read every function, every conditional, every import. Look for:
- Hallucinated APIs: The AI invented a function that does not exist in the library it claims to use.
- Wrong assumptions: The AI assumed a variable is always a string when it can be null.
- Architectural mismatches: The AI used a pattern that contradicts how the rest of your system works.
- Missing error handling: The happy path works. The error path crashes.
Output review is the checkpoint that most separates experienced AI-assisted developers from vibe coders. The experienced developer treats AI output as a first draft from a junior engineer — useful, but requiring review before it enters the codebase.
Checkpoint 3: Tests
This is the checkpoint that changes everything. The single largest jump in quality from any one investment.
Tests come in three flavors, and you need all three:
Happy path tests verify that the feature works when given valid input. If the feature is "create a user," the happy path test creates a user with valid data and verifies the user exists afterward.
Error path tests verify that the feature fails gracefully when given invalid input. Null values, empty strings, negative numbers, missing fields. Every input that is not the happy path is an error path, and every error path needs a test.
Edge case tests verify behavior at the boundaries. What happens at zero? At the maximum? When the list is empty? When the database is down? Edge cases are where the most damaging bugs hide.
The target is . Coverage without assertions is theater. A test that runs a function but does not check the result catches nothing. Every test must assert a specific expected outcome.
# Bad: coverage without meaning
def test_create_user():
create_user("test@email.com", "password123")
# No assertion. This test proves the function runs. Not that it works.
# Good: meaningful assertion
def test_create_user():
user = create_user("test@email.com", "password123")
assert user.email == "test@email.com"
assert user.id is not None
assert user.created_at is not None
Checkpoint 4: CI/CD Validation
Tests on your local machine are useful. Tests that run automatically on every commit are transformative.
(Continuous Integration / Continuous Deployment) means every commit triggers an automated pipeline that runs your tests, checks your linting, validates your types, and only allows the code to proceed if everything passes.
The key word is "automatically." You do not have to remember to run tests. You do not have to trust that the developer ran them before pushing. The pipeline enforces it. Every time. No exceptions.
This is the difference between a policy ("we should write tests") and enforcement ("you cannot merge without passing tests"). Policies get forgotten under deadline pressure. Enforcement does not.
At a minimum, your CI pipeline should:
- Run the full test suite
- Check for linting errors
- Verify type safety (if using TypeScript, Python type hints, etc.)
- Block merges on failure
This takes 30 minutes to set up with GitHub Actions. It saves hundreds of hours over the life of a project.
Checkpoint 5: PR Review
A second pair of eyes. This can be a human teammate, an AI code reviewer like CodeRabbit, or both. The point is that someone other than the author examines the code before it merges.
PR reviews catch a specific class of bugs that tests miss: . A function that works correctly but is in the wrong layer. A pattern that contradicts the system's conventions. A security vulnerability that passes all functional tests but exposes sensitive data.
The reviewer does not need to be more experienced than the author. They need to be a different person (or a different AI session) with a different perspective. Fresh eyes catch what familiar eyes skip.
Checkpoint 6: E2E Validation
The final gate. Deploy to the real environment. Test with real data. Verify that the entire system works end to end — not in isolation, not with mocks, not in a test environment with fake data. In reality.
E2E validation catches the bugs that every other checkpoint misses: integration failures between real services, state management bugs that only appear with real data volumes, configuration mismatches between development and production environments.
This is not optional. This is the difference between "the code works" and "the product works." Merged PRs are not features. Running systems are features.
The Gap Analysis
No single checkpoint catches everything. That is the entire point of having six of them. Each gate is specialized — it catches specific bug types that the others miss.
Notice the pattern: logic errors are caught early (tests, review). Visual bugs slip through almost everything until E2E. Integration bugs require the real environment. Security holes need expert review.
This is why the pipeline is six checkpoints, not one. The system works because the gates complement each other. Remove any single gate and an entire category of bugs gets a free pass to production.
The Compounding Math
Here is the number that should motivate every builder reading this:
Adding zero checkpoints: 0% of bugs caught before production.
Adding just tests: 70% of bugs caught. One investment. Forty percentage point jump from the previous checkpoint. This is the single highest-ROI quality investment you can make.
Adding tests plus CI/CD: 85%. Automated enforcement of the tests you already wrote. Fifteen more percentage points for 30 minutes of setup.
All six checkpoints: 97%. Near-zero bug escapes. The remaining 3% are the truly novel edge cases that only real-world usage surfaces — and at that point, circuit breakers (Lesson 151) contain their blast radius.
The InDecision analysis engine runs all six checkpoints. Every signal processing change goes through spec, review, tests, CI, PR review, and E2E validation against live market data. The result is a system that processes thousands of data points daily with near-zero quality escapes.
Lesson 152 Drill
Implement your first quality checkpoint:
-
Current state: How many of the six checkpoints does your project currently use? Write them down honestly. If the answer is zero, that is fine — it means every checkpoint you add will have massive impact.
-
Write three tests: Pick the most critical function in your codebase. Write one happy path test, one error path test, and one edge case test. If you have never written tests, ask your AI tool: "Write a test file for [function name] with happy path, error path, and edge case tests."
-
Set up CI: Create a
.github/workflows/test.ymlfile that runs your tests on every push. This is a 15-line YAML file. The investment-to-return ratio is astronomical. -
Review the last AI output: Open the last piece of AI-generated code you committed. Read every line. Look for hallucinated functions, wrong assumptions, and missing error handling. Write down what you find.
-
E2E one feature: Pick one feature and test it end to end in your production or staging environment. Not with mocks. With real data. Document what works and what breaks. The gap between "tests pass" and "feature works in production" is often wider than expected.