The PR Review Checkpoint: Reviewing AI-Generated Code Before It Ships

Here is the trap with AI-generated code: it looks correct.

The syntax is clean. The formatting is consistent. The variable names are reasonable. The tests pass. The linting is green. Every surface-level signal says "this is good code."

And that is exactly what makes it dangerous to review.

Human-written code has tells. Inconsistent formatting. Typos in comments. That one developer who always forgets semicolons. These imperfections actually help reviewers — they create friction that forces careful reading. AI-generated code has no tells. It is uniformly clean, uniformly structured, and uniformly confident. The bugs are not on the surface. They are in the assumptions.

Why Review Matters More for AI Code

When a human writes code, they carry context. They know why they chose this approach over that one. They know the tradeoffs they considered. They know the edge cases they thought about but decided not to handle yet. You can ask them during review, and they can explain.

When AI writes code, there is no "why." There is only "what." The AI chose the most likely completion given the prompt and context. It did not weigh tradeoffs. It did not consider your deployment environment. It did not think about the three services that depend on the function it just modified.

This means the reviewer has a different job. You are not just checking for bugs. You are checking for assumptions — the invisible decisions the AI made that might not align with your system's reality.

One in four AI-generated PRs has assumption bugs that pass all automated checks. These are not syntax errors. These are logical errors: wrong API endpoint assumptions, missing error states, incorrect data type assumptions, or behavior that works with test data but fails with production data.

The Five-Point Review Checklist

Every PR. Five questions. No exceptions.

This checklist takes 5 minutes per PR. The return on those 5 minutes is asymmetric — a single caught bug that would have reached production saves hours of debugging, rollback, and customer impact.

The checklist is ordered by importance. "Does it work?" is first because everything else is irrelevant if the feature does not actually function. "Could it break existing?" is last because it requires the most context and is the most commonly skipped — which is exactly why it needs to be on the list.

The Self-Review Technique

You just spent 30 minutes working with AI to generate a feature. The code looks great. You are confident. You are about to push.

Stop. Wait 10 minutes.

This is not arbitrary. The 10-minute gap breaks anchoring bias. When you just generated the code, your brain assumes it is correct because you watched it being created. You saw the reasoning. You saw the tests pass. Your brain filed it as "reviewed."

After 10 minutes, you read it fresh. As if someone else wrote it. As if you are reviewing a PR from a colleague you do not entirely trust.

This is the self-review technique, and it catches more bugs than any automated tool because it applies human judgment at the exact moment when judgment is most compromised — right after generation, when confidence is highest and vigilance is lowest.

The code you just generated feels important and correct because you are thinking about it right now. The 10-minute gap lets System 2 (slow, analytical thinking) override System 1 (fast, pattern-matching thinking). This is where you catch the assumption bugs.

AI-Assisted Review: The Two-AI Pattern

Here is a powerful technique: use a second AI to review the first AI's output.

The logic is simple. The first AI generated code within a specific context, with specific assumptions, optimizing for a specific prompt. The second AI has none of those biases. It reads the code fresh, with different assumptions, and catches things the first AI baked in.

We use this pattern on every significant PR at Tesseract Intelligence. CodeRabbit and Gemini code review run automatically on every PR. They catch structural issues, naming inconsistencies, and common anti-patterns. They do not catch visual bugs, state issues, or integration failures — but they catch enough to be worth the automated pass.

The key insight: AI review is a layer, not the layer. It stacks with human review. Neither alone is sufficient. Together, they cover significantly more ground.

The "Explain This Code" Test

This is the simplest and most powerful review technique for AI-generated code.

The test is binary. Either you can explain the code in one sentence — what it does, why it does it, how it handles failure — or you cannot.

If you cannot explain it, you have two options: ask the AI to explain it (and verify the explanation matches your intent), or rewrite it until you can explain it yourself.

This test is especially critical for AI-generated code because AI can produce code that is more sophisticated than what you would write yourself. That sophistication is not always a feature. If the AI used a pattern you do not understand, you cannot debug it when it breaks. And it will break.

The Review Cadence

For AI-generated code, I recommend this cadence:

Every PR: 5-point checklist. 5 minutes. Non-negotiable.

Every significant PR: Self-review after 10-minute gap + AI-assisted review. 20 minutes total.

Every architectural change: Full walkthrough with "explain this code" test on every new function. 30-60 minutes.

This scales. Even at high AI generation volume — 5-10 PRs per day — the total review time is 1-2 hours. Compare that to the cost of a single production incident caused by an unreviewed assumption bug.

The InDecision Framework applies here: the cost of making a wrong decision (merging bad code) is far higher than the cost of slowing down to make a right decision (reviewing thoroughly). The asymmetry makes the investment obvious.

Lesson Drill

Pick your most recent AI-generated PR. Apply the 5-point checklist. Write down what you would have caught that you did not catch during the original review.
Practice the self-review technique: generate a feature with AI, then wait 10 minutes before reviewing. Note what you catch in the fresh-eyes pass that you did not notice during generation.
Set up automated AI review on one repository (CodeRabbit, or use a second Claude session to review PRs). Run it for one week and count how many issues it catches that human review missed — and vice versa.