ASK KNOX
beta
LESSON 171

Verify Before Fix

Fifteen percent of audit findings in a production codebase were false positives — bugs flagged by agents that didn't exist. One grep before writing code would have caught every one. This is the discipline.

10 min read·AI Code Audit Patterns

A 10-agent portfolio audit across 11 repos produced approximately 100 real bugs. It also produced bugs that were not there.

On the InDecision Engine alone, 2 of 13 audit findings — 15% — were false positives. Both would have generated code changes, test additions, and PRs to fix problems that did not exist. One grep command each would have caught them before a single line was written.

This lesson is about that discipline: verify before fix.

The Two False Positives

False positive 1: the undefined methods.

The audit flagged _find_peaks() and _find_troughs() as undefined. Severity: P1. Label: potential crash on execution.

Both methods exist. They are @staticmethod definitions at line 768 of an 814-line file. The audit agent scanned the file top-down, processed it in chunks, and stopped reading before it reached those definitions. It reported what it could not find as what does not exist.

False positive 2: the hallucinated pattern.

The audit flagged the absence of a "tiered momentum scoring pattern" as a missing feature. The agent had seen enough codebases with this pattern to treat it as expected architecture. It assumed the pattern should be present, detected its absence, and filed a finding.

The pattern was never there. It was never supposed to be there. The finding was constructed entirely from training-data expectation, not from the actual codebase.

Why This Happens

None of these are bugs in the audit agent. They are structural properties of how LLMs process long files. The correct response is not to trust the agent less — it is to add a verification step that is immune to these failure modes.

grep does not have a context window. It does not have training expectations. It searches the entire file and reports what is there.

The Verify-Before-Fix Rule

The workflow is five steps:

  1. Read the audit finding
  2. Write the grep command that would confirm it
  3. Run the grep
  4. If found: write the fix
  5. If not found: mark as false positive, skip, document it

Step 4 and step 5 are equally valid outcomes. Neither wastes time. The wasted time is in step 4 when the answer should have been step 5.

The Grep Commands

For an "undefined method" finding:

# Verify a method exists before fixing "undefined method" finding
grep -n "_find_peaks\|_find_troughs" src/formation_scorer.py

For a "missing pattern" finding:

# Verify a pattern exists before fixing "missing pattern" finding
grep -rn "tiered_momentum\|momentum_tier" src/

For a config mismatch finding:

# Verify a config mismatch before fixing it
grep -n "max_concurrent_positions" src/config.py backend/config.py

These are not complex commands. They take thirty seconds. The finding either confirms or collapses in that time.

The Verification Checklist Format

Track your findings with a simple text checklist. One entry per finding:

Finding: _find_peaks() undefined (P1)
Verify: grep -n "_find_peaks" src/formation_scorer.py
Result: Found at line 768 (@staticmethod)
Status: FALSE POSITIVE — skip

Finding: tiered momentum scoring missing (P1)
Verify: grep -rn "tiered_momentum\|momentum_tier" src/
Result: No matches
Status: FALSE POSITIVE — pattern was never part of this codebase

Finding: 0.0.0.0 binding in plist (P0)
Verify: grep -n "0.0.0.0" deploy/com.example.indecision-api.plist
Result: Found at line 12
Status: CONFIRMED — fix

Finding: Duplicate route registration on /webhooks (P1)
Verify: grep -n "router.add_route.*webhooks\|app.include_router" src/main.py
Result: Found at lines 44 and 89 — both register the same path
Status: CONFIRMED — fix

The STATUS line is the gate. Nothing moves to fix until STATUS is CONFIRMED.

The Cost of Skipping Verification

Writing a fix for a false positive is not neutral. It is harmful.

New code means new surface area. New tests mean new assumptions baked into the test suite. A PR for a non-existent bug consumes reviewer attention and CI minutes. If the method you "added" conflicts with the one that already existed at line 768, you have introduced a real bug to fix a phantom one.

The fix cost is not zero. It is negative.

Calibrating on the False Positive Rate

15% is not an argument against AI audits. It is an argument for a verification step.

The 85% real findings represent genuine value. In a 100-finding audit, that is 85 real bugs fixed — bugs that would otherwise have remained in production. You do not abandon the audit because 15 findings required a second look. You add the second look.

The economics are clear: 30 seconds per finding to verify, multiplied by 100 findings, is 50 minutes. That 50 minutes protects you from writing broken fixes for 15 non-existent bugs. The ROI is strongly positive.

The One-Line Version

Every audit finding is a claim. Claims require evidence. Grep is the evidence.

Verify, then fix. In that order, every time.