Ask Knox

PR #28 had a bug that no test caught and no human had time to spot. The calibrator component had a piecewise formula with four branches:

def calibrator_score(p: float) -> float:
    if p < 0.10:
        return 0.0
    elif p < 0.15:
        return 20.0 * (p - 0.10) / 0.05  # ramps 0 → 20 linearly
    elif p < 0.25:
        return 22.0 - 8.0 * (p - 0.15) / 0.10  # starts at 22, drops to 14
    elif p < 0.40:
        return 18.0 + 2.0 * (p - 0.25) / 0.15  # starts at 18, ends at 20
    else:
        return 20.0

Look at the boundary at p = 0.15. The lower branch at p = 0.15 gives 20.0 * 0.05 / 0.05 = 20.0. The upper branch at p = 0.15 gives 22.0 - 8.0 * 0.0 / 0.10 = 22.0. A two-point jump up as p crosses the boundary.

Now look at p = 0.25. The lower branch at p = 0.25 gives 22.0 - 8.0 * 0.10 / 0.10 = 14.0. The upper branch at p = 0.25 gives 18.0 + 0.0 = 18.0. A four-point jump up.

These are not bugs in isolation — they are deliberate shape choices. But PR #28's original version had a different formula at the 0.15 boundary where the upper branch produced 17.6 while the lower branch produced 18.4. A small improvement in calibration input caused an 0.8-point drop. That is a cliff. It is invisible to any test that does not happen to probe exactly at 0.15.

CodeRabbit noticed it. The PR was blocked until the math was fixed.

Inline Diagram — Continuity Audit Steps

The Audit Pattern

For every piecewise scoring function, run this audit before shipping:

List every branch boundary. Every if/elif threshold. Every case cutoff. Every comparator like p > 0.15.
For each boundary B: compute the lower-branch expression at exactly B, and the upper-branch expression at exactly B. They should be equal or differ by a design-justified amount.
If they differ unexpectedly: either the formula is wrong or the shape choice is wrong. Both need an owner to explain.

A ten-line function with four boundaries gets four boundary checks. Five minutes of arithmetic.

Why Reviewers Catch This

Code-writing agents have a consistent blind spot for numerical continuity. They pattern-match on "piecewise function with thresholds" and produce branches that look symmetrical but are not. The test suite usually picks round numbers that never land on a boundary. The author trusts the tests. The bug ships.

CodeRabbit, and code-review LLMs in general, are better at this specific task because they operate on the code itself as a mathematical object. Given a piecewise expression, asking "are the branches continuous at the boundaries?" is a clean analysis task that does not require running the code.

The Rule

Piecewise functions get boundary audits. Every branch point gets tested at the exact transition value. Every discontinuity needs an owner. And CodeRabbit comments on scoring math are treated as blockers until resolved — not nits to be skimmed.

Piecewise Continuity

Inline Diagram — Continuity Audit Steps

The Audit Pattern

Why Reviewers Catch This

The Rule