Acceptance Criteria as Backstops
Defining 'done' before the agent starts. The regression check rule — previously-passing signals must still pass. Target ranges for backtest hit rates. Why acceptance criteria are the difference between 'looks plausible' and 'is actually correct'.
The agent finished the PR. The diff looked clean. Tests passed. The author ran the backtest and got 52 cleared signals for the last 14 days. The target range was 3-12.
Fifty-two is not a success. Fifty-two means the rebalance over-corrected and dropped the effective quality bar by enough to admit signals that should not be trading. Without a defined range, "more is always better" would have shipped a broken fix. The range is the backstop.
The Two-Sided Range
Every acceptance criterion for a scoring change needs both bounds:
- Lower bound: Did the fix actually do anything? (
cleared > 2) - Upper bound: Did the fix over-correct? (
cleared < 15)
The target range expresses what "correct" looks like. Too few clearances means the rebalance was too conservative and needs another pass. Too many means the rebalance was too aggressive and needs to be dialed back. Neither outcome ships without explicit sign-off.
The Regression Check
The second mandatory backstop is a regression check — verifying that the fix does not break previously-working behavior. For Hermes PR #28:
Every signal in the last 14 days that previously cleared threshold 70 under the old weights must still clear 70 under the new weights. Zero regressions allowed.
This is an absolute constraint, not a soft target. A regression check that says "acceptable" or "mostly good" is not a check. If even one previously-clearing signal fails to clear under the new weights, the diff is wrong and the spec needs to be reopened.
Inline Diagram — Two-Sided Acceptance
The Self-Verification Loop
Good acceptance criteria allow the agent to verify its own work before opening the PR. The flow:
- Agent applies the diff.
- Agent runs the backtest query.
- Agent checks the result against the target range.
- If inside the range, open the PR with the backtest result in the description.
- If outside the range, stop and report the discrepancy instead of opening the PR.
This is a precondition check. The agent is not allowed to ship a PR whose result falls outside acceptance criteria. Without that check, the spec is doing only half its job — defining "done" but not enforcing it.
The Rule
Every acceptance criterion is a binary-evaluable condition with a number. Every number has a target range with both bounds. Every fix includes a regression check that allows zero regressions. The agent self-verifies before opening the PR. Anything that cannot pass the self-verification comes back for another round, not a merge.