Load-Bearing vs Additive Components

Agent Framework had 340 tests passing. 95% code coverage. A runbook. A watchdog. And it had never placed a real trade for weeks.

The bug was not in any line of code. The bug was in the math of the scoring rubric.

The Anatomy of the Failure

Agent Framework assigns every political signal a composite score from four components:

Grok — narrative signal from X search (max 30)
Perplexity — cited research synthesis (max 30)
News — headline sentiment (max 15)
Calibrator — Metaculus/Manifold probability match (max 25)

Total possible: 100. Threshold to fire a trade: 70.

On paper, this looks fine. Even if you lost the calibrator entirely, you could still cross 70 with strong performance on the other three. The problem is that "strong performance" is relative to the distribution of real signals, not the theoretical maximum. And the calibrator wasn't just weak — it was returning exactly zero on 96.2% of signals because the semantic matcher couldn't find corresponding markets on Metaculus or Manifold for political questions.

On those 96.2% of signals, the effective ceiling was 30 + 30 + 15 = 75.

The threshold was 70.

To clear the bar without the calibrator, you needed to hit 93% of the maximum on all three remaining components simultaneously. Almost no signal in the distribution did that. The bot had been mathematically locked out of trading for weeks.

The Decomposition Exercise

Before shipping any scoring system, decompose every component into load-bearing or additive:

List every component with its maximum value.
For each component C: compute sum_of_maxes - C.max.
Compare the result to your threshold.
If (sum_of_maxes - C.max) < threshold: component C is load-bearing.
Re-run the test with fire-rate-weighted effective ceilings. Replace each component's max with max × expected fire rate and recompute the ceiling the system actually sees. A component that passes step 4 on theoretical maxima is still effectively load-bearing if the fire-rate-weighted ceiling leaves too little headroom — the working heuristic is that the ceiling should sit 20–30% above the threshold. The calibrator below is the canonical case: it passes the step-4 test (75 ≥ 70, not load-bearing on paper) but fails the fire-rate-weighted one — at a 3.8% fire rate its expected contribution is 0.95 points, leaving an effective ceiling of 75 on 96.2% of signals, only ~7% headroom over 70.

For Agent Framework before PR #28:

Component	Max	Sum without it	Below threshold 70?
Grok	30	70	No (equal)
Perplexity	30	70	No (equal)
News	15	85	No
Calibrator	25	75	No — but barely

The table looks safe. But ceiling analysis with real fire rates tells a different story. If the calibrator only fires on 3.8% of signals, the effective ceiling for 96.2% of inputs is 75. The difference between 75 and 70 is 5 points — which sounds like a buffer, until you realize that clearing it requires near-perfect scores on everything else. Real-world distributions never cooperate.

The Fix: Rebalance, Don't Move the Threshold

The instinct when signals are not clearing is to lower the threshold. That instinct is wrong. A lowered threshold loses its meaning as a quality gate and opens the door to bad signals. The correct fix is to restructure the component weights so the effective ceiling lives comfortably above the threshold — even when the weakest component is silent.

PR #28 did exactly that. Grok went 30 → 35. Perplexity went 30 → 50. Calibrator went 25 → 20. News stayed at 15.

The threshold remained 70. The effective ceiling on the 96.2% of signals without calibrator data rose from 75 to 100. Eight signals cleared the bar in the next 24 hours.

The Rule

Every scoring system gets a load-bearing analysis before it ships. Every component gets a column in the decomposition table. Every row gets checked against the threshold. The question is not "does this component contribute?" The question is: does the system still clear the bar without this component?

If the answer is no, the component is load-bearing, and you now need to monitor its fire rate as a first-class metric. Silence from a load-bearing component is the scoring-system equivalent of a plane losing an engine. It is not a degradation — it is a failure mode.

The Agent Framework calibrator was load-bearing by accident. It was additive by design. The gap between the two is where dead components hide.