ASK KNOX
beta
LESSON 257

Load-Bearing vs Additive Components

Hermes had a 25-point calibrator that contributed zero on 96.2% of signals. That single dead component made the 70-point threshold mathematically unreachable. Load-bearing vs additive decomposition is the pre-flight check every scoring system needs.

9 min read

Hermes had 340 tests passing. 95% code coverage. A runbook. A watchdog. And it had never placed a real trade for weeks.

The bug was not in any line of code. The bug was in the math of the scoring rubric.

The Anatomy of the Failure

Hermes assigns every political signal a composite score from four components:

  • Grok — narrative signal from X search (max 30)
  • Perplexity — cited research synthesis (max 30)
  • News — headline sentiment (max 15)
  • Calibrator — Metaculus/Manifold probability match (max 25)

Total possible: 100. Threshold to fire a trade: 70.

On paper, this looks fine. Even if you lost the calibrator entirely, you could still cross 70 with strong performance on the other three. The problem is that "strong performance" is relative to the distribution of real signals, not the theoretical maximum. And the calibrator wasn't just weak — it was returning exactly zero on 96.2% of signals because the semantic matcher couldn't find corresponding markets on Metaculus or Manifold for political questions.

On those 96.2% of signals, the effective ceiling was 30 + 30 + 15 = 75.

The threshold was 70.

To clear the bar without the calibrator, you needed to hit 93% of the maximum on all three remaining components simultaneously. Almost no signal in the distribution did that. The bot had been mathematically locked out of trading for weeks.

The Decomposition Exercise

Before shipping any scoring system, decompose every component into load-bearing or additive:

  1. List every component with its maximum value.
  2. For each component C: compute sum_of_maxes - C.max.
  3. Compare the result to your threshold.
  4. If (sum_of_maxes - C.max) < threshold: component C is load-bearing.

For Hermes before PR #28:

ComponentMaxSum without itBelow threshold 70?
Grok3070No (equal)
Perplexity3070No (equal)
News1585No
Calibrator2575No — but barely

The table looks safe. But ceiling analysis with real fire rates tells a different story. If the calibrator only fires on 3.8% of signals, the effective ceiling for 96.2% of inputs is 75. The difference between 75 and 70 is 5 points — which sounds like a buffer, until you realize that clearing it requires near-perfect scores on everything else. Real-world distributions never cooperate.

The Fix: Rebalance, Don't Move the Threshold

The instinct when signals are not clearing is to lower the threshold. That instinct is wrong. A lowered threshold loses its meaning as a quality gate and opens the door to bad signals. The correct fix is to restructure the component weights so the effective ceiling lives comfortably above the threshold — even when the weakest component is silent.

PR #28 did exactly that. Grok went 30 → 35. Perplexity went 30 → 50. Calibrator went 25 → 20. News stayed at 15.

The threshold remained 70. The effective ceiling on the 96.2% of signals without calibrator data rose from 75 to 100. Eight signals cleared the bar in the next 24 hours.

The Rule

Every scoring system gets a load-bearing analysis before it ships. Every component gets a column in the decomposition table. Every row gets checked against the threshold. The question is not "does this component contribute?" The question is: does the system still clear the bar without this component?

If the answer is no, the component is load-bearing, and you now need to monitor its fire rate as a first-class metric. Silence from a load-bearing component is the scoring-system equivalent of a plane losing an engine. It is not a degradation — it is a failure mode.

The Hermes calibrator was load-bearing by accident. It was additive by design. The gap between the two is where dead components hide.