Ask Knox

First, the shape of the system this track keeps returning to: a composite scoring bot evaluates each incoming signal by running it through several scoring components, each contributing a weighted number of points. The points sum into a composite score, and the bot acts only when that composite clears a fixed trade threshold. The percentage of calls on which a component returns a non-zero contribution is its fire rate — a healthy component fires often; a dead one almost never.

The Agent Framework calibrator was alive, well, and producing zero signal. Every call succeeded. Every log line was healthy. Every unit test passed. And every signal it contributed to cleared the threshold on its own merits — or, in 96% of cases, failed to clear because the 25 points the calibrator theoretically could have added were zero instead.

This is the dead component pattern. It is the single most common silent failure mode in production ML and scoring systems, and it exists because error-based monitoring was never designed to catch it.

The Anatomy

A dead component has four properties that collectively make it invisible to standard health checks:

No exceptions. The component gracefully handles all failure cases internally. Network errors, missing data, malformed inputs — all caught, all swallowed, all replaced with a default return value.
No warnings. Nothing in the code path logs warning or error. The default path looks identical to the successful path in log output.
Stable latency. The component returns quickly because there is nothing to compute — returning zero is faster than running the full pipeline. Latency dashboards show improvement, not degradation.
Correct output. The return value is a well-formed zero (or empty list, or False boolean) that satisfies the caller's type expectations. Downstream code continues execution with no idea anything went wrong.

The component is simultaneously doing nothing and passing every health check. The bug is the gap between "responding" and "contributing" — and the gap is invisible unless you measure it.

Inline Diagram — The Invisibility Matrix

Why It Hides for So Long

The dead component is usually hidden by a second failure: the downstream pipeline does not distinguish between "component contributed zero" and "component was not yet computed." Both look like a zero in the composite score. The bot processes the signal, writes the score to the database, and moves on. The next signal comes in. The same thing happens. The pipeline never stops running; it just stops producing output that clears the threshold.

For Agent Framework, the second failure was that signals_cleared_threshold_24h was not an alerted metric. Zero signals per day looked identical to "it was a slow news cycle." The calibrator's role in the blockage was invisible until someone — Knox — thought to ask: "What does the component distribution look like?"

The Detection Pattern

The only reliable detection for dead components is direct fire-rate monitoring (the “Fire-Rate Monitoring” lesson, Quantitative Scoring track — which establishes fire rate as the canonical metric: every component emits a non-zero count at least once per window, and any component below a 5% threshold is treated as dead until proven otherwise):

from prometheus_client import Counter

component_total = Counter(
    "score_component_total",
    "Total calls to scoring component",
    ["component"],
)
component_fired = Counter(
    "score_component_fired_total",
    "Calls where scoring component returned a non-zero value",
    ["component"],
)

# In the scoring component itself
value = compute_contribution(signal)
component_total.labels(component=name).inc()
if value > 0.001:  # or whatever epsilon makes sense
    component_fired.labels(component=name).inc()

# In alerting config (see Lesson 283 for the full PromQL rules)
# fire_rate = rate(score_component_fired_total{component="..."}[24h])
#           / rate(score_component_total{component="..."}[24h])
# alert_if: fire_rate < 0.20  # warning
# alert_if: fire_rate < 0.05  # critical

Three lines of instrumentation. Two alert thresholds. That is the entire fix — and it is the single cheapest investment any scoring system can make against silent failure.

A note on the numbers, because two figures describe this same incident: the first hypothesis-driven SQL query run against the database — covering the previous 14 days — showed 89.1% calibrator-zero (see the “Hypothesis-Driven SQL” lesson). Measuring the full window produced 96.2% zero, or a 3.8% fire rate. Both numbers describe the same dead component at different measurement windows.

The Rule

Silence is not health. Every scoring component emits fire rate. Every fire rate has alert thresholds. The dead component pattern has been observed in Agent Framework, in Foresight's news feed, and in Sports Prediction Agent's market-weather signal — it is not a rare edge case. It is a default failure mode that appears wherever scoring systems depend on external data sources. Monitor for it, or keep finding it in post-mortems.

The Dead Component Pattern

The Anatomy

Inline Diagram — The Invisibility Matrix

Why It Hides for So Long

The Detection Pattern

The Rule