Ask Knox

Error rate is not the right metric for a scoring component. Fire rate is.

Error rate tells you whether the component crashed. Fire rate tells you whether the component contributed anything. These are completely different questions, and conflating them is how you end up with a bot that has 340 tests passing, no alerts firing, and zero trades for weeks.

The Metric

For every scoring component C in your system, emit a standard log line once per score computation:

[score] component=calibrator value=0.0 fire=false threshold_contribution=0.0
[score] component=grok value=27.4 fire=true threshold_contribution=27.4

Roll this up into a fire_rate_24h metric: the percentage of the last 24 hours of signals where fire=true. Ship it to your existing metrics backend next to error rate, latency, and request count. The metric is as fundamental as any of those.

The Alerting Pattern

The warning threshold catches slow drift. A component that used to fire on 60% of signals and now fires on 18% is telling you something changed — maybe the data source shifted, maybe the matching logic is stale, maybe the input distribution moved. Investigate before the component goes fully silent.

The critical threshold catches dead components. At 5% or below, the component is contributing essentially nothing to your scoring. If it was load-bearing by design, the system is broken. If it was additive by design, you have dead weight to remove. Either way, you need to know right now, not when someone finally notices the bot has not traded in two weeks.

Why Error Rate Fails

A dead component is typically well-behaved. It catches its exception, returns 0 or an empty default, logs nothing, and moves on. The error rate is zero because no error occurred. The component looks perfect on every dashboard.

This is exactly what happened to the Agent Framework calibrator. The semantic matcher would query Metaculus and Manifold, find no corresponding market, return an empty candidate list, and the scoring function would gracefully return 0. No exception. No warning. No error metric. Just a silent, steady stream of zeros that collapsed the effective ceiling of the scoring rubric and blocked every trade.

If the calibrator had been emitting fire_rate_24h, the 3.8% rate would have paged on day one.

The Inline SVG — Fire-Rate Gauge

The Detection Template

Add this to every scoring service:

def record_score(component: str, value: float, maximum: float) -> None:
    fired = value > 0
    logger.info(
        "score.component",
        extra={
            "component": component,
            "value": value,
            "maximum": maximum,
            "fire": fired,
            "ratio": value / maximum if maximum else 0.0,
        },
    )
    metrics.increment("score_total", labels={"component": component})
    if fired:
        metrics.increment("score_fired_total", labels={"component": component})

Then set two alerts in your monitoring config:

# Thresholds are fractions (0.05 = 5%, 0.20 = 20%).
# Both counters carry the component label, so the division matches per
# label and the alert evaluates one series PER COMPONENT — a single dead
# component among healthy ones still trips, instead of drowning in the
# aggregate. (An unlabeled, aggregated expr would average a dead component
# against its healthy peers and never fire.) Prometheus detects counter
# resets automatically, so process restarts do not break the 24h window.
- alert: ComponentFireRateWarning
  expr: increase(score_fired_total[24h]) / increase(score_total[24h]) < 0.20
  severity: warning
- alert: ComponentFireRateCritical
  expr: increase(score_fired_total[24h]) / increase(score_total[24h]) < 0.05
  severity: critical

Thirty minutes of setup. A failure mode that eats weeks of production if you skip it.

The Rule