Ask Knox

The simplest metric that would have caught the Hermes failure is "hours since last real trade." It is one number. It is easy to compute. It cannot hide silence behind counting tricks. And it would have been red for weeks before the session retro surfaced the actual root cause.

The Metric

SELECT
  EXTRACT(EPOCH FROM (NOW() - MAX(filled_at))) / 3600 AS hours_since_last_real_trade
FROM hermes_trades
WHERE status = 'filled';

One row. One number. Small number = recent activity. Large number = silence. Push it to your metrics backend as bot.liveness.hours_since_last_trade and display it on the bot's main dashboard.

Why It Works

A staleness counter captures a property that count-based metrics cannot. "Trades today" resets at midnight and can read zero on both a quiet day and a fully broken bot. "Hours since last real trade" is monotonically increasing until a real trade resets it, so extended silence accumulates visibly no matter how you slice the display window.

This matters because the failure modes that matter most are the long ones. A bot that stops trading for an hour is probably fine — one of its data sources is slow, one of its signals is noisy, nothing to do. A bot that stops trading for 500 hours is not fine. The staleness counter tells the difference at a glance.

Inline Diagram — The Dashboard Layout

The Dashboard Rule

Every bot dashboard should have two columns: code integrity on the left, operational validity on the right. The left column is the CI-style metrics everyone builds by default. The right column is the set of metrics that directly measure whether the bot is producing its intended outcomes.

Both columns appear on the same screen. No tab-switching. No drill-down. A glance should be enough to see whether the bot is healthy in both senses — and the Hermes pattern (left column fully green, right column fully red) should be impossible to miss.

The Rule

Liveness is a staleness counter, not a count. One number per bot. Displayed next to code integrity metrics so the gap is visible. Alerted against a bot-specific threshold. The Hermes failure would have been unmissable with this one metric — and adding it across the bot ecosystem is the cheapest post-incident insurance policy available.

Last Real Trade as a Metric

The Metric

Why It Works

Inline Diagram — The Dashboard Layout

The Dashboard Rule

The Rule