ASK KNOX
beta
LESSON 120

Monitoring Autonomous Systems: Signals, Alerts, and Kill Switches

What to monitor, when to alert, and how to stop a misbehaving agent system before damage compounds. Output quality scores, error rate trending, cost per action, escalation frequency — and the circuit breaker design that translates alert conditions into safe system states.

12 min read·Autonomous Agent Trust

You do not know your system is broken until you measure it.

This statement seems obvious. Its implications are not. Most engineers building autonomous systems do not have a complete answer to: "What would you see in your logs and dashboards if your system started producing wrong outputs at a rate 20% above normal?" If the answer is "I would see elevated error rates" — that is wrong. Wrong outputs do not produce errors. They produce confident-but-incorrect actions that look correct in the logs right up until they cause visible downstream damage.

The monitoring gap in autonomous AI systems is not a technical problem. It is a measurement design problem. You are not monitoring for errors. You are monitoring for quality, correctness, and behavioral drift — metrics that require intentional design to capture.

Monitoring Signals and Circuit Breaker

The Six Core Monitoring Signals

Six metrics form the monitoring baseline for any autonomous agent system. All six must be tracked. Missing any one creates a blind spot that adversarial conditions, edge cases, or gradual drift will exploit.

Signal 1: Output quality score. The numerical confidence score produced by the scoring system for each agent output. Track the distribution and trailing average. The trailing 50-run average (or 7-day average, whichever is larger) establishes the baseline. Alert when the trailing average drops 15%+ from the 7-day rolling baseline.

This is the primary quality signal. A drop in average confidence score is the leading indicator of a system beginning to struggle with its task domain — before that struggle manifests as escalations or user complaints.

Signal 2: Error rate. The percentage of agent runs that produce unhandled exceptions, timeout failures, or explicit error states. Track per task type. The aggregate error rate can look fine while a specific task type has a rapidly rising error rate that will cause failures when volume increases.

Alert threshold: error rate spike greater than 3× the rolling 7-day average for any task type. Immediate escalation: error rate greater than 10% for any task type.

Signal 3: Escalation frequency. The percentage of tasks that are escalated to human review. This is a critical signal because it directly measures where the system is uncertain. A rising escalation rate means the system is encountering more uncertainty — which could indicate task distribution shift, model drift, or context changes that the system was not designed for.

Alert: escalation frequency rising more than 5 percentage points above the 7-day average. Investigation trigger: sustained escalation rate above 20%.

Signal 4: Cost per action. The total compute cost (token cost + inference infrastructure) divided by successfully completed tasks. Track the delta from the 7-day rolling average. A rising cost per action indicates the system is doing more work per task — longer chains, more retries, more validation passes — which is a behavioral signal that something has changed.

Alert: cost per action trending more than 20% above the 7-day baseline.

Signal 5: Latency P95. The 95th percentile task completion time. Latency spikes indicate resource exhaustion, API throttling, or increased retry behavior. Track against the SLA defined for each task type.

Alert: latency P95 exceeding SLA for 5% or more of tasks in a rolling 30-minute window.

Signal 6: User complaint rate. For systems with user-facing outputs, the rate of user feedback indicating incorrect, unhelpful, or problematic outputs. This is the most lagging of the six signals — users complain after problems have existed long enough to affect their experience. It is also the most externally valid — it measures whether the system is actually serving its users well.

Alert: complaint rate exceeding 3% of interactions in a rolling 24-hour window.

Setting Alert Thresholds Correctly

Alert thresholds set before production data exists are guesses. Alert thresholds calibrated against a measured baseline are defensible.

The process for setting thresholds:

  1. Run for two weeks without automated alerts. Log all six signals continuously. Do not alert; just collect data.
  2. Establish baseline distributions. For each signal, compute: mean, standard deviation, and 7-day rolling average. Identify natural variation — how much the signal moves on a normal day.
  3. Set thresholds at meaningful deviation. Alerts should fire when the signal is outside normal variation. 1.5–2 standard deviations above the mean is a reasonable starting point for most signals. The 15% degradation rule for quality score is calibrated against the observation that 15%+ degradation is a strong signal of systemic problems, not random variation.
  4. Tune based on false positive rate. After the first 30 days with alerts active, review the alert log. If any alert is firing more than 2× per week without corresponding real issues, the threshold is too sensitive and should be raised.

The monitoring system does not eliminate fog. It illuminates the areas of the fog that are changing. Alert thresholds are calibrated judgments about which changes in the signals represent real problems versus normal variation. The calibration process is ongoing — production systems change, and the thresholds must evolve with them.

Circuit Breaker Design

The circuit breaker is the mechanism that translates alert conditions into safe system states. It has three states:

CLOSED (normal operation). All requests flow through normally. The circuit breaker monitors the signals. If any alert threshold is crossed, the circuit breaker opens.

OPEN (tripped). No new requests are processed. In-flight requests that can be completed safely are allowed to finish; new requests are queued or rejected with a clear degraded-service response. The circuit breaker remains open for a configured cooldown period — typically 5–15 minutes, long enough for a human to assess the situation and long enough for a transient condition to resolve.

HALF-OPEN (testing). After the cooldown period, the circuit breaker allows exactly one probe request to be processed normally. If the probe succeeds and the monitoring signals look healthy, the circuit breaker moves to CLOSED. If the probe fails, the circuit breaker moves back to OPEN with an extended cooldown.

The circuit breaker is automatic. It does not wait for human intervention to open or close — that would defeat the purpose of autonomous failure protection. It does, however, alert humans when it opens, so the situation can be investigated and resolved.

Kill Switch Design

The circuit breaker handles gradual or transient failures. Kill switches handle situations where the system must be stopped immediately, with more force than a circuit breaker provides.

Hard stop. All agent processes are terminated immediately. No in-flight tasks are completed. The system does not accept new requests. This is appropriate when active harm is in progress — when the system is taking actions that are causing real damage and continuing to take them would make the situation worse. The hard stop is the emergency brake.

Trigger conditions for hard stop: active exploitation of a discovered vulnerability, system taking actions outside its authorized scope, cascading failure spreading to adjacent systems.

Graceful drain. New tasks are blocked from starting. In-flight tasks are allowed to complete normally. The system enters a planned wind-down. This is appropriate for situations where the system needs to be stopped but is not actively causing harm — rising cost per action, scheduled maintenance, deployment of a patched version.

Runbook trigger. Not a kill switch but an escalation path. When a specific alert fires, instead of (or in addition to) opening the circuit breaker, the system triggers a runbook — a documented sequence of investigation and remediation steps. The runbook trigger ensures that the on-call engineer receives not just an alert but a guided response procedure.

Runbook Design for Common Failure Modes

The runbook is the operational counterpart to the monitoring system. It should exist before the first alert fires.

Runbook template for quality score degradation:

Alert: Quality score trailing average dropped 15%+ from 7-day baseline
Step 1: Check which task type has the largest drop — segment by task type in dashboard
Step 2: Examine the last 10 outputs of the degraded task type — are there common failure patterns?
Step 3: Check if any dependency (model API, retrieval system, external data source) had issues in the same window
Step 4: If the degradation is isolated to one task type → isolate that task type, route to human review, investigate
Step 5: If the degradation is broad → open circuit breaker, notify on-call, initiate incident response
Step 6: Log outcome in incident tracker with root cause and resolution

The runbook translates a monitoring signal into a sequence of specific, executable steps. The engineer on call does not need to decide how to investigate — the runbook tells them. They execute the steps, reach a conclusion, and take the appropriate action.

Lesson 120 Drill

Build the monitoring baseline for your most autonomous agent:

  1. Instrument all six signals. Verify that every run produces log events for: output quality score, task completion status (success/error/escalation), latency, and cost.
  2. Run for one week and collect baseline distributions for each signal.
  3. Set alert thresholds at 1.5 standard deviations above mean for each signal, plus the specific percentage-based rules (15% quality drop, 3× error spike).
  4. Write a runbook for each alert type: what the alert means, what to check first, what to do if the condition confirms.
  5. Implement the circuit breaker in CLOSED state. Test OPEN and HALF-OPEN transitions in staging.

The system is not monitored until you can answer: "If output quality drops 15% tomorrow, what will I see in my dashboards, what alert will fire, and what will the runbook tell me to do?"

Bottom Line

Monitoring autonomous systems is not about capturing errors. Errors are rare and often catch themselves. It is about capturing drift — the gradual degradation of quality, the rising uncertainty, the increasing cost, the accumulating user friction.

Build the six signals. Calibrate the thresholds against measured baselines. Design the circuit breaker with three clean states. Define kill switch types for different severity levels. Write runbooks before the alerts fire.

The monitoring system is what lets you sleep while the agents run. It is also what wakes you up — specifically and usefully — when something is wrong.