Ask Knox

Agents fail in ways that are not immediately obvious. They do not always crash. They do not always return error codes. Sometimes they produce plausible-looking output that is wrong. Sometimes they enter patterns that escalate cost without making progress. Sometimes they simply get stuck and stay in-progress indefinitely.

A platform without health monitoring is a platform operating blind. You learn about problems when users notice, not when they start. This lesson builds the visibility layer that detects problems early — before they cascade.

The Three Failure Modes to Monitor

Hallucination — the agent produces output that is syntactically correct and structurally plausible but factually wrong or fabricated. The agent does not know it is hallucinating. Its confidence is unaffected. Detection requires external validation.

Doom spiral — the agent enters a non-convergent error-correction loop. An error triggers a fix attempt. The fix introduces a new error. The new error triggers a larger fix. The pattern diverges until token budget is exhausted or the session becomes unrecoverable. Detection requires recognizing the divergent pattern, not just counting errors.

Stale execution — the agent is in in_progress state significantly longer than its baseline for this task type. It is stuck — possibly waiting for a dependency that will never arrive, possibly in an infinite loop, possibly out of context. Detection requires time-in-state tracking against a per-task-type baseline.

Each failure mode has a different detection mechanism and a different response.

The Behavioral Baseline

Behavioral monitoring requires a reference point: what does healthy look like for this agent on this task type? Without a baseline, every metric is ambiguous.

@dataclass
class BehavioralBaseline:
    agent_id: str
    task_type: str
    established_from: int          # number of sessions used to build baseline

    # Time metrics
    median_completion_seconds: float
    p95_completion_seconds: float
    stale_threshold_seconds: float  # p95 × 2.0

    # Quality metrics
    median_confidence: float
    min_acceptable_confidence: float   # median - 2σ

    # Error metrics
    baseline_error_rate: float
    error_spike_threshold: float       # baseline × 3.0

    # Cost metrics
    median_token_cost: float
    cost_spike_threshold: float        # median × 2.5


class BaselineManager:
    def compute_baseline(
        self,
        agent_id: str,
        task_type: str,
        session_window: int = 20
    ) -> BehavioralBaseline:
        """
        Compute baseline from last N validated healthy sessions.
        Excludes sessions flagged as anomalous.
        """
        sessions = self.store.get_healthy_sessions(
            agent_id=agent_id,
            task_type=task_type,
            limit=session_window
        )

        if len(sessions) < 5:
            return self.get_default_baseline(task_type)

        durations = [s.duration_seconds for s in sessions]
        confidences = [s.avg_confidence for s in sessions]
        error_rates = [s.error_rate for s in sessions]
        costs = [s.token_cost for s in sessions]

        return BehavioralBaseline(
            agent_id=agent_id,
            task_type=task_type,
            established_from=len(sessions),
            median_completion_seconds=statistics.median(durations),
            p95_completion_seconds=statistics.quantiles(durations, n=20)[18],
            stale_threshold_seconds=statistics.quantiles(durations, n=20)[18] * 2.0,
            median_confidence=statistics.median(confidences),
            min_acceptable_confidence=statistics.mean(confidences) - 2 * statistics.stdev(confidences),
            baseline_error_rate=statistics.mean(error_rates),
            error_spike_threshold=statistics.mean(error_rates) * 3.0,
            median_token_cost=statistics.median(costs),
            cost_spike_threshold=statistics.median(costs) * 2.5,
        )

Hallucination Detection

Hallucination detection is a validation problem. The agent cannot detect its own hallucinations, so the platform must.

class HallucinationDetector:
    def __init__(self, validators: list[Validator]):
        self.validators = validators

    async def validate(
        self,
        output: AgentOutput,
        context: ValidationContext
    ) -> HallucinationResult:
        violations = []

        for validator in self.validators:
            result = await validator.check(output, context)
            if result.is_violation:
                violations.append(result)

        confidence = 1.0 - (len(violations) / max(len(self.validators), 1))

        return HallucinationResult(
            has_violations=len(violations) > 0,
            violations=violations,
            confidence=confidence
        )


class FactualConsistencyValidator:
    """Cross-check agent claims against authoritative sources."""

    async def check(
        self,
        output: AgentOutput,
        context: ValidationContext
    ) -> ValidationResult:
        # Extract claims from output
        claims = extract_factual_claims(output.content)

        violations = []
        for claim in claims:
            # Check against authoritative data
            authoritative = await context.knowledge_base.lookup(claim.subject)
            if authoritative and not claim.is_consistent_with(authoritative):
                violations.append(ClaimViolation(
                    claim=claim,
                    expected=authoritative,
                    actual=claim.value
                ))

        return ValidationResult(
            validator="factual-consistency",
            is_violation=len(violations) > 0,
            violations=violations
        )


class SchemaValidator:
    """Validate structured outputs against expected schema."""

    async def check(
        self,
        output: AgentOutput,
        context: ValidationContext
    ) -> ValidationResult:
        if not output.expected_schema:
            return ValidationResult(validator="schema", is_violation=False)

        try:
            jsonschema.validate(output.structured, output.expected_schema)
            return ValidationResult(validator="schema", is_violation=False)
        except jsonschema.ValidationError as e:
            return ValidationResult(
                validator="schema",
                is_violation=True,
                violations=[SchemaViolation(path=e.path, message=e.message)]
            )

Doom Spiral Detection

A doom spiral is characterized by a specific pattern: error rates increasing over iterations, with each iteration making the problem worse rather than better.

class DoomSpiralDetector:
    def __init__(
        self,
        error_threshold: float = 0.40,
        divergence_threshold: float = 0.05,  # mean per-iteration increase in error rate
        window: int = 5
    ):
        self.error_threshold = error_threshold
        self.divergence_threshold = divergence_threshold
        self.window = window

    def detect(self, session: ActiveSession) -> DoomSpiralResult:
        if len(session.iterations) < self.window:
            return DoomSpiralResult(detected=False)

        recent = session.iterations[-self.window:]
        error_rates = [i.error_rate for i in recent]

        # Is the error rate above threshold?
        current_error_rate = error_rates[-1]
        if current_error_rate < self.error_threshold:
            return DoomSpiralResult(detected=False)

        # Is it diverging (getting worse, not better)?
        trend = self._compute_trend(error_rates)
        if trend < self.divergence_threshold:
            return DoomSpiralResult(detected=False)

        # Are the errors different each time (not the same error looping)?
        error_diversity = len(set(i.error_type for i in recent)) / len(recent)

        return DoomSpiralResult(
            detected=True,
            current_error_rate=current_error_rate,
            trend=trend,
            error_diversity=error_diversity,
            iteration_count=len(session.iterations),
            recommendation="halt_and_reset" if error_diversity > 0.6 else "halt_and_retry"
        )

    def _compute_trend(self, series: list[float]) -> float:
        """Mean per-iteration change in error rate — positive means diverging.

        Error rates are 0-1 fractions, so the trend is small in absolute
        terms: a session climbing from 0.40 to 0.60 over a 5-iteration
        window has a trend of 0.05, which clears the default threshold.
        """
        if len(series) < 2:
            return 0.0
        deltas = [series[i+1] - series[i] for i in range(len(series)-1)]
        return statistics.mean(deltas)

Worked example: a session whose last five iterations show error rates [0.40, 0.45, 0.50, 0.55, 0.60] has a current error rate of 0.60 (above error_threshold = 0.40) and a trend of (0.60 - 0.40) / 4 = 0.05 — at the divergence_threshold, so the detector fires. A session oscillating around 0.45 with no upward trend stays below the divergence threshold and does not.

Stale Execution Detection

Stale execution is the simplest failure to detect and one of the most common.

class StaleExecutionDetector:
    def __init__(self, baselines: BaselineManager):
        self.baselines = baselines

    def check_session(
        self,
        session: ActiveSession
    ) -> StaleResult:
        if session.status != "in_progress":
            return StaleResult(is_stale=False)

        elapsed = (datetime.now(timezone.utc) - session.started_at).total_seconds()
        baseline = self.baselines.get(session.agent_id, session.task_type)

        if elapsed > baseline.stale_threshold_seconds:
            return StaleResult(
                is_stale=True,
                elapsed_seconds=elapsed,
                threshold_seconds=baseline.stale_threshold_seconds,
                factor=elapsed / baseline.median_completion_seconds,
                action=self.recommend_action(elapsed, baseline)
            )

        return StaleResult(is_stale=False)

    def recommend_action(
        self,
        elapsed: float,
        baseline: BehavioralBaseline
    ) -> str:
        # Escalation ladder — must match RECOVERY_PROTOCOLS below
        factor = elapsed / baseline.median_completion_seconds
        if factor > 5.0:
            return "force_terminate_no_retry"
        elif factor >= 3.0:
            return "soft_terminate_and_retry"
        else:
            return "send_heartbeat_check"

Circuit Breakers

A circuit breaker prevents a failing agent from receiving new work until it demonstrates it has recovered. It tracks failure rates and opens the circuit when failures exceed the threshold.

class AgentCircuitBreaker:
    def __init__(
        self,
        agent_id: str,
        health_monitor: "AgentHealthMonitor",  # injected — no global state
        failure_threshold: float = 0.30,  # 30% failure rate opens circuit
        recovery_timeout_seconds: int = 300,
        half_open_test_count: int = 3
    ):
        self.agent_id = agent_id
        self.health_monitor = health_monitor
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout_seconds
        self.half_open_tests = half_open_test_count

        self.state: str = "closed"  # closed | open | half-open
        self.failure_count: int = 0
        self.total_count: int = 0
        self.last_failure: Optional[datetime] = None
        self.half_open_successes: int = 0

    def record_result(self, success: bool) -> None:
        self.total_count += 1
        if not success:
            self.failure_count += 1
            self.last_failure = datetime.now(timezone.utc)

        if self.state == "half-open":
            if success:
                self.half_open_successes += 1
            else:
                # A failure during the recovery probe reopens the circuit
                self._open_circuit()
                return

        self._evaluate_state()

    def _evaluate_state(self) -> None:
        if self.state == "closed":
            if self.total_count >= 10:  # minimum sample
                failure_rate = self.failure_count / self.total_count
                if failure_rate >= self.failure_threshold:
                    self._open_circuit()

        elif self.state == "open":
            if self.last_failure:
                elapsed = (datetime.now(timezone.utc) - self.last_failure).total_seconds()
                if elapsed >= self.recovery_timeout:
                    self._half_open()

        elif self.state == "half-open":
            if self.half_open_successes >= self.half_open_tests:
                self._close_circuit()

    def can_accept_task(self) -> bool:
        if self.state == "closed":
            return True
        if self.state == "half-open":
            return True  # Allow limited tasks to test recovery
        return False  # open — no new tasks

    def _open_circuit(self) -> None:
        self.state = "open"
        # Notify the injected health monitor
        self.health_monitor.alert(
            agent_id=self.agent_id,
            event="circuit_breaker_opened",
            failure_rate=self.failure_count / self.total_count
        )

    def _half_open(self) -> None:
        self.state = "half-open"
        self.half_open_successes = 0

    def _close_circuit(self) -> None:
        self.state = "closed"
        self.failure_count = 0
        self.total_count = 0

The Health Monitor: Assembling the System

The health monitor runs as a separate process — not part of any agent, not part of the broker. It polls agent state, runs detection checks, and acts on findings.

class AgentHealthMonitor:
    def __init__(
        self,
        broker: AgentBroker,
        check_interval_seconds: int = 30
    ):
        self.broker = broker
        self.check_interval = check_interval_seconds
        self.stale_detector = StaleExecutionDetector(BaselineManager())
        self.spiral_detector = DoomSpiralDetector()
        self.circuit_breakers: dict[str, AgentCircuitBreaker] = {}

    async def run(self) -> None:
        """Main monitoring loop — runs forever."""
        while True:
            try:
                await self.check_all_agents()
            except Exception as e:
                logger.error(f"Health monitor error: {e}")
            await asyncio.sleep(self.check_interval)

    async def check_all_agents(self) -> None:
        active_sessions = await self.broker.get_active_sessions()

        for session in active_sessions:
            await self.check_session(session)

    async def check_session(self, session: ActiveSession) -> None:
        issues = []

        # Stale execution check
        stale = self.stale_detector.check_session(session)
        if stale.is_stale:
            issues.append(HealthIssue(
                type="stale_execution",
                severity="warning" if stale.factor < 3 else "error",
                details=stale
            ))

        # Doom spiral check
        spiral = self.spiral_detector.detect(session)
        if spiral.detected:
            issues.append(HealthIssue(
                type="doom_spiral",
                severity="error",
                details=spiral
            ))

        if issues:
            await self.act_on_issues(session, issues)

    async def act_on_issues(
        self,
        session: ActiveSession,
        issues: list[HealthIssue]
    ) -> None:
        # Never compare severity strings directly — lexicographically
        # "warning" > "error", which would misclassify mixed sessions.
        has_error = any(i.severity == "error" for i in issues)

        if has_error:
            # Halt the session, open the circuit breaker
            await self.broker.halt_session(session.id, reason="health-monitor")
            cb = self.get_circuit_breaker(session.agent_id)
            cb.record_result(success=False)
            await self.escalate_to_ceo(session, issues)
        else:
            # Warning — flag and monitor
            await self.broker.flag_session(session.id, issues=issues)

Automated Recovery Protocols

When the health monitor halts a session, the recovery protocol determines what happens next.

RECOVERY_PROTOCOLS = {
    "stale_execution": {
        "factor < 3": "send_heartbeat_check",
        "factor 3-5": "soft_terminate_and_retry",
        "factor > 5": "force_terminate_no_retry",
    },
    "doom_spiral": {
        "iteration_count < 10": "halt_and_reset_context",
        "iteration_count >= 10": "halt_escalate_human",
    },
    "hallucination_detected": {
        "confidence > 0.80": "reject_output_retry",
        "confidence 0.60-0.80": "reject_output_validate_manually",
        "confidence < 0.60": "halt_escalate_human",
    },
}

Summary

Three failure modes require distinct detection mechanisms: hallucination (external validation), doom spiral (divergence pattern), stale execution (time-in-state vs baseline)
Behavioral baselines must be built from a rolling window of validated healthy sessions
Hallucination detection requires multiple validation layers — schema, factual consistency, and cross-referencing
Circuit breakers isolate failing agents from new work, allowing recovery without cascading failure
The health monitor is a separate process that polls agent state, runs detection checks, and triggers automated recovery

What's Next

The final lesson assembles all components into a complete platform — walking end-to-end through how expertise, team architecture, organizational wiring, authority delegation, and behavioral monitoring combine into a running system.

Behavioral Health Monitoring