Ask Knox

Budget enforcement stops spend once it has accumulated. Loop detection stops it before it accumulates.

An agent in a reasoning loop will hit its daily budget eventually — but "eventually" might be $0.80 into a $1.00 budget, or $2.50 into an override that was granted for a legitimate extended session. The loop detector does not wait. It detects the behavioral signature of a loop — consecutive turns with high content similarity — and intervenes at the session level.

What a Loop Looks Like

Agent loops are not always obvious. They do not always look like exact repetition. Common patterns:

Direct repetition. The agent produces nearly identical turns: "I need to check the current price and make a trading decision." Each time slightly rephrased but semantically identical.

Alternating stuck states. The agent oscillates between two responses: state A ("I need more information"), state B ("Let me check the data"), state A, state B. Neither state makes progress.

Reformulation loops. The agent rephrases its task on each turn, never actually executing: "Let me approach this differently." "Perhaps I should start by..." "A better framing might be..." Each turn is the agent restarting the same failing attempt.

Tool retry loops. A tool call fails. The agent tries again with identical or near-identical input. The tool fails again. Repeat.

All of these produce a detectable signal: turns with high lexical overlap. The loop detector measures this signal.

Jaccard Similarity

The similarity metric is Jaccard token overlap:

@staticmethod
def _text_similarity(a: str, b: str) -> float:
    """
    Simple token overlap similarity (Jaccard).
    Production would use sentence-transformers embeddings.
    """
    if not a or not b:
        return 0.0
    tokens_a = set(a.lower().split())
    tokens_b = set(b.lower().split())
    if not tokens_a or not tokens_b:
        return 0.0
    intersection = tokens_a & tokens_b
    union = tokens_a | tokens_b
    return len(intersection) / len(union)

Jaccard similarity is: |intersection| / |union|

The intersection is the set of tokens that appear in both strings. The union is the set of all tokens that appear in either string. A score of 1.0 means the strings have identical token sets. A score of 0.0 means they share no tokens.

Working through an example:

Turn A: "check price and decide trade"
tokens_a = {"check", "price", "and", "decide", "trade"}

Turn B: "check current price and make trade decision"
tokens_b = {"check", "current", "price", "and", "make", "trade", "decision"}

intersection = {"check", "price", "and", "trade"} → 4 tokens
union = {"check", "price", "and", "decide", "trade", "current", "make", "decision"} → 8 tokens

similarity = 4 / 8 = 0.50

A 0.50 score is below the 0.85 threshold — these turns would not trigger the loop counter. But:

Turn A: "check price and decide trade"
Turn C: "check price and decide trade action"
tokens_c = {"check", "price", "and", "decide", "trade", "action"}

intersection = {"check", "price", "and", "decide", "trade"} → 5 tokens
union = {"check", "price", "and", "decide", "trade", "action"} → 6 tokens

similarity = 5 / 6 = 0.833

Still below 0.85. Add one more overlap:

Turn D: "check price and decide on trade"
tokens_d = {"check", "price", "and", "decide", "on", "trade"}

intersection with Turn A = {"check", "price", "and", "decide", "trade"} → 5 tokens
union = {"check", "price", "and", "decide", "trade", "on"} → 6 tokens

similarity = 5 / 6 = 0.833

The 0.85 threshold is intentionally conservative. It allows natural variation in agent language without triggering false positives. Two turns must share almost all of their vocabulary to exceed 0.85.

The Check Turn Logic

Each turn goes through check_turn():

def check_turn(
    self, session_id: str, turn_content: str
) -> LoopCheckResult:
    """Check if the current turn is too similar to recent turns."""
    history = self._session_history[session_id]

    if not history:
        history.append(turn_content)
        return LoopCheckResult()

    # Compare against last N turns (window), not just the last one.
    # This catches alternating patterns (A, B, A, B...) too.
    window = history[-5:]  # last 5 turns
    max_similarity = max(
        self._text_similarity(prev, turn_content) for prev in window
    )

    history.append(turn_content)
    # Cap history to prevent unbounded memory growth
    if len(history) > MAX_HISTORY_PER_SESSION:
        self._session_history[session_id] = history[-MAX_HISTORY_PER_SESSION:]

    if max_similarity >= SIMILARITY_THRESHOLD:
        self._consecutive_similar[session_id] += 1
    else:
        self._consecutive_similar[session_id] = 0

    consecutive = self._consecutive_similar[session_id]

    if consecutive >= TERMINATE_CONSECUTIVE:
        return LoopCheckResult(
            looping=True,
            should_terminate=True,
            consecutive_similar=consecutive,
            reason=(
                f"Loop detected: {consecutive} consecutive similar "
                f"turns (>{SIMILARITY_THRESHOLD} similarity). "
                f"Session will be terminated."
            ),
        )

    if consecutive >= WARN_CONSECUTIVE:
        return LoopCheckResult(
            looping=True,
            should_terminate=False,
            consecutive_similar=consecutive,
            reason=(
                f"Loop warning: {consecutive} consecutive similar "
                f"turns. VP will be notified."
            ),
        )

    return LoopCheckResult(consecutive_similar=consecutive)

Three behaviors to note:

The first turn always passes. If there is no history for the session, the turn is added and returns clean. The detector needs at least one previous turn to compare against.

The window is the last 5 turns, not all history. Comparing against all history would flag an agent that legitimately returns to a topic it discussed ten turns ago. The window limits comparison to recent context.

The consecutive counter resets to 0 on any dissimilar turn. A session that has 3 similar turns, then one dissimilar turn, then 2 more similar turns is not looping — it has made genuine progress in the middle. The counter does not accumulate across the gap.

The Two-Stage Response

# Thresholds
SIMILARITY_THRESHOLD = 0.85
WARN_CONSECUTIVE = 3
TERMINATE_CONSECUTIVE = 5

At 3 consecutive similar turns: warning. At 5 consecutive similar turns: termination.

The two-stage design mirrors the budget system's 80%/100% pattern. The warning at 3 turns gives the broker the opportunity to intervene — notify the controlling VP agent, send a Discord alert, give the operator a chance to inspect. Many real loops are recoverable: the operator can send a clarifying message that gives the agent new information, breaking the loop.

The termination at 5 turns is the hard cut. Five consecutive turns with 0.85+ similarity is not ambiguous — the session has failed. Continuing it would burn budget without making progress. The broker terminates the session, logs the termination reason, and fires a session.terminated event.

Memory Management

The session history is bounded:

MAX_HISTORY_PER_SESSION = 50

A session can accumulate at most 50 turns in the comparison history. This is not a loop detection threshold — it is a memory management limit. An agent that runs 200 turns — a long but perfectly healthy session, since the 5-turn termination threshold applies only to consecutive similar turns, never to total session length — does not need all 200 turns in the comparison window. The comparison window is 5 turns. Keeping 50 in memory is generous storage with a cap.

When the history exceeds 50, it is trimmed to the last 50:

if len(history) > MAX_HISTORY_PER_SESSION:
    self._session_history[session_id] = history[-MAX_HISTORY_PER_SESSION:]

This is the rolling-window pattern. Old turns fall off. Current state is maintained. No unbounded growth.

Why Jaccard and Not Embeddings

The comment in the source code is explicit:

"""
Simple token overlap similarity (Jaccard).
Production would use sentence-transformers embeddings.
"""

Embeddings would be more semantically accurate. Two turns that express the same idea with completely different words — "analyze the price movement" vs "examine how prices changed" — would score as highly similar with embeddings, and near-zero with Jaccard.

Jaccard was chosen for three reasons:

Zero dependencies. No model loading, no network calls, no inference latency. It runs in microseconds.
Deterministic. Given the same inputs, it always produces the same output. Embeddings can vary with model versions.
Good enough. Real loops — agents that are genuinely stuck — do tend to use similar vocabulary. The agent saying "check price and decide" ten times in a row does not need semantic understanding to detect.

The comment is a signal, not a criticism. If false negatives become a problem — loops that use varied vocabulary and evade Jaccard detection — upgrading to embeddings is a well-defined path. The interface (_text_similarity(a, b) -> float) is a single function swap.

What Happens on Termination

When should_terminate=True is returned, the broker:

Fires a session.terminated event with the loop detection reason attached
Sends a Discord alert to the operator channel with the session details
Notifies the controlling VP agent
Records the termination in the audit log
Calls loop_detector.reset_session(session_id) to clear the history

The session is terminated cleanly. The broker does not let the agent issue one more turn. Once 5 consecutive similar turns are detected, the session is done.

The agent's state at termination is whatever it was when the 5th turn was received. The broker does not attempt to synthesize a clean exit. The operator can inspect the turn history in the audit log, diagnose why the agent got stuck, and restart a fresh session with whatever corrective context is needed.

Integration With Budget Enforcement

Loop detection and budget enforcement are complementary but independent. Both run on every turn. A loop that is caught at 3-warning turns has burned 3 turns of budget. A loop that reaches 5-termination turns has burned 5.

For a session running on Sonnet at $0.054/turn:

Caught at warning (3 turns): $0.162 additional spend beyond legitimate turns
Caught at termination (5 turns): $0.27 additional spend

These are small numbers per session. At scale — if an agent class is prone to loops and hits this pattern multiple times per day — the numbers compound. But the bigger value of loop detection is not the direct cost savings — it is the signal it provides.

An agent that hits the loop detector multiple times per week is telling you something about its prompt, its tooling, or the tasks it is being asked to perform. Loop frequency is an early indicator of agent quality problems, surfaced before they become P&L problems.

Session Isolation

Each session maintains independent state in the detector:

self._session_history: dict[str, list[str]] = defaultdict(list)
self._consecutive_similar: dict[str, int] = defaultdict(int)

Keyed by session_id, not agent_id. Two simultaneous sessions from the same agent do not interfere with each other's loop counters. A looping session does not affect a healthy parallel session.

When reset_session() is called — either on termination or when a session ends normally — the state for that session ID is cleared. No stale history accumulates between sessions.

Next: cost.attributed Events

Loop detection and budget enforcement both depend on accurate turn-level cost data. The next lesson covers the event layer that makes this data available: the cost.attributed event emitted on every LLM call, the emit callback pattern, and the CFO daily report structure that aggregates it.

Loop Detection — Jaccard Similarity and Session Termination