Production Patterns for Managed Agents

A working prototype and a production-ready system are separated by a specific set of engineering decisions: versioning strategy, rate limit awareness, cost modeling, health verification, and a tested path for when things break. None of these are visible in a demo. All of them determine whether the system you ship is reliable in six months.

This lesson covers the production patterns that distinguish deployed pipelines from experiments.

Agent Versioning

Agents are not ephemeral — they are durable resources that production systems reference by ID. Versioning is native to the API: every update to an agent creates a new immutable version, and sessions pin a specific version with {"type": "agent", "id": ..., "version": ...}.

The production discipline this enables:

Update, don't re-create. Make changes — including breaking ones — by updating the agent. The update publishes a new version; callers pinned to the old version are unaffected until you migrate them. Creating a fresh agent for every change accumulates orphaned agents and defeats the versioning model.
Pin production sessions to a known version. Unpinned sessions follow the latest version, which means an update silently changes production behavior. Pin, validate the new version, then advance the pin deliberately.
Roll back by pinning the prior version. Old versions are immutable and always available — rollback is a pointer change, not a redeploy.

For breaking changes, the migration pattern:

# Step 1: Publish the new version via update
client.beta.agents.update(
    agent_id=agent.id,
    system="[updated system prompt with new output format]"
)
# The response carries the new version number, e.g. version=4

# Step 2: Run parallel sessions to validate equivalence
# Same inputs against version=3 and version=4, compare outputs

# Step 3: Migrate callers by advancing their pinned version
session = client.beta.sessions.create(
    agent={"type": "agent", "id": agent.id, "version": 4},
    environment_id=environment.id
)

# Step 4: Keep the prior version as the rollback path
# Versions are immutable — version 3 remains available indefinitely

There is no delete endpoint for agents — only archive. Archive agents you genuinely no longer need; archived definitions remain available for audit. If a new version produces unexpected outputs in production, the prior version is your rollback path — re-pin and you are back on known-good behavior.

Rate Limits

Managed Agents endpoints have their own per-organization request limits (requests per minute on session creation, event sends, and so on), separate from the standard Messages API request limits. But the inference is not separate: model calls inside a session draw from your organization's standard token rate limits (ITPM/OTPM). A Managed Agents fleet and your direct Messages API traffic share the same token budget.

Check the current limits in the API documentation — they change as the product matures. The critical operational implication: if you build a pipeline that starts 100 sessions simultaneously, the burst hits both the session-creation request limits and — once all those agents start thinking — your shared token limits.

Design for rate limits from the start:

import time
from typing import List

def start_sessions_with_rate_limit(
    agent_id: str,
    environment_id: str,
    tasks: List[str],
    max_concurrent: int = 10,
    delay_between_batches: float = 1.0
) -> List[str]:
    """Start sessions in batches to respect rate limits."""
    session_ids = []
    
    for i in range(0, len(tasks), max_concurrent):
        batch = tasks[i:i + max_concurrent]
        
        for task in batch:
            session = client.beta.sessions.create(
                agent={"type": "agent", "id": agent_id},
                environment_id=environment_id
            )
            client.beta.sessions.events.send(
                session_id=session.id,
                event={"type": "user.message", "content": task}
            )
            session_ids.append(session.id)
        
        if i + max_concurrent < len(tasks):
            time.sleep(delay_between_batches)
    
    return session_ids

Cost Modeling

Managed Agents billing is token-based: model inference inside sessions bills as standard model token usage at the pricing of the model assigned to the agent. There is no published per-session platform fee — the economics are driven by how many tokens your sessions consume. Longer sessions, more tool calls, and bigger contexts mean more tokens.

The observability surface for cost is the event stream: every span.model_request_end event carries model_usage with the token counts for that inference call. Aggregate these per session and per agent — they are the ground truth your cost model calibrates against.

Modeling a pipeline's cost:

def estimate_pipeline_cost(
    sessions_per_day: int,
    avg_input_tokens_per_session: int,
    avg_output_tokens_per_session: int,
    input_cost_per_million: float,
    output_cost_per_million: float
) -> dict:
    daily_input_cost = (sessions_per_day * avg_input_tokens_per_session / 1_000_000) * input_cost_per_million
    daily_output_cost = (sessions_per_day * avg_output_tokens_per_session / 1_000_000) * output_cost_per_million
    daily_total = daily_input_cost + daily_output_cost
    
    return {
        "daily_input_cost": daily_input_cost,
        "daily_output_cost": daily_output_cost,
        "daily_total": daily_total,
        "monthly_total": daily_total * 30
    }

Calibrate avg_input_tokens_per_session and avg_output_tokens_per_session from real span.model_request_end data, not guesses — agentic sessions consume far more input tokens than a naive estimate suggests, because the context (including tool schemas and tool results) is re-sent on every inference call. If your agent completes in under 30 seconds, evaluate whether the task could be handled more cost-effectively by the Messages API with your own orchestration loop.

Health Verification

Knowing that your agent was deployed is not the same as knowing it is working correctly. Implement a health check that validates actual agent behavior:

def verify_agent_health(agent_id: str, environment_id: str) -> dict:
    """
    Run a canary session with a known input and validate the output.
    Returns health status and any failures detected.
    """
    canary_input = "Generate a one-sentence summary of your capabilities."
    expected_keywords = ["research", "synthesize", "report"]  # Adjust for your agent
    
    try:
        session = client.beta.sessions.create(
            agent={"type": "agent", "id": agent_id},
            environment_id=environment_id
        )
        client.beta.sessions.events.send(
            session_id=session.id,
            event={"type": "user.message", "content": canary_input}
        )
        
        # Wait for the turn to finish (short timeout for health checks)
        result = wait_for_turn_end(session.id, poll_interval=5, timeout=120)
        
        if result["status"] != "idle" or result.get("stop_reason") != "end_turn":
            return {"healthy": False, "reason": f"Session ended with: {result}"}
        
        # The agent's output is the last agent.message event
        events = list(client.beta.sessions.events.list(session_id=session.id))
        output_text = next(
            (e.content for e in reversed(events) if e.type == "agent.message"),
            ""
        )
        missing_keywords = [kw for kw in expected_keywords if kw not in output_text.lower()]
        
        if missing_keywords:
            return {
                "healthy": False, 
                "reason": f"Output missing expected keywords: {missing_keywords}",
                "output": output_text
            }
        
        return {"healthy": True, "output": output_text}
    
    except Exception as e:
        return {"healthy": False, "reason": str(e)}

Run health checks:

After every agent deployment
On a schedule (daily for production agents)
As part of any infrastructure change that could affect agent execution

The Migration Path from Messages API

When migrating an existing Messages API pipeline to Managed Agents:

Step 1: Identify the orchestration code you are replacing. The Python code that loops over documents, maintains state, retries on failures, and accumulates results — this is what you are moving into the agent.

Step 2: Convert your orchestration loop into a system prompt. The logic in your loop becomes instructions in the agent's system prompt. The data your loop maintained becomes the agent's working state within a session.

Step 3: Identify tool requirements. What external APIs or data sources does your loop call? Those become custom tools or built-in tools attached to the agent.

Step 4: Run both pipelines in parallel. Same inputs through the Messages API loop and through the Managed Agent. Compare outputs for accuracy and consistency.

Step 5: Validate cost and latency. Managed Agents may be more expensive per task than an efficient Messages API loop. Validate that the operational simplicity is worth the cost delta for your use case.

Step 6: Cut over with monitoring. Switch production traffic to Managed Agents. Keep the Messages API implementation available for at least 30 days as a fallback.

The Operational Checklist

Before calling a Managed Agents pipeline production-ready:

Production sessions pinned to explicit agent versions, with the pin-advance process documented
System prompt specifies error handling for expected edge cases
Tools scoped to minimum necessary with appropriate permission policies
Rate limit awareness built into session dispatch logic
Cost model documented with monthly estimate at expected volume
Health check implemented and scheduled
Event logging to observability layer from day one
Fallback path documented (either old agent version or Messages API equivalent)
Retention policy defined for completed session data