Production Patterns for Managed Agents
Shipping a Managed Agents pipeline to production requires versioning discipline, health verification, cost modeling, and a tested migration path — not just a working prototype.
A working prototype and a production-ready system are separated by a specific set of engineering decisions: versioning strategy, rate limit awareness, cost modeling, health verification, and a tested path for when things break. None of these are visible in a demo. All of them determine whether the system you ship is reliable in six months.
This lesson covers the production patterns that distinguish deployed pipelines from experiments.
Agent Versioning
Agents are not ephemeral — they are durable resources that production systems reference by ID. Version them like APIs.
Semantic versioning for agents:
- Patch (v1.0.0 → v1.0.1): System prompt improvements that do not change output format or fundamental behavior. Safe to update in place.
- Minor (v1.0.0 → v1.1.0): New tools or skills added. May change behavior but is backward compatible in output format. Update in place with monitoring.
- Major (v1.0.0 → v2.0.0): Output format change, model change, removal of tools callers depend on, fundamental behavior shift. Create a new agent — do not update in place.
For major versions, the migration pattern:
# Step 1: Create the new version
agent_v2 = client.beta.agents.create(
name="intelligence-pipeline-v2", # Versioned name
model="claude-sonnet-4-6",
system="[updated system prompt with new output format]",
...
)
# Step 2: Run parallel sessions to validate equivalence
# Run same inputs through v1 and v2, compare outputs
# Step 3: Migrate callers to v2 by updating agent_id references
# Step 4: Archive v1 (do not delete)
# Archival preserves the agent definition for audit and rollback
# while preventing new sessions from being started against it
Never delete old agents immediately after migration. Retained agent definitions are your rollback path. If v2 produces unexpected outputs in production, you need to know what v1 was configured with. Delete only after a deliberate retention window (90 days minimum for production agents).
Rate Limits
Managed Agents has its own rate limit tier, separate from the standard Messages API. The limits apply to:
- Sessions created per minute
- Concurrent active sessions
- Events streamed per minute
Check the current limits in the API documentation — they change as the product matures. The critical operational implication: if you build a pipeline that starts 100 sessions simultaneously, you will hit the concurrent session limit before all of them start.
Design for rate limits from the start:
import time
from typing import List
def start_sessions_with_rate_limit(
agent_id: str,
tasks: List[str],
max_concurrent: int = 10,
delay_between_batches: float = 1.0
) -> List[str]:
"""Start sessions in batches to respect rate limits."""
session_ids = []
for i in range(0, len(tasks), max_concurrent):
batch = tasks[i:i + max_concurrent]
for task in batch:
session = client.beta.agents.sessions.create(
agent_id=agent_id,
input={"role": "user", "content": task}
)
session_ids.append(session.id)
if i + max_concurrent < len(tasks):
time.sleep(delay_between_batches)
return session_ids
Cost Modeling
Managed Agents pricing has two components: per-session cost plus per-token cost. Both matter for pipeline economics.
Per-session cost applies regardless of session duration or token usage. Short sessions with minimal tokens are more expensive per token than long sessions, because the session overhead is amortized over more tokens in longer runs.
Per-token cost follows the standard model pricing for the model assigned to the agent. Longer sessions and more tool calls mean more tokens and higher token costs.
Modeling a pipeline's cost:
def estimate_session_cost(
sessions_per_day: int,
avg_tokens_per_session: int,
session_cost: float,
token_cost_per_million: float
) -> dict:
daily_session_cost = sessions_per_day * session_cost
daily_token_cost = (sessions_per_day * avg_tokens_per_session / 1_000_000) * token_cost_per_million
return {
"daily_session_cost": daily_session_cost,
"daily_token_cost": daily_token_cost,
"daily_total": daily_session_cost + daily_token_cost,
"monthly_total": (daily_session_cost + daily_token_cost) * 30
}
For pipelines running at scale, the session cost component can dominate for short-running agents. If your agent completes in under 30 seconds, evaluate whether the task could be handled more cost-effectively by the Messages API with your own orchestration loop.
Health Verification
Knowing that your agent was deployed is not the same as knowing it is working correctly. Implement a health check that validates actual agent behavior:
def verify_agent_health(agent_id: str) -> dict:
"""
Run a canary session with a known input and validate the output.
Returns health status and any failures detected.
"""
canary_input = "Generate a one-sentence summary of your capabilities."
expected_keywords = ["research", "synthesize", "report"] # Adjust for your agent
try:
session = client.beta.agents.sessions.create(
agent_id=agent_id,
input={"role": "user", "content": canary_input}
)
# Wait for completion (short timeout for health checks)
result = wait_for_completion(session.id, poll_interval=5, timeout=120)
if result["status"] != "completed":
return {"healthy": False, "reason": f"Session ended with status: {result['status']}"}
output_text = result.get("output", "")
missing_keywords = [kw for kw in expected_keywords if kw not in output_text.lower()]
if missing_keywords:
return {
"healthy": False,
"reason": f"Output missing expected keywords: {missing_keywords}",
"output": output_text
}
return {"healthy": True, "output": output_text}
except Exception as e:
return {"healthy": False, "reason": str(e)}
Run health checks:
- After every agent deployment
- On a schedule (daily for production agents)
- As part of any infrastructure change that could affect agent execution
The Migration Path from Messages API
When migrating an existing Messages API pipeline to Managed Agents:
Step 1: Identify the orchestration code you are replacing. The Python code that loops over documents, maintains state, retries on failures, and accumulates results — this is what you are moving into the agent.
Step 2: Convert your orchestration loop into a system prompt. The logic in your loop becomes instructions in the agent's system prompt. The data your loop maintained becomes the agent's working state within a session.
Step 3: Identify tool requirements. What external APIs or data sources does your loop call? Those become custom tools or built-in tools attached to the agent.
Step 4: Run both pipelines in parallel. Same inputs through the Messages API loop and through the Managed Agent. Compare outputs for accuracy and consistency.
Step 5: Validate cost and latency. Managed Agents may be more expensive per task than an efficient Messages API loop. Validate that the operational simplicity is worth the cost delta for your use case.
Step 6: Cut over with monitoring. Switch production traffic to Managed Agents. Keep the Messages API implementation available for at least 30 days as a fallback.
The Operational Checklist
Before calling a Managed Agents pipeline production-ready:
- Agent versioned with a clear naming convention
- System prompt specifies error handling for expected edge cases
- Tools scoped to minimum necessary with appropriate permission policies
- Rate limit awareness built into session dispatch logic
- Cost model documented with monthly estimate at expected volume
- Health check implemented and scheduled
- Event logging to observability layer from day one
- Fallback path documented (either old agent version or Messages API equivalent)
- Retention policy defined for completed session data