ASK KNOX
beta
LESSON 310

Memory in Managed Agents

Managed Agents does not have automatic memory — the agent must explicitly write and query memories, and understanding what is and is not persisted is critical for long-running pipelines.

8 min read

Every Managed Agents session starts fresh. When the session ends, the context window — everything the agent knew, reasoned through, and produced — does not persist to the next session automatically. This is correct behavior for stateless pipelines. It is a significant design constraint for pipelines that need to accumulate knowledge or avoid repeating work across sessions.

Memory in Managed Agents is not automatic. It is a deliberate, explicit mechanism: the agent writes memories when they should persist, and queries memories when prior context would be useful. Understanding the distinction between what is ephemeral and what requires explicit persistence — and how to design for that boundary — is the difference between a pipeline that compounds value over time and one that starts from zero on every run.

What Is Not Persisted Automatically

Before designing memory architecture, be precise about what does not persist across sessions:

Session conversation history — the messages between the orchestrator and the agent, the chain-of-thought reasoning, the intermediate working — all of this exists only within the session's context window. When the session ends, it is gone.

Tool call history — which tools were called, what inputs were provided, what results were returned — none of this is automatically accessible in a future session.

Intermediate work product — files written to the ephemeral session file system, data parsed from web searches, analysis conducted on input documents — ephemeral unless explicitly saved to external storage or written to the memory tool.

Agent state — there is no "agent state" that persists between sessions unless you explicitly design one. Each session is a fresh instantiation of the agent with its system prompt and any tools and skills configured at agent definition time.

The Memory Tool (Research Preview)

The memory tool is a built-in mechanism for explicit key-value persistence across sessions:

agent = client.beta.agents.create(
    name="research-accumulator",
    model="claude-sonnet-4-6",
    system="""You are a research accumulator. 
For each research task:
1. Query your memory for prior research on this topic
2. Incorporate any relevant prior findings into your current analysis
3. After completing your research, save key findings to memory for future sessions""",
    tools=[
        {"type": "web_search"},
        {"type": "memory"}  # Enables cross-session memory
    ]
)

The memory tool exposes two operations to the agent:

  • write — store a key-value pair with optional namespace and metadata
  • read — retrieve stored memories by key or by semantic query

How the Agent Uses Memory

Memory is not auto-injected. The agent must explicitly query its memory when prior context would be useful:

[Agent reasoning during session]
The task is to research Anthropic's latest releases.
Let me check if I have prior research on this topic.
[Calls memory tool with query: "Anthropic product releases"]
[Receives: stored finding from 2026-03-15 about Claude Sonnet release]
Good — I have context from last month. I'll use this as a baseline and update with newer information.

The agent does not passively receive memories at session start. It actively decides when to consult them. This means your system prompt needs to instruct the agent to query relevant memories as part of its task workflow.

Memory Namespacing

Memories are organized by namespace, which controls scope:

per_session — memories scoped to a single session. Equivalent to notes within a session's context. These do not persist across sessions.

per_user — memories scoped to a specific user ID. Useful for agents that serve multiple users and need to remember preferences or context for each one.

per_agent — memories shared across all sessions and all users of a specific agent. Useful for accumulated research, processed document indexes, and any knowledge that should be globally available to the agent.

# Agent writes an organization-wide memory
# (simplified — in practice this happens within the agent's tool call)
memory_write = {
    "tool": "memory",
    "action": "write",
    "key": "competitor-intel/stripe/q1-2026",
    "value": "Stripe launched Stripe Connect for Platforms in Q1 2026...",
    "namespace": "per_agent",
    "metadata": {
        "source": "official press release",
        "date": "2026-04-01",
        "confidence": "high"
    }
}

When Memory Is Not the Right Tool

The memory tool is useful but not universally appropriate. For structured data that needs reliable retrieval, external storage is often more appropriate:

External database — if your agent processes daily records and needs to track which records have been processed, a database with a processed_at timestamp is more reliable and queryable than the memory tool's key-value store.

Object storage (S3, GCS) — if your agent generates reports or produces files that need to persist, write them to object storage during the session. The memory tool is not designed for large binary content.

Vector database — for semantic search over large accumulated knowledge bases, a vector database provides better retrieval precision than the memory tool's built-in query mechanism.

The memory tool is right for: lightweight state that the agent needs to reference conversationally, preferences and context for specific users, and accumulated findings that should inform future sessions without requiring a full database query.

Designing for Stateful Pipelines

For pipelines that genuinely require state across sessions, the design pattern is:

  1. At session start, the agent queries memory (or reads from external storage) for relevant prior state
  2. The agent incorporates prior state into its current reasoning
  3. At session end, the agent writes new findings to memory (or to external storage) before the session completes
  4. The next session starts by repeating step 1

This explicit state management is more work to design than implicit persistence, but it is also more controllable. You can inspect the memory store. You can correct wrong memories. You can design the memory schema to match your pipeline's actual needs.

# System prompt that makes the pipeline explicitly stateful
system = """You are a market intelligence accumulator.

At the start of each session:
1. Query memory for: recent_findings/{topic}
2. If prior findings exist, summarize what you know and when it was gathered
3. Identify what has changed or is new since the last session

During research:
4. Gather new information using web_search
5. Cross-reference with prior findings to identify updates and changes

Before ending the session:
6. Write updated findings to memory: recent_findings/{topic}
7. Include: date, key findings, sources, what changed from prior session
8. Write a one-sentence summary to memory: summary/{topic}
"""