ASK KNOX
beta
LESSON 244

Agent Team Architecture

Team skills, parallel vs sequential execution patterns, file territory ownership, model routing from Opus to Sonnet, and the orchestrator pattern that coordinates specialists without creating a single point of failure.

13 min read

A team of specialists is more powerful than a single generalist — but only if the team is organized well. Random parallelism without coordination produces conflicts: two agents writing the same file, one agent depending on output that hasn't been produced yet, expensive models burning tokens on simple tasks.

This lesson covers the architecture of a well-organized agent team: how to define team skills, when to run agents in parallel versus sequentially, how to assign file territory ownership, how to route tasks to the right model tier, and how the orchestrator holds it all together.

Team Skills: The Unit of Coordination

A team skill is a defined workflow that requires multiple specialists to collaborate. It has a name, a list of agents that participate, an execution strategy (parallel or sequential), and a set of rules about who owns what.

# skills/feature-team.yaml
name: feature-team
description: "Implement a feature end-to-end across backend and frontend"
agents:
  - backend-dev
  - frontend-dev
  - qa-engineer
execution: parallel
territories:
  backend-dev:
    owns: ["backend/", "tests/backend/"]
    reads: ["types/", "lib/"]
  frontend-dev:
    owns: ["app/", "components/", "tests/frontend/"]
    reads: ["types/", "lib/"]
  qa-engineer:
    owns: ["e2e/", "playwright/"]
    reads: ["app/", "backend/", "components/"]
handoff:
  backend-dev → qa-engineer: "API contract defined in types/"
  frontend-dev → qa-engineer: "Component API defined in components/"

The skill defines the collaboration pattern declaratively. The orchestrator reads this configuration and knows how to coordinate the session before dispatching a single agent.

Parallel vs Sequential: The Decision

The wrong default is always-parallel. The right default is parallel-where-possible, sequential-where-necessary.

Parallel execution works when:

  • Tasks are independent (no output dependency)
  • File territories are non-overlapping
  • Failure in one task does not invalidate the others

Sequential execution works when:

  • Task B requires Task A's output to start
  • Tasks touch overlapping code (architecture change before implementation)
  • A validation gate must pass before proceeding

In practice, most multi-agent workflows are mixed: some phases are parallel, some are sequential.

Feature Implementation Timeline:

Phase 1 (sequential):
  Architect → defines interfaces, API contracts, data models
  (Must complete before Phase 2 — others need the contracts)

Phase 2 (parallel):
  Backend Dev → implements API
  Frontend Dev → implements UI (can start from contracts)
  Test Writer → writes test stubs (can start from contracts)

Phase 3 (sequential — gated by Phase 2):
  QA Engineer → runs integration tests
  (Needs both backend and frontend complete)

Phase 4 (parallel):
  Reviewer → code review
  Security → security scan
  (Both can run simultaneously on the complete implementation)

Implementing this in a skill definition:

class TeamSkill:
    def __init__(self, config: SkillConfig):
        self.phases = config.phases
        self.agents = config.agents

    async def execute(self, directive: Directive) -> SkillResult:
        results = {}

        for phase in self.phases:
            if phase.execution == "sequential":
                for agent_id in phase.agents:
                    result = await self.dispatch_agent(
                        agent_id,
                        directive,
                        prior_results=results
                    )
                    results[agent_id] = result
                    if not result.success and phase.gate:
                        raise PhaseGateFailure(
                            f"Phase gate failed: {agent_id}"
                        )
            elif phase.execution == "parallel":
                tasks = [
                    self.dispatch_agent(
                        agent_id,
                        directive,
                        prior_results=results
                    )
                    for agent_id in phase.agents
                ]
                phase_results = await asyncio.gather(*tasks)
                for agent_id, result in zip(phase.agents, phase_results):
                    results[agent_id] = result

        return SkillResult(phase_results=results)

File Territory Ownership

File territory ownership is the mechanism that makes parallel development safe. Without it, two agents working on the same codebase simultaneously will produce conflicting changes.

The principle is simple: each agent owns a set of directories or file patterns. It can write to anything in its territory. It can read from anywhere. It cannot write outside its territory without going through the orchestrator.

TERRITORIES = {
    "backend-dev": {
        "owns": [
            "backend/",
            "tests/backend/",
            "migrations/",
        ],
        "reads": [
            "types/",
            "lib/shared/",
        ]
    },
    "frontend-dev": {
        "owns": [
            "app/",
            "components/",
            "hooks/",
            "tests/frontend/",
        ],
        "reads": [
            "types/",
            "lib/shared/",
        ]
    },
    "qa-engineer": {
        "owns": [
            "e2e/",
            "playwright/",
        ],
        "reads": [
            "app/",
            "backend/",
            "components/",
        ]
    }
}

def check_territory(agent_id: str, file_path: str, op: str) -> bool:
    """Returns True if the operation is permitted."""
    territory = TERRITORIES.get(agent_id, {})
    if op == "read":
        return True  # All agents can read anywhere
    if op == "write":
        owns = territory.get("owns", [])
        return any(file_path.startswith(owned) for owned in owns)
    return False

When an agent needs to write outside its territory — say, the backend developer needs to update a shared type — it does not just write the file. It sends a cross-territory write request to the orchestrator, which either queues it, routes it to the territory owner, or grants a temporary exception with a conflict check.

Model Routing: Opus for Thinking, Sonnet for Executing

Running every agent task on the most capable model is expensive and slow. Running everything on the cheapest model produces poor quality on complex work. The right approach is task-type routing.

The framework: match model capability to task complexity.

MODEL_ROUTING = {
    # High reasoning, architecture, planning — Opus
    "task_types": {
        "architecture_decision": "claude-opus-4-5",
        "system_design": "claude-opus-4-5",
        "security_review": "claude-opus-4-5",
        "complex_debugging": "claude-opus-4-5",
        "pr_architecture_review": "claude-opus-4-5",

        # Standard implementation — Sonnet
        "code_implementation": "claude-sonnet-4-5",
        "test_writing": "claude-sonnet-4-5",
        "code_review": "claude-sonnet-4-5",
        "documentation": "claude-sonnet-4-5",
        "refactoring": "claude-sonnet-4-5",
        "bug_fix": "claude-sonnet-4-5",

        # Simple edits and lookups — Haiku
        "file_edit": "claude-haiku-3-5",
        "format_conversion": "claude-haiku-3-5",
        "simple_query": "claude-haiku-3-5",
    }
}

def route_task(task: Task) -> str:
    """Returns the appropriate model ID for a given task."""
    return MODEL_ROUTING["task_types"].get(
        task.type,
        "claude-sonnet-4-5"  # default
    )

The same agent can use different models for different subtasks. An agent handling a complex feature implementation might use Opus for the architecture phase and Sonnet for the implementation phase:

async def implement_feature(directive: Directive) -> Result:
    # Phase 1: Architecture — requires deep reasoning
    architecture = await call_model(
        model=route_task(Task(type="architecture_decision")),
        prompt=f"Design the architecture for: {directive.description}"
    )

    # Phase 2: Implementation — standard execution
    implementation = await call_model(
        model=route_task(Task(type="code_implementation")),
        prompt=f"Implement according to this architecture: {architecture}"
    )

    return Result(architecture=architecture, code=implementation)

Routing by task type is not a minor optimization. On a fleet running hundreds of tasks per day, it is the difference between a manageable cost and a bill that terminates the project.

The Orchestrator Pattern

The orchestrator is the coordination layer that makes a team function as a team. Its responsibilities:

  1. Directive routing — receives incoming work and routes it to the right specialist or team skill
  2. State tracking — knows which agent is working on what, in what state
  3. Conflict resolution — resolves territory conflicts and resource contention
  4. Handoff management — routes work when it crosses domain boundaries
  5. Health monitoring — knows which agents are available, which are busy, which are stuck

What the orchestrator does NOT do:

  • Execute tasks directly
  • Make product or business decisions
  • Override agent authority ceilings
  • Operate as a bottleneck in the task execution path
class Orchestrator:
    def __init__(self):
        self.registry: dict[str, AgentCard] = {}
        self.active_directives: dict[str, DirectiveState] = {}
        self.territory_map: TerritoryMap = TerritoryMap()

    async def dispatch(self, directive: Directive) -> str:
        """Route a directive to the appropriate agent or team skill."""

        # Determine routing
        if directive.requires_team:
            skill = self.load_team_skill(directive.skill_name)
            return await self.execute_team_skill(skill, directive)
        else:
            agent = self.route_to_specialist(directive)
            return await self.dispatch_to_agent(agent, directive)

    def route_to_specialist(self, directive: Directive) -> AgentCard:
        """Deterministic routing — no LLM calls."""
        for agent_id, card in self.registry.items():
            if directive.domain in card.domains:
                if card.status == "ready":
                    return card
        raise NoAvailableAgent(f"No ready agent for domain: {directive.domain}")

    async def resolve_territory_conflict(
        self,
        agent_a: str,
        agent_b: str,
        file_path: str
    ) -> str:
        """Returns the agent that should get the write."""
        territory_owner = self.territory_map.get_owner(file_path)
        if territory_owner:
            return territory_owner
        # No owner — route to the agent that requested first
        return agent_a

The critical design principle: the orchestrator uses deterministic routing rules, not LLM reasoning. Every routing decision should be explainable as a rule: "Agent with domain X gets directives with domain Y." LLM-based routing is expensive, non-deterministic, and creates a dependency on model quality in the critical path.

Putting the Team Together

A fully assembled team architecture looks like this:

Incoming Directive
        ↓
   Orchestrator
   (deterministic routing)
        ↓
┌───────────────────────────────────────┐
│           Phase Execution             │
│                                       │
│  Sequential:   Architect              │
│                    ↓                  │
│  Parallel:   Backend  Frontend  QA    │
│                    ↓                  │
│  Sequential:   Reviewer               │
└───────────────────────────────────────┘
        ↓
   Result Assembly
   (orchestrator collects outputs)
        ↓
   Directive Completed

Each agent in the team:

  • Has its own seed file with domain expertise
  • Has its own Akashic namespace for persistent memory
  • Runs its boot and shutdown protocol
  • Respects territory boundaries
  • Uses model routing appropriate to its task types

The orchestrator knows the workflow but executes none of it. That separation — coordination from execution — is what keeps the system scalable.

Summary

  • Team skills define collaboration patterns declaratively: who participates, execution strategy, territory ownership
  • Parallel execution for independent tasks; sequential for dependent tasks; mixed for most real workflows
  • File territory ownership prevents conflicts: each agent owns directories it can write; all agents can read anywhere
  • Model routing by task type: Opus for complex reasoning, Sonnet for standard implementation, Haiku for simple edits
  • The orchestrator coordinates without executing — deterministic routing, state tracking, conflict resolution only

What's Next

With the team architecture defined, the next lesson wires it into an organization — agent cards as identity documents, the dispatch routing system, and the directive lifecycle from pending to completed.