Swarm Architecture: Parallel Agents with Consensus

The fundamental problem with using a single AI agent for high-stakes decisions is that you have one data point. One response from one model on one run. The response might be correct. It might be wrong. You have no way to distinguish between them without external verification — and external verification is exactly what you might not have.

A swarm solves this problem by creating multiple independent data points on the same question, then using agreement as a proxy for confidence.

This is not a new idea. It is the operating principle of juries, scientific peer review, prediction markets, and every committee that has ever existed. Independent agents, each working from the same inputs without seeing each other's reasoning, produce answers. Agreement among those answers is evidence. Disagreement is a signal to investigate. Near-unanimity is the closest thing to confidence you can get from probabilistic systems.

What a Swarm Is and Is Not

A swarm, as used in this system, means: N agents receive the same task, work independently without seeing each other's outputs, produce answers, and those answers are compared by a consensus engine.

This is structurally different from multi-agent orchestration where agents work on different sub-tasks. In orchestration, Agent A does research, Agent B writes, Agent C reviews — they are working on different pieces of the same problem. In a swarm, Agent A, B, and C all research the same question, write answers to the same question, and review the same output. They are doing the same work redundantly, for the purpose of consensus.

The difference matters because it changes the threat model. Orchestration is vulnerable to cascade failures — Agent B's error propagates downstream. Swarms are resistant to cascade failures because agents are independent. But swarms are vulnerable to correlated errors — if all agents were trained on the same wrong data, they will all produce the same wrong answer with high apparent consensus.

Achieving true independence requires more than running the same prompt three times. It requires:

Varying the temperature settings across agents to reduce identical outputs
Potentially using different models if the stakes are high enough
Isolating agents so they cannot see each other's reasoning during the task
Running agents with slightly varied prompt framing to reduce anchoring to a specific approach

The Quorum Model

The quorum model defines how to act on consensus results.

3/3 consensus — High confidence. All agents reached the same conclusion. The probability of correlated error is low. Proceed with confidence. Log the consensus for the confidence ledger.

2/3 consensus — Medium confidence. Two agents agreed, one diverged. This is a signal worth investigating, not an automatic rejection. The consensus engine should:

Identify what the dissenting agent got that the others did not
If the dissent is on a factual point, verify that point independently
If the dissent is on a judgment call, escalate to human review
If the dissent appears to be a reasoning error, proceed with the 2/3 consensus result and log the dissent

1/3 consensus — Rejection. One agent produced an output; two did not agree. The task should be rejected and rerun with a revised prompt, additional context, or a different approach. Do not proceed on a 1/3 result.

0/3 consensus (all different). Three agents produced three distinct answers. This is the clearest possible signal that the task specification is ambiguous or the problem is too novel for the current agent configuration. Stop. Clarify the specification. Do not rerun with the same inputs.

Use Cases: Where Swarms Are Worth the Cost

Swarms are expensive. Running three agents costs three times the compute and time of running one agent. This is only justified when the value of increased confidence outweighs the cost of redundancy.

Research synthesis. When synthesizing information from multiple sources to produce a factual summary, the consequences of a hallucination are high — the summary may be treated as authoritative. Running three synthesis agents and comparing their outputs for agreement on key claims provides meaningful confidence before the summary is distributed.

Fact-checking high-stakes content. Before publishing claims that could cause reputational or legal harm, running three agents independently to verify each claim, then requiring 2/3 or 3/3 consensus, provides a defensible verification standard.

Decision input generation. When an agent is generating options or recommendations that will feed a human decision, the decision is only as good as the options generated. A swarm producing the recommendation set, with consensus filtering removing options that only one agent surfaced, produces a cleaner, more reliable input to the decision.

Code generation for critical paths. When generating code that will touch authentication, payment processing, or data migration — areas where a logic error has outsized consequences — running three generation agents and comparing outputs for semantic equivalence provides a quality floor before human review.

Where swarms are not worth the cost: formatting tasks, simple transformations, tasks with deterministic correct answers, and any task where the cost of a wrong answer is low and easily corrected. Do not run swarms for routine work.

Designing the Consensus Engine

The consensus engine is the component that receives N outputs and produces a verdict. Its design is the subtle part of swarm architecture.

For structured outputs (JSON, code, specific format): Semantic similarity scoring. Compare outputs for structural equivalence, not character-level equality. Two code samples that produce the same output via different variable names should count as consensus. Two JSON responses with the same values in different field order should count as consensus.

For unstructured outputs (analysis, synthesis, recommendation): Claim extraction and comparison. Parse each output into a set of key claims. Compare claim lists across agents for overlap. Claims that appear in 2/3 or 3/3 outputs are consensus claims. Claims that appear in only 1/3 outputs are minority claims requiring investigation.

Disagreement handling: When the consensus engine identifies disagreement, it should not simply pick the majority. It should:

Identify specifically what the disagreement is about
Classify whether it is factual (checkable) or judgmental (requires human input)
Attempt automated resolution for factual disagreements (retrieve the answer from an authoritative source)
Escalate judgmental disagreements to human review with the specific disagreement clearly framed

The consensus engine is your friction management mechanism. Independent agents will produce divergent results through reasoning variation, probabilistic sampling, and contextual interpretation differences. The consensus engine translates that divergence into a signal rather than noise.

Cost-Trust Tradeoff

Every swarm decision involves this explicit tradeoff:

Cost: N × (agent compute cost + latency). Running three agents costs three times as much and takes at minimum as long as the slowest agent.

Trust gained: Confidence proportional to the independence of your agents and the clarity of your consensus criteria. High-quality swarms with genuinely independent agents and well-designed consensus engines produce meaningful confidence gains. Low-quality swarms — same model, same temperature, minimal prompt variation — produce minimal confidence gains despite full cost.

The tradeoff is only favorable when:

The agents are genuinely independent
The consensus criteria are specific and well-designed
The cost of an error on this task type exceeds the cost of redundant compute
The task type does not have a cheaper verification alternative (if a fact is checkable by retrieving a document, retrieve the document rather than running three agents)

Lesson Drill

Identify the highest-stakes single-agent task in your current system. Design a swarm for it:

Define what N (number of agents) you would use and why
Specify the independence mechanisms: temperature variation, prompt framing variation, model variation if applicable
Write the consensus criteria: what counts as agreement for this specific task type?
Define the 3/3, 2/3, 1/3, 0/3 handling rules
Calculate the cost: what is the per-run cost of the swarm vs. single agent?
Estimate the error rate reduction: what is the expected improvement in accuracy?

If the cost-benefit calculation does not favor the swarm, do not build it. But if the highest-stakes task in your system does not justify redundant verification, the question worth asking is why you are running that task autonomously at all.

Bottom Line

A single agent run gives you one data point. A swarm gives you N independent data points. Agreement among those points is evidence. The evidence strength is proportional to the independence of your agents and the quality of your consensus engine.

Build swarms for the decisions where wrong is expensive. Use the quorum model to translate consensus into action: 3/3 proceed, 2/3 investigate, 1/3 reject, 0/3 clarify. Keep the compute cost honest and do not apply swarms to tasks that do not justify them.

Consensus is not certainty. It is the best approximation of certainty available to probabilistic systems at scale.