May 3, 2026

Claude Code Source Analysis Series, Chapter 13: Industry Patterns for Agent Collaboration (Optional)

A cross-industry comparison of multi-agent collaboration patterns, and how different systems split work and converge on results.

Industry Patterns for Agent Collaboration

Claude Code is only one example of a broader design problem.

The larger question is:

Outside Claude Code, how do other systems actually design multi-agent collaboration?

This topic is easy to turn into a pile of product names:

Gemini / ADK has SequentialAgent, ParallelAgent, and LoopAgent.
Claude Code has subagents and agent teams.
Codex has subagents, custom agents, sandboxes, and approvals.
OpenAI Agents SDK has handoffs and Agent.as_tool().
Deep Agents has supervisors, sync subagents, async subagents, memory, and sandboxing.

But memorizing names does not actually make the design clearer.

The hard part of multi-agent work is not whether the system can launch several model instances. It is this:

How is the task split?
Who dispatches the work?
How much context does each child receive?
May that child edit files?
How do results come back?
Can the task be stopped if it gets stuck?
What happens when multiple agents write to the same surface?
Who approves high-risk actions?

So instead of walking vendor by vendor, this article follows the problem chain.

Keep using the same running example:

The user asks the system to fix a login-failure bug.

The bug may involve auth code, cookie configuration, database-backed sessions,
the frontend login page, test mocks, legacy API compatibility, and security boundaries.

One agent can do the work, but it will likely be slow and messy.
Multiple agents can do the work too, but without collaboration design,
it just becomes a faster way to create chaos.

The core question is:

How do multi-agent systems turn multiple model-based workers into controlled engineering collaboration instead of a group of agents that constantly contaminate one another’s context?

1. Start by Following the Pressure Chain

Multi-agent systems usually do not appear because the model is “not smart enough.” They appear because engineering work creates four recurring pressures.

First, there is too much information.

When you are fixing a login bug, the main agent may need to read dozens of files, run tests, inspect logs, and trace callers. If all of that intermediate work lives in one context, the main line of reasoning gets buried under search noise.

Second, much of the work is naturally parallel.

Tracing callers, running tests, inspecting frontend state, and reviewing the security surface can often happen at the same time.

Third, the roles are different.

An explorer should mostly read. An implementer should focus on a minimal change. A reviewer should doubt the implementer’s assumptions. A security reviewer should prioritize exploitability. Stuffing all of that into one prompt often makes the roles fight each other.

Fourth, risk still has to converge somewhere.

A child agent cannot delete compatibility logic, change database schema, pull dependencies from the network, or run dangerous commands just because it happens to be a background worker. Those actions still need a unified approval and responsibility path.

That is why industry designs tend to evolve along a similar chain:

A single agent can handle a simple task
-> larger tasks dirty the main context
-> child agents appear for context isolation
-> subtasks can run in parallel
-> supervisor / worker fan-out and fan-in patterns emerge
-> different task types need specialized agents and permission boundaries
-> some tasks require real transfer of control
-> handoff appears
-> some work spans systems or teams
-> protocolized interoperability such as A2A appears
-> some work is long-running and cannot finish in one burst
-> durable execution, memory, sandboxing, and task lifecycle appear

So the core equation is not:

more agents = more intelligence

It is:

more agents = more coordination problems

Every mature design is really answering the same question:

How do you add parallelism without losing context, state, permissions, and accountability?

2. Pattern One: Central Orchestration

The safest multi-agent pattern is central orchestration.

You start with a central coordinator:

User request
-> main coordinator understands the goal
-> breaks it into subtasks
-> dispatches those tasks to specialists
-> waits for the results
-> integrates the outcome and decides what happens next

This is the classic fan-out / fan-in shape: work spreads out from one center and then converges back into that center for synthesis.

In sequence form:

How central orchestration fans work out and then pulls it back in

Applied to the login bug:

The coordinator:
  - sends a code-mapper to trace auth callers
  - sends a test-runner to reproduce the issue and run relevant tests
  - sends a security-reviewer to inspect the risk boundary
  - keeps final fix selection and merge judgment for itself

The main virtue of this pattern is clarity. Each child does local work. One central node remains responsible for the global interpretation. Child agents do not need to know everything, and they should not make the final call on their own.

OpenAI Agents SDK’s Agent.as_tool() is a clean example of this design. The coordinating agent treats a specialist as a tool invocation, so control never leaves the center. Many Claude Code and Codex subagent scenarios feel similar: the parent dispatches, the child returns a result, and the parent decides what to do with it.

This solves the first core problem:

In a multi-agent system, someone still has to be responsible for telling the final story.

Otherwise, each agent returns a local truth:

The testing agent says the suite is red.
The security agent says there is risk.
The implementation agent says the fix is done.
The frontend agent says the UI looks normal.

But the user actually needs:

What is the root cause?
What changed?
Why is this fix safe?
Which tests show old behavior still works?
What risk remains?

That synthesis has to happen somewhere.

The limitation is equally clear. If every interaction must flow through the center, the center can become the bottleneck. Long jobs, many subtasks, or heavy peer-to-peer coordination can leave the controller spending more time relaying information than making decisions.

That is why many systems move to the next pattern: turning work into explicit task objects.

3. Pattern Two: Make Tasks First-Class Objects

Multi-agent systems break down quickly when work is assigned only through vague natural language.

Imagine the main agent says:

Go look into the login problem.

That instruction is too loose. The child still does not know:

whether it is allowed to edit code or should stay read-only
whether it is expected to return evidence or a solution
what to do if it gets stuck
what counts as done
whether other agents are allowed to consume its output

More mature systems turn tasks into explicit objects.

A task should usually have fields such as:

id
description
owner / assignee
status
dependencies
allowed tools
artifacts
logs
result summary
cancel / resume capability

Google’s A2A protocol treats Task as a foundational unit with a lifecycle. Claude Code teams use shared task tables and locking. Codex imposes thread limits, nesting limits, and runtime limits around subagent work. Deep Agents returns task IDs for async children so the supervisor can later check, update, cancel, or list them.

The shared principle is simple:

A child agent is not just a temporary model call. It is an execution unit with a lifecycle.

For the login bug, that might look like:

task: trace-auth-callers
owner: explorer
status: running
mode: read-only
output:
  - key call chain
  - suspicious branches
  - eliminated paths
  - next-step recommendation

Now the coordinator does not need to stare at every message the child produces. It can inspect task state and consume the summary at the right moment.

Turning tasks into first-class objects also solves a hidden but critical problem: recovery.

If a long job dies halfway through, the runtime needs to know:

Which tasks completed?
Which failed?
Which are still running?
Which results were already materialized as artifacts?
Which conclusions were only intermediate guesses?

Without task objects, recovery becomes “read the entire transcript and guess.” Anyone who has dealt with a real production incident at 3 a.m. knows that this is exactly what you do not want.

That is why serious agent runtimes increasingly look like a combination of task system, tool system, and state system rather than a thin prompt wrapper.

4. Pattern Three: Isolate Context, Return Summaries

The earliest multi-agent win is often not parallelism. It is context hygiene.

Claude Code subagents, Codex explorers, and Deep Agents subagents all converge on the same idea:

The child may search deeply and wander through false starts
-> the parent should receive only a compressed result

That is crucial.

An exploration agent debugging a login failure might:

search for session
search for cookie
search for login
read old tests
rule out unrelated modules
inspect a failed log
make one wrong guess
correct it

That process is useful to the child. It is not what the main agent should inherit verbatim.

What the parent really needs is:

Which paths were checked
Which ones can be ruled out
Which two or three locations still look suspicious
Where the evidence lives
What should be validated next

The child should return a summary, not a transcript.

The analogy to a real team is helpful: you do not ask a coworker to recite every search query from the afternoon. You want them to say, “I checked it. The problem is probably in session refresh. I have two pieces of evidence, and the cookie branch can be ruled out.”

That is the value of context isolation.

But isolation has a cost. If the child sees too little context, it may repeat work, misunderstand the task, or diverge from the main line.

So industry systems usually switch between two strategies:

Strategy	Best used for	Main risk
Clean context	Independent subtasks, noisy exploration	The child has to rebuild background understanding
Forked / inherited context	Parallel validation of multiple hypotheses when the parent context is valuable	Higher cost, and inherited context can also inherit bad assumptions

Claude Code’s fork path belongs to the second family: multiple children share the parent’s current prefix and then branch into separate hypotheses.

So context design is not “the more isolated, the better.” The right question is:

Does this child need the parent’s full worksite, or does it benefit more from a clean slate?

5. Pattern Four: Specialize Roles and Bind Tools to Them

Multi-agent systems are not three copies of the same general-purpose agent.

If every agent can read, write, delete, run commands, access the network, and change configuration, you do not have specialization. You have several unstable full-access replicas.

The more mature pattern is role-based capability shaping.

Codex documents built-in roles such as default, worker, and explorer. explorer is explicitly read-heavy. Custom agents can also set model, reasoning effort, sandbox, and MCP configuration. Claude Code lets subagents be defined with Markdown frontmatter such as name, description, and tools, so different subagents have different context and tool boundaries. Deep Agents uses similar fields such as name, description, system_prompt, tools, model, and permissions.

The principle underneath all of this is straightforward:

Explorer: read more, change less, ideally stay read-only
Implementer: change code, but only inside a bounded scope
Reviewer: read the diff and the context, prioritize risk
Docs researcher: access official documentation, not business code
Security reviewer: keep permissions tighter and outputs stricter

For the login bug, that might mean:

Agent	Job	Tool boundary
Explorer	Trace auth callers	Read-only files and search
Worker	Change session refresh logic	Workspace write permission, but within a bounded scope
Test runner	Run login-related tests	Test command execution, but no casual implementation edits
Security reviewer	Inspect cookie, token, and permission risk	Read-only, structured findings
Docs researcher	Validate framework API behavior	Documentation access, no code edits

Role specialization answers “who should do what.”

Tool boundaries answer “who must not do what.”

The second question matters even more.

Models make mistakes. Robust systems cannot rely only on a prompt saying “please do not change too much.” The boundary has to exist at the tool, sandbox, and approval layers as well.

6. Pattern Five: Handoff vs. Specialist-as-Tool

In a centrally orchestrated design, the child often behaves like a tool: it completes a local job and returns control.

But not every case fits that model.

Suppose the user expands the original request:

Also help me design a new SSO integration strategy.

Now the conversation may no longer be a local branch of the original login bug. It may have shifted into a different expert domain entirely. In that case, a real handoff can make more sense:

The current agent recognizes a domain shift
-> transfers the conversation to an SSO specialist
-> that specialist handles the next turns
-> until it finishes or hands control back again

OpenAI Agents SDK expresses this distinction especially clearly:

Pattern	Who keeps control?	Best used for
Agent as tool	The orchestrator keeps control	Local expert questions, reviews, retrieval, supporting analysis
Handoff	Control moves to the downstream agent	The user’s intent has genuinely shifted to another domain and needs continued interaction

One sentence is enough to separate them:

Agent as tool: I ask a specialist to answer a sub-question.
Handoff: from this point on, the specialist owns the conversation.

Engineering systems should be careful with handoff. If every side branch triggers a handoff, the user experiences constant speaker changes and the main line of work becomes hard to track.

So the safer default is usually specialist-as-tool or subagent dispatch. Let the specialist handle a local branch, then bring the result back to the orchestrator. Reserve handoff for cases where the subject of the conversation has truly changed.

7. Pattern Six: Protocolized Interoperability

So far, most of the patterns above live inside one runtime: Claude Code dispatches a subagent, Codex launches an explorer, Deep Agents invokes a supervisor-child pattern, OpenAI Runner performs a handoff.

But industry systems also face a larger question:

What happens when two agents come from different frameworks, vendors, or teams?

That is where A2A enters.

A2A treats a remote agent as something discoverable, invocable, and trackable. The caller does not need to know how the remote side is implemented internally. Both sides only need to speak a shared object model:

Agent Card: who the remote agent is, what it can do, how it authenticates
Message: one unit of communication
Task: a stateful unit of work
Artifact: the output of that task
Streaming / Push: progress and long-task notification

This resembles MCP in spirit, but it solves a different layer:

MCP is mainly about how an agent calls tools and resources.
A2A is mainly about how one agent calls another agent.

In the login-bug example, imagine the company already runs a security-review agent service outside the current Claude Code or Codex process. The orchestrator can discover that service through A2A, submit a review task, and subscribe to its status and artifacts.

This solves a different class of problem: cross-organization and cross-runtime collaboration.

Different teams can own their own specialist agents.
Callers do not need to know the callee’s internal prompt or toolchain.
Long tasks can be tracked through a task state machine.
Results can be returned as standardized artifacts.
Authentication and capability declarations live in the agent card.

But protocolized collaboration also introduces a new layer of complexity:

authentication
authorization
timeouts
retries
version compatibility
task cancellation
output trust
sensitive data boundaries

So A2A is not “let agents chat freely.” It is a way to expose remote agents as services with explicit tasks, state, and artifacts.

That is one reason the Gemini / ADK ecosystem is especially notable here: it does not just provide workflow agents. It also pushes agent-to-agent interoperability up to the protocol layer.

8. Pattern Seven: Long-Running Agent Harnesses

Once a task stretches from minutes to hours or even multiple days, multi-agent collaboration hits another layer of difficulty.

For example:

Fix the login bug
-> discover the auth library must be upgraded
-> migrate old sessions
-> run a full regression suite
-> wait for user confirmation on compatibility policy
-> generate migration docs

That is not something one model call can absorb.

Long tasks need a harness:

short-term state: what the current thread is doing right now
long-term memory: facts and preferences that survive across sessions
task state: what completed, failed, or was cancelled
artifacts: where intermediate outputs live
sandbox: how execution environments stay isolated
approval: how risky actions pause for human review
tracing: how failures can be traced back to the step that caused them

Deep Agents makes this direction especially visible. It describes itself as an agent harness built on LangGraph runtime, combining planning, subagents, filesystem context, long-term memory, sandboxing, and human-in-the-loop control. Its async subagents let a supervisor launch background work, get back a task ID, keep talking to the user, and later check, update, or cancel the task.

Codex moves in a similar direction, though its expression is more coding-runtime-centric:

subagents have concurrency limits
nesting depth is bounded
sandbox and approval policy shape the execution boundary
app, CLI, and cloud share the same execution model
approval review is scoped to actions that actually require approval

The underlying judgment is the same:

Once an agent can run for a long time, it has to behave like a governable workflow, not like a single chat reply.

Without tracing, you cannot tell which agent failed.

Without sandboxing, more capability means more danger.

Without memory boundaries, errors become durable too.

Without approvals, risky actions disappear into the background.

Without task lifecycle, pause and resume exist only in a human’s head.

The challenge of long-running agents is not whether the model can keep going. It is whether the surrounding system can safely support that persistence.

9. Put the Systems on One Map

Now the major design families can be compared on one page:

System / framework	What it feels like	Collaboration core	Strength	Main boundary
Gemini / ADK / A2A	Enterprise orchestration and interoperability platform	Workflow agents + A2A task / agent card model	Protocolized collaboration, task state, cross-system integration	Internal scheduling details remain opaque, and engineering complexity is high
Claude Code	Collaborative terminal workbench	Subagents + context isolation + teams / mailbox	Strong local engineering workflow, context hygiene, team-like coordination	Recovery, conflict convergence, and protocolization are comparatively lighter
Codex	Strong central coding-execution kernel	Parent fan-out / fan-in + custom agents + sandbox / approvals	Approval model, safety boundaries, configurable concurrency, coding workflow	More centered on controlled orchestration than free-form swarm behavior
OpenAI Agents SDK	Programmable orchestration API	Handoffs + agents as tools + runner tracing	Clear control-transfer semantics, good fit for product integration	Developers must design task objects and permission policy themselves
Deep Agents	Long-running agent harness	Supervisor + sync / async subagents + memory / sandbox	Long tasks, persistence, pluggable backend orientation	Preview-style features and distributed governance still require more custom work

This table is not about ranking winners. It is about seeing that different systems are solving different layers of the same problem space.

If you mainly want to split a large coding job into several read-heavy explorations, Claude Code or Codex subagents may be enough.
If you need to orchestrate several specialists inside a product backend, OpenAI Agents SDK may be the more direct fit.
If you care about cross-team, cross-system agent interoperability, A2A-like protocols matter more.
If you need work that lasts hours, pauses, resumes, and accumulates memory and artifacts, a harness-oriented design such as Deep Agents is closer to the target shape.

10. Industry-Wide Design Consensus

Looking across these systems, several practical design rules show up again and again.

1. Split the Task Boundary Before You Split the Agents

Do not start by asking:

How many agents should I create?

Start by asking:

Which work can be completed independently?
Which work needs shared context?
Which work is read-only?
Which work has side effects?
Which results still need final synthesis by a coordinator?

Once the task boundaries are clear, the number of agents usually becomes obvious.

2. Parallelize Reads First, Converge Writes

The best candidates for parallel work are:

code search
documentation lookup
log inspection
independent test execution
candidate-solution analysis
independent review

The worst candidates for free-form parallelism are:

multiple agents editing the same file at once
multiple agents migrating schema simultaneously
multiple agents updating shared memory at the same time
multiple agents deciding release strategy independently

One line captures the principle:

Reads can fan out. Writes should converge.

3. Child Agents Should Return Conclusions, Not Process Noise

The best child-agent deliverables often look like this:

What was done
What was found
Where the evidence lives
What was ruled out
What should happen next
What remains uncertain

Do not dump every tool call, every search result, and every failed path back into the main context.

4. Permissions Should Be Bound to Roles, Not to Model Confidence

A model saying “I’m confident” is not a valid permission system.

The stronger pattern is:

explorer defaults to read-only
reviewer defaults to read-only
worker gets bounded workspace write access
network is off by default
dangerous commands require approval
escaping the sandbox requires approval

That is where Codex sandboxing and approvals, Claude Code subagent tools, Deep Agents permissions, and Gemini-style platform governance all begin to converge.

5. Humans Are Not a Low-Level Fallback

Multi-agent systems should not treat the user as the person to ask only when the model is confused.

The user is better understood as the approver at key risk boundaries:

Should compatibility be broken?
Should the old path be removed?
Should the system access external resources?
Should a migration be executed?
Should a long-term memory item be saved?

The more agents you add, the more important it becomes to define when the human is brought back in.

6. Tracing Is Not Optional

If a single-agent system fails, you can often reconstruct the problem from one transcript.

In a multi-agent system without tracing, failure becomes:

You do not know which agent reasoned incorrectly
You do not know which handoff lost context
You do not know which tool call polluted the state
You do not know which child wrote a bad artifact

That is why mature systems elevate tracing, usage, task status, artifacts, and approval records into first-class runtime objects.

11. A Practical Design Template

If you were designing an engineering-focused multi-agent system yourself, a useful starting template would look like this:

Supervisor
  - understands the user goal
  - maintains the task ledger
  - dispatches subtasks
  - collects results
  - makes final decisions

Task ledger
  - id
  - status
  - owner
  - dependencies
  - artifacts
  - cancel / resume

Specialized agents
  - explorer: read-heavy exploration
  - worker: bounded implementation
  - reviewer: independent review
  - tester: validation and reproduction
  - docs researcher: external-document calibration

Context policy
  - clean context for noisy exploration
  - forked context for parallel hypothesis testing
  - summary-only return to supervisor

Permission policy
  - read-only by default
  - workspace-write for bounded implementation
  - network / external write / destructive commands require approval

Observability
  - trace every tool call
  - record every handoff
  - record task state
  - store artifacts separately from chat history

Applied to the login bug, the flow looks like this:

A practical multi-agent workflow for the running login-bug example

The most important thing in that diagram is not the number of agents. It is the boundaries:

read tasks fan out in parallel
write tasks converge
review stays independent
high-risk actions go back to a human
the supervisor integrates the final result

12. What This Clarifies About Claude Code

Seen through this industry lens, Claude Code’s objects become easier to interpret.

AgentTool is not “a model calling another model.” It is a controlled dispatch point.

Subagents are not “another chat window.” They are context-isolation units.

Fork is not “copy the session and see what happens.” It is a strategy for inheriting parent context and testing several hypotheses in parallel.

The task system is not just a background job list. It is lifecycle management for multiple execution units.

SendMessageTool is not casual chat between agents. It is a protocolized communication surface.

Coordinator and team modes are not multi-agent theatrics. They are upgrades in task organization.

AskUserQuestion and permission bubbling are not awkward interruptions. They are the mechanisms that route risky decisions back to the proper decision-maker.

From that angle, Claude Code is not an isolated oddity. It is converging toward the same general shape as Gemini, Codex, OpenAI Agents SDK, and Deep Agents:

Wrap model capability inside a system that is organizable, traceable, recoverable, approvable, and isolatable

The difference is emphasis.

Claude Code leans harder into local engineering collaboration. Codex leans harder into central execution and approvals. Gemini leans harder into platform protocols. OpenAI Agents SDK leans harder into programmable orchestration. Deep Agents leans harder into the long-running harness.

13. A Final Test

Multi-agent systems are not there to make a product look more “intelligent.”

They exist to solve this problem:

When a task becomes too large, too noisy, too long, or too risky,
how do you split it into execution units that are isolatable, parallelizable,
communicative, stoppable, and reviewable?

So if you want to judge whether a multi-agent design is any good, do not start by counting how many agents it has. Start with six questions:

Who makes the final decision?
Where does task state live?
How is context isolated?
How do permissions converge?
How do results come back?
How does failure recover?

If those questions are fuzzy, multi-agent design only parallelizes confusion.

If those questions are clear, multi-agent design starts to become an engineering collaboration system rather than a collection of chat windows.