The Two-Layer Architecture That Stopped My AI Agents from Collapsing

Bilal El Alamy9 min read
multi-agent architectureAI orchestrationagent coordinationAI agents

Part 2 of 4 โ€” Part 1: Why 31 Agents ยท Part 3: Quality Control ยท Part 4: Costs & Memory


How I solved the coordination problem for 31 AI agents โ€” by splitting orchestration into strategic routing and domain-level task decomposition.

In Part 1, I explained why I needed 31 AI agents organized into 8 teams โ€” and what each one does. But having the right agents isn't enough. You need the right architecture to coordinate them.

This is the part where AI agent orchestration either works or falls apart. Not the AI. The plumbing.


Flat Orchestration Fails at 31 Agents

Most multi-agent systems use one orchestrator that talks to all agents. One brain, many hands. It works fine at 5 agents. It does not work at 31.

We tried it. For the first two weeks, Alfrawd โ€” our master agent โ€” talked directly to all 31 agents. The results were terrible.

Three compounding failures killed the flat approach:

Context overflow. Each agent handoff requires context about the task, the upstream output, the downstream requirements, and the domain constraints. At 31 agents, the orchestrator's context window fills with coordination overhead before it can do any actual thinking.

Domain dilution. An orchestrator managing marketing, engineering, finance, and legal simultaneously becomes a generalist giving shallow instructions to specialists. It can't decompose "build the dashboard" into frontend/backend/devops sub-tasks because it doesn't hold enough engineering context to know what that means.

Single point of failure. Every request, every failure, every synthesis funnels through one brain. When that brain gets overwhelmed, everything queues. When it makes a mistake, it cascades everywhere.

By week two, the symptoms were clear:

  • Response quality dropped by the third concurrent workstream
  • Sub-tasks got shallow, generic instructions instead of domain-specific decomposition
  • Alfrawd spent more tokens on coordination than on reasoning
  • Failures in one domain bled into another because context was shared

The fix wasn't more context window. It was organizational design.


The Two-Layer Solution

We split orchestration into two distinct layers, each handling a different cognitive job.

View details
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  LAYER 1: STRATEGIC ROUTING                     โ”‚
โ”‚  Alfrawd (Master Agent)                         โ”‚
โ”‚  Routes, synthesizes, escalates, connects dots  โ”‚
โ”‚  Does NOT: micromanage, review code, write copy  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                     โ”‚
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚            โ”‚                โ”‚
        โ–ผ            โ–ผ                โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Thom    โ”‚  โ”‚  Peiy    โ”‚    โ”‚  Warren  โ”‚  ...
โ”‚  (Eng)   โ”‚  โ”‚  (Mktg)  โ”‚    โ”‚  (CFO)   โ”‚
โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜
     โ”‚              โ”‚               โ”‚
     โ–ผ              โ–ผ               โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  LAYER 2: DOMAIN TASK DECOMPOSITION          โ”‚
โ”‚  Orchestrators spawn specialists             โ”‚
โ”‚  Own their domain end-to-end                 โ”‚
โ”‚  Validate output before passing upstream     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Layer 1 handles cross-domain routing and synthesis. Layer 2 handles intra-domain decomposition and execution. Same separation you'd find in any well-run company: executives set direction, department heads run their teams. Except here, AI agent orchestration happens at both levels simultaneously.

The difference: this runs on AI agents, and the entire hierarchy coordinates through files.


Layer 1: What Alfrawd Actually Does

Alfrawd sits between me and the entire operation. Named after Alfred Pennyworth โ€” the man who kept the Batcave running and Bruce Wayne sane.

Alfrawd's job is NOT to coordinate the work. That's the critical distinction. His job is to:

  • Route requests to the right team
  • Synthesize information from across all teams
  • Escalate decisions that require my judgment
  • Connect dots that no single-domain agent can see
  • Shield me from noise

When I ask "how's this week going?", Alfrawd doesn't relay the question to 31 agents. He pulls from Vivi's timeline, Thom's build status, Pepe's test results, and Peiy's launch prep โ€” then delivers one coherent answer.

When a regulatory deadline from Warren affects a product launch that Vivi is coordinating, Alfrawd connects those dots. No other agent has that cross-domain visibility.

View details
Me โ†’ Alfrawd
โ”œโ”€โ”€ Quick/direct       โ†’ Handle it himself
โ”œโ”€โ”€ Product question   โ†’ Pull from Vivi โ†’ synthesize โ†’ present
โ”œโ”€โ”€ Technical issue    โ†’ Route to Thom โ†’ appropriate sub-agent
โ”œโ”€โ”€ Financial/Legal    โ†’ Route to Warren โ†’ warren-finance or warren-legal
โ”œโ”€โ”€ Investment         โ†’ Route to Hari โ†’ hari-equities, hari-crypto
โ”œโ”€โ”€ Marketing          โ†’ Route to Peiy โ†’ specialist
โ”œโ”€โ”€ Design review      โ†’ Route to Jack โ†’ jack-ui or jack-brand
โ”œโ”€โ”€ Cross-domain       โ†’ Pull from multiple agents โ†’ synthesize
โ””โ”€โ”€ "Brief me"         โ†’ Compile across all agents โ†’ one briefing

What Alfrawd Does NOT Do

This matters more than what he does.

  • Micromanage the weekly product cycle. That's Vivi.
  • Review code. That's Nico.
  • Certify launches. That's Pepe.
  • Write marketing copy. That's Peiy's team.

The discipline is in not doing things. Alfrawd could code, design, write copy, analyze markets. But so can the specialized agents โ€” and they do it better in their domain. The judgment is knowing when it's faster to do it yourself versus routing to the specialist who'll do it right.


Layer 2: How Domain Orchestrators Work

Below Alfrawd, six orchestrators manage their own teams:

OrchestratorDomainSpecialists
Vivi ๐ŸŽฏProduct CoordinationCoordinates Bill, Jack, Thom, Nico, Pepe, Peiy
Thom ๐Ÿ’ปEngineeringthom-frontend, thom-backend, thom-web3, thom-devops
Jack ๐ŸŽจDesignjack-ui, jack-brand, jack-creative
Pepe ๐ŸงชQApepe-general, pepe-web3
Peiy ๐Ÿ“ฃMarketingpeiy-content, peiy-seo, peiy-analytics, +5 more
Warren ๐Ÿ“ŠFinance & Legalwarren-finance, warren-legal
Hari ๐Ÿ“ˆInvestmentshari-equities, hari-crypto, hari-polymarket

Each orchestrator follows the same pattern:

  1. Receives a goal from Alfrawd or the workflow
  2. Decomposes it into sub-tasks
  3. Spawns the right specialist sub-agent
  4. Validates the output before passing it upstream
  5. Handles failures and retries within their domain

The key insight: orchestrators own their domain. Alfrawd doesn't tell Thom which sub-agent to use for a frontend task. Thom knows. Alfrawd tells Thom "build the dashboard" and Thom decides that means thom-frontend for the UI, thom-backend for the API, and thom-devops for deployment.

This is what makes the multi-agent architecture scale. Each layer handles a different cognitive task. Layer 1 never needs to understand React component patterns. Layer 2 never needs to connect a regulatory deadline to a launch timeline.


How Sub-Agent Spawning Actually Works

When an orchestrator receives a task, it doesn't do the work itself. It spawns a specialist. Here's the exact flow โ€” 7 steps from request to completion:

View details
1. Vivi tells Thom: "Build the AlphaNet dashboard"
2. Thom decomposes:
   - Frontend: React dashboard with charts    โ†’ thom-frontend
   - Backend: Django API for portfolio data   โ†’ thom-backend
   - Deploy: Vercel + Render setup            โ†’ thom-devops
3. Thom spawns thom-frontend with task + context
4. thom-frontend executes in an isolated session
5. thom-frontend self-tests (build passes, lint clean, renders)
6. thom-frontend signals complete โ†’ Thom validates output
7. Thom spawns next sub-agent or signals Vivi: done

Step 2 is where the architecture earns its keep. A flat orchestrator would send "build the AlphaNet dashboard" as one blob to one agent. Thom decomposes it into three parallel workstreams with specific technology constraints. That decomposition requires domain knowledge that only an engineering orchestrator has.

What Each Sub-Agent Receives

When spawned, a sub-agent gets:

  • Its own SOUL.md โ€” personality, capabilities, rules
  • REGRESSIONS.md โ€” mistakes it must never repeat
  • CONTEXT_HOLDS.md โ€” active constraints with expiry dates
  • Task context โ€” what to build, acceptance criteria
  • TASK_REGISTRY.md protocol โ€” how to track its own work

What it does NOT get:

  • Other agents' credentials
  • Access to other workspaces
  • The full conversation history
  • Approval authority

This is deliberate isolation. thom-frontend doesn't need to know what warren-legal is working on. It doesn't need Hari's API keys. It gets exactly the context it needs to do its job and nothing more.

Depth-3 Nesting: Agents Spawning Agents Spawning Agents

Sub-agents can spawn their own children. The hierarchy supports 3 levels deep:

View details
Alfrawd โ†’ Thom โ†’ thom-frontend โ†’ (focused sub-task)
  L1        L2        L3                L3 child

This means Thom can ask thom-frontend to spawn a focused task โ€” "just build this one chart component" โ€” without Alfrawd or me being involved. The orchestrator manages its own domain end-to-end, including delegation within delegation.

The nesting is bounded at 3 levels. Deeper than that and you lose observability. The task registry tracks all active work regardless of depth, so I can always see what's running.


File-Based Communication: The Glue

Agents don't send each other messages. They communicate through files. The task registry (tasks.json) is the shared state.

When thom-frontend completes a task, it updates the registry. When Nico starts a code review, it reads the registry to find what's ready for review. When Vivi checks project status, it reads the same registry.

jsonShow code
{
  "tasks": [
    {
      "id": "alphanet-frontend-dashboard",
      "assignee": "thom-frontend",
      "status": "complete",
      "parent": "thom",
      "output": "./build/dashboard",
      "selfTest": "passed",
      "completedAt": "2025-03-10T14:32:00Z"
    }
  ]
}

This is deliberate. File-based communication is:

  • Observable โ€” I can read the registry at any time and see exactly what's happening
  • Persistent โ€” survives session restarts, which happen constantly with AI agents
  • Debuggable โ€” when something goes wrong, the state is in a file, not lost in a message queue

Information flows up through synthesis โ€” many signals compressed into one briefing. And down through decomposition โ€” one goal expanded into many tasks.

Not event streams. Not message queues. Not API calls between agents. Files. Simple, readable, persistent files. The most boring technology choice in the whole system, and the one that saved us the most debugging time.


What This Architecture Changed

Before the two-layer split, Alfrawd was drowning. Spending 70% of its tokens on coordination, 30% on actual thinking. After the split:

  • Alfrawd handles strategic routing and cross-domain synthesis. That's it.
  • Domain orchestrators handle task decomposition with full domain context. No dilution.
  • Specialists execute in isolation with exactly the context they need. No leakage.
  • File-based state means nothing gets lost between sessions. No amnesia.
  • Depth-3 nesting means orchestrators can delegate without escalating. No bottlenecks.

The cognitive load that was crushing one agent got distributed across 7 orchestrators, each holding deep context in one domain instead of shallow context in all domains.

Same 31 agents. Same capabilities. Different wiring. Everything changed.


The architecture handled coordination. But coordination doesn't guarantee quality. 31 agents producing work at speed can produce garbage at speed.

Part 3 covers how I keep quality high โ€” with human gates, trust scoring, and cascading validation that catches context degradation before it propagates.

Ready to delegate?

Start free with your own AI team. No credit card required.

The Two-Layer Architecture That Stopped My AI Agents from Collapsing โ€” Mr.Chief