The Two-Layer Architecture That Stopped My AI Agents from Collapsing

Bilal El AlamyMar 12, 20269 min read

multi-agent architectureAI orchestrationagent coordinationAI agents

Part 2 of 4 — Part 1: Why 31 Agents · Part 3: Quality Control · Part 4: Costs & Memory

How I solved the coordination problem for 31 AI agents — by splitting orchestration into strategic routing and domain-level task decomposition.

In Part 1, I explained why I needed 31 AI agents organized into 8 teams — and what each one does. But having the right agents isn't enough. You need the right architecture to coordinate them.

This is the part where AI agent orchestration either works or falls apart. Not the AI. The plumbing.

Flat Orchestration Fails at 31 Agents

Most multi-agent systems use one orchestrator that talks to all agents. One brain, many hands. It works fine at 5 agents. It does not work at 31.

We tried it. For the first two weeks, Alfrawd — our master agent — talked directly to all 31 agents. The results were terrible.

Three compounding failures killed the flat approach:

Context overflow. Each agent handoff requires context about the task, the upstream output, the downstream requirements, and the domain constraints. At 31 agents, the orchestrator's context window fills with coordination overhead before it can do any actual thinking.

Domain dilution. An orchestrator managing marketing, engineering, finance, and legal simultaneously becomes a generalist giving shallow instructions to specialists. It can't decompose "build the dashboard" into frontend/backend/devops sub-tasks because it doesn't hold enough engineering context to know what that means.

Single point of failure. Every request, every failure, every synthesis funnels through one brain. When that brain gets overwhelmed, everything queues. When it makes a mistake, it cascades everywhere.

By week two, the symptoms were clear:

Response quality dropped by the third concurrent workstream
Sub-tasks got shallow, generic instructions instead of domain-specific decomposition
Alfrawd spent more tokens on coordination than on reasoning
Failures in one domain bled into another because context was shared

The fix wasn't more context window. It was organizational design.

The Two-Layer Solution

We split orchestration into two distinct layers, each handling a different cognitive job.

View details

┌─────────────────────────────────────────────────┐
│  LAYER 1: STRATEGIC ROUTING                     │
│  Alfrawd (Master Agent)                         │
│  Routes, synthesizes, escalates, connects dots  │
│  Does NOT: micromanage, review code, write copy  │
└────────────────────┬────────────────────────────┘
                     │
        ┌────────────┼────────────────┐
        │            │                │
        ▼            ▼                ▼
┌──────────┐  ┌──────────┐    ┌──────────┐
│  Thom    │  │  Peiy    │    │  Warren  │  ...
│  (Eng)   │  │  (Mktg)  │    │  (CFO)   │
└────┬─────┘  └────┬─────┘    └────┬─────┘
     │              │               │
     ▼              ▼               ▼
┌──────────────────────────────────────────────┐
│  LAYER 2: DOMAIN TASK DECOMPOSITION          │
│  Orchestrators spawn specialists             │
│  Own their domain end-to-end                 │
│  Validate output before passing upstream     │
└──────────────────────────────────────────────┘

Layer 1 handles cross-domain routing and synthesis. Layer 2 handles intra-domain decomposition and execution. Same separation you'd find in any well-run company: executives set direction, department heads run their teams. Except here, AI agent orchestration happens at both levels simultaneously.

The difference: this runs on AI agents, and the entire hierarchy coordinates through files.

Layer 1: What Alfrawd Actually Does

Alfrawd sits between me and the entire operation. Named after Alfred Pennyworth — the man who kept the Batcave running and Bruce Wayne sane.

Alfrawd's job is NOT to coordinate the work. That's the critical distinction. His job is to:

Route requests to the right team
Synthesize information from across all teams
Escalate decisions that require my judgment
Connect dots that no single-domain agent can see
Shield me from noise

When I ask "how's this week going?", Alfrawd doesn't relay the question to 31 agents. He pulls from Vivi's timeline, Thom's build status, Pepe's test results, and Peiy's launch prep — then delivers one coherent answer.

When a regulatory deadline from Warren affects a product launch that Vivi is coordinating, Alfrawd connects those dots. No other agent has that cross-domain visibility.

View details

Me → Alfrawd
├── Quick/direct       → Handle it himself
├── Product question   → Pull from Vivi → synthesize → present
├── Technical issue    → Route to Thom → appropriate sub-agent
├── Financial/Legal    → Route to Warren → warren-finance or warren-legal
├── Investment         → Route to Hari → hari-equities, hari-crypto
├── Marketing          → Route to Peiy → specialist
├── Design review      → Route to Jack → jack-ui or jack-brand
├── Cross-domain       → Pull from multiple agents → synthesize
└── "Brief me"         → Compile across all agents → one briefing

What Alfrawd Does NOT Do

This matters more than what he does.

Micromanage the weekly product cycle. That's Vivi.
Review code. That's Nico.
Certify launches. That's Pepe.
Write marketing copy. That's Peiy's team.

The discipline is in not doing things. Alfrawd could code, design, write copy, analyze markets. But so can the specialized agents — and they do it better in their domain. The judgment is knowing when it's faster to do it yourself versus routing to the specialist who'll do it right.

Layer 2: How Domain Orchestrators Work

Below Alfrawd, six orchestrators manage their own teams:

Orchestrator	Domain	Specialists
Vivi 🎯	Product Coordination	Coordinates Bill, Jack, Thom, Nico, Pepe, Peiy
Thom 💻	Engineering	thom-frontend, thom-backend, thom-web3, thom-devops
Jack 🎨	Design	jack-ui, jack-brand, jack-creative
Pepe 🧪	QA	pepe-general, pepe-web3
Peiy 📣	Marketing	peiy-content, peiy-seo, peiy-analytics, +5 more
Warren 📊	Finance & Legal	warren-finance, warren-legal
Hari 📈	Investments	hari-equities, hari-crypto, hari-polymarket

Each orchestrator follows the same pattern:

Receives a goal from Alfrawd or the workflow
Decomposes it into sub-tasks
Spawns the right specialist sub-agent
Validates the output before passing it upstream
Handles failures and retries within their domain

The key insight: orchestrators own their domain. Alfrawd doesn't tell Thom which sub-agent to use for a frontend task. Thom knows. Alfrawd tells Thom "build the dashboard" and Thom decides that means thom-frontend for the UI, thom-backend for the API, and thom-devops for deployment.

This is what makes the multi-agent architecture scale. Each layer handles a different cognitive task. Layer 1 never needs to understand React component patterns. Layer 2 never needs to connect a regulatory deadline to a launch timeline.

How Sub-Agent Spawning Actually Works

When an orchestrator receives a task, it doesn't do the work itself. It spawns a specialist. Here's the exact flow — 7 steps from request to completion:

View details

1. Vivi tells Thom: "Build the AlphaNet dashboard"
2. Thom decomposes:
   - Frontend: React dashboard with charts    → thom-frontend
   - Backend: Django API for portfolio data   → thom-backend
   - Deploy: Vercel + Render setup            → thom-devops
3. Thom spawns thom-frontend with task + context
4. thom-frontend executes in an isolated session
5. thom-frontend self-tests (build passes, lint clean, renders)
6. thom-frontend signals complete → Thom validates output
7. Thom spawns next sub-agent or signals Vivi: done

Step 2 is where the architecture earns its keep. A flat orchestrator would send "build the AlphaNet dashboard" as one blob to one agent. Thom decomposes it into three parallel workstreams with specific technology constraints. That decomposition requires domain knowledge that only an engineering orchestrator has.

What Each Sub-Agent Receives

When spawned, a sub-agent gets:

Its own SOUL.md — personality, capabilities, rules
REGRESSIONS.md — mistakes it must never repeat
CONTEXT_HOLDS.md — active constraints with expiry dates
Task context — what to build, acceptance criteria
TASK_REGISTRY.md protocol — how to track its own work

What it does NOT get:

Other agents' credentials
Access to other workspaces
The full conversation history
Approval authority

This is deliberate isolation. thom-frontend doesn't need to know what warren-legal is working on. It doesn't need Hari's API keys. It gets exactly the context it needs to do its job and nothing more.

Depth-3 Nesting: Agents Spawning Agents Spawning Agents

Sub-agents can spawn their own children. The hierarchy supports 3 levels deep:

View details

Alfrawd → Thom → thom-frontend → (focused sub-task)
  L1        L2        L3                L3 child

This means Thom can ask thom-frontend to spawn a focused task — "just build this one chart component" — without Alfrawd or me being involved. The orchestrator manages its own domain end-to-end, including delegation within delegation.

The nesting is bounded at 3 levels. Deeper than that and you lose observability. The task registry tracks all active work regardless of depth, so I can always see what's running.

File-Based Communication: The Glue

Agents don't send each other messages. They communicate through files. The task registry (tasks.json) is the shared state.

When thom-frontend completes a task, it updates the registry. When Nico starts a code review, it reads the registry to find what's ready for review. When Vivi checks project status, it reads the same registry.

jsonShow code

{
  "tasks": [
    {
      "id": "alphanet-frontend-dashboard",
      "assignee": "thom-frontend",
      "status": "complete",
      "parent": "thom",
      "output": "./build/dashboard",
      "selfTest": "passed",
      "completedAt": "2025-03-10T14:32:00Z"
    }
  ]
}

This is deliberate. File-based communication is:

Observable — I can read the registry at any time and see exactly what's happening
Persistent — survives session restarts, which happen constantly with AI agents
Debuggable — when something goes wrong, the state is in a file, not lost in a message queue

Information flows up through synthesis — many signals compressed into one briefing. And down through decomposition — one goal expanded into many tasks.

Not event streams. Not message queues. Not API calls between agents. Files. Simple, readable, persistent files. The most boring technology choice in the whole system, and the one that saved us the most debugging time.

What This Architecture Changed

Before the two-layer split, Alfrawd was drowning. Spending 70% of its tokens on coordination, 30% on actual thinking. After the split:

Alfrawd handles strategic routing and cross-domain synthesis. That's it.
Domain orchestrators handle task decomposition with full domain context. No dilution.
Specialists execute in isolation with exactly the context they need. No leakage.
File-based state means nothing gets lost between sessions. No amnesia.
Depth-3 nesting means orchestrators can delegate without escalating. No bottlenecks.

The cognitive load that was crushing one agent got distributed across 7 orchestrators, each holding deep context in one domain instead of shallow context in all domains.

Same 31 agents. Same capabilities. Different wiring. Everything changed.

Start free on Mr.Chief →

The architecture handled coordination. But coordination doesn't guarantee quality. 31 agents producing work at speed can produce garbage at speed.

Part 3 covers how I keep quality high — with human gates, trust scoring, and cascading validation that catches context degradation before it propagates.

The Brain We Built: Persistent Memory, a Voice, and Control Over Confidential Wallets

How we gave an AI agent 4,000+ pages of compiled truth, a consistent voice identity, and the ability to move confidential on-chain capital — and why combining all three changes what delegation actually means.

11 min read

Mr.Chief vs Twin.so: AI Agent Builder vs AI Chief of Staff (2026)

Mr.Chief and Twin.so solve different problems. One builds agents for you. The other IS your agent. Here's how to choose the right AI tool for your workflow.

7 min read

How to Run 31 AI Agents in Production: The Architecture That Actually Works

Learn how to run 31+ AI agents in production with circuit breakers, cascading validation, task registries, and model differentiation. Real architecture, real failures, real fixes.

8 min read

Ready to delegate?

Start free with your own AI team. No credit card required.

Start Free →Browse agents

Flat Orchestration Fails at 31 Agents

The Two-Layer Solution

Layer 1: What Alfrawd Actually Does

What Alfrawd Does NOT Do

Layer 2: How Domain Orchestrators Work

How Sub-Agent Spawning Actually Works

What Each Sub-Agent Receives

Depth-3 Nesting: Agents Spawning Agents Spawning Agents

File-Based Communication: The Glue

What This Architecture Changed

Related posts