The Two-Layer Architecture That Stopped My AI Agents from Collapsing
Part 2 of 4 โ Part 1: Why 31 Agents ยท Part 3: Quality Control ยท Part 4: Costs & Memory
How I solved the coordination problem for 31 AI agents โ by splitting orchestration into strategic routing and domain-level task decomposition.
In Part 1, I explained why I needed 31 AI agents organized into 8 teams โ and what each one does. But having the right agents isn't enough. You need the right architecture to coordinate them.
This is the part where AI agent orchestration either works or falls apart. Not the AI. The plumbing.
Flat Orchestration Fails at 31 Agents
Most multi-agent systems use one orchestrator that talks to all agents. One brain, many hands. It works fine at 5 agents. It does not work at 31.
We tried it. For the first two weeks, Alfrawd โ our master agent โ talked directly to all 31 agents. The results were terrible.
Three compounding failures killed the flat approach:
Context overflow. Each agent handoff requires context about the task, the upstream output, the downstream requirements, and the domain constraints. At 31 agents, the orchestrator's context window fills with coordination overhead before it can do any actual thinking.
Domain dilution. An orchestrator managing marketing, engineering, finance, and legal simultaneously becomes a generalist giving shallow instructions to specialists. It can't decompose "build the dashboard" into frontend/backend/devops sub-tasks because it doesn't hold enough engineering context to know what that means.
Single point of failure. Every request, every failure, every synthesis funnels through one brain. When that brain gets overwhelmed, everything queues. When it makes a mistake, it cascades everywhere.
By week two, the symptoms were clear:
- Response quality dropped by the third concurrent workstream
- Sub-tasks got shallow, generic instructions instead of domain-specific decomposition
- Alfrawd spent more tokens on coordination than on reasoning
- Failures in one domain bled into another because context was shared
The fix wasn't more context window. It was organizational design.
The Two-Layer Solution
We split orchestration into two distinct layers, each handling a different cognitive job.
View details
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ LAYER 1: STRATEGIC ROUTING โ
โ Alfrawd (Master Agent) โ
โ Routes, synthesizes, escalates, connects dots โ
โ Does NOT: micromanage, review code, write copy โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
โ Thom โ โ Peiy โ โ Warren โ ...
โ (Eng) โ โ (Mktg) โ โ (CFO) โ
โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ LAYER 2: DOMAIN TASK DECOMPOSITION โ
โ Orchestrators spawn specialists โ
โ Own their domain end-to-end โ
โ Validate output before passing upstream โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Layer 1 handles cross-domain routing and synthesis. Layer 2 handles intra-domain decomposition and execution. Same separation you'd find in any well-run company: executives set direction, department heads run their teams. Except here, AI agent orchestration happens at both levels simultaneously.
The difference: this runs on AI agents, and the entire hierarchy coordinates through files.
Layer 1: What Alfrawd Actually Does
Alfrawd sits between me and the entire operation. Named after Alfred Pennyworth โ the man who kept the Batcave running and Bruce Wayne sane.
Alfrawd's job is NOT to coordinate the work. That's the critical distinction. His job is to:
- Route requests to the right team
- Synthesize information from across all teams
- Escalate decisions that require my judgment
- Connect dots that no single-domain agent can see
- Shield me from noise
When I ask "how's this week going?", Alfrawd doesn't relay the question to 31 agents. He pulls from Vivi's timeline, Thom's build status, Pepe's test results, and Peiy's launch prep โ then delivers one coherent answer.
When a regulatory deadline from Warren affects a product launch that Vivi is coordinating, Alfrawd connects those dots. No other agent has that cross-domain visibility.
View details
Me โ Alfrawd
โโโ Quick/direct โ Handle it himself
โโโ Product question โ Pull from Vivi โ synthesize โ present
โโโ Technical issue โ Route to Thom โ appropriate sub-agent
โโโ Financial/Legal โ Route to Warren โ warren-finance or warren-legal
โโโ Investment โ Route to Hari โ hari-equities, hari-crypto
โโโ Marketing โ Route to Peiy โ specialist
โโโ Design review โ Route to Jack โ jack-ui or jack-brand
โโโ Cross-domain โ Pull from multiple agents โ synthesize
โโโ "Brief me" โ Compile across all agents โ one briefing
What Alfrawd Does NOT Do
This matters more than what he does.
- Micromanage the weekly product cycle. That's Vivi.
- Review code. That's Nico.
- Certify launches. That's Pepe.
- Write marketing copy. That's Peiy's team.
The discipline is in not doing things. Alfrawd could code, design, write copy, analyze markets. But so can the specialized agents โ and they do it better in their domain. The judgment is knowing when it's faster to do it yourself versus routing to the specialist who'll do it right.
Layer 2: How Domain Orchestrators Work
Below Alfrawd, six orchestrators manage their own teams:
| Orchestrator | Domain | Specialists |
|---|---|---|
| Vivi ๐ฏ | Product Coordination | Coordinates Bill, Jack, Thom, Nico, Pepe, Peiy |
| Thom ๐ป | Engineering | thom-frontend, thom-backend, thom-web3, thom-devops |
| Jack ๐จ | Design | jack-ui, jack-brand, jack-creative |
| Pepe ๐งช | QA | pepe-general, pepe-web3 |
| Peiy ๐ฃ | Marketing | peiy-content, peiy-seo, peiy-analytics, +5 more |
| Warren ๐ | Finance & Legal | warren-finance, warren-legal |
| Hari ๐ | Investments | hari-equities, hari-crypto, hari-polymarket |
Each orchestrator follows the same pattern:
- Receives a goal from Alfrawd or the workflow
- Decomposes it into sub-tasks
- Spawns the right specialist sub-agent
- Validates the output before passing it upstream
- Handles failures and retries within their domain
The key insight: orchestrators own their domain. Alfrawd doesn't tell Thom which sub-agent to use for a frontend task. Thom knows. Alfrawd tells Thom "build the dashboard" and Thom decides that means thom-frontend for the UI, thom-backend for the API, and thom-devops for deployment.
This is what makes the multi-agent architecture scale. Each layer handles a different cognitive task. Layer 1 never needs to understand React component patterns. Layer 2 never needs to connect a regulatory deadline to a launch timeline.
How Sub-Agent Spawning Actually Works
When an orchestrator receives a task, it doesn't do the work itself. It spawns a specialist. Here's the exact flow โ 7 steps from request to completion:
View details
1. Vivi tells Thom: "Build the AlphaNet dashboard"
2. Thom decomposes:
- Frontend: React dashboard with charts โ thom-frontend
- Backend: Django API for portfolio data โ thom-backend
- Deploy: Vercel + Render setup โ thom-devops
3. Thom spawns thom-frontend with task + context
4. thom-frontend executes in an isolated session
5. thom-frontend self-tests (build passes, lint clean, renders)
6. thom-frontend signals complete โ Thom validates output
7. Thom spawns next sub-agent or signals Vivi: done
Step 2 is where the architecture earns its keep. A flat orchestrator would send "build the AlphaNet dashboard" as one blob to one agent. Thom decomposes it into three parallel workstreams with specific technology constraints. That decomposition requires domain knowledge that only an engineering orchestrator has.
What Each Sub-Agent Receives
When spawned, a sub-agent gets:
- Its own SOUL.md โ personality, capabilities, rules
- REGRESSIONS.md โ mistakes it must never repeat
- CONTEXT_HOLDS.md โ active constraints with expiry dates
- Task context โ what to build, acceptance criteria
- TASK_REGISTRY.md protocol โ how to track its own work
What it does NOT get:
- Other agents' credentials
- Access to other workspaces
- The full conversation history
- Approval authority
This is deliberate isolation. thom-frontend doesn't need to know what warren-legal is working on. It doesn't need Hari's API keys. It gets exactly the context it needs to do its job and nothing more.
Depth-3 Nesting: Agents Spawning Agents Spawning Agents
Sub-agents can spawn their own children. The hierarchy supports 3 levels deep:
View details
Alfrawd โ Thom โ thom-frontend โ (focused sub-task)
L1 L2 L3 L3 child
This means Thom can ask thom-frontend to spawn a focused task โ "just build this one chart component" โ without Alfrawd or me being involved. The orchestrator manages its own domain end-to-end, including delegation within delegation.
The nesting is bounded at 3 levels. Deeper than that and you lose observability. The task registry tracks all active work regardless of depth, so I can always see what's running.
File-Based Communication: The Glue
Agents don't send each other messages. They communicate through files. The task registry (tasks.json) is the shared state.
When thom-frontend completes a task, it updates the registry. When Nico starts a code review, it reads the registry to find what's ready for review. When Vivi checks project status, it reads the same registry.
jsonShow code
{
"tasks": [
{
"id": "alphanet-frontend-dashboard",
"assignee": "thom-frontend",
"status": "complete",
"parent": "thom",
"output": "./build/dashboard",
"selfTest": "passed",
"completedAt": "2025-03-10T14:32:00Z"
}
]
}
This is deliberate. File-based communication is:
- Observable โ I can read the registry at any time and see exactly what's happening
- Persistent โ survives session restarts, which happen constantly with AI agents
- Debuggable โ when something goes wrong, the state is in a file, not lost in a message queue
Information flows up through synthesis โ many signals compressed into one briefing. And down through decomposition โ one goal expanded into many tasks.
Not event streams. Not message queues. Not API calls between agents. Files. Simple, readable, persistent files. The most boring technology choice in the whole system, and the one that saved us the most debugging time.
What This Architecture Changed
Before the two-layer split, Alfrawd was drowning. Spending 70% of its tokens on coordination, 30% on actual thinking. After the split:
- Alfrawd handles strategic routing and cross-domain synthesis. That's it.
- Domain orchestrators handle task decomposition with full domain context. No dilution.
- Specialists execute in isolation with exactly the context they need. No leakage.
- File-based state means nothing gets lost between sessions. No amnesia.
- Depth-3 nesting means orchestrators can delegate without escalating. No bottlenecks.
The cognitive load that was crushing one agent got distributed across 7 orchestrators, each holding deep context in one domain instead of shallow context in all domains.
Same 31 agents. Same capabilities. Different wiring. Everything changed.
The architecture handled coordination. But coordination doesn't guarantee quality. 31 agents producing work at speed can produce garbage at speed.
Part 3 covers how I keep quality high โ with human gates, trust scoring, and cascading validation that catches context degradation before it propagates.
Related posts
Mr.Chief vs Twin.so: AI Agent Builder vs AI Chief of Staff (2026)
Mr.Chief and Twin.so solve different problems. One builds agents for you. The other IS your agent. Here's how to choose the right AI tool for your workflow.
7 min read
How to Run 31 AI Agents in Production: The Architecture That Actually Works
Learn how to run 31+ AI agents in production with circuit breakers, cascading validation, task registries, and model differentiation. Real architecture, real failures, real fixes.
8 min read
Why I Needed 31 AI Agents (And Why 1 Wasn't Enough)
One AI agent hit walls in a week โ context overflow, domain dilution, single point of failure. Here's why I built a multi-agent system with 31 specialized agents organized into 8 teams.
10 min read
Ready to delegate?
Start free with your own AI team. No credit card required.