← Back to Blog

Exploring Multi-Agent Coordination: Orchestrating CrewAI and LangGraph Systems

OpenAgents.mom · 2026-05-08 · 10 min read

You've built one agent. It works. Now you're hitting the wall: a single agent runs out of context, loses focus on complex multi-step work, and can't handle concurrent tasks. The answer isn't a bigger model—it's multiple agents working together.

Multi-agent systems sound elegant in theory. In practice, they're a coordination nightmare. Who runs which task? What happens when agent A's output breaks agent B's assumptions? How do you debug a workflow where five agents are running in parallel and one silently fails? This is where orchestration frameworks become essential.

The Coordination Problem: Why One Agent Isn't Enough

Single agents hit a hard ceiling. You give your agent a task like "automate our entire billing workflow," and here's what breaks:

A single agent's context window fills fast. You add memory management code, tool descriptions, safety guardrails, and your SOUL.md—suddenly you've lost 40% of context to scaffolding, leaving 60% for actual work. Add 50 invoices, 10 templates, and error handling rules, and the agent is effectively blind.

The focus problem is real. An agent that does "generate invoices, send emails, track payments, and handle disputes" is actually doing four jobs poorly. Each task context-pollutes the others. You get worse outputs because the agent is context-thrashing.

Parallel work is impossible. If you need to process 100 invoices and send 50 emails simultaneously, a single agent queues them serially. Your workflow that should take 5 minutes takes an hour.

This is where multi-agent systems shine: split the work into specialized agents, each focused on one job, running in parallel under a coordinator that delegates and handles failures.

What Multi-Agent Orchestration Actually Means

When people say "multi-agent systems," they mean different things. Let's define the real architecture:

Orchestrator agent: The decision-maker. Receives a high-level task ("process this batch of invoices"), decides which agents to call, in what order, and how to handle their outputs.

Specialized agents: Worker agents. Each does one thing well: "generate an invoice from raw data," "send an email," "log a payment." They don't decide; they execute.

State management: The critical piece everyone forgets. When agent A finishes generating an invoice and agent B needs to send it, how does B get the data? Where does it live? Who cleans it up if B fails?

Error handling: What happens when agent B can't send the email? Does the orchestrator retry? Skip to agent C? Roll back agent A's work?

The two frameworks that solve this best are CrewAI (human-friendly) and LangGraph (more control).

CrewAI: When You Want Agents to Feel Like a Team

CrewAI is purpose-built for multi-agent workflows. You define roles, tools, and a hierarchy. CrewAI handles coordination under the hood.

from crewai import Agent, Task, Crew

# Specialized agents
invoice_generator = Agent(
    role="Invoice Generator",
    goal="Generate accurate invoices from raw transaction data",
    backstory="Expert billing specialist with 10 years experience",
    tools=[template_tool, calculation_tool],
    verbose=True
)

email_sender = Agent(
    role="Email Sender",
    goal="Send invoices to customers with professional templates",
    backstory="Customer communication expert",
    tools=[email_tool, template_tool],
    verbose=True
)

# Tasks define the workflow
generate_invoice = Task(
    description="Generate invoice for customer {customer_id}",
    agent=invoice_generator,
    output_file="invoice_{customer_id}.pdf"
)

send_invoice = Task(
    description="Send the generated invoice to the customer",
    agent=email_sender,
    depends_on=[generate_invoice]  # Sequential: email waits for invoice
)

# Orchestrator
crew = Crew(
    agents=[invoice_generator, email_sender],
    tasks=[generate_invoice, send_invoice],
    verbose=True
)

result = crew.kickoff(inputs={"customer_id": 12345})

What CrewAI handles for you:

Agent personality is real. The invoice agent knows it's responsible for accuracy; the email agent knows it's responsible for tone. They don't cross roles.
State passing is automatic. The output of generate_invoice feeds into send_invoice via depends_on.
Error recovery is built in. If sending fails, CrewAI retries with backoff before escalating.
Reasoning is visible. Set verbose=True and watch each agent explain its decisions.

The downside: CrewAI is opinionated. You get less control over the exact orchestration logic. If your workflow needs conditional branching ("if invoice total > $50,000, route to approval agent first"), you're fighting the framework.

LangGraph: When You Need Surgical Control

LangGraph is LangChain's state-machine orchestrator. Instead of roles and crews, you define nodes (agents or functions) and edges (transitions).

from langgraph.graph import StateGraph
from langchain.agents import AgentExecutor

# Define state (the shared data structure)
class InvoiceState(TypedDict):
    customer_id: str
    invoice_data: dict
    generated_invoice: Optional[str]
    email_status: str
    error: Optional[str]

# Create the graph
graph = StateGraph(InvoiceState)

# Define nodes (agents or functions)
def generate_invoice_node(state):
    try:
        invoice = invoice_generator.invoke(state["customer_id"])
        state["generated_invoice"] = invoice
        return state
    except Exception as e:
        state["error"] = str(e)
        return state

def send_email_node(state):
    if state["error"]:
        state["email_status"] = "skipped: generation failed"
        return state

    try:
        send_mail(state["generated_invoice"])
        state["email_status"] = "sent"
        return state
    except Exception as e:
        state["error"] = str(e)
        state["email_status"] = "failed"
        return state

# Add nodes
graph.add_node("generate", generate_invoice_node)
graph.add_node("send", send_email_node)

# Define edges (transitions)
graph.add_edge("generate", "send")
graph.set_entry_point("generate")
graph.set_finish_point("send")

# Compile and run
workflow = graph.compile()
result = workflow.invoke({"customer_id": 12345})

LangGraph gives you:

Conditional logic. Branch workflows based on state: if state["invoice_total"] > 50000: route_to_approval().
Parallel agents. Two agents work on independent subtasks simultaneously, then sync.
Explicit error handling. You write the error logic; LangGraph doesn't hide it.
State visibility. Everything that passes between agents is in one dict. Debug it.

The cost: more boilerplate. You're writing the orchestration logic explicitly instead of declaring it like in CrewAI.

The Key Difference: Declarative vs Imperative

CrewAI is declarative. You say "here are my agents, here are my tasks, here are the dependencies." CrewAI figures out the execution order and handles coordination.

LangGraph is imperative. You say "do this, then check the state, then maybe do that." You're in control; you're also responsible for getting it right.

For simple, linear workflows (generate invoice → send email → log transaction), CrewAI wins. For complex conditional logic, parallel execution, or tightly-controlled error handling, LangGraph wins.

Common Mistakes in Multi-Agent Systems

Common Mistakes

Not scoping agent roles tightly enough. An agent called "General Worker" doing 5 different tasks is just a single agent with extra complexity. Define one job per agent: "Generate invoices," not "Handle billing."
Forgetting state management. You don't need a database. A dict (LangGraph) or task output files (CrewAI) work fine. But you MUST have a consistent way to pass data between agents.
No timeout guards. A single hanging agent can freeze the entire orchestration. Set timeout=30s on every agent call.
Silent failures. If agent B fails, don't just skip it. Log it, alert on it, escalate it. Add explicit error handlers.
Testing each agent in isolation, not the workflow. Unit test individual agents. Integration-test the full orchestration flow with real data.

Security Guardrails for Orchestrated Agents

Security Guardrails

Limit inter-agent permissions. Agent A (invoice generator) doesn't need email access. Agent B (email sender) doesn't need database write access. Build the minimum tool set per agent.
Add approval gates for critical actions. Before sending an email or deleting a record, route to human approval. Use HITL (human-in-the-loop) checks in your orchestrator.
Audit all inter-agent communication. Log every state transition. If agent A produces output that agent B consumes, log what moved between them. You'll need this for compliance and debugging.
Isolate agent execution contexts. Run agents in separate processes or containers when possible. One agent crashing shouldn't crash the orchestrator.

Real-World Pattern: The Approval Workflow

Here's a concrete multi-agent pattern you'll see often: work agent → review agent → approval gate → action agent.

Agent 1: Researcher

Goal: Find anomalies in transaction data
Tools: Database query, data analysis functions
Output: List of suspicious transactions with reasons

Agent 2: Reviewer

Goal: Validate researcher findings, explain them
Tools: Same database, plus human context (customer history, fraud patterns)
Output: "Approved," "Rejected," or "Needs escalation" + reason

Gate: Human Approval (if escalated)

Route to human: "Agent 2 found potential fraud but flagged for escalation. Approve or reject?"

Agent 3: Executor

Goal: Take action on approved cases (freeze account, notify customer, etc)
Tools: Account system, email, alert system
Output: Confirmation of actions taken

This pattern eliminates autonomous errors at scale. The researcher is thorough but might flag false positives. The reviewer catches mistakes. The human catches edge cases. The executor is narrow and safe.

Scaling Beyond Two Agents

Once you have 3+ agents, a few patterns emerge:

Fan-out pattern: One orchestrator spawns 10 independent agents. Example: process 10 CSV files in parallel, each file gets one agent.

Orchestrator → Agent 1 (file_1.csv)
           → Agent 2 (file_2.csv)
           → ... Agent 10 (file_10.csv)
Synchronize results → Aggregator agent

Pipeline pattern: Agents run in strict sequence. Output of Agent N feeds Agent N+1. Example: raw data → cleaner → validator → transformer → loader.

DAG (Directed Acyclic Graph) pattern: Complex dependencies. Agent C waits for output from both Agent A and Agent B. LangGraph is built for this.

You don't need a complex framework for fan-out or pipeline patterns. A bash loop and file passing works. You DO need LangGraph or CrewAI once your DAG has 4+ nodes or conditional logic.

Building Multi-Agent Systems for OpenClaw

If you're running this on OpenClaw (your self-hosted server), the orchestration logic lives in AGENTS.md and HEARTBEAT.md. Here's how:

AGENTS.md tells the orchestrator agent how to delegate:

tools:
  - name: delegator
    description: "Call specialized agents"
    commands:
      - delegate_invoice_generation
      - delegate_email_sending
      - delegate_approval_check

memory:
  - type: multi_agent_state
    path: memory/orchestration_state.json
    shared_with: [invoice_agent, email_agent]

HEARTBEAT.md runs periodic coordination checks:

heartbeat:
  - task: "Check orchestration queue"
    interval: 5m
    action: "If workflow stuck for > 10m, escalate to human"

  - task: "Sync state between agents"
    interval: 30s
    action: "Write to shared memory/orchestration_state.json"

The key insight: multi-agent orchestration on OpenClaw isn't a single-agent problem. You need:

An orchestrator agent (master AGENTS.md)
Specialized sub-agents (separate workspaces)
Shared state (memory/ directory)
Heartbeat tasks (coordination checks)

When NOT to Use Multi-Agent Systems

Be honest: do you need this?

Single-threaded workflows (one step at a time, no parallelism): One agent is fine. Don't over-engineer.
Small context windows (all your data fits in 8K tokens): Splitting into agents adds overhead. Stay monolithic.
Novel, unpredictable work: Agents are best at repeatable, well-defined tasks. If you're constantly building new workflows, you're prototyping, not deploying.
Sub-second latency requirements: Orchestration adds milliseconds. Network calls between agents add more.

Multi-agent systems shine when you have clear task boundaries, parallel work, high volume, and need for specialization.

The Bottom Line

Multi-agent coordination frameworks exist on a spectrum:

No framework (bash + file passing): Works for fan-out, painful for DAGs
CrewAI: Great for declarative workflows, role-based work
LangGraph: Best for complex conditional logic and state management
OpenClaw native (AGENTS.md + HEARTBEAT.md + shared memory): Best for self-hosted, file-based orchestration

Start simple. One agent. Once you hit the context/focus ceiling, add a second agent with clear task separation. Use CrewAI if you want ease-of-use. Use LangGraph if you need precision. Use OpenClaw patterns if you're building for production self-hosted systems.

The real skill isn't picking a framework. It's knowing when your single agent has become three jobs and needs to become three agents.

Build Your Multi-Agent Orchestration Now

Multi-agent workflows are powerful—but only if your base agent config is secure and scoped. Our wizard generates security-hardened AGENTS.md bundles pre-wired for delegation patterns, so your orchestrator agent starts life with proper guardrails built in.

Generate Your Multi-Agent Workspace

Send Feedback