Content for craftspeople. By the craftspeople at WillowTree.

Data & AI

Orchestrating Multi-Agent AI Systems: When Should You Expand to Using Multiple Agents?

Christopher Frenchi

AI Research Engineer

Nish Tahir

Principal Software Engineer

Last updated:

Published:

January 16, 2025

We recently broke down how to build AI agents using plan-and-execute loops. These artificial intelligence agents use the reasoning capabilities of large language models (LLMs) to autonomously make decisions on how to achieve their given task, from ordering pizza to retrieving account information.

But single AI agents have a ceiling on how complex of tasks they can reliably automate on their own. In those cases, businesses are better served by multiple intelligent agents cooperating together. These multi-agent AI systems allow us to solve problems of increasing complexity by delegating specialized tasks to specific agents.

Knowing when to expand from a single AI agent to multiple agents can be difficult, but eventually it becomes necessary. As we add more complexity to a single agent, our massive toolbox becomes more of a burden. Also, a single agent can quickly become difficult to debug and increase the possibility of our agent making mistakes.

But if we expand our workflows to include multiple specialized agents we can add performance, clarity, and adaptability to completing our given goal. Let’s step through a use case where we’ll highlight:

Why single agents start to fail
When to move to a multi-agent approach with an example
What benefits and challenges come with multi-agent AI systems

We’ll start with what makes most single AI agents start to fail.

When Do Single AI Agents Start to Fail?

A single-agent approach can make sense at first (i.e., one AI agent that can do everything from navigating a browser to dealing with file operations). Over time though, as the tasks become more complex and the number of tools grow larger, our single-agent approach will start to be challenging.

We will notice effects when the agent starts to misbehave, which can result from:

Too many tools: The agent gets confused on which tools to use and/or when.
Too much context: The agent’s increasingly large context windows contain too many tools.
Too many mistakes: The agent starts to produce suboptimal or incorrect results due to overly broad responsibilities.

When we start automating multiple distinct subtasks such as data extraction or report generation, it might be time to start separating responsibilities. By using multiple AI agents, where each agent focuses on its own domain and toolkits, we can enhance the clarity and quality of our solution. Not only does this allow the agent to become more effective, but it also eases the development of the agents themselves.

Example Use Case From Finance

Let’s walk through a common example in finance. Suppose we have an agentic workflow and data pipeline that takes in a user query, interacts with a database, and does analysis to generate a report.

1. Single-agent scenario

In the case we have a single agent, this agent is responsible for:

Taking the user query to create a plan and execute it
Querying and reading the database with financial data
Loading the data into a pandas DataFrame for analysis
Generating charts with matplotlib
Creating a final report summarizing the insights

Our-do-it-all single agent manages file I/O, data analysis, visualization, and reporting. Each of those tasks is complex enough to warrant its own specialized agent.

A single AI agent responsible for a variety of tasks instead of being specialized in one

‍

That’s a lot to track and will inevitably lead to more complex prompts and a higher chance of error, especially when adding more features and use cases.

2. Multi-agent scenario

If we instead use multiple agents, we can break down our workflow into manageable agents targeting specific tasks and responsibilities. There are a few approaches to a multi-agent setup. We could choose an orchestrator setup where one agent chooses other agents. Alternatively, we could have multiple agents optimize our process by choosing the next best agent based on the next steps of completing the goal.

We’ll take a look at the orchestrator method, which starts with our Orchestrator Agent:

Orchestrator Agent: Decides which agent to call and in what order.
Database Agent: Queries data from a database.
Analysis Agent: Analyzes the returned data.
Graph Agent: Generates visualizations based on analysis.
Report Agent: Creates a report document to share the insights from analysis.

The Orchestrator Agent works similar to how a single agent chooses tools, except now it chooses which specialized agent to call, as shown in the graphic below.

Diagram of an orchestrator setup where an Orchestrator Agent assigns tasks to specialized agents, each with their own tools and memory

The separation of responsibilities between these agents leads to clearer logic and a more scalable architecture. This allows us to logically separate out the specific scope and tools available to a specific agent.

Building Multi-Agent AI Systems in Python

To set up the orchestrator framework we explored above, we’ll need to specify several things in our code.

Core multi-agent components

We can expand our core components of agents to now include an orchestrator:

Orchestrator Agent: The agent responsible for choosing which agents to use to complete a goal.
Agents: Autonomous AI with access to memory and tools of a specified domain to complete a goal.
Memory: Each agent can maintain its own memory or state.
Tools: Functions that the agent has access to so it can complete its task.

For our simplified demo code, let’s consider an agentic workflow for data analytics that can fetch data, analyze it, generate a plot, and output a report.

Agentic flow

A user will provide a prompt such as, “Generate a quarterly report from the database for Q4 2024.”

The Orchestrator Agent will then decide how best to accomplish this task using the other agents. The plan might look like this:

Database Agent: query_database tool to pull the Q4 2024 data.
Analysis Agent: load_into_dataframe and analyze_data tools to load and derive insights.
Graph Agent: generate_plot tool where visualizations can be created from insights.
Report Agent: create_report to combine insights and graphs into a final document.

Now let’s pull all of this together in our code.

Example code

Below is a brief example to illustrate multiple agents working together. We’ll start with a similar structure and patterns as our previous post on building single AI agents, but now use multiple agents and an orchestrator.

We’ll start with our imports and Base Agent class.


import os
import json
from dotenv import load_dotenv
import pandas as pd
import matplotlib.pyplot as plt

from openai import OpenAI
load_dotenv()

api_key = os.getenv("OPENAI_API_KEY")
client = OpenAI()


# Base Agent
############
class BaseAgent:
    def __init__(self, name):
        self.name = name
        self.memory_tasks = []
        self.memory_responses = []
        self.specialization = ""

    def openai_call(self, query: str, system_prompt: str, json_format: bool = False):
        format_response = {"type": "json_object"} if json_format else {"type": "text"}
        completion = client.chat.completions.create(
            model="gpt-4o",
            temperature=0,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": f"User query: {query}"}
            ],
            response_format=format_response
        )
        print("Agent:", self.name)
        print(completion.choices[0].message.content)
        print("****\n")
        return completion.choices[0].message.content

‍

Above, we create a Base Agent that all of the other Agent Classes will extend. This will print what agent is running and the output.

Now we’ll build our Specialized Agents.

# Specialized Agents
####################

class DatabaseAgent(BaseAgent):
    def __init__(self):
        super().__init__("DatabaseAgent")
        self.specialization = "Database queries for quarterly data."
        self.data = []

        # System prompt to guide database queries
        self.system_prompt = """
        % Role:
        You are DatabaseAgent, specialized in fetching quarterly data from a database.

        % Instructions:
        The user will ask for specific quarterly data by providing quarter and year.
        Create a sql query to fetch the data. Return as a json object with a "query" key.
        The database is a postgres database with values up to the end of2024.

        The columns are: month, revenue_data, date

        Example:
        "Get the revenue data for Q2 2023"
        {{
          "query": "SELECT * FROM revenue WHERE date >= '2023-3-01' AND date <= '2023-6-30';"
        }}
        """

    def query_database(self, quarter: str, year: str):
        # Using OpenAI to generate some dummy data as if it fetched from DB
        query = f"Fetch revenue data for {quarter} {year}"
        response = self.openai_call(query, self.system_prompt, json_format=True)

        data = json.loads(response)

        # Hardcoded data for Q4 2024 for example
        if data.get("query") == "SELECT * FROM revenue WHERE date >= '2024-10-01' AND date <= '2024-12-31';":
            self.data = [
                {"month": "Oct", "revenue": 10000},
                {"month": "Nov", "revenue": 15000},
                {"month": "Dec", "revenue": 18000}
            ]
        else:
            # raise an error that the query is not valid
            raise ValueError("The query is not valid for the Q4 2024 example. Please try again.")

        return self.data


class AnalysisAgent(BaseAgent):
    def __init__(self):
        super().__init__("AnalysisAgent")
        self.specialization = "Loading data into pandas DataFrame and generating insights."
        self.df = None
        self.insights = {}

        # System prompt to help generate insights
        self.system_prompt = """
        % Role:
        You are AnalysisAgent, specialized in analyzing data.

        % Instructions:
        Given a list of dictionaries with 'month' and 'revenue' keys, describe the total and average revenue.
        Return a JSON object with 'total_revenue' and 'average_revenue'.
        """

    def load_into_dataframe(self, data):
        self.df = pd.DataFrame(data)
        print("Data:", self.df)
        print("\n****\n")
        return self.df

    def analyze_data(self):
        if self.df is not None:
            query = f"Data: {self.df.to_dict('records')}"
            response = self.openai_call(query, self.system_prompt, json_format=True)
            try:
                insights = json.loads(response)
                self.insights = insights
            except json.JSONDecodeError:
                # fallback if LLM doesn't return expected JSON
                total_revenue = self.df["revenue"].sum()
                avg_revenue = self.df["revenue"].mean()
                self.insights = {
                    "total_revenue": total_revenue,
                    "average_revenue": avg_revenue
                }
        return self.insights


class GraphAgent(BaseAgent):
    def __init__(self):
        super().__init__("GraphAgent")
        self.specialization = "Creating charts and plots from provided data."
        self.graph_image = None

        # System prompt to decide how to plot the data
        self.system_prompt = """
        % Role:
        You are GraphAgent, specialized in creating plots.

        % Instructions:
        The user will provide data in a DataFrame-like structure.
        Suggest a plot type and confirm to use a bar chart of revenue by month.
        Always respond with {"plot_type": "bar"} in JSON.
        """

    def generate_plot(self, df):
        query = f"Given data: {df.to_dict('records')} - Suggest a plot type."
        response = self.openai_call(query, self.system_prompt, json_format=True)

        # We won't rely heavily on LLM output for plotting type; we use a bar chart as intended
        # But this shows the LLM engagement in deciding the plot type
        try:
            plot_info = json.loads(response)
            if plot_info.get("plot_type") == "bar":
                plt.figure()
                df.plot(x="month", y="revenue", kind="bar")
                plt.savefig('graph_image.png', format='png')
                with open('graph_image.png', 'rb') as f:
                    self.graph_image = f.read()
        except json.JSONDecodeError:
            # fallback if LLM doesn't return expected JSON
            plt.figure()
            df.plot(x="month", y="revenue", kind="bar")
            plt.savefig('graph_image.png', format='png')
            with open('graph_image.png', 'rb') as f:
                self.graph_image = f.read()

        return self.graph_image


class ReportAgent(BaseAgent):
    def __init__(self):
        super().__init__("ReportAgent")
        self.specialization = "Compiling insights and images into a formatted report."
        
        # System prompt to create a summary from insights
        self.system_prompt = """
        % Role:
        You are ReportAgent, specialized in summarizing insights.

        % Instructions:
        Given total_revenue, average_revenue, and a note that a chart is generated,
        produce a textual summary in JSON:
        {
          "report_text": "..."
        }
        """

    def create_report(self, insights, graph_image):
        query = (
            f"Insights: total_revenue={insights.get('total_revenue')}, "
            f"average_revenue={insights.get('average_revenue')}, chart_generated=True"
        )
        response = self.openai_call(query, self.system_prompt, json_format=True)
        try:
            report_data = json.loads(response)
            report_text = report_data.get("report_text", "")
        except json.JSONDecodeError:
            # fallback
            report_text = (
                f"Quarterly Report:\n"
                f"Total Revenue: {insights['total_revenue']}\n"
                f"Average Revenue: {insights['average_revenue']}\n"
                "A chart has been generated and embedded.\n"
            )
        return report_text

‍

The database, analysis, graph, and report agents target their respective tasks. Having the specialized agents focused on specific outcomes, we can reduce the likelihood of errors as we add new features and update our agents.

From here, we’ll introduce the Orchestrator Agent.


# Orchestrator Agent
####################
class OrchestratorAgent(BaseAgent):
    def __init__(self, db_agent, analysis_agent, graph_agent, report_agent):
        super().__init__("OrchestratorAgent")
        self.specialization = "Planning and delegating tasks across multiple specialized agents."
        self.db_agent = db_agent
        self.analysis_agent = analysis_agent
        self.graph_agent = graph_agent
        self.report_agent = report_agent

        # The system prompt to generate a plan and steps.
        self.planning_system_prompt = f"""
        % Role:
        You are an orchestrator agent that receives a user request and must plan steps to achieve it.
        
        % Task:
        The user may ask for a report on quarterly data from a database.
        Your job: 
        1. Decide the sequence of steps (tools = other agents) needed.
        2. Return a plan as a JSON list of steps where each step includes:
           - "agent": one of DatabaseAgent, AnalysisAgent, GraphAgent, ReportAgent
           - "action": the function to call on that agent
           - "args": arguments for that function if any.
        
        Available agents and their capabilities:
        DatabaseAgent:
          - query_database(quarter: str, year: str)
        AnalysisAgent:
          - load_into_dataframe(data)
          - analyze_data()
        GraphAgent:
          - generate_plot(df)
        ReportAgent:
          - create_report(insights, graph_image)
        
        Output ONLY JSON with a "plan" key, example:
        {{
          "plan": [
            {{"agent": "DatabaseAgent", "action": "query_database", "args": {{"quarter": "Q4", "year":"2024"}}}},
            {{...}}
          ]
        }}
        """

    def run(self, user_prompt: str):
        # First, orchestrator creates a plan using the system prompt and user prompt
        plan_str = self.openai_call(user_prompt, self.planning_system_prompt, json_format=True)
        # Attempt to parse the JSON plan
        try:
            plan = json.loads(plan_str).get("plan", [])
        except json.JSONDecodeError:
            print("Error parsing plan. Make sure the model returns valid JSON.")
            return

        # Execute each step in the plan
        data = None
        df = None
        insights = None
        graph_image = None

        for step in plan:
            agent_name = step.get("agent")
            action = step.get("action")
            args = step.get("args", {})

            if agent_name == "DatabaseAgent" and hasattr(self.db_agent, action):
                data = getattr(self.db_agent, action)(**args)

            elif agent_name == "AnalysisAgent" and hasattr(self.analysis_agent, action):
                if action == "load_into_dataframe":
                    df = getattr(self.analysis_agent, action)(data)
                elif action == "analyze_data":
                    insights = getattr(self.analysis_agent, action)()

            elif agent_name == "GraphAgent" and hasattr(self.graph_agent, action):
                if action == "generate_plot":
                    graph_image = getattr(self.graph_agent, action)(df)

            elif agent_name == "ReportAgent" and hasattr(self.report_agent, action):
                if action == "create_report":
                    report = getattr(self.report_agent, action)(insights, graph_image)
                    print(report)

‍

The Orchestrator Agent is the heart of our process. Very similar to plan-and-execute, we use this pattern again to show at a high level how orchestration can happen, where one agent plans and executes other agents to solve a task.

Last, we arrive at our main execution, “Generate a quarterly report from the database for Q4 2024.”


# Main Execution
################
if __name__ == "__main__":
    db_agent = DatabaseAgent()
    analysis_agent = AnalysisAgent()
    graph_agent = GraphAgent()
    report_agent = ReportAgent()

    orchestrator = OrchestratorAgent(db_agent, analysis_agent, graph_agent, report_agent)

    # The user prompt that triggers a multi-step plan
    user_prompt = "Generate a quarterly report from the database for Q4 2024."
    print("User Prompt:", user_prompt)
    orchestrator.run(user_prompt)

‍

Our main execution above will take in our user_prompt of “Generate a quarterly report from the database for Q4 2024.” and run our Orchestrator Agent. This will kick off our multiagent workflow.

Now, here’s the output from our example prompt.


User Prompt: Generate a quarterly report from the database for Q4 2024.
Agent: OrchestratorAgent
{
  "plan": [
    {"agent": "DatabaseAgent", "action": "query_database", "args": {"quarter": "Q4", "year": "2024"}},
    {"agent": "AnalysisAgent", "action": "load_into_dataframe", "args": {}},
    {"agent": "AnalysisAgent", "action": "analyze_data", "args": {}},
    {"agent": "GraphAgent", "action": "generate_plot", "args": {}},
    {"agent": "ReportAgent", "action": "create_report", "args": {}}
  ]
}
****

Agent: DatabaseAgent
{
  "query": "SELECT * FROM revenue WHERE date >= '2024-10-01' AND date <= '2024-12-31';"
}
****

Data:   month  revenue
0   Oct    10000
1   Nov    15000
2   Dec    18000

****

Agent: AnalysisAgent
{
  "total_revenue": 43000,
  "average_revenue": 14333.33
}
****

Agent: GraphAgent
{"plot_type": "bar"}
****

Agent: ReportAgent
{
  "report_text": "The total revenue generated is $43,000, with an average revenue of $14,333.33. A chart has been generated to visually represent this data."
}
****

The total revenue generated is $43,000, with an average revenue of $14,333.33. A chart has been generated to visually represent this data.

Sample quarterly financial report from a multi-agent AI system showing total and average revenue

‍

To summarize, this code:

Shows how the orchestrator can coordinate processes between agents, making the system easier to scale and modify
Gives each agent a narrower set of responsibilities to reduce confusion and increase reliability
Illustrates how multiple agents can work together to solve complex problems more effectively than single agents alone

When to Move From a Single Agent to Multiple Agents

Here are three key times you should consider expanding from a single agent to multiple agents.

1. Large number of tools with different scopes

If your agent handles file I/O, database queries, data analysis, and visualizations, it may be too broad. Splitting agents by specific domains helps simplify the context and reduce agent and developer confusion.

2. Performance and reliability issues

When a single agent becomes too large or complex, it may start to choose the wrong tools or fail tasks due to overly broad contexts.

3. Complexity

Having distinct agents keeps responsibilities focused and manageable. For example, a “DatabaseAgent” that only queries data is simpler to read and write (as well as more reliable) than a general-purpose single agent doing it all.

The Benefits of Expanding to Multiple Agents

Multi-agent systems offer AI practitioners lots of advantages over single-agent systems. Three benefits that stand out include:

Increased capabilities: Dividing workloads between multiple agents allows AI to find solutions to increasing complex problems.
Enhanced specialization: With each agent tuned to a specific domain, tasks become more focused and consistent.
Scalability: Multi-agent AI systems can easily add new agents as the need arises, avoiding overloading a single agent.

Of course, some of these benefits bring added responsibilities that practitioners should be aware of.

Challenges of Multi-Agent Systems

As for the challenges of orchestrating multiple agents, the following are worth keeping top of mind:

Cost: As more agents are involved and more LLM calls are created, the cost associated with completing the goal will increase.
Debugging complexity: As multiple agents are incorporated, knowing where things are breaking can become difficult.
Clear boundaries: As more tools are added to more agents, knowing when to increase or decrease one agent over the other can become muddied.
Overhead: The developer is ultimately responsible for identifying where an agent would work best.

Solutions will likely emerge for these challenges as agentic AI matures. For instance, Anthropic recently announced Model Context Protocol (MCP), the first attempt at an industry standard for how AI systems and processes integrate with each other. An open source standard like that could positively impact each of the issues mentioned above if developers follow the recommended agent patterns.

Build the Multi-Agent Systems Your Business Needs

Moving from a single-agent to multi-agent architecture can improve the results and scalability of your application, especially as it grows in complexity. By separating responsibilities to dedicated agents, you can carry out more sophisticated tasks and enhance the capabilities within your workflows.

As the field evolves and agentic frameworks mature, we’ll see even more advanced patterns to handle large-scale multi-agent systems. With careful planning, multiple agents can empower your systems to take on bigger challenges with greater reliability and efficiency.

If you need help developing your agentic AI systems, we can help out. Our experience spans AI strategy governance, generative AI experiences, and voice. Learn more about our Data & AI Consulting services.

Table of Contents

Christopher Frenchi

AI Research Engineer

Nish Tahir

Principal Software Engineer