Generative AI Series
Multi-Agent System — Crew.AI
Multi-Agent systems are LLM applications that are changing the automation landscape with intelligent bots.
This blog is an ongoing series on Generative AI and introduction to multi-agent architecture and frameworks such as Autogen, Crew.ai, that help build intelligent bots, that implement multi-agent architectures. In this blog, we will explore crew.ai
Multi Agent System
In the context of language models and AI, a multi-agent system involves multiple independent actors, each powered by language models, collaborating in a specific way. I did a blog on Multi-Agent Systems, and general architecture. Please read the following blog, before you proceed.
In this blog, we will dig deeper on crew.ai, one of the emerging frameworks for building multi-agent application. Crew.ai provides a framework for building agents to work together seamlessly, tackling complex tasks with a level of sophistication that mirrors the dynamics of a well-oiled team. Crew.AI is designed to enable AI agents to assume roles, share goals, and operate in a cohesive unit.
Architecture
The architecture of CrewAI is modular, comprising several key components working together to achieve a well-orchestrated multi-agent system. The following picture shows the key components that provide a framework to build multi-agent LLM applications.
Let's go through this picture bottom up so that we understand how they come together.
Tool: A tool object is a utility/equipment that the agents use to perform a specific task efficiently. For example, searching the web, loading documents & reading them, etc. Crewai is built with LangChain, we can use any existing ones from LangChain or we can write our custom tools.
Task: Task, as the name suggests, is the specific task that needs to be executed. An Agent performs a task. We provide various tools that are required to execute a task.
Agents: An agent is like a team member in the crew, with a specific role, background story, goal, and memory. The agent is the core performer of the tasks assigned within the framework. Each CrewAI Agent is a LangChain Agent, but enhanced with a ReActSingleInputOutputParser
. This parser is specially modified to better support role-playing, incorporates a binding stop word for contextual focus, and integrates a memory mechanism, using ConversationSummaryMemory for maintaining the context.
Crew: Think of the crew as a team of agents, working together to achieve a particular goal. These agents have a clearly defined way to collaborate to achieve the set of tasks at hand
Process: Process object is the workflow or strategy the crew follows to complete tasks. 3 strategies are defined by the framework (as of the time I was writing the blog. I understand that there are plans to add more).
- Sequential: This strategy executes the given “tasks” in sequence, and in a particular defined order. This strategy is ideal for any pipeline type of work, where each agent performs a specific task and passes it to the next agent. We will use this strategy in our example today to write a blog, for a given topic.
- Hierarchical: This strategy organizes tasks in a hierarchy, and these tasks are delegated in a hierarchical fashion and executed based on chain of command. This is remarkably close to an orchestrator kind of pattern. This is like a manager assigning work to various agents, and validating the results, before completing the work. When we use this strategy, we need to configure a
manager_llm
, that helps take the right decision. We will be building a solution using this strategy in the next blog. - Consensual Process (Planned): This is one of the most popular strategies, which is yet to be released, where we have agents talking to each other to get the work done. This is based on collaborative decision-making and the decisions are taken democratically. This is not yet released, but we have other multi-system frameworks, and have already implemented this strategy. We will explore this in future blogs
You can read more about these objects and API in crewai documentation.
Let's now jump into building our first multi-agent system that build a blog in markup format, for a given topic.
Code
Before we start the work, let's install all the dependencies, find below my rewuirements.txt that has all the python dependencies. I run pip install -r requirements.txt
(after creating my own virtual environment), to install all the dependencies.
To build out blogger multi-agent application, we will be implementing 2 agents
researcher
: Researcher agent will search the web for a given topic and collect all the information. We set the context of the llm, to think like a researcher, and get all the material that we need, on the topic.blogger
: Blogger agent will convert the content that is collected by researcher into a blog.
These agents will run the following tasks
task_search
: This task is about searching the web for all the relevant content for the given topic. We will be providing 2 tools (duckduckgo
search tool to search usingduckduckgo
and web search tool that usesSerper
), to support thetask_search
agent to do complete researchtask_post
: This task is about writing the blog, with the provided information in markdown format.
We will be running our models on Ollama on our laptops. To understand more about Ollama Please read my blog list
Lets now look at the code. The following code shows all the dependencies imports.
in line number 9, 10 we are defining the various tools that we will be using. To use webtool with Serper, you need to signup with server at Serper — The World’s Fastest and Cheapest Google Search API, and generate a API key, and configure that in the env. The following screenshot shows how to access the API key
The following is my .env
file, that we will be loading using dotenv
library.
Please note that I am not using OpenAI, but I have anyway shown in the following code how we would use OpenAI (if you don’t want to run your models on Ollama). In my case, I have installed misral
model on Ollama, and using it locally. If you want, you can use OpenAI/GPT or any other LLMs.
In the following code we are defining the kickoffTheCrew(topic)
method. which we will be calling from the main blog. We will be passing the topic
parameter, to perform the research and generate a blog.
In this method we are instantiating the 2 agents Researcher, Blogger, that we defined earlier in the blog. As you can see, each agent we are defining a specific role
, goal
and setting the context using backstory
. All these are really like prompts with context, and it is very important to provide the right prompts. I had to fine tune this several times to achieve the best results, I am sure we can do better results by fine tuning these 3 parameters. We are also passing the llm to be used, Please note that in our example we are using the same llm, but we could be using multiple different llms for different agents, if required, depending on what the agent is going to achieve. Its infact a good practice to use the best llm for the specific tasks the agent is going to perform, for best results.
In the following code we are defining the 2 tasks that I mentioned earlier in the blog to search and write the blog. Please note that for the task_search
, we are providing the tools to perform the web searches. Also note that we are configuring which agent is going to perform this task.
To bring the tasks and agents together, we deifying the crew object, and we are passing the agents and tasks, that will be working in the crew, we are also defining out process strategy as sequential. We are calling the kickoff method of the crew to get the agents work on the provided task.
Now coming to the main block of the code, we are receiving the command line argument as the topic, and pass this topic to the kickoffTheCrew
method that we defined above. This will execute the system
Results
Lets now run this code. I recorded a complete screencast video for you to see the results (since we have set the verbose mode, we see the complete execution of the agents. Following the video of the screencast.
Here is a screenshot of the final markdown that our multi-agent blogger application has created.
There you go, Isn’t it amazing, what we could do with multi-agent systems. I am super excited and I have been exploring various frameworks, I look forward to hearing from you. Hope this was useful.
I will be coming back soon, with more examples and other frameworks that I am playing around with, until then…code and have fun :-D
You can find the code in my GitHub repository abvijaykumar/crewai-blogs (github.com)