The journey from proof of concept to proof of value: part V

This blog is part V of a series. See parts I, II, III, and IV.

Understanding AI Agents: What They Are, When to Use Them, and How They Work

In our last set of articles, we’ve talked about how GenAI techniques are evolving, and some of the best practices to improve your results with these tools and techniques. In this post, we’ll talk about the most intriguing advancement in this field: the concept of AI agents. You’ve probably heard the hype - these agents have the potential to revolutionize our lives by automating complex tasks, enhancing productivity, and providing sophisticated solutions. Let’s talk about what they are, when to use them, and how multiple agents can work together effectively.

What AI Agents Are and Aren't

AI agents, specifically LLM-powered agents, are systems designed to reason through problems, create plans to solve them, and execute these plans using a set of tools. Usually, when they’re brought up as a potential solution to an AI problem, you’ll hear something like “These agents possess complex reasoning capabilities, memory, and execution functionalities, which distinguish them from simpler generative AI models that primarily focus on generating text or performing isolated tasks.” But putting it like this can also gloss over what agents can really do, or make them seem sentient or magical somehow. So before we start pulling them apart to look at the guts, let’s demystify that statement a little. A GenAI Agent is really just a series of LLM prompts, with the following additions:

An Agent Definition. Some added instructions about what the system is supposed to do (e.g. “You are a helpful accountant and you will help a user by…..”)
Memory. Generally, this just means that a segment of the conversation is stored and becomes part of the prompt, though there are more complex systems that can extract important details and make them available as the conversation progresses
Tool Access. Often these systems exploit the LLM's ability to utilize a particular tool or function defined in the prompt space. For example, an agent could query some search tool on the web for more information before replying to the user.
Flow control. Instead of returning output directly back to the user (as in the case of a “normal” LLM prompt), an agent can send it elsewhere, allowing for the use of tools or functions, sending the conversation to another agent, or choosing to send it back to the user.

It’s worth breaking these things out to dispel the idea that an agent is somehow an entirely different usage of AI. In the end, the whole system is just an iterated series of LLM calls, but with the elements above giving more contextual input to those calls, and allowing the LLM output to “go other places” including calling other tools, functions or even other agents. Agent systems undeniably have a lot of power and flexibility and are being used for a lot of great applications. But they aren’t so much a new form of AI as they are creative and flexible arrangements of the same base LLM systems.

When to Use AI Agents and When to Opt for Other Solutions

AI agents are powerful tools but are not always the best choice for every scenario. These systems can be significantly complex, both intrinsically and in the emergent sense of having many stochastic elements operating cooperatively. As a result, they can require a lot of careful validation and testing. Understanding when to deploy these agents and when to consider simpler alternatives is crucial for developing efficient GenAI systems.

When to deploy AI Agents:

Complex and Manifold Problem Solving: When human-intensive tasks require sophisticated reasoning, multiple steps, and the integration of various tools and data sources.
Automation of Routine but Interactive Tasks: Agents can significantly reduce the burden of repetitive tasks, and some of these (e.g. tech support, customer service) can be both interactive and well-defined in scope.
Open-Ended Analysis: Agents with access to a lot of data can perform iterated observations and refinements to try to arrive at and support conclusions. In this case, a validation stage involving a human expert (i.e. HITL - human-in-the-loop) should be used.

When to use other Techniques:

Tasks Requiring Emotional Nuance or High-risk Roles: Generative models have difficulty responding to nuanced human emotions. Applications that require sensitivity can therefore be high-risk, with health or mental wellness on the line. Understanding and improving a language model's ability to be empathetic is an active area of research. In the meantime, the risks of negative outcomes could be profound. So while I wouldn’t say “absolutely do not attempt,” I would say “absolutely know what you’re doing, and understand how validation will be performed” with this sort of role.
Repeated Data Analysis: In scenarios where large volumes of data need to be analyzed to derive actionable insights, but the basic work is well understood, AI agents are unlikely to be as useful as a mixture of procedural code and targeted LLM-based query generation.
Unclear Tasks: Language models' current reasoning capabilities are still quite limited. Suppose the task involves generating content that requires a high degree of accuracy and unbiased information while being responsive to any human input. In that case, extensive human oversight is necessary to mitigate the risk of propagating errors or biases.

Multiple Agents and How They Work Together

The concept of using multiple AI agents, sometimes referred to as a "swarm" or "ecosystem," involves a collection of agents working collaboratively to solve problems. This decentralized approach can be likened to microservices in software engineering, where each agent specializes in specific tasks but contributes to a common goal.

In this setup, multiple agents coexist in a single environment, collaborating on tasks. These agents can work together to simulate environments like digital companies or virtual neighborhoods. For example, in a software development project, different agents might handle coding, design, and testing, while another agent handles engaging the others in a structured way, handling project management and concept development. This collaborative effort can lead to rapid prototyping and cost-effective development, but it can also produce a wide distribution of results.

Multi-agent architectures inherently support tried and true classic object-oriented design principles, including encapsulation and specialization, making development and maintainability more manageable and extensible.

Best Practices - When is it Worth Deploying Agents?

With all the possible GenAI applications, it’s easy to get lost in different agent geometries, and equally easy to produce something complex that never gets quite stable enough for a real deployment. So what to do? In my opinion, you:

Start with a clear picture of what you actually want to accomplish, and turn that vision into a flowchart. The goal could be as simple as “answering user questions about a set of documents” or as complex as “creating a new product and a marketing campaign for this initiative.” Regardless, sit down at a whiteboard or your favorite flowchart software and sketch it all out.
Attempt to accomplish every step in this flowchart, simplest-method first. Try to do every step using the following techniques, in this order: procedural/standard coding or querying techniques, single-prompt data extraction and structuring, and finally, RAG/Prompt chains.
If you find the entire flowchart is possible using one or more of these approaches, consider developing and deploying in that way.
Once you've made a credible attempt to solve your problems with simpler methods, the remaining tasks tend to be the things that agents are necessary for. These are open-ended interactive conversations with a user, the ability to expand tasks into strategies and then solve them, or answer complex questions by reaching out to multiple tools as part of the answer. If the flow of activity must change depending on the inputs, a running memory of the interactions is important, or calling external functions is a must, then it’s time to consider agent systems.

Conclusion

AI agents are still in their infancy. As their reasoning and decision-making capabilities become more sophisticated and reliable, they’ll usher in a significant leap in the capabilities of artificial intelligence to enhance our daily lives. But they’re still currently limited by most of the same things that prompt chains are - complex edge cases, the need for validation, and a lack of sophisticated reasoning capabilities. Combining better reasoning, memory, and execution tools makes them suitable for a wide range of applications, but it’s essential to understand their current limitations and to deploy them judiciously, particularly in scenarios that require emotional intelligence, high-stakes decision-making, or could be better handled by a simpler approach.

To learn more about MVL, read our manifesto and let’s build a company together.

Best Practices - When is it Worth Deploying Agents?

CHEAT CODES

By Sean Robinson, with Jay Bartot and Keith Rosema • July 22, 2024

The journey from proof of concept to proof of value: part V

Understanding AI Agents: What They Are, When to Use Them, and How They Work

What AI Agents Are and Aren't

When to Use AI Agents and When to Opt for Other Solutions

When to deploy AI Agents:

When to use other Techniques:

Multiple Agents and How They Work Together

Best Practices - When is it Worth Deploying Agents?

Conclusion

Best Practices - When is it Worth Deploying Agents?

Recent stories

CHEAT CODES

Understanding the COGs of Gen AI: part I

CHEAT CODES

Understanding the COGs of Gen AI: part II

FOUNDER SPOTLIGHT

Business operations automation in the age of Gen AI with Tektonic

Let’s start a company together