Deconstructing LLM Agents for software engineers - Beyond the Hype
Table of Contents
Introduction
Agents, agents, agents everywhere. To be clear: AI agents. Oh, sorry—to be precise, LLM agents. A lot of buzz surrounds these agents. They were called the most important AI innovation of 2024, and now they’re touted as the hottest topic of 2025. But is this something we truly need? What exactly are they? Are all agents the same? Let’s dive deeper into the subject.
I’ll start by saying I’m a bit skeptical about the whole buzz. “Agents” seems to be a catchy term for yet another type of wrapper for LLMs—a good marketing name for AI based on language models. They come with their strengths and weaknesses, and adopting agents won’t magically solve all human problems. But first things first: let’s break things down to ensure we’re on the same page.
We’ll begin by clarifying what LLM agents are. Then, we’ll explore a few frameworks and compare their implementation approaches. Next, we’ll analyze how to implement agent-like properties using classical software tools. Finally, we’ll discuss whether the term ‘agent’ is really justified.
Agents: From General to AI to LLM
Agents is neither a new term in engineering nor in software engineering, so let’s just take a look at already well-defined definitions.
Classical Software Agent Definition
According to Techopedia, a software agent is defined as:
A piece of software that functions as an agent for a user or another program, working autonomously and continuously in a particular environment. It is inhibited by other processes and agents, but is also able to learn from its experience in functioning in an environment over a long period of time.
Attributes of autonomous software agents include activation without being invoked, perceiving and reacting to their environment, autonomy in decision-making and goal achievement, communication and collaboration with other agents, persistence and responsiveness.
Examples include early robotic vacuum cleaners, basic chatbots, traffic lights, and thermostats.
AI Agents
Also AI (or intelligent) agents are nothing new. Let’s take a look at Wikipedia Definition
An intelligent agent perceives its environment, takes actions autonomously in order to achieve goals, and may improve its performance with learning or acquiring knowledge.
Types of intelligent agents:
- Simple reflex agents: Respond directly to environmental stimuli without considering the context.
- Model-based reflex agents: Maintain an internal model of the world to respond to stimuli.
- Goal-based agents: Take actions guided by goal achievement.
- Utility-based agents: Choose actions based on a utility function to maximize outcomes.
- Learning agents: Improve performance over time by learning from their experiences.
LLM Agents
If you are familiar with LLM agent frameworks, you’ll notice that agents are defined differently than those presented above. Here, the term ‘agent’ refers to a much simpler responsibility, primarily focused on executing workflows driven by LLMs. Below are definitions of LLM agents provided by well-known tools and frameworks for comparison.
Anthropic
At Anthropic, we categorize all these variations as agentic systems, but draw an important architectural distinction between workflows and agents:
Workflows are systems where LLMs and tools are orchestrated through predefined code paths.
Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish task
A generative AI agent attempts to achieve a goal by observing and acting upon the world using tools
CrewAI
An agent is an autonomous unit that performs tasks, makes decisions, uses tools, communicates and collaborates, maintains memory of interactions, delegates tasks.
Langgraph
An agent uses an LLM to decide the control flow of an application.
AutoGen
An agent communicates via messages, maintains state, and acts based on messages or state changes.
Semantic Kernel
An AI agent autonomously performs tasks, processes input, and takes actions to achieve goals.
So, comparing to the original software agent or intelligent agent, are these systems truly deserving of the term ‘agents’? We will revisit this question at the end of the article.
Existing Agent Frameworks Comparison
Now that we have a clearer understanding of what an LLM agent is, let’s examine some of the currently popular frameworks that claim to be agentic. We’ll compare their approaches to implementation and analyze their strengths and weaknesses. To better understand what agent collaboration may look like, we will show a practical example application implemented with one of the frameworks — CrewAI. The application will be an agent-based system that updates living system documentation and drafts unit tests upon the arrival of the new project tickets.
Introduction to CrewAI
CrewAI executes workflows by organizing agents into coordinated “crews.” These crews collaborate to handle tasks by breaking them down into smaller, role-specific actions. Each agent operates with a predefined role and goal, contributing its specialized expertise to achieve the collective objective.
Base Concepts of CrewAI
Agents:
- The fundamental building blocks in CrewAI.
- Each agent has a defined role, goal, and backstory, which guide its behavior and interactions.
- Agents process tasks.
Tasks:
- Represent units of work assigned to agents.
- Each task includes a description, expected output, and an associated agent.
- Tasks can be contextual, utilizing results from other tasks to enhance their outputs.
Crew:
- A collection of agents working together as a team.
- Crews coordinate the execution of tasks in sequential or parallel workflows.
- They utilize shared memory and communication channels to optimize collaboration and consistency.
Process:
- Defines how tasks are executed within the crew.
- Sequential processes ensure ordered execution, while hierarchical processes allow tasks to execute dynamically.
Configuration
The configuration files define agents and tasks:
Agents: Each agent has a role, goal, and backstory. For instance:
business_analyst:
role: Business Analyst
goal: Analyze functional tickets to extract entities and business rules.
backstory: Experienced in interpreting business requirements.
technical_analyst:
role: Technical Analyst
goal: Analyze technical tickets to extract technical details.
backstory: Skilled in understanding technical specifications.
domain_expert:
role: Domain Expert
goal: Extract and describe system domain behaviors.
backstory: Proficient in identifying system operations and domain processes.
technical_writer:
role: Technical Writer
goal: Update system documentation based on analyses.
backstory: Expert in documenting technical content.
test_engineer:
role: Test Engineer
goal: Draft unit test cases based on analyses.
backstory: Experienced in creating unit test cases for system functionalities.
Tasks: Tasks specify the work to be performed, expected output, and the responsible agent. For example:
analyze_functional_ticket:
description: >
Analyze the functional ticket to extract entities and business rules.
expected_output: >
Your task is to analyze functional tickets provided by the user.
Each ticket includes specific details about requested functionalities, such as requirements and constraints.
Your analysis should identify:
- Entities involved in the ticket (e.g., users, systems, data).
- Business rules governing the functionality described in the ticket.
Based on the provided ticket details:
Title: {title}
Description: {description}
Acceptance criteria: {acceptance_criteria}
Analyze and provide the following:
- Entities involved as list
- Business rules relevant to the functionality as list
agent: business_analyst
update_documentation:
description: Update the system documentation using the analysis results.
expected_output: >
Incorporate details about entities, business rules, and domain behaviors to create comprehensive documentation.
Based on the analysis results, create updated structured documentation for the system.
agent: technical_writer
context: [analyze_functional_ticket, analyze_technical_ticket, extract_domain_behaviors]
Code
The implementation is in a DocumentationUpdaterCrew
class that integrates agents and tasks. The class includes methods for creating agents, defining tasks, and orchestrating workflows.
import logging
from crewai import Agent, Crew, Process, Task
@CrewBase
class DocumentationUpdaterCrew:
agents_config = 'config/agents.yml'
tasks_config = 'config/tasks.yml'
@agent
def business_analyst(self) -> Agent:
return Agent(config=self.agents_config['business_analyst'], verbose=True)
@agent
def technical_analyst(self) -> Agent:
return Agent(config=self.agents_config['technical_analyst'], verbose=True)
@agent
def technical_writer(self) -> Agent:
return Agent(config=self.agents_config['technical_writer'], verbose=True)
@task
def analyze_functional_ticket(self) -> Task:
return Task(config=self.tasks_config['analyze_functional_ticket'])
@task
def update_documentation(self) -> Task:
return Task(config=self.tasks_config['update_documentation'])
# Additional agents and tasks can be defined as needed for other roles and workflows.
@crew
def crew(self) -> Crew:
return Crew(
agents=self.agents,
tasks=self.tasks,
process=Process.sequential
)
def handle_new_ticket(self, ticket_details: dict):
logging.debug(f"Received new ticket: {ticket_details}")
self.crew().kickoff(inputs=ticket_details)
Agents here are invoked sequentially within the DocumentationUpdaterCrew
class. Each agent, such as the business_analyst
or technical_writer
, is designed to handle specific tasks based on their configuration. Tasks are defined with inputs and expected outputs, and the process is managed in a sequential manner. Results from one task, like analyzing functional tickets, are passed as context to subsequent tasks, such as updating documentation. This setup ensures a logical flow of information and collaborative task execution among the agents. However, this is not the only possible configuration. A more dynamic and agentic approach would be a hierarchical process, where an overarching manager agent dynamically delegates tasks to other agents. This method gives you more flexibility and can adapt better to complex problems but comes with more indirection while solving a problem.
Due to the limited size of this article, it is not possible to showcase different implementations across all frameworks. This topic is well-suited for a separate, dedicated article. Here, I will focus on describing four selected frameworks, highlighting their unique characteristics and approaches. Selection is arbitrary, the main rationale was the fact that presented tools have their unique properties.
Selected Framework Characteristics
CrewAI: You could already see CrewAI in action (or better: implementation). In addition to that the framework has ability to utilize dynamic tool calling and memory management (including short-term memory, entity memory, and user memory) enhances task execution. While the memory system gives agents some individuality, its current implementation is limited to suggestions from previous interactions, rather than fully keeping historical context.
Langraph: Langraph adopts a graph-based workflow model. Tasks are represented as nodes, and transitions between these nodes depend on evaluations made at each node. These evaluations may be deterministic, based on boolean logic, or fuzzy, leveraging LLM capabilities. A shared state mechanism (GraphState) enables nodes to read, update, and share additional data. While Langraph’s approach offers clarity and structure, it lacks dynamic task recognition and problem-solving capabilities, making it a less agentic solution overall.
AutoGen: AutoGen represents the most dynamic and agentic framework in terms of autonomy. It utilizes specialized agents in a shared conversational environment. These agents collaborate through strategies like group chat, round-robin discussions, or random selection, working together to solve problems. Conversations are summarized and passed between agents as needed. The discussion process ends based on predefined rules, allowing for iterative problem-solving.
Semantic Kernel: Semantic Kernel focuses on grouping functions into skills, which can either be semantic (LLM-powered) or native (traditional code). These skills can be chained to form complex workflows. The kernel functions as a manager, executing flows and sharing context across functions. Semantic Kernel’s planner feature introduces dynamic problem-solving capabilities. Recent updates have incorporated agents in a manner similar to AutoGen.
Comparison of Frameworks
Framework | Role/Specialization | Tool Calling | Memory (STM, LTM, Context Access) | LLM Processing | Dynamic Problem Solving | Agent Communication/Delegation |
---|---|---|---|---|---|---|
CrewAI | Defined roles with goals and backstories | Supported | STM, LTM, entity/user memory | LLM calls | Manager-driven, hierarchical process | Delegation |
LangGraph | Graph-based, no explicit roles | Supported | Shared state (GraphState) | LLM calls | Node-based decisions | LImited by the graph |
AutoGen | Specialized roles in group chat | Supported | Shared conversation memory | LLM calls | Group chat, round-robin or other strategy | Discussion summarization |
Semantic Kernel | Skill-oriented functions, AutoGen like agents | Supported | Request, session, global memory | LLM calls | Dynamic planner features | Agents like in AutoGen |
Implementing Agent Behaviors Without Frameworks
Frameworks might be helpful, but they also come with a price. Each framework implements specific mechanisms—manipulating prompts, extending them, or making additional calls—in its own way. It’s easy to be trapped by the indirection they add, making it difficult to debug or reason about efficiency and performance.
That’s why I advise against jumping into frameworks prematurely unless you have substantial time or a thorough understanding of the underlying mechanisms. Many of so-called “agentic” patterns are actually straightforward for experienced software engineers to replicate.
Below, you’ll see how to implement each major agent pattern in plain Python with OpenAI’s library—no additional frameworks required. They are not production ready, but can give you a glimps of how it can look like.
Dynamic Function Calling With OpenAI’s Python Library
This snippet uses function calling feature to either search a database or summarize text, depending on the user input. We then parse the LLM’s response to see which function was called and execute it in Python.
import os
import json
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
def search_database(query):
return f"Pretending to search the database for: {query}"
def summarize_text(text):
return f"Summary of the text: {text}"
def call_function(func_name, func_args):
if func_name == "search_database":
return search_database(func_args.get("query", ""))
elif func_name == "summarize_text":
return summarize_text(func_args.get("text", ""))
else:
raise ValueError(f"Unknown function: {func_name}")
def main():
user_message = input("User message: ")
response = openai.ChatCompletion.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": user_message}],
functions=[
{
"name": "search_database",
"description": "Searches for information in a database.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "A query string to search for in the database"
}
},
"required": ["query"]
}
},
{
"name": "summarize_text",
"description": "Summarizes a given piece of text.",
"parameters": {
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "The text to summarize"
}
},
"required": ["text"]
}
}
],
function_call="auto"
)
finish_reason = response["choices"][0]["finish_reason"]
if finish_reason == "function_call":
func_name = response["choices"][0]["function_call"]["name"]
raw_args = response["choices"][0]["function_call"]["arguments"]
func_args = json.loads(raw_args)
result = call_function(func_name, func_args)
print("Function Call Result:", result)
else:
# LLM responded with a direct text answer
direct_answer = response["choices"][0]["message"]["content"]
print("LLM Response:", direct_answer)
if __name__ == "__main__":
main()
How It Works
Function Definitions
We define two local functions in Python:search_database
andsummarize_text
. Each returns a simple mock response.call_function
Dispatch
When the LLM chooses which function to call, we route the request throughcall_function(func_name, func_args)
, which invokes the correct function based onfunc_name
.function_call="auto"
By settingfunction_call="auto"
, the LLM decides if (and which) function to call. If the user request fits the description of one of our registered functions, LLM returns afinish_reason
of"function_call"
, along with the chosen function name and JSON-encoded arguments.Response Handling
- If
finish_reason == "function_call"
, we parsefunction_call["arguments"]
from JSON, invoke the corresponding local Python function, and print its result. - Otherwise, we print the LLM’s direct response (i.e., LLM didn’t see the need to call a function).
- If
Object Representation of an Agent With Memory
The following code shows how each Agent
can define and use its role, store previous prompts and responses in memory
and then reuse them, tagging them clearly under “START OF MEMORY” and “END OF MEMORY”. You could integrate tools calling too in so called ReAct pattern.
import openai
class LLM:
def __init__(self, model="gpt-4-0613"):
self.model = model
def prompt(self, prompt_text):
response = openai.ChatCompletion.create(
model=self.model,
messages=[{"role": "user", "content": prompt_text}]
)
return response["choices"][0]["message"]["content"]
class Agent:
def __init__(self, name, description, llm, memory=None):
self.name = name
self.description = description
self.llm = llm
self.memory = memory if memory is not None else []
def execute(self, task):
memory_context = "=== START OF MEMORY ===\n"
for old_task, old_response in self.memory:
memory_context += f"[User said]: {old_task}\n[You responded]: {old_response}\n\n"
memory_context += "=== END OF MEMORY ===\n\n"
prompt_text = memory_context + f"Now the user says: {task}"
response = self.llm.prompt(prompt_text)
self.memory.append((task, response))
return response
Manager Pattern That Analyzes and Delegates Work
import os
import json
import openai
class Manager:
def __init__(self, llm):
self.llm = llm
self.agents = {}
def register_agent(self, agent):
self.agents[agent.name] = agent
def choose_agent_by_description(self, task):
descriptions = []
for agent in self.agents.values():
descriptions.append(
f"Name: {agent.name}\nDescription: {agent.description}"
)
prompt_text = (
"You are an expert system that selects the best agent for a given user task.\n\n"
"Agents are described as follows:\n"
+ "\n\n".join(descriptions)
+ "\n\n"
"Given the user's task, reply with valid JSON in the format:\n"
"{\n \"chosen_agent\": \"NameOfTheAgent\"\n}\n\n"
f"User task: {task}"
)
content = self.llm.prompt(prompt_text)
parsed = json.loads(content)
return parsed["chosen_agent"]
def assign_task(self, task):
chosen_name = self.choose_agent_by_description(task)
return self.agents[chosen_name].execute(task)
def main():
openai.api_key = os.getenv("OPENAI_API_KEY")
llm = LLM()
manager = Manager(llm)
manager.register_agent(Agent("SummarizerAgent", "Summarizes text, articles, or documents.", llm))
manager.register_agent(Agent("TranslatorAgent", "Translates text between languages.", llm))
manager.register_agent(Agent("DataAnalysisAgent", "Analyzes data, finds insights, and explains trends.", llm))
user_task = "Please summarize the following text: OpenAI is a leading AI research lab."
result = manager.assign_task(user_task)
print("Final Output:", result)
if __name__ == "__main__":
main()
How It Works
- The
Manager
holds multiple agents and uses the LLM itself to decide which agent is best for a given user task, based on each agent’s description. - The chosen agent then executes the task.
Planner That Can Be Used by the Manager
import os
import json
import openai
class Planner:
def __init__(self, llm):
self.llm = llm
def create_plan(self, goal):
prompt_text = (
"You are a planner system. Given the user's goal, respond in valid JSON with a set of steps.\n"
"Format:\n"
"{\n \"steps\": [\"Step 1\", \"Step 2\"]\n}\n\n"
f"Goal: {goal}"
)
return self.llm.prompt(prompt_text)
class Manager:
def __init__(self, llm):
self.llm = llm
self.agents = {}
def register_agent(self, agent):
self.agents[agent.name] = agent
def choose_agent_by_description(self, task):
descriptions = []
for agent in self.agents.values():
descriptions.append(
f"Name: {agent.name}\nDescription: {agent.description}"
)
prompt_text = (
"You are an expert system that selects the best agent for a given user task.\n\n"
"Agents are described as follows:\n"
+ "\n\n".join(descriptions)
+ "\n\n"
"Given the user's task, reply with valid JSON in the format:\n"
"{\n \"chosen_agent\": \"NameOfTheAgent\"\n}\n\n"
f"User task: {task}"
)
content = self.llm.prompt(prompt_text)
parsed = json.loads(content)
return parsed["chosen_agent"]
def assign_task(self, task):
chosen_name = self.choose_agent_by_description(task)
return self.agents[chosen_name].execute(task)
def main():
openai.api_key = os.getenv("OPENAI_API_KEY")
llm = LLM()
planner = Planner(llm)
manager = Manager(llm)
manager.register_agent(Agent("SummarizerAgent", "Summarizes text, articles, or documents.", llm))
manager.register_agent(Agent("TranslatorAgent", "Translates text between languages.", llm))
manager.register_agent(Agent("DataAnalysisAgent", "Analyzes data, finds insights, and explains trends.", llm))
user_goal = "Plan a small online conference about AI"
plan_json = planner.create_plan(user_goal)
plan_dict = json.loads(plan_json)
steps = plan_dict["steps"]
for step in steps:
result = manager.assign_task(step)
print(result)
if __name__ == "__main__":
main()
How It Works
Planner
uses the LLM to generate a plan of discrete steps in JSON format.Manager
has multiple agents registered, each capable of handling different tasks.- The plan is iterated over step-by-step, and each step is automatically assigned to the best agent.
Summary
These examples show how to implement common agent capabilities from scratch:
- Dynamic Function Calling without extra scaffolding
- Agent with memory, roles, and optional “tool” calling
- Manager Pattern for delegation to specialized agents
- Planner to create multi-step tasks and coordinate execution
While frameworks promise convenience, they often hide the fundamental mechanisms behind layers of abstraction. Starting from your own solution can yield a deeper understanding, more transparent debugging, and the freedom to tune each piece to your use case.
Last, but very important thing: Are agents really agents?
Let’s revisit the definition of an agent. The most common definition of LLM agents focuses on dynamic workflow execution, where the specific flow is determined at runtime by the LLM. However, I argue that the term “agent” is overused in this context. It feels more like a catchy marketing phrase rather than a justified technical designation.
What the most popular frameworks offer is better described as an LLM-driven workflow engine supported by semantic function calling, with varying levels of dynamic behavior. In essence, they provide a structured way to execute workflows, but they fall short of embodying the full concept of an agent as traditionally defined in software engineering or artificial intelligence.
What Would Make a Real LLM Agent System?
In my opinion, a true LLM agent system would include the following elements:
- Role and Specialization: Agents should have well-defined roles, particularly useful for tasks like semantic search.
- Tool Calling (Semantic): The ability to invoke tools dynamically and contextually.
- LLM Processing: Integrating LLM calls effectively to process information and execute tasks.
- Memory: Including both short-term (STM) and long-term memory (LTM), with access to request, session, and global contexts, enabling stateful behavior.
- Dynamic Task Decomposition and Problem Solving: Agents should be capable of breaking down tasks dynamically and solving problems through managers, planners, or group chats.
- Agent Communication and Delegation: Effective communication between agents, including task delegation and routing.
… And Missing in Current Solutions:
- Persistence: Agents should operate independently of the caller, possibly running as separate threads or processes. They should be deployable and architected to function autonomously.
- Autonomous Behavior: True agents should have their own lifecycle, allowing them to observe their environment (including other agents) and react proactively. Existing solutions cannot be called agents until they are independently deployed and capable of functioning as standalone processes.
- Identity and Optimization: Agents should develop a unique identity through their memory and optimize their work based on past executions. This level of sophistication is absent in current implementations.
Food for Thought
The term “agent” is widely used but often misapplied in the context of LLM frameworks. To move beyond the current state, we would need to build systems that embody the characteristics of true agents: persistence, autonomy, and the ability to learn and evolve. Until then, what we call “agents” are closer to workflow engines—powerful and useful but far from realizing the full potential of autonomous software agents.