Deconstructing LLM Agents for software engineers - Beyond the Hype

Mariusz Sieraczkiewicz
Programming , Artificial intelligence , Software development
January 9, 2025

Table of Contents

Introduction

Agents, agents, agents everywhere. To be clear: AI agents. Oh, sorry—to be precise, LLM agents. A lot of buzz surrounds these agents. They were called the most important AI innovation of 2024, and now they’re touted as the hottest topic of 2025. But is this something we truly need? What exactly are they? Are all agents the same? Let’s dive deeper into the subject.

I’ll start by saying I’m a bit skeptical about the whole buzz. “Agents” seems to be a catchy term for yet another type of wrapper for LLMs—a good marketing name for AI based on language models. They come with their strengths and weaknesses, and adopting agents won’t magically solve all human problems. But first things first: let’s break things down to ensure we’re on the same page.

We’ll begin by clarifying what LLM agents are. Then, we’ll explore a few frameworks and compare their implementation approaches. Next, we’ll analyze how to implement agent-like properties using classical software tools. Finally, we’ll discuss whether the term ‘agent’ is really justified.

Agents: From General to AI to LLM

Agents is neither a new term in engineering nor in software engineering, so let’s just take a look at already well-defined definitions.

Classical Software Agent Definition

According to Techopedia, a software agent is defined as:

A piece of software that functions as an agent for a user or another program, working autonomously and continuously in a particular environment. It is inhibited by other processes and agents, but is also able to learn from its experience in functioning in an environment over a long period of time.
Attributes of autonomous software agents include activation without being invoked, perceiving and reacting to their environment, autonomy in decision-making and goal achievement, communication and collaboration with other agents, persistence and responsiveness.

Examples include early robotic vacuum cleaners, basic chatbots, traffic lights, and thermostats.

AI Agents

Also AI (or intelligent) agents are nothing new. Let’s take a look at Wikipedia Definition

An intelligent agent perceives its environment, takes actions autonomously in order to achieve goals, and may improve its performance with learning or acquiring knowledge.

Types of intelligent agents:

Simple reflex agents: Respond directly to environmental stimuli without considering the context.
Model-based reflex agents: Maintain an internal model of the world to respond to stimuli.
Goal-based agents: Take actions guided by goal achievement.
Utility-based agents: Choose actions based on a utility function to maximize outcomes.
Learning agents: Improve performance over time by learning from their experiences.

LLM Agents

If you are familiar with LLM agent frameworks, you’ll notice that agents are defined differently than those presented above. Here, the term ‘agent’ refers to a much simpler responsibility, primarily focused on executing workflows driven by LLMs. Below are definitions of LLM agents provided by well-known tools and frameworks for comparison.

Anthropic

At Anthropic, we categorize all these variations as agentic systems, but draw an important architectural distinction between workflows and agents:
Workflows are systems where LLMs and tools are orchestrated through predefined code paths.
Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish task
Building effective agents

Google

A generative AI agent attempts to achieve a goal by observing and acting upon the world using tools

CrewAI

An agent is an autonomous unit that performs tasks, makes decisions, uses tools, communicates and collaborates, maintains memory of interactions, delegates tasks.

Langgraph

An agent uses an LLM to decide the control flow of an application.

AutoGen

An agent communicates via messages, maintains state, and acts based on messages or state changes.

Semantic Kernel

An AI agent autonomously performs tasks, processes input, and takes actions to achieve goals.

So, comparing to the original software agent or intelligent agent, are these systems truly deserving of the term ‘agents’? We will revisit this question at the end of the article.

Existing Agent Frameworks Comparison

Now that we have a clearer understanding of what an LLM agent is, let’s examine some of the currently popular frameworks that claim to be agentic. We’ll compare their approaches to implementation and analyze their strengths and weaknesses. To better understand what agent collaboration may look like, we will show a practical example application implemented with one of the frameworks — CrewAI. The application will be an agent-based system that updates living system documentation and drafts unit tests upon the arrival of the new project tickets.

Introduction to CrewAI

CrewAI executes workflows by organizing agents into coordinated “crews.” These crews collaborate to handle tasks by breaking them down into smaller, role-specific actions. Each agent operates with a predefined role and goal, contributing its specialized expertise to achieve the collective objective.

Base Concepts of CrewAI

Agents:
- The fundamental building blocks in CrewAI.
- Each agent has a defined role, goal, and backstory, which guide its behavior and interactions.
- Agents process tasks.
Tasks:
- Represent units of work assigned to agents.
- Each task includes a description, expected output, and an associated agent.
- Tasks can be contextual, utilizing results from other tasks to enhance their outputs.
Crew:
- A collection of agents working together as a team.
- Crews coordinate the execution of tasks in sequential or parallel workflows.
- They utilize shared memory and communication channels to optimize collaboration and consistency.
Process:
- Defines how tasks are executed within the crew.
- Sequential processes ensure ordered execution, while hierarchical processes allow tasks to execute dynamically.

Configuration

The configuration files define agents and tasks:

Agents: Each agent has a role, goal, and backstory. For instance:

business_analyst:
  role: Business Analyst
  goal: Analyze functional tickets to extract entities and business rules.
  backstory: Experienced in interpreting business requirements.

technical_analyst:
  role: Technical Analyst
  goal: Analyze technical tickets to extract technical details.
  backstory: Skilled in understanding technical specifications.

domain_expert:
  role: Domain Expert
  goal: Extract and describe system domain behaviors.
  backstory: Proficient in identifying system operations and domain processes.

technical_writer:
  role: Technical Writer
  goal: Update system documentation based on analyses.
  backstory: Expert in documenting technical content.

test_engineer:
  role: Test Engineer
  goal: Draft unit test cases based on analyses.
  backstory: Experienced in creating unit test cases for system functionalities.

Tasks: Tasks specify the work to be performed, expected output, and the responsible agent. For example:

analyze_functional_ticket:
  description: >
    Analyze the functional ticket to extract entities and business rules.    
  expected_output: >
    Your task is to analyze functional tickets provided by the user.
        Each ticket includes specific details about requested functionalities, such as requirements and constraints.
        Your analysis should identify:
        - Entities involved in the ticket (e.g., users, systems, data).
        - Business rules governing the functionality described in the ticket.
    Based on the provided ticket details:
    Title: {title}
    Description: {description}
    Acceptance criteria: {acceptance_criteria}
    Analyze and provide the following:
    - Entities involved as list
    - Business rules relevant to the functionality as list    
  agent: business_analyst

update_documentation:
  description: Update the system documentation using the analysis results.
  expected_output: >
    Incorporate details about entities, business rules, and domain behaviors to create comprehensive documentation.
    Based on the analysis results, create updated structured documentation for the system.    
  agent: technical_writer
  context: [analyze_functional_ticket, analyze_technical_ticket, extract_domain_behaviors]

Code

The implementation is in a DocumentationUpdaterCrew class that integrates agents and tasks. The class includes methods for creating agents, defining tasks, and orchestrating workflows.

import logging
from crewai import Agent, Crew, Process, Task

@CrewBase
class DocumentationUpdaterCrew:
    agents_config = 'config/agents.yml'
    tasks_config = 'config/tasks.yml'

    @agent
    def business_analyst(self) -> Agent:
        return Agent(config=self.agents_config['business_analyst'], verbose=True)

    @agent
    def technical_analyst(self) -> Agent:
        return Agent(config=self.agents_config['technical_analyst'], verbose=True)

    @agent
    def technical_writer(self) -> Agent:
        return Agent(config=self.agents_config['technical_writer'], verbose=True)

    @task
    def analyze_functional_ticket(self) -> Task:
        return Task(config=self.tasks_config['analyze_functional_ticket'])

    @task
    def update_documentation(self) -> Task:
        return Task(config=self.tasks_config['update_documentation'])

    # Additional agents and tasks can be defined as needed for other roles and workflows.

    @crew
    def crew(self) -> Crew:
        return Crew(
            agents=self.agents,
            tasks=self.tasks,
            process=Process.sequential
        )

    def handle_new_ticket(self, ticket_details: dict):
        logging.debug(f"Received new ticket: {ticket_details}")
        self.crew().kickoff(inputs=ticket_details)

Agents here are invoked sequentially within the DocumentationUpdaterCrew class. Each agent, such as the business_analyst or technical_writer, is designed to handle specific tasks based on their configuration. Tasks are defined with inputs and expected outputs, and the process is managed in a sequential manner. Results from one task, like analyzing functional tickets, are passed as context to subsequent tasks, such as updating documentation. This setup ensures a logical flow of information and collaborative task execution among the agents. However, this is not the only possible configuration. A more dynamic and agentic approach would be a hierarchical process, where an overarching manager agent dynamically delegates tasks to other agents. This method gives you more flexibility and can adapt better to complex problems but comes with more indirection while solving a problem.

Due to the limited size of this article, it is not possible to showcase different implementations across all frameworks. This topic is well-suited for a separate, dedicated article. Here, I will focus on describing four selected frameworks, highlighting their unique characteristics and approaches. Selection is arbitrary, the main rationale was the fact that presented tools have their unique properties.

Selected Framework Characteristics

CrewAI: You could already see CrewAI in action (or better: implementation). In addition to that the framework has ability to utilize dynamic tool calling and memory management (including short-term memory, entity memory, and user memory) enhances task execution. While the memory system gives agents some individuality, its current implementation is limited to suggestions from previous interactions, rather than fully keeping historical context.

Langraph: Langraph adopts a graph-based workflow model. Tasks are represented as nodes, and transitions between these nodes depend on evaluations made at each node. These evaluations may be deterministic, based on boolean logic, or fuzzy, leveraging LLM capabilities. A shared state mechanism (GraphState) enables nodes to read, update, and share additional data. While Langraph’s approach offers clarity and structure, it lacks dynamic task recognition and problem-solving capabilities, making it a less agentic solution overall.

AutoGen: AutoGen represents the most dynamic and agentic framework in terms of autonomy. It utilizes specialized agents in a shared conversational environment. These agents collaborate through strategies like group chat, round-robin discussions, or random selection, working together to solve problems. Conversations are summarized and passed between agents as needed. The discussion process ends based on predefined rules, allowing for iterative problem-solving.

Semantic Kernel: Semantic Kernel focuses on grouping functions into skills, which can either be semantic (LLM-powered) or native (traditional code). These skills can be chained to form complex workflows. The kernel functions as a manager, executing flows and sharing context across functions. Semantic Kernel’s planner feature introduces dynamic problem-solving capabilities. Recent updates have incorporated agents in a manner similar to AutoGen.

Comparison of Frameworks

Framework	Role/Specialization	Tool Calling	Memory (STM, LTM, Context Access)	LLM Processing	Dynamic Problem Solving	Agent Communication/Delegation
CrewAI	Defined roles with goals and backstories	Supported	STM, LTM, entity/user memory	LLM calls	Manager-driven, hierarchical process	Delegation
LangGraph	Graph-based, no explicit roles	Supported	Shared state (GraphState)	LLM calls	Node-based decisions	LImited by the graph
AutoGen	Specialized roles in group chat	Supported	Shared conversation memory	LLM calls	Group chat, round-robin or other strategy	Discussion summarization
Semantic Kernel	Skill-oriented functions, AutoGen like agents	Supported	Request, session, global memory	LLM calls	Dynamic planner features	Agents like in AutoGen

Implementing Agent Behaviors Without Frameworks

Frameworks might be helpful, but they also come with a price. Each framework implements specific mechanisms—manipulating prompts, extending them, or making additional calls—in its own way. It’s easy to be trapped by the indirection they add, making it difficult to debug or reason about efficiency and performance.

That’s why I advise against jumping into frameworks prematurely unless you have substantial time or a thorough understanding of the underlying mechanisms. Many of so-called “agentic” patterns are actually straightforward for experienced software engineers to replicate.

Below, you’ll see how to implement each major agent pattern in plain Python with OpenAI’s library—no additional frameworks required. They are not production ready, but can give you a glimps of how it can look like.

Dynamic Function Calling With OpenAI’s Python Library

This snippet uses function calling feature to either search a database or summarize text, depending on the user input. We then parse the LLM’s response to see which function was called and execute it in Python.

import os
import json
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

def search_database(query):
    return f"Pretending to search the database for: {query}"

def summarize_text(text):
    return f"Summary of the text: {text}"

def call_function(func_name, func_args):
    if func_name == "search_database":
        return search_database(func_args.get("query", ""))
    elif func_name == "summarize_text":
        return summarize_text(func_args.get("text", ""))
    else:
        raise ValueError(f"Unknown function: {func_name}")

def main():
    user_message = input("User message: ") 

    response = openai.ChatCompletion.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": user_message}],
        functions=[
            {
                "name": "search_database",
                "description": "Searches for information in a database.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "query": {
                            "type": "string",
                            "description": "A query string to search for in the database"
                        }
                    },
                    "required": ["query"]
                }
            },
            {
                "name": "summarize_text",
                "description": "Summarizes a given piece of text.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "text": {
                            "type": "string",
                            "description": "The text to summarize"
                        }
                    },
                    "required": ["text"]
                }
            }
        ],
        function_call="auto"
    )

    finish_reason = response["choices"][0]["finish_reason"]
    if finish_reason == "function_call":
        func_name = response["choices"][0]["function_call"]["name"]
        raw_args = response["choices"][0]["function_call"]["arguments"]
        func_args = json.loads(raw_args)
        result = call_function(func_name, func_args)
        print("Function Call Result:", result)
    else:
        # LLM responded with a direct text answer
        direct_answer = response["choices"][0]["message"]["content"]
        print("LLM Response:", direct_answer)

if __name__ == "__main__":
    main()

How It Works

Function Definitions
We define two local functions in Python: search_database and summarize_text. Each returns a simple mock response.
call_function Dispatch
When the LLM chooses which function to call, we route the request through call_function(func_name, func_args), which invokes the correct function based on func_name.
function_call="auto"
By setting function_call="auto", the LLM decides if (and which) function to call. If the user request fits the description of one of our registered functions, LLM returns a finish_reason of "function_call", along with the chosen function name and JSON-encoded arguments.
Response Handling
- If finish_reason == "function_call", we parse function_call["arguments"] from JSON, invoke the corresponding local Python function, and print its result.
- Otherwise, we print the LLM’s direct response (i.e., LLM didn’t see the need to call a function).

Object Representation of an Agent With Memory

The following code shows how each Agent can define and use its role, store previous prompts and responses in memory and then reuse them, tagging them clearly under “START OF MEMORY” and “END OF MEMORY”. You could integrate tools calling too in so called ReAct pattern.

import openai

class LLM:
    def __init__(self, model="gpt-4-0613"):
        self.model = model

    def prompt(self, prompt_text):
        response = openai.ChatCompletion.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt_text}]
        )
        return response["choices"][0]["message"]["content"]

class Agent:
    def __init__(self, name, description, llm, memory=None):
        self.name = name
        self.description = description
        self.llm = llm
        self.memory = memory if memory is not None else []

    def execute(self, task):
        memory_context = "=== START OF MEMORY ===\n"
        for old_task, old_response in self.memory:
            memory_context += f"[User said]: {old_task}\n[You responded]: {old_response}\n\n"
        memory_context += "=== END OF MEMORY ===\n\n"
        prompt_text = memory_context + f"Now the user says: {task}"
        response = self.llm.prompt(prompt_text)
        self.memory.append((task, response))
        return response

Manager Pattern That Analyzes and Delegates Work

import os
import json
import openai

class Manager:
    def __init__(self, llm):
        self.llm = llm
        self.agents = {}

    def register_agent(self, agent):
        self.agents[agent.name] = agent

    def choose_agent_by_description(self, task):
        descriptions = []
        for agent in self.agents.values():
            descriptions.append(
                f"Name: {agent.name}\nDescription: {agent.description}"
            )
        prompt_text = (
            "You are an expert system that selects the best agent for a given user task.\n\n"
            "Agents are described as follows:\n"
            + "\n\n".join(descriptions)
            + "\n\n"
            "Given the user's task, reply with valid JSON in the format:\n"
            "{\n  \"chosen_agent\": \"NameOfTheAgent\"\n}\n\n"
            f"User task: {task}"
        )
        content = self.llm.prompt(prompt_text)
        parsed = json.loads(content)
        return parsed["chosen_agent"]

    def assign_task(self, task):
        chosen_name = self.choose_agent_by_description(task)
        return self.agents[chosen_name].execute(task)

def main():
    openai.api_key = os.getenv("OPENAI_API_KEY")
    llm = LLM()
    manager = Manager(llm)
    manager.register_agent(Agent("SummarizerAgent", "Summarizes text, articles, or documents.", llm))
    manager.register_agent(Agent("TranslatorAgent", "Translates text between languages.", llm))
    manager.register_agent(Agent("DataAnalysisAgent", "Analyzes data, finds insights, and explains trends.", llm))

    user_task = "Please summarize the following text: OpenAI is a leading AI research lab."
    result = manager.assign_task(user_task)
    print("Final Output:", result)

if __name__ == "__main__":
    main()

How It Works

The Manager holds multiple agents and uses the LLM itself to decide which agent is best for a given user task, based on each agent’s description.
The chosen agent then executes the task.

Planner That Can Be Used by the Manager

import os
import json
import openai

class Planner:
    def __init__(self, llm):
        self.llm = llm

    def create_plan(self, goal):
        prompt_text = (
            "You are a planner system. Given the user's goal, respond in valid JSON with a set of steps.\n"
            "Format:\n"
            "{\n  \"steps\": [\"Step 1\", \"Step 2\"]\n}\n\n"
            f"Goal: {goal}"
        )
        return self.llm.prompt(prompt_text)

class Manager:
    def __init__(self, llm):
        self.llm = llm
        self.agents = {}

    def register_agent(self, agent):
        self.agents[agent.name] = agent

    def choose_agent_by_description(self, task):
        descriptions = []
        for agent in self.agents.values():
            descriptions.append(
                f"Name: {agent.name}\nDescription: {agent.description}"
            )
        prompt_text = (
            "You are an expert system that selects the best agent for a given user task.\n\n"
            "Agents are described as follows:\n"
            + "\n\n".join(descriptions)
            + "\n\n"
            "Given the user's task, reply with valid JSON in the format:\n"
            "{\n  \"chosen_agent\": \"NameOfTheAgent\"\n}\n\n"
            f"User task: {task}"
        )
        content = self.llm.prompt(prompt_text)
        parsed = json.loads(content)
        return parsed["chosen_agent"]

    def assign_task(self, task):
        chosen_name = self.choose_agent_by_description(task)
        return self.agents[chosen_name].execute(task)

def main():
    openai.api_key = os.getenv("OPENAI_API_KEY")
    llm = LLM()
    planner = Planner(llm)
    manager = Manager(llm)

    manager.register_agent(Agent("SummarizerAgent", "Summarizes text, articles, or documents.", llm))
    manager.register_agent(Agent("TranslatorAgent", "Translates text between languages.", llm))
    manager.register_agent(Agent("DataAnalysisAgent", "Analyzes data, finds insights, and explains trends.", llm))

    user_goal = "Plan a small online conference about AI"
    plan_json = planner.create_plan(user_goal)
    plan_dict = json.loads(plan_json)
    steps = plan_dict["steps"]

    for step in steps:
        result = manager.assign_task(step)
        print(result)

if __name__ == "__main__":
    main()

How It Works

Planner uses the LLM to generate a plan of discrete steps in JSON format.
Manager has multiple agents registered, each capable of handling different tasks.
The plan is iterated over step-by-step, and each step is automatically assigned to the best agent.

Summary

These examples show how to implement common agent capabilities from scratch:

Dynamic Function Calling without extra scaffolding
Agent with memory, roles, and optional “tool” calling
Manager Pattern for delegation to specialized agents
Planner to create multi-step tasks and coordinate execution

While frameworks promise convenience, they often hide the fundamental mechanisms behind layers of abstraction. Starting from your own solution can yield a deeper understanding, more transparent debugging, and the freedom to tune each piece to your use case.

Last, but very important thing: Are agents really agents?

Let’s revisit the definition of an agent. The most common definition of LLM agents focuses on dynamic workflow execution, where the specific flow is determined at runtime by the LLM. However, I argue that the term “agent” is overused in this context. It feels more like a catchy marketing phrase rather than a justified technical designation.

What the most popular frameworks offer is better described as an LLM-driven workflow engine supported by semantic function calling, with varying levels of dynamic behavior. In essence, they provide a structured way to execute workflows, but they fall short of embodying the full concept of an agent as traditionally defined in software engineering or artificial intelligence.

What Would Make a Real LLM Agent System?

In my opinion, a true LLM agent system would include the following elements:

Role and Specialization: Agents should have well-defined roles, particularly useful for tasks like semantic search.
Tool Calling (Semantic): The ability to invoke tools dynamically and contextually.
LLM Processing: Integrating LLM calls effectively to process information and execute tasks.
Memory: Including both short-term (STM) and long-term memory (LTM), with access to request, session, and global contexts, enabling stateful behavior.
Dynamic Task Decomposition and Problem Solving: Agents should be capable of breaking down tasks dynamically and solving problems through managers, planners, or group chats.
Agent Communication and Delegation: Effective communication between agents, including task delegation and routing.

… And Missing in Current Solutions:

Persistence: Agents should operate independently of the caller, possibly running as separate threads or processes. They should be deployable and architected to function autonomously.
Autonomous Behavior: True agents should have their own lifecycle, allowing them to observe their environment (including other agents) and react proactively. Existing solutions cannot be called agents until they are independently deployed and capable of functioning as standalone processes.
Identity and Optimization: Agents should develop a unique identity through their memory and optimize their work based on past executions. This level of sophistication is absent in current implementations.

Food for Thought

The term “agent” is widely used but often misapplied in the context of LLM frameworks. To move beyond the current state, we would need to build systems that embody the characteristics of true agents: persistence, autonomy, and the ability to learn and evolve. Until then, what we call “agents” are closer to workflow engines—powerful and useful but far from realizing the full potential of autonomous software agents.

Deconstructing LLM Agents for software engineers - Beyond the Hype

Introduction

Agents: From General to AI to LLM

Classical Software Agent Definition

AI Agents

LLM Agents

Anthropic

Google

CrewAI

Langgraph

AutoGen

Semantic Kernel

Existing Agent Frameworks Comparison

Introduction to CrewAI

Base Concepts of CrewAI

Configuration

Code

Selected Framework Characteristics

Comparison of Frameworks

Implementing Agent Behaviors Without Frameworks

Dynamic Function Calling With OpenAI’s Python Library

Object Representation of an Agent With Memory

Manager Pattern That Analyzes and Delegates Work

Planner That Can Be Used by the Manager

Summary

Last, but very important thing: Are agents really agents?

What Would Make a Real LLM Agent System?

Food for Thought

Tags

Related Posts

The Natural Order of Refactoring Under the Microscope Part 5: Evolution of Architecture

Architectural Evolution

Task-doing vs. Responsibility Taking - A Subtle Distinction

The Subtle Distinction Between Task-doing and Responsibility Taking

Conversation with the Client

First Conversation with the Client

Deconstructing LLM Agents for software engineers - Beyond the Hype

Introduction

Agents: From General to AI to LLM

Classical Software Agent Definition

AI Agents

LLM Agents

Anthropic

Google

CrewAI

Langgraph

AutoGen

Semantic Kernel

Existing Agent Frameworks Comparison

Introduction to CrewAI

Base Concepts of CrewAI

Configuration

Code

Selected Framework Characteristics

Comparison of Frameworks

Implementing Agent Behaviors Without Frameworks

Dynamic Function Calling With OpenAI’s Python Library

Object Representation of an Agent With Memory

Manager Pattern That Analyzes and Delegates Work

Planner That Can Be Used by the Manager

Summary

Last, but very important thing: Are agents really agents?

What Would Make a Real LLM Agent System?

Food for Thought

Tags

Share :

Related Posts

The Natural Order of Refactoring Under the Microscope Part 5: Evolution of Architecture

Architectural Evolution

Task-doing vs. Responsibility Taking - A Subtle Distinction

The Subtle Distinction Between Task-doing and Responsibility Taking

Conversation with the Client

First Conversation with the Client