Programming with LLMs for programmers - Beyond the Hype
- Mariusz Sieraczkiewicz
- Programming , Artificial intelligence , Software development
- December 15, 2024
Table of Contents
Introduction
The field of software development is currently abuzz with discussions about Large Language Models (LLMs), AI agents, and autonomous systems. New libraries and frameworks emerge daily, each promising to revolutionize our coding practices. However, amidst this flurry of buzzwords and complex tools, it’s easy to overlook that many programming tasks involving LLMs can be accomplished with a straightforward approach; it’s not rocket science. While crafting production-ready LLM-based systems is not trivial, using LLMs as part of a system is quite manageable, especially if you avoid over-reliance on the abundance of dedicated frameworks.
We’ll advocate for an evolutionary path: start with basic prompts, then gradually add complexity through parameterization, context, and pipelines, resorting to more intricate agent architectures only when truly necessary. To demonstrate these concepts, we’ll use Python with the OpenAI API to build a customizable code review tool.
Basic Prompt Example
graph LR A[System] -->|Prompt| B[LLM] B --> |Generates Output| C(Response)
Let’s dive right in. We’ll start by exploring how to use prompts to perform basic code reviews with an LLM. As you will see, it is not difficult at all. Just input and output. Could anything be easier for a seasoned programmer?
We will use the OpenAI API in Python to perform a general code review with a simple prompt. Why Python? While I work daily with JVM and Spring AI, why not Python? If you are not familiar with it, GenAI can explain it anytime you want! So let’s see the example. First, make sure you have python-dotenv
installed and you have created a .env
file with your OpenAI API key:
OPENAI_API_KEY=your_actual_api_key_here
from openai import OpenAI
from dotenv import load_dotenv
import os
load_dotenv()
def send_prompt_to_llm(prompt):
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
temperature=0.0,
)
return response.choices[0].message.content.strip()
def generate_code_review():
prompt = f"""
Review the following Python code and provide feedback on its style, logic, and potential improvements:
def add_numbers(x, y):
result = x+ y
return result
"""
return send_prompt_to_llm(prompt)
review = generate_code_review()
print(review)
We’ve also added a function send_prompt_to_llm
that handles sending the prompt to the LLM and retrieving the response – it will be handy later. The generate_code_review
function sends the prompt to the LLM. Easy, isn’t it? Especially since the core and unique part of the implementation is this:
def generate_code_review():
prompt = f"""
Review the following Python code and provide feedback on its style, logic, and potential improvements:
def add_numbers(x, y):
result = x+ y
return result
"""
return send_prompt_to_llm(prompt)
Prompt Engineering
As you can see, we have something working; we even get some results. But the problem is that the review quality is largely out of our control. If we want to have more control over what we get, we need to be more precise. This is where prompt engineering comes into play. There’s a wealth of information about it online, and even the LLM itself can guide you if you ask. However, mastering this skill is crucial for effectively using LLMs in programming, and it will require significant time and effort through trial and error. Without going into too much detail, here are some techniques for crafting more effective code review prompts:
- Be Explicit: Clearly state the desired focus of the review. Do you want feedback on performance, security, adherence to a specific style guide, or something else? Specify it clearly.
- Provide Examples: Including examples of good and bad code within your prompt can help the LLM understand your expectations and provide more relevant feedback.
- Specify Constraints: If you have specific requirements, such as maximum line length or preferred coding conventions, state them explicitly in your prompt.
- Specify Output Expectations: Clearly define the desired format and structure of the output. For example, you might want the review in the form of a bulleted list, a table, or a JSON object. You can also specify the level of detail you expect in the feedback.
Improved Prompt Example - Focusing on Specific Aspects
Let’s refine our code review prompt to focus on specific aspects and define the output expectations:
def generate_code_review():
prompt = f"""
Review the following Python code. Focus on:
1. **Adherence to PEP 8 style guidelines.**
2. **Potential performance bottlenecks.**
3. **Readability and clarity of the code.**
Provide specific suggestions for improvement in a JSON format, with the following structure:
<expected_output>
{{
"overall_assessment": "A brief summary of the code's overall quality.",
"specific_feedback": [
{{
"topic": "The specific aspect being addressed (e.g., 'PEP 8', 'Performance').",
"comment": "Detailed feedback on the topic.",
"suggestion": "Specific suggestion for improvement."
}}
]
}}
</expected_output>
<code_to_review>
def add_numbers(x, y):
result = x+ y
return result
</code_to_review>
"""
return send_prompt_to_llm(prompt)
review = generate_code_review()
print(review)
Here, we’ve refined the prompt by explicitly stating the areas we want the LLM to focus on: PEP 8 compliance, performance, and readability. We also encourage specific suggestions for improvement. Most importantly, we’ve specified that the output should be in JSON format with a specific structure, including an overall assessment and a list of specific feedback items, each with a topic, comment, and suggestion. These details help guide the LLM toward generating a more targeted, structured, and helpful code review. We’ve also added some structuring that helps to ensure the model understands correctly what we have written – markdown can be very useful, and XML for longer parts of text.
Prompt Templates
Static prompts are useful, but real-world code review often requires runtime customization. This is where parameterized prompts come into play. Instead of hardcoding review criteria into your prompt, you can use placeholders that are filled in with data at runtime (e.g., for a specific language, repo, or project). We can use string interpolation if your language supports it, simply concatenate strings, or add some replacement logic.
Example - Enforcing Custom Naming Conventions
Let’s say you want to enforce a specific naming convention, for example, that all function names must be in snake_case. Here’s how you could create a parameterized prompt for that:
def generate_code_review(code):
prompt = f"""
Review the following Python code. Focus on:
1. **Adherence to PEP 8 style guidelines.**
2. **Potential performance bottlenecks.**
3. **Readability and clarity of the code.**
4. **Function names should be in snake_case.**
Provide specific suggestions for improvement in a JSON format, with the following structure:
<expected_output>
{{
"overall_assessment": "A brief summary of the code's overall quality.",
"specific_feedback": [
{{
"topic": "The specific aspect being addressed (e.g., 'PEP 8', 'Performance', 'Naming Convention').",
"comment": "Detailed feedback on the topic.",
"suggestion": "Specific suggestion for improvement."
}}
]
}}
</expected_output>
<code_to_review>
{code}
</code_to_review>
"""
return send_prompt_to_llm(prompt)
code_with_camel_case = """
def addNumbers(x, y):
result = x + y
return result
"""
review = generate_code_review(code_with_camel_case)
print(review)
This relatively simple technique is called a Prompt template and is a base method for runtime prompt customization and enhancing prompts with custom data.
A few words about Tools Calling
Tool calling is another foundational tool in the toolbox. Suppose you would like to delegate the recognition that something needs to be done to the LLM (e.g., retrieving additional files, searching the web). This is where function calling comes into play. The LLM will analyze the prompt, identify the need for a tool, specify which tool to use, and provide the necessary arguments. Your application will then execute that function and return the result to the LLM, allowing it to complete the initial task.
Tool calling is not magic, but it is not the most elegant part of LLM interaction using the plain OpenAI API, so we will leave the details for another time.
Adding Context
graph LR A[System] -->|Parameters| B(Parameterized Prompt) B --> C(Filled Prompt) C -->|Sent to LLM| D[LLM] D --> |Generates Code Review| E(Result) subgraph Context F(Coding Standards) -->|Provides Context| C G(Other data) -->|Provides Context| C end
The previous example is already quite useful, but real-world code review often requires customization based on specific project guidelines, coding standards, or runtime variables that can be defined, for example, on an external wiki page. This is where adding custom context comes into play, and prompt templates are a great means to do it. We will add an additional parameter to our method to dynamically provide rules for code review.
Example - Enforcing Custom Rules and Referencing a Style Guide
def generate_code_review(code, custom_rules):
prompt = f"""
Review the following Python code. Focus on:
{custom_rules}
Provide specific suggestions for improvement in a JSON format, with the following structure:
<expected_output>
{{
"overall_assessment": "A brief summary of the code's overall quality.",
"specific_feedback": [
{{
"topic": "The specific aspect being addressed (e.g., 'PEP 8', 'Performance', 'Custom Rule').",
"comment": "Detailed feedback on the topic.",
"suggestion": "Specific suggestion for improvement."
}}
]
}}
</expected_output>
<code_to_review>
{code}
</code_to_review>
"""
return send_prompt_to_llm(prompt)
code_with_camel_case = """
def addNumbers(x, y):
result = x + y
return result
"""
custom_rules = """
1. **Adherence to PEP 8 style guidelines.**
2. **Potential performance bottlenecks.**
3. **Readability and clarity of the code.**
4. **Function names should be in snake_case and all variables should be descriptive.**
"""
review = generate_code_review(code_with_camel_case, custom_rules)
print(review)
In this example, we’ve added a custom_rules
parameter to the generate_code_review
function. This allows us to dynamically inject specific rules into the prompt. We can now enforce different naming conventions, coding standards, or any other project-specific requirements.
Adding Memory (Conversation History)
graph LR A[System] -->|Parameters| B(Parameterized Prompt) B --> C(Filled Prompt) C --> |Sent to LLM| D[LLM] D --> |Generates Response| E(Result) subgraph Context G(Conversation Memory) -->|Provides History| C end
While the above examples demonstrate how to tailor prompts, LLMs, by default, process each prompt in isolation (in stateless fashion). To create more sophisticated interactions, especially when creating assistants, we would need to introduce the concept of memory. This means storing previous interactions and using them to inform future responses. For our code review tool, this could mean remembering previous code review comments. What for? As you know LLMs are not perfect, specially when dealing with larger prompt, so we could pass the same code to review twice, but providing previous results for the next turn, to generate additional code review comments. Take a look how it could look like:
from openai import OpenAI
from dotenv import load_dotenv
import os
import json
load_dotenv()
def send_prompt_to_llm(prompt, conversation_history=None):
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
messages = [{"role": "system", "content": "You are a helpful assistant."}]
if conversation_history:
messages.extend(conversation_history)
messages.append({"role": "user", "content": prompt})
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0.0,
)
return response.choices[0].message.content.strip()
def generate_code_review_with_memory(code, custom_rules, conversation_history=None):
prompt = f"""
Review the following Python code. Focus on:
{custom_rules}
Only add new suggestions that haven't beed added so far. Don't repeat suggestion.
Provide specific suggestions for improvement in a JSON format, with the following structure:
<expected_output>
{{
"overall_assessment": "A brief summary of the code's overall quality.",
"specific_feedback": [
{{
"topic": "The specific aspect being addressed (e.g., 'PEP 8', 'Performance', 'Custom Rule').",
"comment": "Detailed feedback on the topic.",
"suggestion": "Specific suggestion for improvement."
}}
]
}}
</expected_output>
<code_to_review>
{code}
</code_to_review>
"""
return send_prompt_to_llm(prompt, conversation_history)
code_with_camel_case = """
def addNumbers(x, y):
result = x + y
return result
"""
custom_rules = """
1. **Adherence to PEP 8 style guidelines.**
2. **Potential performance bottlenecks.**
3. **Readability and clarity of the code.**
4. **Function names should be in snake_case and all variables should be descriptive.**
"""
conversation = []
review1 = generate_code_review_with_memory(code_with_camel_case, custom_rules, conversation_history=conversation)
conversation.append({"role": "user", "content": f"""
Review the following Python code.
Focus on:
<custom_rules>
{custom_rules}
</custom_rules>
<code_to_review>
{code_with_camel_case}
</code_to_review>
"""})
conversation.append({"role": "assistant", "content": review1})
print("Review 1:\n", review1)
review2 = generate_code_review_with_memory(code_with_camel_case, custom_rules, conversation_history=conversation)
print("\nReview 2 (with memory):\n", review2)
We have modified the send_prompt_to_llm
function to accept an optional conversation_history
parameter, that is applied to the core open ai chat call, attaching the history of conversation:
if conversation_history:
messages.extend(conversation_history)
messages.append({"role": "user", "content": prompt})
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0.0,
)
The first review is generated without any history:
conversation = []
review1 = generate_code_review_with_memory(code_with_camel_case, custom_rules, conversation_history=conversation)
Then the prompt and response from the first review are added to the conversation
list. The second review is generated with the conversation
history, allowing the LLM to be aware of the previous review.
We have also added instruction in prompt to instruct how to use the history - Only add new suggestions that haven’t beed added so far. Don’t repeat suggestions.
Introducing the second pass with the memory makes our code reviewer stronger.
Retrieval Augmented Generation (RAG)
graph LR A[System] -->|Parameters| B(Parameterized Prompt) B --> C(Filled Prompt) C --> |Sent to LLM| D[LLM] D --> |Generates Code Review| E(Result) subgraph Context G(Additional external data) -->|Provides More Context| C end
Retrieval Augmented Generation (RAG) is another buzzword that has gained popularity in the GenAI space. In essence, RAG refers to the process of retrieving data from external resources and incorporating it into the prompt. This allows the LLM to access information beyond its training data, making it more accurate and contextually aware. Often, vector databases are used to store and retrieve this external data, but we will not dig into this topic in this article to keep it focused.
In the context of our code review tool, RAG could be used to retrieve archival review comments on similar parts of code. This would allow the LLM to treat these past reviews as a reference, providing more consistent and informed feedback. Let’s see how this could be implemented:
from openai import OpenAI
from dotenv import load_dotenv
import os
load_dotenv()
def send_prompt_to_llm(prompt):
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
temperature=0.0,
)
return response.choices[0].message.content.strip()
def generate_code_review_with_rag(code, custom_rules, similar_reviews=None):
prompt = f"""
Review the following Python code. Focus on:
{custom_rules}
Provide specific suggestions for improvement in a JSON format, with the following structure:
<expected_output>
{{
"overall_assessment": "A brief summary of the code's overall quality.",
"specific_feedback": [
{{
"topic": "The specific aspect being addressed (e.g., 'PEP 8', 'Performance', 'Custom Rule').",
"comment": "Detailed feedback on the topic.",
"suggestion": "Specific suggestion for improvement."
}}
]
}}
</expected_output>
<code_to_review>
{code}
</code_to_review>
"""
if similar_reviews:
prompt += f"""
Examples of previous code review remarks on similar code:
{similar_reviews}
Use these examples as a reference for adding comments to the current code review.
"""
return send_prompt_to_llm(prompt)
code_to_review = """
def addNumbers(x, y):
result = x + y
return result
"""
custom_rules = """
1. **Adherence to PEP 8 style guidelines.**
2. **Potential performance bottlenecks.**
3. **Readability and clarity of the code.**
4. **Function names should be in snake_case and all variables should be descriptive.**
"""
def retrieve_similar_code_reviews(code):
# Simulated function to retrieve similar code reviews based on the provided code
return """
- Previous review on a similar function:
<code>
# def calculateArea(length, width):
# area = length * width
# return area
</code>
Comment: "The function name should be in snake_case, consider renaming it to 'calculate_area'."
- Another review:
<code>
# def processData(i):
# o = i.strip()
# return o
<code>
Comment: "Avoid using single-letter variable names like 'i' and 'o', use descriptive names instead."
"""
similar_code_reviews = retrieve_similar_code_reviews(code_to_review)
review3 = generate_code_review_with_rag(code_to_review, custom_rules, similar_reviews=similar_code_reviews)
print("\nReview with RAG (Similar Code Review Remarks):\n", review3)
In this example, we’ve modified the generate_code_review_with_rag
function to accept an optional similar_reviews
parameter. If provided, this parameter is appended to the prompt, simulating the retrieval of previous code review remarks on similar code. The LLM can then use these examples to guide its review, potentially leading to more consistent and informed feedback.
By incorporating memory and RAG we can add more context thus create more context-aware LLM-powered applications. Details how to create RAG based on vector database, fuzzy searching or classical database searching is another story.
Pipelines for AI flows
graph LR A[System] -->|Parameters| B(Parameterized Prompt) B --> C(Filled Prompt) C --> |Sent to LLM| D[LLM] D --> |Generates Response| E(Review prompt) E --> |Sent to LLM| F[LLM] F --> |Generates Response| G(Result) subgraph Context H(Rule one) -->|Provides Context| C I(Rule two) -->|Provides Context| C J(Rule three) --> |Provides Context| C end
Using a well-crafted prompt with additional context may yield results in many cases, but solving more complex problems often requires combining multiple steps to achieve the final outcome. If you have experience with data engineering (such as implementing ETL), this concept might be familiar. Pipes in Unix-like systems are another example: you create multiple steps where the output of one step serves as the input for the next. Additionally, there are options to split or join outputs from multiple steps.
Building Pipelines
So far, we’ve mainly used LLMs in a single-step manner. However, doing a full code review that involves multiple rules and goals—like checking style, logic, and potential errors—is tough to handle with just one prompt. LLMs often find it challenging to manage multiple objectives at the same time. As you may have seen in the previous chapter, we tried memory-based approaches that involve reviewing the code more than once. In practice, though, it’s usually better to handle each rule separately. By creating prompts tailored to specific rules and working through them one at a time, you can bring together the results for a more complete and focused review.
Example - Separating PEP, Performance, and Readability Checks
Let’s build a pipeline that separates the review process into distinct checks for PEP adherence, performance issues, and readability.
def parameterized_rule_check(code_to_review, rule):
prompt = f"""
Review the following Python code for {rule}. Provide feedback in JSON format:
<expected_output>
{{
"overall_assessment": "Summary of the code's {rule} aspects.",
"specific_feedback": [
{{
"topic": "{rule}",
"comment": "Details about the code related to {rule}.",
"suggestion": "Recommendations for improvement."
}}
]
}}
</expected_output>
<code>
{code_to_review}
</code>
"""
return send_prompt_to_llm(prompt)
code_snippet = """
def calculate_sum(a, b):
result = a + b
return result
"""
review1 = parameterized_rule_check(code_snippet, "adherence to PEP 8 style guidelines")
review2 = parameterized_rule_check(code_snippet, "performance issues")
review3 = parameterized_rule_check(code_snippet, "readability")
all_reviews = "\n\n".join([review1, review2, review3])
print("All Reviews:\n", all_reviews)
Self-Checks - Improving Result Quality
While LLMs are powerful, they are not infallible. Splitting a task into multiple smaller tasks and utilizing a structured flow gives us endless possibilities. For instance, we can leverage programming concepts such as loops, conditions (especially with structured outputs), and assembling results from multiple steps. Additionally, we can introduce advanced flow patterns, like having one step validate the results of previous steps or combining intermediate results into a final output (Reflection pattern). Examples of other patterns include iterative refinements, branching flows based on specific conditions, or aggregating results from parallel steps.
Our previous solution of splitting reviews into multiple steps has one potential drawback: it can lead to repetitions or contradictions across different prompt runs. To address this, we can add an extra step to consolidate and validate the results, ensuring consistency and eliminating redundancies.
def review_the_review(code_to_review, all_reviews):
consolidated_prompt = f"""
Consolidate and validate the following reviews for the given code.
Check for repetitions, contradictions, and overall consistency.
Remove or modify review comments if necessary.
**Code:**
<code>
{code_to_review}
</code>
**All Reviews:**
{all_reviews}
Provide the final corrected reviews, output in JSON format:
<code>
{{
"overall_assessment": "A brief summary of the consolidated reviews' quality.",
"specific_feedback": [
{{
"topic": "The specific aspect being addressed (e.g., 'Consistency', 'Accuracy').",
"comment": "Detailed feedback on the topic.",
"suggestion": "Specific suggestion for improvement."
}}
]
}}
</code>
"""
return send_prompt_to_llm(consolidated_prompt)
code_snippet = """
def calculate_sum(a, b):
result = a + b
return result
"""
review1 = parameterized_rule_check(code_snippet, "adherence to PEP 8 style guidelines")
review2 = parameterized_rule_check(code_snippet, "performance issues")
review3 = parameterized_rule_check(code_snippet, "readability")
all_reviews = "".join([review1, review2, review3])
consolidated_review = review_the_review(code_snippet, all_reviews)
print("All Reviews:", all_reviews)
print("Consolidated Review:", consolidated_review)
The source code now includes a review_the_review
function, which consolidates and validates all the individual reviews into a unified output. This addition enhances the quality of the final response by ensuring consistency, eliminating redundancies, and addressing any contradictions in the reviews.
Demystifying Agents
What are Agents in This Context?
The term “agent” is frequently used in AI to describe systems that can perceive their environment, make decisions, and take actions to achieve specific goals. In our code review scenario, an agent might be a program that monitors a codebase for changes, automatically triggers code reviews based on predefined rules or events, and potentially even integrates the feedback into the development workflow.
Agents can often be broken down into simpler components:
- Perception: Gathering information from the environment and keeping gathered information (including memory). In our case, this could involve monitoring code repositories for new commits or pull requests.
- Decision-Making: Based on perceived information, the agent makes decisions. For example, it might decide which code changes require review and which review rules to apply.
- Action: The agent takes action based on its decisions, such as triggering an LLM-powered code review and presenting the results to developers.
Agent-like Behavior with Simple Tools
Importantly, you don’t necessarily need complex agent frameworks to achieve agent-like behavior for code review. You can often build sophisticated systems using the tools we’ve discussed – prompts, loops, and standard programming constructs.
Example - A Simple Git Commit Monitoring Agent
Let’s imagine we want to create a simple agent that monitors a Git repository for new commits and automatically triggers a code review for each commit. We’ll need to make some assumptions here, such as having a way to access commit information. A full implementation would require interacting with a Git API:
import time
def get_new_commits():
# Placeholder implementation - Replace with actual Git interaction
return [
{
"message": "Fix: Handle edge case in calculation",
"diff": "```diff\n--- a/calculator.py\n+++ b/calculator.py\n@@ -1,4 +1,7 @@\n def calculate_sum(a,b):\n- result = a+b\n+ if a is None or b is None:\n+ return 0\n+ else:\n+ result = a + b\n return result\n```"
},
# ... potentially more commits
]
def monitor_commits(repo_path):
last_processed_commit = None
while True:
new_commits = get_new_commits()
for commit in new_commits:
if commit != last_processed_commit:
print(f"New commit detected: {commit['message']}")
code_diff = commit['diff']
review = generate_code_review(code_diff, improved_prompt)
apply_code_review(review)
last_processed_commit = commit
time.sleep(60)
This simplified example demonstrates a basic agent-like behavior:
- Perceiving: The
get_new_commits()
function (which would need to be implemented using a Git API in a real scenario) represents the agent’s ability to perceive changes in the environment (the Git repository). - Decision-Making: The agent checks if a commit is new (
if commit != last_processed_commit
). This represents a simple decision-making process. - Taking Action: If a new commit is detected, the agent triggers a code review using our previously defined
generate_code_review
function and prints the results.
This is a simplified example, but it illustrates how you can achieve a degree of automation in code review without resorting to complex agent frameworks.
Threads/Actors/Distributed Systems - Scaling Up
For more complex agent interactions or scenarios requiring parallel processing of multiple code reviews, you might eventually need more advanced yet classical programming tools:
- Threads: Threads allow you to run multiple tasks concurrently within a single process. Useful for performing code reviews on multiple commits or files simultaneously.
- Actors: The actor model is a concurrency paradigm where independent “actors” communicate via messages. This can be more robust and scalable than threads for certain agent systems.
- Distributed Systems: For large-scale, fault-tolerant agent systems, you might need to distribute agents across multiple machines, typically using message queues (like RabbitMQ or Kafka) for inter-agent communication.
These more complex tools should only be employed when the complexity of your agent system truly warrants it. Often, the simpler approach using prompts, loops, and basic programming constructs will suffice for many usage scenarios.
Evolutionary path for LLMs
flowchart LR A(Simple Prompt) --> B(Prompt Template) B --> CC(Context) CC --> PP(Pipelines) PP --> AA(Agents) AA --> H(Distributed system) subgraph Context direction LR CC --> M(Memory) CC --> T(Tools calling) CC --> R(RAG) end
Evolutionary paths is a concept that emphasizes an evolutionary approach to design and architecture, encouraging the preference for simpler solutions, especially in the early phases of development. In the case of LLMs, most problems can be solved using techniques like Simple Prompts, Prompt Templates, Context, and Pipelines. Agents and agent distributions should be considered as a last resort due to their complexity, and the trade-off should be carefully evaluated before deciding to use them.
Conclusion
The main point I want to get across is that programming with LLMs is pretty straightforward for any experienced programmer. In most cases, you don’t need a specific framework to handle even complex scenarios—just using the core LLM API is often enough. Frameworks tend to add their own quirks, layers of abstraction, and a bit of “magic,” which can make the technology (in this case, using LLMs for system implementation) feel more complicated than it really is.
That’s not to say frameworks don’t have their place—they can be super useful for things like scaling, monitoring, and deployment. But most of those tools already exist in the software world, and chances are you’re already familiar with them.
I’m not saying you shouldn’t use frameworks (I personally use SpringAI), but it’s important to make that choice with a clear understanding of the basics. Know what the framework gives you and what it takes away.
Most importantly, have fun working with LLMs!