Introduction
There is a lot of buzz going on around LLMs right now (and rightfully so). LLM capabilities are advancing rapidly and they get better and better with each new model release. For some of us, LLMs have become like mobile phones: it’s a technology we use everyday for different purposes.
I wanted to write this post as I feel that some people may be skeptical about using LLMs, since they heard LLMs can sometimes produce nonsense. This is true; LLMs do produce nonsense at times, but more often than not the positives outweigh the negatives.
The target audience of this post are people that have heard of LLMs, but haven’t really used them yet, besides maybe trying them out a few times via a web browser. In this post, I will try my best to enumerate all the ways I use LLMs in my day-to-day, as well as to point out what LLMs can’t do yet on a sufficieint level of quality (at least from my perspective). Most of these use cases are done by me interacting with a web browser, but some are specific to programming and relate to Cursor. I won’t go in great technical detail or explain why LLMs are good or bad at something; I’ll leave that for another blog post.
I divided the blog post in 3 major sections: the good, the bad and the ugly. Each section describes use cases where LLMs shine, use cases where LLMs are still bad at and use cases that are just inconvenient.
I hope that by the end of this blog post you will be convinced to give LLMs a shot (or another shot). Let’s begin!
The good
Proofreading emails and messages
Note: Below I talk about emails, but the same applies to messages. Usually, emails are more formal so I tend to proofread them more often.
Without LLMs, I sent important or lengthy emails roughly like this:
- Write the first draft of the email.
- Do some other task.
- Re-read my draft from step 1 and revise it.
- Send the email.
I found this method to be very useful when sending important emails with technical details and I wanted to make sure my point gets across well (while not bogging down the recipient(s) in unnecessary technical details if there’s no need for that).
With LLMs, my workflow looks like this:
- Write the first draft of the email (sometimes LLMs assist me with this as well).
- Send the email to an LLM and ask it whether I can get my point across more consicely.
- Read what the LLM suggests.
- Send the email.
In this way, the LLM is like a second person that proofreads my email and suggests improvements. In this way, I don’t have to re-read my emails, but rather I have a “second set of eyes” which read the email, suggest improvements (if any) and I can send it right away.
The prompt I use is something along the lines of:
I am composing an email which is trying to get the following points across:
* Point 1
* Point 2
* Point 3
Here is some additional context: <Context>
Is the email below good? Could it be improved in some way?
"""
<Email goes here>
"""
Sometimes, I even ask the LLM to help me draft an email:
I am composing an email which is trying to get the following points across:
* Point 1
* Point 2
* Point 3
Here is some additional context: <Context>
Could you please draft an email which would get the points accross?
Usually inside the <Context>
I describe any additional context (if needed).
Double-checking ideas
Sometimes I use LLMs to double-check some ideas I have.
As a concrete example, here is a prompt I used to double-check my website reorganization ideas:
What do you think about this organization? It doesn't seem right to me... Btw. Blog will just be Blog for now; no additional categories.
"""
Main Navigation:
* Home
* Portfolio (your work)
* Blog (your thoughts/writings)
* Resources (learning materials)
* About
* Contact
Under Portfolio:
* Professional Experience
* Side Projects
* Publications (formal research papers)
Under Blog:
* Machine Learning
* LLMs & GenAI
* Computer Vision
* Career Insights
* Technical Deep Dives
Under Resources:
* Tutorials
* Book Exercise Solutions
* Podcasts
* Certifications
Under About:
* About Me
* Education
* CV/Resume Download
"""
As you can see, it’s almost as if I was talking to a human. The LLM then goes on to propose improvements (if any) to my ideas. I noticed that ChatGPT 4o tends to propose improvements even after 5 or 6 iterations, while Claude Opus 4 seems to tell you it’s excellent in 1 iteration (given that you implemented its suggestions).
Doing research I’d spend an afternoon (or a few afternoons) on
Here I found ChatGPT’s Deep Research very useful for doing research I’d spend an afternoon (or a few afternoons) on. I should note that just recently Anthropic introduced Research functionality to their web interface, so maybe it’s as good as or even better than ChatGPT’s Deep Research.
In any case, I usually ask the LLM to look into stuff that I have on my backlog, but I haven’t attended to them yet. An example would be the following:
I am interested in AI Safety research. I do not have a PhD in computer science, but I do have a master's degree and 4-5 years of experience in AI/ML. I am attaching my CV.
Please look into some research camps (or similar) which would allow me to participate in AI Safety research and would allow me to test how well I would fare at it. I am particulary interested in the camps which I could attend remotely (and ideally part-time).
Usually the original question is followed up with multiple follow-up questions, and after that the LLM gets to work.
I found the quality of the results to vary: sometimes it gives me genuinely good results, but sometimes it gives me results not really relevant for my circumstances. In either case, I at least have a better idea about the topic I’m researching and I have at least some pointers I could start with.
Brainstorming possible solutions
In these examples I’ll use software engineering, but I think this could apply to any field of work (and maybe even to general life problems or situations).
Before LLMs, when I faced a new task that I was unfamiliar with, I did the following :
- If available, I would ask someone more senior than me for pointers on how to do what I wanted/needed to do.
- If no senior was available, I’d Google around to find ideas for a solution.
- I’d implement whatever I found through either step 1 or step 2 (usually I combined them).
Now, with LLMs, my workflow goes like this:
- I articulate the task I’m trying to solve and ask an LLM for ideas.
- LLM gives me some ideas on how to approach it.
- I usually ask follow-up questions and double-check some things to make sure LLM didn’t hallucinate.
- I pick a solution and proceed to implement it.
One concrete example: for one of my clients, I was tasked with writing a local LLM and ASR server in C++ using a specific NVIDIA SDK. There were multiple things to consider here:
- I didn’t write C++ in some time (my focus is primarily Python)
- Rest of the team used Python
- The server architecture was something I’d have to consider
In order to brainstorm possible solutions, I asked an LLM (Claude in particular) something along the lines of:
I am tasked with writing a local LLM and ASR inference server with NVIDIA XYZ SDK. The SDK uses C++.
Here are some things on my mind:
* I haven't written C++ in a few years now, so I'd prefer Python if possible
* Other team members use Python
* I am brainstorming possible server architectures; do you have any suggestions?
And then I brainstormed with the LLM. It proved to be really useful and suggested a solution that worked great for everyone on the team.
Creating files and folders for a project
When I started a new project before LLMs, I spent some time creating files and folders for my project and writing boilerplate code.
Now, I usually start Cursor, describe my project in Agent mode (or a markdown file, which I attach to the chat) and it creates the file structure for me. I have to note it’s not perfect and sometimes it tends to overcomplicate things, but overall it’s more good than bad.
Writing boilerplate code
This is closely related to the previous point I mentioned regarding creating files and folders for a project. Usually, there is a lot of boilerplate code to be written (i.e. every FastAPI server has the same structure), so it’s good that I can automate that part as well.
Writing clearly defined bits of code
I find Cursor’s inline edit functionality to be very good for making targeted changes.
For example, say I want to add two more options to command line arguments. What I would do is select the code in question, press Ctrl + K on my keyboard, then write something along the lines of:
Please expand the options list to include two additional options:
* Option 1 - Description of what the option
* Option 2 - Description of what the option
I also use it sometimes if I’m unsure about some specific syntax or class attributes; it’s usually very good there.
Asking codebase-related questions
I find Cursor very useful to get acquainted with a new codebase. I usually load the codebase inside Cursor, enter Ask mode and then I ask questions about the codebase.
I double-check the details with programmers working on the codebase of course, but this gives me some insights I might miss if I was just manually exploring the codebase.
The bad
Cleanly adding features to an existing codebase
If you want to add a feature to an existing codebase, doing it with LLMs can lead to deteriotation of code quality. From my experience, LLMs tend to overcomplicate the solutions and even if they take a simple approach, it usually involves some manual debugging.
For adding features to an existing codebase, I prefer consulting with codebase author(s), then implementing the feature in a way we agree and using LLMs for specific code blocks, as I outlined in the section Writing clearly defined bits of code.
Even if the entire codebase was made with the help of LLMs, it can still introduce some “unclean” code.
For example, have a look at this example from my SWEMentor side project:
Let’s consider two classes:
CodeIssue
– represents a code issue with its solution and contextCodePattern
– represents a code pattern with its description and example
Now consider these two method’s signatures:
def add_issue(self, issue: CodeIssue) -> None:
So far so good (although the method could return a bool
indicating whether it succeded, not None
). But now look at this method:
def add_pattern(
self,
name: str,
description: str,
example: str,
category: str,
tags: List[str]
) -> None:
So instead of using the CodePattern
class as the argument, it repeats its attributes again. This affects maintanability of code and makes it more “unclean”.
This might look like nitpicking, but it adds up quickly and can lead to a messy codebase.
Hallucinations
I think hallucations improved tremendously with the latest versions of LLMs. I think before it was a much bigger issue that it is now.
However, I still encounter issues with more subtle kinds of hallucinations. Not so much in the sense that the model produces complete jibberish, but rather that it gives me some answers which are way off (like estimates which are 2-3x off from reality, but they seem plausible enough). I consider that to also be a hallucination, albeit a harder to detect one than total nonsense.
Malleability
Try the following experiment: Ask an LLM some question. Then add the following:
Be brutally honest.
I found that with some models (Claude 4 Opus in particular), including the brutal honesty remark can not only affect your answer, but also the tone of the entire conversation. This is something to definitely keep in mind.
The ugly
Having to switch chats when context window limit is reached
Sometimes it happens that I talk to an LLM about something, then think of something else and write it in the same chat. This is how we usually talk as humans: we start with one topic, then switch to another and so on.
However, since LLMs have a context window, they cannot chat with you forever (not in the same chat at least). On multiple occasions I found myself having to summarize a previous chat because the current one got too slow or I would simply exceed the context length and couldn’t go on.
This is annoying because as I said, when talking to humans you don’t really have to mind the context length; with LLMs, you do.
Subtly misunderstanding my question if I don’t provide the entire context
To illustrate this point, let’s say you’re discussing some career-related question with an LLM. The conversation is flowing nicely and you go back and forth a few times. Just as you reach the conclusion, the LLM outputs something like:
...
This applies to you because:
* This is standard for US-based machine learning engineers and conslutants
...
Uh-oh, I see an issue here. The LLM mistakenly thought it was giving advice to a person residing in the US, while I’m from Europe. Realizing that the LLM made some assumptions which are not true and which could alter the conclusion happened to me more than a few times.
This is somewhat related to hallucinations, as the LLM assumes things it’s not explicitly told, but I put it under The Ugly category as it’s just ugly to see that you spent some time going back-and-forth with the LLM over a certain topic, only to realize it made some assumptions which, if they don’t hold, could fundamentally change the answer.
Conclusion
In this blog post, I tried to outline how I use LLMs in my day-to-day. I tried to be as specific as possible and include prompts wherever it made sense, so you can see how LLMs can be used on particular examples.
Hopefully you will give LLMs a shot (or another shot) of accelerating your workflow. If you’re wondering which LLMs to use, I use Anthropic’s Claude and ChatGPT mostly, but you could try other models as well. I recommend you to use the paid versions of the models, as I found they tend to better (especially for ChatGPT, for Claude the free version is good in terms of quality, but you reach your message limit fast).
As a final note, I would say that I went from not using LLMs almost at all 1.5 years ago to using them daily. In my opinion, LLMs are great tools to make you more productive and there’s no harm in trying them. I think if you give them a shot, you will likely find yourself doing your work faster by removing the tedious parts of it and focusing on what you actually enjoy doing.