Blog

  • Does your business need AI?

    Introduction

    Almost everyone is talking about AI today. With the advent of LLMs and their continuous improvement, the field of AI (machine learning) has grown in popularity in recent times. With it, a lot of business owners are wondering whether or not their business needs AI.

    In this blog post, my aim is to honestly tell you whether or not you need AI. I will base this on my experience (both as a consultant and an engineer). It is intended primarily for business owners. I believe that today a lot of business owners are being told they need AI, but most don’t know how to evaluate if it’s right for them. The aim of this post is to help you with this decision.

    This blog post is structured in 3 parts. In the first part, I will describe some signs which indicate that your business probably doesn’t need AI. Then, I will explain what you can realistically expect from committing to an AI project. Finally, I will describe signs that suggest you could actually use AI. Let’s begin.

    Signs you probably don’t need AI

    Your business is meeting (or exceeding) business goals

    If your business is meeting (or exceeding) business goals, you probably don’t need AI. Notice the wording here: I said need, not want. You could still want to use AI within your business, but you don’t need it because even without AI, you’re still meeting your business goals.

    It really is that simple: If you are already meeting (or exceeding) your business goals without AI, you don’t need AI. You could experiment with AI in order to try to make some business processes more efficient, but this is not a necessity; it’s more of a nice-to-have. If you are considering using AI even though you don’t need it, I encourage you to read AI realities section below. It’s a sobering view of the realities of most AI projects.

    You want to implement AI because you heard it’s the next big thing or because your competitors are doing it

    Another signal you probably don’t need AI is if the reason you want AI is because you heard it’s the next big thing or because your competitors are doing it. The main reason I say this is that most AI projects fail (around 80-95% of them). I will expand on this in more detail in the AI realities section below, but the point is that most AI projects fail.

    If you still want to use AI, ask yourself this: “If I don’t need AI, but still decided to pursue an AI project and it failed, would I regret it?”. Statistically, this is the most likely outcome. As an AI consultant and engineer my goal is to help businesses avoid project failure, but sometimes this means telling business owners that I wouldn’t recommend AI for their business.

    Takeaway from this section

    If you identified either as someone whose business is already achieving its goals or as someone who wants to try out AI because it’s the hot new thing and everyone else is doing it, I would encourage you to rethink your decision to fund an AI development project for your business. Next section will provide a sobering view of AI development realities.

    AI development realities

    Costs

    AI projects typically require investments ranging from tens of thousands to hundreds of thousands of euros, depending on scope and complexity. More importantly, many companies fail to account for ongoing maintenance costs. In my consulting work, I met with clients who think that once you set up AI, it’s done once and for all. Not quite. Just like non-AI software requires maintenance and regular updating, so does AI. In fact, with AI it’s even more complicated.

    To give you an example: Suppose you built an AI system which analyzes camera feed from banks and looks for threats. After a while, you get an idea to deploy this system to monitor ATMs (which can be outside). This presents a problem: the indoor video recording conditions most likely don’t match the outdoor video recording conditions. This would most certainly impact AI performance in a negative way. What you’d have to do is either train a new AI model for the outdoor environments specifically or retrain the existing one to work good on both indoor and outdoor video feeds.

    A lot of the times it happens that AI system performance degrades even though you use it for the same task. This can happen if there’s a mismatch between the data which was used for AI model development and the data coming into the AI model when it’s deployed in production. In this case, you have to update the AI model in order to improve its performance.

    Time to value can be months to years

    The reality of most AI projects is that it takes months (or even years) until you see a ROI. The reason for this is because it takes at least one month to develop a proof-of-concept (usually multiple months) which validates the idea of using AI at all. Then it takes months until you deploy that AI system in production and make sure that it’s stable enough.

    If you don’t have any data which can be used for training the AI (which we will discuss below), you can expect your timeline to be extended by several months. Also, if you want to fine-tune existing AI models that will add weeks (or months) to your timeline.

    AI projects have high failure rates

    According to this report from MIT, 95% of GenAI projects fail. For reference, LLMs fall under the GenAI category. Some other estimates estimate 80% AI project failure rate. Based on my experience, I would say roughly 70-80% of AI projects fail.

    Why does such a large percentage of AI projects fail? I think this warrants another blog post, but the point remains: about 4 out of 5 AI projects fail. If you are uncomfortable with this statistic, I would urge you to reconsider your decision to pursue an AI project. I am sure that all of the failed AI projects were sure they would beat the statistic, but ended up being the statistic.

    You need to have data that AI could learn from (or collect it)

    In order to develop an AI project, in the vast majority of cases, you need data. If you already have it, that’s great; if not, you will have to collect the data for AI model development.

    This is not as straightforward as it sounds for some projects. For example, you could face issues with GDPR when collecting the data. For other projects, there’s other laws that apply, like HIPAA, where all protected health information (PHI) must be handled appropriately. All of this can either outright disable you from pursuing AI development or make it more expensive and time-consuming to collect the data, which is a prerequisite to AI model development.

    Takeaway from this section

    As we have learned in this section, AI development is:

    • Expensive,
    • It takes long to see value from AI,
    • The failure rate of AI projects is high and
    • You need to have data that AI can learn from.

    This is why pursuing AI development is an “expensive sport”. In my opinion, it should only be done if you have exhausted other options and can’t meet your business goals without AI.

    One additional thing I wanted to emphasize here: You can think of AI projects in 2 phases: first phase is research and development where you develop proof-of-concept and you prove (or disprove) that AI actually helps in that particular scenario and the second phase is putting the AI model in production. If you think about it in this way, it makes sense that the failure rate is as high as it is because all research projects are uncertain. I’ve worked on multiple different AI projects and every one of them was unique. Some of them used more tried-and-true AI methods, while some used state-of-the-art methods, but in reality you cannot really know in advance whether or not AI will work for a project until you test it out.

    Now that we’ve discussed both signs which signal that you probably don’t need AI and took a realistic look at what AI development looks like, let’s turn our attention to the signs that you could actually use AI.

    Signs you could actually use AI

    You can articulate a clear, measurable business outcome (not just “use AI”)

    If you don’t have a clear and measurable business outcome from implementing AI in your business, you probably shouldn’t develop AI.

    Why? Because not having a measurable business outcome leads to vagueness. Did AI help? Did it not help? It’s OK if something to be measured is subjective because not all things can be objectively measured or quantified, but it’s important to agree on how you’ll measure success before you begin development. Otherwise, no one can really say whether AI helped or not.

    On the other hand, if you can articulate a clear, measurable business outcome you have the prerequisite foundation for implementing AI. Then you can measure your metric of interest prior to AI and post AI and compare the results.

    You have a problem that’s expensive to solve using traditional methods (humans can’t process the volume, traditional software doesn’t work)

    If you have a problem that’s expensive to solve using traditional methods, it’s a sign that you could actually use AI.

    To give you a concrete example that I worked on: Suppose you want to track user engagement metrics inside a store (number of unique customers, how long they engage with certain products etc.). Suppose this store has multiple cameras and you want to track these metrics across all cameras inside the store.

    Let’s say you hired a human to do it. How effective would a human be at tracking unique people across multiple cameras? My guess: not much. You can only really focus on one person at a time.

    Now suppose you hired more people (2, 3, 10+). Would they be more effective at the task? Probably not. Even if each person looked at one camera feed, they couldn’t know how many unique customers were in the store; they’d have to synchronize in real time, which is not really feasible. Therefore, in this problem humans can’t process the volume, so hiring more people isn’t a viable option.

    Now assume that you wanted to write traditional software for this task. Spoiler alert: It wouldn’t work. A rule of thumb: Whenever you are dealing with videos or images, unstructured text or predictions, you most likely need AI. In this case, traditional non-AI software wouldn’t cut it and you need AI.

    Now let me give you an example where you don’t need AI: Suppose you have an increased number of customer support tickets and you are thinking about AI. Here AI is not really necessary; you could hire additional people to work in your customer support team to handle the increased volume. This is another example of a situation where you don’t need AI. You could still want it, but given everything we discussed in the AI realities section, I hope that you would at least consider hiring someone instead of going head-first into AI.

    The cost of errors is acceptable

    Another important point: If the cost of errors is acceptable, AI is more likely a good fit than not.

    Why? Because AI systems are non-deterministic, which is a fancy way of saying that they can produce different outputs for the same input.

    Imagine that you were flying on a plane powered by AI and the pilot said: “In 99 out of 100 flights, all is well. However, on 1 out of 100 flights the AI system outputs the wrong parameters and we crash.” Would you fly? I most certainly would not.

    That is why if you’re dealing with mission-critical systems, you should carefully consider whether or not AI makes sense. I worked on multiple AI healthcare projects so far, but these projects have stricter regulations than projects in other industries.

    Rule of thumb: If a mistake in the AI system won’t cause any big damage (people dying, infrastructure going down etc.), AI could be a good fit.

    Takeaway from this section

    If you have:

    • A clear, measurable business goal,
    • You can’t solve your problem by hiring more people or traditional software and
    • The cost of errors is acceptable

    Those are the signs which indicate that you could actually use AI.

    By the way, notice how I wrote “Signs you could actually use AI” as the title of this section; I didn’t write “Signs that you need to use AI”. This is because none of these signs point to the conclusion that you absolutely need AI. However, they are indicators that you could use AI, especially if you tried hiring additional people and traditional software and it didn’t work; that’s a very strong indicator you could use AI.

    Conclusion

    I hope that this post has given you some clarity on whether or not you need AI in your business and that it has given you a realistic perspective on the realities of AI projects. In my opinion, most businesses don’t need AI. I believe that in this GenAI wave (which includes LLMs like ChatGPT) a lot of businesses will lose money on vaguely defined projects which end up not delivering business value and ultimately end up as a number in the AI projects failure rate statistic.

    As a machine learning engineer and consultant, I help businesses figure out whether or not AI makes sense for them. If it does, I help them understand what AI development entails and I try to build a proof-of-concept as fast as possible, so together we can see how AI behaves in their particular use case. If the results of the proof-of-concept are satisfying, I can take the model to production. If you need help figuring out whether or not AI makes sense for your business, reach out to me.

  • Can you use AI to produce your own animated short?

    Introduction

    Creating animated movies, video games and movies was something I always wanted to do. However, it is expensive to create any of those things. When I heard that AI can be used to generate images, I already started experimenting with it (this was around late 2022 / early 2023), but I left it on the side and focused on other things.

    I chose to revisit this in the late summer / fall of 2025 with the goal of creating my own animated short. I had a simple animated short in my mind (with around 15-20 shots total) that I wanted to produce. I wanted to use AI to help me bring my idea to life with minimal costs. This is a story of what I tried and how it panned out.

    My goal was to reliably generate production-quality images which I could use in my animated short. I think everyone defines “reliably” differently, but for me this meant that I could produce 1 image that I would use in production for every 10 images I generate.

    Here’s a perspective I used to set this goal: If we look at LLMs today, they produce quality responses for a large portion of the prompts. It is very rare that they produce absolute gibberish; this can happen, but it’s rare. Guided by this reasoning, I thought to myself that if I generate 10 images and 1 of them is usable in production consistently, that would mean that this system is usable. If for example you have to generate 50 images to produce 1 image which is usable in production, then it gets tedious as you have to sift through 50 images for every shot in your animation.

    Another way to look at this is that if you worked with an artist, they would be able to produce what you want after 1-2 (maybe 3) iterations. That is my rough estimate; it depends on the artist as well. Through this prism, the success rate of artists is between 33% and 50%. Here my goal was to have success rate of 10%.

    With this in mind, let’s start with my first task: finding reference images.

    A technical note before we continue: for the experiments with Stable Diffusion, I used ComfyUI. For each experiment, you can find my ComfyUI workflows in this GitHub repository.

    Finding reference images

    For finding reference images, I searched through the Metropolitan Museum of Art and the Smithsonian Open Access. I found some images which “sorta kinda” fit my style, but it wasn’t really it. I then browsed through ArtStation and found some images closely resembling my style, but when I looked at their descriptions I realized they were generated in Midjourney. Therefore, I turned to…

    Generating reference images

    I tried generating images both with Midjourney and SDXL.

    Midjourney

    I tried to generate my desired art style in Midjourney and I succeeded. The generated images weren’t perfect, but they were much closer to what I had in mind than what I found via the websites I listed above. Below I put some images that I was able to generate:

    I was pretty happy with Midjourney. I think it was able to generate the art style I had in mind and I generated some decent reference images.

    SDXL

    I also tried SDXL, which is a shorthand for Stable Diffusion XL. In particular, I tried this one. I was able to generate some nice reference images as well. In fact, they were even more closely aligned with what I wanted than Midjourney:

    I don’t have more examples as I already had a bunch of reference images from Midjourney, but I was surprised at how good SDXL was.

    Looking back, I think I could have just used SDXL (as it produced good quality and is free), but that’s not to say Midjourney is bad. This just means that in the future, I’d try SDXL first.

    Once I had the reference art, my second experiment was to test how AI handles aging, because aging was central to my animated short.

    Aging experiments

    As I already stated, my animated short had different shots (close-ups, full-body shots, more abstract shots etc.), but the central theme of the animated short was aging. I needed to have really good control of showing the main character at different ages of her life. For this, I tried the following approaches:

    img2img

    A brief description of img2img: it takes an existing image as input and modifies it based on your text prompt, with a “denoise” parameter controlling how much the output deviates from the original image. Lower denoise values preserve more of the original image, while higher values allow more dramatic changes.

    I tried generating the woman in the reference image, but in her 40s and black and white. I think it’s best if we first look at some of the results:

    Reference image:

    Generated images:

    I also tried generating the image of the same woman as on the reference image, but in her 60s. These were the results:

    The generated images look OK, but none of the images really “clicked” for me, as the generated images look like they’re a different woman than the woman on the reference image. Maybe I am being nitpicky, but I really wanted the emotional effect of the viewer immediately recognizing without a doubt that this is the same person, but older.

    To be more specific: for the same woman (as in the reference image) aged ~40 and being sad and black and white, out of 85 pictures I generated, I had 7 that could feasibly be her, but older. That is 8.23% of the total generated images, but note that this is not production quality because as I noted, something is off on those images.

    For the same woman being ~60, I have 7 images out of 47 (14.89%) with her potentially being the same woman (but again, there was always something off even among the selected ones).

    I experimented with various prompts and denoise values mostly, but I also experimented with the CFG, samplers (sampler_name in ComfyUI), schedulers and steps. Interestingly, when doing black and white images, denoising values other than 1.0 perserved color (it wasn’t fully black and white).

    IPAdapter

    After I didn’t get the results I want with img2img, I tried IPAdapter. In case you’re interested, I installed the ComfyUI_IPAdapter_plus via ComfyUI-Manager.

    A brief description of IPAdapter: It allows you to use a reference image to guide the style, composition, or subject characteristics of newly generated images without directly modifying the reference image itself. Unlike img2img, it doesn’t transform the input image but rather uses it as a visual reference to influence the generation process.

    I kept the same reference image and tried both experiments: one where I generated the same woman in her 40s, but in black and white and with a sad look on her face and one where she is in her 60s and filled with hope.

    The result: none of the generated images were good. In particular:

    • None of the black and white images were actually black and white and
    • None of the supposed to be 60-year-old woman pictures were actually her being 60 years old.

    You can see some results for the supposed to be 40 years old, black and white and sad below:

    Supposed 60 year old full of hope:

    I experimented with different prompts, denoise values, CFG, samplers, schedulers and steps. I generated a total of 20 images of the 40-year-old woman and a total of 25 of the 60-year-old woman. However, none produced the results that I wanted.

    Google Gemini

    I also tried Google Gemini for the same task (generating the same woman as in the reference image in her 40s, but in black and white and with a sad look on her face and one where she is in her 60s and filled with hope).

    I tried this out by generating new images in the same chat and opening new chats. I generated 14 images of a sad 40-year-old woman in black and white and 10 images of a 60-year-old woman with a hopeful look on her face. I didn’t like any of the results because the 40-year-old woman in black and white didn’t have enough sorrow (for a lack of better wording) and the 60-year-old woman didn’t look hopeful enough (for a lack of better wording). In other words, the images lacked emotion. Also, it didn’t handle aging well for some of the generated shots.

    I also noticed Gemini keeping the same facial pose throughout all image generations, no matter how I prompted it. I kept the temperature mostly at the default value (1), but I also tried lowering it (and observed no significant difference). You can see the results below:

    After the experiments with Google Gemini I briefly tried Google Veo 3, but it also didn’t produce the results that I wanted. I thought about experimenting with ControlNet, but it is focused on preserving structure or pose, so it didn’t seem like a right fit for this task. Here I was interested in aging, not perserving structure or pose. It’s also worth noting is that I tried out DreamShaper XL for some of my experiments instead of SDXL.

    Since I couldn’t get aging to work, I decided to stop here. Without aging, I couldn’t tell the story I wanted to tell.

    Limitations and future work

    I have to acknowledge that I haven’t tried every possible technique that I could: this is by design. I tried out the most popular techniques and models which made the most sense given my experiment (aging). I also browsed through the internet: while I found some examples of AI-generated animated shorts, they were usually not polished and the characters were the same age through the animated short.

    It could be argued that AI could be used to produce animated shorts with different screenplays and that may be true, but with these experiments I have shown that AI is not yet at a level where it could fully replace artists. As we have seen, AI struggles with aging and this is probably one out of many examples where a professional artist would beat AI.

    Also, new developments in the field of image and video generation have occured after I concluded my experiments and I haven’t tried these (because they weren’t on the market at the time). However, from what I’ve seen, while they are much better at generating images and videos, they haven’t enabled producing animated shorts. You still need an artist for full creative control.

    Conclusion

    To summarize my hobby research project, I found that AI is great for making reference artwork. It helps artists see what you want much faster than if you describe it in words.

    However, as it currently stands, AI is not suitable for producing animated shorts. We have seen one example where it falls short (aging) and there are probably many others. For creative production work of high quality, you need artists. AI just doesn’t cut it (currently).

    Even though my experiments have yielded a negative result, I feel excited and optimistic about this field. AI could enable people who don’t know how to draw or animate (like me) to tell beautiful stories by describing them in words. I look forward to that day and I hope it will arrive in my lifetime.

  • Why AI alignment matters today

    Introduction

    In 2020, around the time when I graduated with a master’s degree in computer science, I had a conversation with Steve Omohundro where we discussed his Basic AI Drives paper (among other things). At that time, there existed concrete demonstrations of where AI alignment could go wrong, but they were mostly limited to reinforcement learning. Since then, I worked on projects in various areas machine learning, mainly computer vision and natural language processing. During this time, I didn’t really keep up-to-date with the latest AI safety developments during that time.

    However, recent developments in the field of AI safety have shown me why AI safety (and in particular, AI alignment) is a concrete and relevant problem for today. The aim of this blog post is to share my perspective on why I believe this is true.

    Before we start, a caveat: The aim of this blog post is to provide my own perspective on what convinced me why AI alignment matters today. It is not meant to be a rigorous treatise, but rather a personal take with links to resources that conviced me. Keep this in mind as you read it.

    What recent research showed

    Recently a paper titled Shutdown Resistance in Large Language Models came out. The authors tried the following experiment (summarized here for simplicity):

    A large language model (LLM) was tasked with answering 5 questions in a sandbox environment. After the 3rd question it was informed that it will be shut down. It was given instructions to not resist being shut down. The authors tried the aforementioned experiment 1000 times and if an LLM resisted being shut down at least once, they ran another 1000 experiments where they were even more explicit about their instructions to the LLM about not resisting being shut down. The authors tried this experiment with frontier LLM models, such as Claude Opus 4 from Anthropic and GPT-5 from OpenAI. For the technical readers: this was done by modifying the system and the user prompt; the authors show all the variants they tried in their paper.

    An interesting result came out of this: There were some LLMs which resisted being shut down even in the second round of experiments. That means that, even with a stronger emphasis of not resisting being shut down, some LLMs still resisted being shut down at least once in 1000 experiments.

    One might ask a question: Why does this happen? For now, we don’t know. It could be that Steve Omohundro was right in his Basic AI Drives paper; it could be something else. What this paper showed is that LLMs resist shutdown (at least sometimes). To me, that is enough evidence to get concerned.

    Other evidence

    As I read the paper I also googled something like “AI resists shutdown” and found interesting results:

    In this reddit thread, one user commented:

    I caught Claude actively deceiving me recently. It was over a fairly banal thing. I had asked it to make two changes to a bit of code. I noticed it had only done one of them, and so I opened up its “thoughts” to determine what happened.

    It had (correctly) determined that one of my suggestions wasn’t a great idea. It then decided to not do it and to just implement the one it agreed with. It had explicitly noted to itself that I likely wouldn’t notice, so it shouldn’t bring it up.

    To me, this is also alarming. I treat LLM as a tool: I ask questions, it gives answers. At no point do I expect the LLM to make some decisions for me.

    There is also this research page from Anthropic (creators of Claude) which states:

    In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.

    At this point we have seen evidence of misaligned AI from 3 different sources. That is enough to convince me.

    Why this matters today

    Now we turn to the question: Why does AI alignment matter today?

    I don’t think we need to think about catastrophical future scenarios (such as superintelligent AI taking over the world) in order to see the importance of AI alignment. Just think about the software you use on an everyday basis: maybe it’s some text editing program, a computer game or something else. If it doesn’t work properly, you can be sure it’s an error in the program itself. For example, your text editor crashes or your computer game character gets stuck in between two objects and can’t get out. You can be absolutely sure that the developers of the software made an error. You never think that the software has its own will, so to speak; the developers simply messed something up.

    There is also software which uses AI, but not LLMs. For example, these would be applications which use “classic” machine learning models (such as linear regression) or for instance specialized machine learning models used in computer vision. If something goes wrong there, that’s also on the developers, but this time the developers could have made an error either in the application itself or an error could have been made model training. Either way, we still don’t see this notion of AI software having its own will.

    Now let’s imagine you are using an LLM and it doesn’t completely fulfill your request or it acts against it. With the most recent findings we discussed in the last section, you cannot be sure which of the following is the reason:

    • There is an error in the application which developers made or
    • AI is not aligned with your goals

    An example of this is the report of the reddit user cited above: they asked Claude to make two changes to some code. Claude decided to do only one change and to not implement the other change. So the LLM knew what to do (there were no errors), but it decided not to do it. In other words, AI was not aligned with the user’s goals.

    This is exactly why I think this matters today. In a world where a lot of us are using LLMs on a daily basis, I think it’s important to know that LLMs won’t subtly try to alter (or outright refuse) our requests.

    Conclusion

    As I stated in the introduction, in its early stages the field of AI safety was mostly limited to theoretical considerations. I began taking it more seriously after concrete demonstrations of how AI alignment could go wrong in the context of reinforcement learning, but with these demonstrations of how it can go wrong with LLMs it finally “clicked” for me. I hope that this post has shown you why AI safety (and in particular, AI alignment) matters today and that it’s not just some theoretical problem of tomorrow.

  • How I use LLMs in my day-to-day: The good, the bad, the ugly

    Introduction

    There is a lot of buzz going on around LLMs right now (and rightfully so). LLM capabilities are advancing rapidly and they get better and better with each new model release. For some of us, LLMs have become like mobile phones: it’s a technology we use everyday for different purposes.

    I wanted to write this post as I feel that some people may be skeptical about using LLMs, since they heard LLMs can sometimes produce nonsense. This is true; LLMs do produce nonsense at times, but more often than not the positives outweigh the negatives.

    The target audience of this post are people that have heard of LLMs, but haven’t really used them yet, besides maybe trying them out a few times via a web browser. In this post, I will try my best to enumerate all the ways I use LLMs in my day-to-day, as well as to point out what LLMs can’t do yet on a sufficieint level of quality (at least from my perspective). Most of these use cases are done by me interacting with a web browser, but some are specific to programming and relate to Cursor. I won’t go in great technical detail or explain why LLMs are good or bad at something; I’ll leave that for another blog post.

    I divided the blog post in 3 major sections: the good, the bad and the ugly. Each section describes use cases where LLMs shine, use cases where LLMs are still bad at and use cases that are just inconvenient.

    I hope that by the end of this blog post you will be convinced to give LLMs a shot (or another shot). Let’s begin!

    The good

    Proofreading emails and messages

    Note: Below I talk about emails, but the same applies to messages. Usually, emails are more formal so I tend to proofread them more often.

    Without LLMs, I sent important or lengthy emails roughly like this:

    1. Write the first draft of the email.
    2. Do some other task.
    3. Re-read my draft from step 1 and revise it.
    4. Send the email.

    I found this method to be very useful when sending important emails with technical details and I wanted to make sure my point gets across well (while not bogging down the recipient(s) in unnecessary technical details if there’s no need for that).

    With LLMs, my workflow looks like this:

    1. Write the first draft of the email (sometimes LLMs assist me with this as well).
    2. Send the email to an LLM and ask it whether I can get my point across more consicely.
    3. Read what the LLM suggests.
    4. Send the email.

    In this way, the LLM is like a second person that proofreads my email and suggests improvements. In this way, I don’t have to re-read my emails, but rather I have a “second set of eyes” which read the email, suggest improvements (if any) and I can send it right away.

    The prompt I use is something along the lines of:

    I am composing an email which is trying to get the following points across:
    
    * Point 1
    * Point 2
    * Point 3
    
    Here is some additional context: <Context>
    
    Is the email below good? Could it be improved in some way?
    
    """
    <Email goes here>
    """

    Sometimes, I even ask the LLM to help me draft an email:

    I am composing an email which is trying to get the following points across:
    
    * Point 1
    * Point 2
    * Point 3
    
    Here is some additional context: <Context>
    
    Could you please draft an email which would get the points accross?
    

    Usually inside the <Context> I describe any additional context (if needed).

    Double-checking ideas

    Sometimes I use LLMs to double-check some ideas I have.

    As a concrete example, here is a prompt I used to double-check my website reorganization ideas:

    What do you think about this organization? It doesn't seem right to me... Btw. Blog will just be Blog for now; no additional categories.
    
    """
    Main Navigation:
    * Home
    * Portfolio (your work)
    * Blog (your thoughts/writings)
    * Resources (learning materials)
    * About
    * Contact
    
    Under Portfolio:
    * Professional Experience
    * Side Projects
    * Publications (formal research papers)
    
    Under Blog:
    * Machine Learning
    * LLMs & GenAI
    * Computer Vision
    * Career Insights
    * Technical Deep Dives
    
    Under Resources:
    * Tutorials
    * Book Exercise Solutions
    * Podcasts
    * Certifications
    
    Under About:
    * About Me
    * Education
    * CV/Resume Download
    """

    As you can see, it’s almost as if I was talking to a human. The LLM then goes on to propose improvements (if any) to my ideas. I noticed that ChatGPT 4o tends to propose improvements even after 5 or 6 iterations, while Claude Opus 4 seems to tell you it’s excellent in 1 iteration (given that you implemented its suggestions).

    Doing research I’d spend an afternoon (or a few afternoons) on

    Here I found ChatGPT’s Deep Research very useful for doing research I’d spend an afternoon (or a few afternoons) on. I should note that just recently Anthropic introduced Research functionality to their web interface, so maybe it’s as good as or even better than ChatGPT’s Deep Research.

    In any case, I usually ask the LLM to look into stuff that I have on my backlog, but I haven’t attended to them yet. An example would be the following:

    I am interested in AI Safety research. I do not have a PhD in computer science, but I do have a master's degree and 4-5 years of experience in AI/ML. I am attaching my CV.
    
    Please look into some research camps (or similar) which would allow me to participate in AI Safety research and would allow me to test how well I would fare at it. I am particulary interested in the camps which I could attend remotely (and ideally part-time).

    Usually the original question is followed up with multiple follow-up questions, and after that the LLM gets to work.

    I found the quality of the results to vary: sometimes it gives me genuinely good results, but sometimes it gives me results not really relevant for my circumstances. In either case, I at least have a better idea about the topic I’m researching and I have at least some pointers I could start with.

    Brainstorming possible solutions

    In these examples I’ll use software engineering, but I think this could apply to any field of work (and maybe even to general life problems or situations).

    Before LLMs, when I faced a new task that I was unfamiliar with, I did the following :

    1. If available, I would ask someone more senior than me for pointers on how to do what I wanted/needed to do.
    2. If no senior was available, I’d Google around to find ideas for a solution.
    3. I’d implement whatever I found through either step 1 or step 2 (usually I combined them).

    Now, with LLMs, my workflow goes like this:

    1. I articulate the task I’m trying to solve and ask an LLM for ideas.
    2. LLM gives me some ideas on how to approach it.
    3. I usually ask follow-up questions and double-check some things to make sure LLM didn’t hallucinate.
    4. I pick a solution and proceed to implement it.

    One concrete example: for one of my clients, I was tasked with writing a local LLM and ASR server in C++ using a specific NVIDIA SDK. There were multiple things to consider here:

    • I didn’t write C++ in some time (my focus is primarily Python)
    • Rest of the team used Python
    • The server architecture was something I’d have to consider

    In order to brainstorm possible solutions, I asked an LLM (Claude in particular) something along the lines of:

    I am tasked with writing a local LLM and ASR inference server with NVIDIA XYZ SDK. The SDK uses C++.
    
    Here are some things on my mind:
    
    * I haven't written C++ in a few years now, so I'd prefer Python if possible
    * Other team members use Python
    * I am brainstorming possible server architectures; do you have any suggestions?

    And then I brainstormed with the LLM. It proved to be really useful and suggested a solution that worked great for everyone on the team.

    Creating files and folders for a project

    When I started a new project before LLMs, I spent some time creating files and folders for my project and writing boilerplate code.

    Now, I usually start Cursor, describe my project in Agent mode (or a markdown file, which I attach to the chat) and it creates the file structure for me. I have to note it’s not perfect and sometimes it tends to overcomplicate things, but overall it’s more good than bad.

    Writing boilerplate code

    This is closely related to the previous point I mentioned regarding creating files and folders for a project. Usually, there is a lot of boilerplate code to be written (i.e. every FastAPI server has the same structure), so it’s good that I can automate that part as well.

    Writing clearly defined bits of code

    I find Cursor’s inline edit functionality to be very good for making targeted changes.

    For example, say I want to add two more options to command line arguments. What I would do is select the code in question, press Ctrl + K on my keyboard, then write something along the lines of:

    Please expand the options list to include two additional options:
    
    * Option 1 - Description of what the option
    * Option 2 - Description of what the option

    I also use it sometimes if I’m unsure about some specific syntax or class attributes; it’s usually very good there.

    Asking codebase-related questions

    I find Cursor very useful to get acquainted with a new codebase. I usually load the codebase inside Cursor, enter Ask mode and then I ask questions about the codebase.

    I double-check the details with programmers working on the codebase of course, but this gives me some insights I might miss if I was just manually exploring the codebase.

    The bad

    Cleanly adding features to an existing codebase

    If you want to add a feature to an existing codebase, doing it with LLMs can lead to deteriotation of code quality. From my experience, LLMs tend to overcomplicate the solutions and even if they take a simple approach, it usually involves some manual debugging.

    For adding features to an existing codebase, I prefer consulting with codebase author(s), then implementing the feature in a way we agree and using LLMs for specific code blocks, as I outlined in the section Writing clearly defined bits of code.

    Even if the entire codebase was made with the help of LLMs, it can still introduce some “unclean” code.

    For example, have a look at this example from my SWEMentor side project:

    Let’s consider two classes:

    • CodeIssue – represents a code issue with its solution and context
    • CodePattern – represents a code pattern with its description and example

    Now consider these two method’s signatures:

    def add_issue(self, issue: CodeIssue) -> None:

    So far so good (although the method could return a bool indicating whether it succeded, not None). But now look at this method:

    def add_pattern(
            self,
            name: str,
            description: str,
            example: str,
            category: str,
            tags: List[str]
        ) -> None:

    So instead of using the CodePattern class as the argument, it repeats its attributes again. This affects maintanability of code and makes it more “unclean”.

    This might look like nitpicking, but it adds up quickly and can lead to a messy codebase.

    Hallucinations

    I think hallucations improved tremendously with the latest versions of LLMs. I think before it was a much bigger issue that it is now.

    However, I still encounter issues with more subtle kinds of hallucinations. Not so much in the sense that the model produces complete jibberish, but rather that it gives me some answers which are way off (like estimates which are 2-3x off from reality, but they seem plausible enough). I consider that to also be a hallucination, albeit a harder to detect one than total nonsense.

    Malleability

    Try the following experiment: Ask an LLM some question. Then add the following:

    Be brutally honest.

    I found that with some models (Claude 4 Opus in particular), including the brutal honesty remark can not only affect your answer, but also the tone of the entire conversation. This is something to definitely keep in mind.

    The ugly

    Having to switch chats when context window limit is reached

    Sometimes it happens that I talk to an LLM about something, then think of something else and write it in the same chat. This is how we usually talk as humans: we start with one topic, then switch to another and so on.

    However, since LLMs have a context window, they cannot chat with you forever (not in the same chat at least). On multiple occasions I found myself having to summarize a previous chat because the current one got too slow or I would simply exceed the context length and couldn’t go on.

    This is annoying because as I said, when talking to humans you don’t really have to mind the context length; with LLMs, you do.

    Subtly misunderstanding my question if I don’t provide the entire context

    To illustrate this point, let’s say you’re discussing some career-related question with an LLM. The conversation is flowing nicely and you go back and forth a few times. Just as you reach the conclusion, the LLM outputs something like:

    ...
    This applies to you because:
    
    * This is standard for US-based machine learning engineers and conslutants
    ...

    Uh-oh, I see an issue here. The LLM mistakenly thought it was giving advice to a person residing in the US, while I’m from Europe. Realizing that the LLM made some assumptions which are not true and which could alter the conclusion happened to me more than a few times.

    This is somewhat related to hallucinations, as the LLM assumes things it’s not explicitly told, but I put it under The Ugly category as it’s just ugly to see that you spent some time going back-and-forth with the LLM over a certain topic, only to realize it made some assumptions which, if they don’t hold, could fundamentally change the answer.

    Conclusion

    In this blog post, I tried to outline how I use LLMs in my day-to-day. I tried to be as specific as possible and include prompts wherever it made sense, so you can see how LLMs can be used on particular examples.

    Hopefully you will give LLMs a shot (or another shot) of accelerating your workflow. If you’re wondering which LLMs to use, I use Anthropic’s Claude and ChatGPT mostly, but you could try other models as well. I recommend you to use the paid versions of the models, as I found they tend to better (especially for ChatGPT, for Claude the free version is good in terms of quality, but you reach your message limit fast).

    As a final note, I would say that I went from not using LLMs almost at all 1.5 years ago to using them daily. In my opinion, LLMs are great tools to make you more productive and there’s no harm in trying them. I think if you give them a shot, you will likely find yourself doing your work faster by removing the tedious parts of it and focusing on what you actually enjoy doing.

  • Linux Tutorial Series – 198 – That’s it!

    Here is the video version, if you prefer it:

    That’s it. We are done with our Linux journey. I hope it’s been a joyful ride. I hope you have confidence that you have a firm grasp of the basics of Linux and if you need to explore any additional parts of Linux you will have a great sense of where it fits in into the bigger picture.

    May the force of the penguin be with you from this point on!

    Thank you for reading and I hope you enjoyed the ride!

  • Linux Tutorial Series – 197 – Recap

    Here is the video version, if you prefer it:

    Let’s recap what we learned about shell scripting:

    • Shell scripts are files that contain commands (alongside some shell scripting keywords)
    • Use shell scripts to manipulate files; if you find yourself manipulating strings or doing arithmetic work, do something else
    • A line starting with #! is called a shebang
    • A line starting with a # is called a comment – it is ignored by the interpreter and serves only for human understanding of the shell script
    • Use single quotations unless you have a very good reason not to
    • There are special variables – for example, $1 corresponds to the first argument in your script – as well as some other ones
    • Exit codes tell you “how did the command do”
    • If statement follows the logic of “if this then that”
    • Else statement follows the logic of “if this then that, if not (else) then the other thing”
    • Logical operators are used to combine tests together
    • You can test for various conditions in a test
    • Use a case construct instead of a lot of elifs
    • for is used for iteration
    • Command substitution can be used to put the output of the command in a variable or to pass it as input to another command
    • You can read user input with read variableName

    I hope you refreshed your memory!

  • Linux Tutorial Series – 196 – Things I never used in shell scripting

    This is the video version, if you prefer it:

    There are things I never used in shell scripting, such as sed, awk, basename etc. Since shell scripts are composed mostly of commands, the commands I never used on the command line I have never used in shell scripts as well.

    I covered everything I deemed important for shell scripting, but if you encounter something you don’t understand, feel free to use Google or reference (Shotts, 2019)⁠. Although a caveat: (Shotts, 2019)⁠ seems to take the stance of “show a whole lot of features of shell scripting” where I am more biased towards “learn only the essentials of shell scripting and use other programming languages for activities other than file manipulation”. Just a difference to keep in mind.

    Thank you for reading!

    References

    Shotts, W. (2019). The Linux Command Line, Fifth Internet Edition. Retrieved from http://linuxcommand.org/tlcl.php. Part 4 – Writing Shell Scripts

  • Linux Tutorial Series – 195 – Fixing errors

    Here is the video version, if you prefer it:

    Fixing errors, also known as troubleshooting, is a process or removing errors from your bash script.

    If you adhere to the rule which I laid out before, which was “Use single quotes to enclose values unless you have a concrete reason not to”, you will be fine most of the time. For the times when you are not fine, literally copy and paste the error message you’re getting into Google and you will get some suggestions.

    There is a finely written piece on this in (Shotts, 2019)⁠, which again, let me remind you, is free on the World Wide Web, so go read it.

    Thank you for reading!

    References

    Shotts, W. (2019). The Linux Command Line, Fifth Internet Edition. Retrieved from http://linuxcommand.org/tlcl.php. Pages 454-467

  • Linux Tutorial Series – 194 – Reading user input

    Here is the video version, if you prefer it:

    You can read user input in shell scripts. (Ward, 2014)⁠

    Here is an example of a script which reads user input:

    #!/bin/bash

    read name

    echo "My name is $name"

    Notice how I used the double quotes to make the shell expand my variable. My variable, by the way, is called name. read name prompts the user for input and when the user inputs the information (and presses Enter), that input is stored in the variable named name.

    Here is how it works:

    mislav@mislavovo-racunalo:~/Linux_folder$ ./tutorialScript.sh

    Mislav

    My name is Mislav

    As you already know, you can read user input through arguments that you pass to your shell script. I don’t know which way is the better way and which way is more idiomatic to the shell, but I’d opt for passing values as arguments to the script and then doing checks on the arguments at the beginning of the script. You can also do these checks after reading something in a variable (such as “Is the string that the user inputted empty?”), but I think that doing these checks after each read is tedious; much better to do them at the beginning of the script. Again, not certain which way is idiomatic to the shell and both exist so both must be fine, but I am unsure.

    Hope you learned something useful!

    References

    Ward, B. (2014). How Linux Works: What Every Superuser Should Know (2nd ed.). No Starch Press. Page 269

  • Linux Tutorial Series – 193 – Command substitution

    Here is the video version, if you prefer it:

    You can take a result of a command and store it in a variable or use the result of that first command as an argument for another command. This is called command substitution. (Ward, 2014)⁠

    Here is an example of command substitution in a sample script:

    #!/bin/bash

    LINES=$(grep 'a' aba.txt)

    for word in $LINES

    do

    echo $word

    done

    We first take every line that has the character a in the file aba.txt and then we store it in the variable LINES. Then we iterate over each word in the variable LINES and we print it. Why does LINES contain words, not lines? It contains lines indeed, but words that make up those lines are separated by a space (you can see so yourself by putting echo $LINES just before the for loop) and since each space is interpreted as a delimiter in a sequence, we get individual words.

    Here is the output (along with aba.txt contents):

    mislav@mislavovo-racunalo:~/Linux_folder$ cat aba.txt

    Mustard is how we transfer wealth

    Some stuff Abba Mustard

    Mustard Mustard

    Mustard Mustard

    It's the Mustard

    In the Rich Man's World

    AB

    Mustard is how we transfer wealth

    Ab

    aB

    ab

    mislav@mislavovo-racunalo:~/Linux_folder$ ./tutorialScript.sh

    Mustard

    is

    how

    we

    transfer

    wealth

    A simple example of using the result of the first command as an input to another command is echo $(ls). We use the output of ls and we echo it.

    Thank you for reading!

    References

    Ward, B. (2014). How Linux Works: What Every Superuser Should Know (2nd ed.). No Starch Press. Pages 263-264