I intend this blogpost to be some kind of public ranting, about how I experienced LLM-based Gen-AI so far and how I am involved. I don’t claim to do any predictions in the end. My objective is not to cover everything, just dictate my thoughts so I can have space for the new ones.

It is October of 2024. We have OpenAI leading the way with ChatGPT. In my opinion, Anthropic is runner-up with Claude. Github Copilot and Cursor IDE leading the way especially for LLM assisted code development. Almost every SaaS product I use daily has an LLM integration. It can be asking questions for an existing information or autocompleting something while writing. Most of the business-related e-mails I receive (and sometimes send) are LLM generated. There are many tools out there working as OpenAI wrappers.

NotebookLM is recently released. It looks practical to generate podcasts from it, audio summaries as they call it, and listen to dumbed-down version of complex texts in podcast format.

LLM based Gen-AI is not just a hype any more. It has proved its usefulness. The speed of change has been amazing, up until GPT-4o, but right now it’s plateaued, in my practical experience.


what it is

Again I’m definitely not an expert in this area. I imagine LLMs to be neural networks trained with huge data sets - hence the name, large. I don’t really know how neural networks work; but I am aware that large language models predict the next word after a word sequence with a great success. So LLMs are not smart, not self-aware, but they predict what should come next with good accuracy.

The prompt is important. We did some prompt engineering in the beginning, influencers bloated that we will be all prompt engineers in the very near future. Right now we have specifically purposed tools, which abstract away the hard-core prompt engineering to the tool provider. There are still tricks to pull to get the most out of an LLM-based tool.

To use LLMs with more accuracy, we need to adapt iterative prompting. The first response is infrequently the one we need, so we need to work on the response over a few times to get what we need.

I trust the response 80% of the time. I am never sure if it is lacking some data, because LLMs are very bad at knowing what they don’t know. Again, they don’t know anything, they find highly probable consecutive words. So for every response, I feel the need to go to the source documentation and confirm the response. To be frank, I treat Wikipedia the same. perplexity.io covers some of this need, but again, it cannot tell me what it says is ultimately the right answer.

OpenAI released gpt-4o, which is the most commonly used model right now, and it is a propriety model. Llama and Claude are open source models. It takes considerable resources to provision open source models for an enterprise use-case, but I feel safer overall.


personal use cases

I use Github Copilot daily and extensively. It is a very good auto-completer. It is especially good since I am a vim user and LSPs work only semantically, not logically (as far as I could configure them). Also, I let it write some dummy code with quick inline prompts - the precious comments. I write the comment, or the method name, and Copilot generates the code. To be frank, it caused me issues, but again, I trust it 80%.

I use ChatGPT and Claude 3.5 to learn new concepts. They are very good at explaining things over and over again. I used it to understand how LLMs work, how to build some toy project from scratch.

I make them write some business emails, the ones that especially require some kind of template.

I played with Cursor IDE a little, which uses Claude 3.5 Sonnet by Anthropic, to generate a simple frontend application with React.

I read quick summaries of some business-related documents.

I am slowly replacing Google with perplexity.ai, which gives more accurate answers with references. But as I mentioned, I do another search in SO or in the documentation to satisfy my curiosity.

I try not to use an LLM tool to write down my own thoughts because it doesn’t sound like me. The default text, if not explicitly told, they generate are bloated, with lots of extra explanatory information. Sometimes I get impatient to read LLM-generated content because it hides the main idea somewhere. They sound like politicians. It would be easier to share the prompt and let me read it.

I generate funny images for presentations and blogposts.


business use cases

In Bayzat, we are using LLM integrations with OpenAI. We enrich the response quality with RAG - retrieval augmented generation. We index our data in a vector database, Weaviate to be more specific, and fetch similar text chunks to the user’s query from the vector-db to be used as context.

We are also experimenting with AWS Bedrock, which can provide open-source models with a token-based on-demand pricing, including Claude 3.5 Sonnet and Llama from Meta. Meta released Llama 3.2 but this specific model is not available to our account yet; while Bedrock itself is not available in our region at all.

We have tested some AI-based reporting solutions, such as Quicksight Q. The promise of the solution is it can turn user queries into charts and users can play with the results with interactive components, such as changing the measure type, dimension, chart type, and so on. Right now, it works well with a single-table structure; however, if we are to have more than one table defined for a domain, we need to reduce it to a single-table structure again. We need to create topics about the dataset, which is the source data, which is some metadata about the table and its columns. I takes time to build this topic index and it is definitely not some quickstart action - requires investing a significant time into it. And I didn’t love the queries as well; I feel like I need to know how data querying works to ask the right questions.

As a startup, we are not trying to build an LLM model from scratch but rather enrich the responses with RAG. We are consumers of LLM models provided by AI companies such as OpenAI and Anthropic.

For me, the main roadblocker is again to trust the response and guide our clients that they should only trust 80%, not more. There might me missing data in the response; we cannot guarantee that there won’t.

final thoughts

There is value in LLM-based Gen-AI. The images in this blogpost were generated with text-to-image tools, with their prowess and lack of correctness. Is there enough value to cover all the global investments so far? I’m not sure. Will there be a leap towards AGI? I can’t know that, but I expect AGI to be aware of what it knows and what it doesn’t, LLMs are not there yet. Does it mean it cannot be harmful? It can be, because when configured properly, it can take actions using certain APIs and commands on hardware.

Is it going to take my job away? I am certain it won’t, but it depends on me a lot. If I only produce work that an LLM can do in a heartbeat with a simple prompt, then yeah, it will outdate me so fast. I expect shrinking in the workforce or a transformation of the roles, product-oriented people to take more active roles in development. But in the end, there will be some weird combination of unfortunate complexities somewhere, and our LLMs are not enough to consume all the context & provide the answers - yet.

They might or might not reach that point, we will see. But the expectations are being shaped, business owners have probably changed minds already, and my assumption is they will operate on the assumption that LLMs have changed everything and workforce needs to adapt to that. I need to adapt to this mindset.