Reimagining Logs: Building AI powered Conversational Observability System

It is mid-2025 and the cogs of AI are at full speed. So we (I and Mobin) decided to do our own AI project. We called it "IntelliLogs".

IntelliLogs at a glance:

Demo: https://www.youtube.com/watch?v=OXMlORwyMQk

In this post I will describe why we did what we did, what is it that we did and how we did it. I will share my personal experience. I am hoping this will, at least, be an interesting read.

Table of contents:

Why IntelliLogs

What is IntelliLogs

How IntelliLogs was developed

Future of IntelliLogs

Conclusion

References

Why IntelliLogs:

Personal motivation 💪 to this were:

Explore and experience what does an AI app look like from an architectural and engineering perspective
Explore the realm of Huge LLMs (eg: GPT-4.1-170B, Gemini Pro etc) vs small LLMs (eg: granite-7b, gemma-4b)
Explore the possibilities of model tuning / making a model without being a data scientist. How easy or hard it is, what tools available etc.

We also wanted to tackle a "not too far from practical" problem to solve. So, the problems, I think 🤔, we are solving here are:

Cost of logs: Logs ingestion by the enterprise level aggregators is costly enough and getting quality logs ingested is a pain point to solve. May be a low cost small AI can help here.
Out of control logs: Many product owner onboards COTS applications and there is probably little control over the logs quality (eg: whether the logs are labeled correctly). On top, out-of-the-box most runtimes (eg: K8s) provide limited labels (eg: info, warn, error, critical). This with the mix of COTS often times may lead to weird filtering rules for ALERTs or threat issues. We are experimenting to see if this can be solved through AI doing additional labels.
When "sh*t hits the fan" POs starts chasing SREs, SREs chase Devs and every body ends up combing through rows of logs. There may already be existing elegant solution to this. But I thought I would try to re-invent the wheels here, cause why not? If NASA literally did it for moon rovers I can at least try to experiment.
Shortfall of traditional observability tools: I have always found that the traditional observability tools (tools without AI) usually require some degree of up-skilling to work with and as a result it gets pushed down to as SRE responsibility or even worse developer responsibility, despite its original intention. Perhaps a conversational AI powered solution like "IntelliLogs" can close the gap here.

So yeah, this is not an orphan project.

What is IntelliLogs:

The IntelliLogs platform is a modular, AI-driven conversational observability solution aimed at making log analysis more optimised and accessible to everyone (tech users as well as non-tech users).

It wasn't created out of necessity rather our main motivation was to explore the possibilities.

Below is how the user story was formed (it was super organic 😉):

Below are the Functional and Non Function requirements we set for this application.

Finally, here's a system context diagram to describe what the system does at a high level

Here is a short demo video of IntelliLogs in action:

https://www.youtube.com/watch?v=OXMlORwyMQk

How IntelliLogs was developed:

It was an interesting journey from its ideation to its deployment where we saw it functioning. TBH, despite the scary conspiracies about AI, I really enjoyed the process.

So when we first thought of it, I did not have any clue how to approach it. The open community in the internet was the main source of my research in terms of how do I go about implementing such system. After few weeks of try outs, overcoming learning curves and also a fair amount filtering out information I gained some clues.

I will make references of the resources that was actually helpful at the end of the post.

So, my journey went something like this:

Few interesting things to highlight here:

No GPU for development: There was no GPU requirement to develop this application because of LLM models chosen for this project. They are smaller in size (of course with lower context window) that can run on CPU.
The classification model: Using Scikit Learn is pretty much only thing that is probably sort of data science-ish. But it was simple enough; basically multiclass classification using LogisticRegression. I followed this tutorial to understand it and tweak the implementation to fit my needs.
The command extract model: At first I did not plan this. I tried to use LLM (eg: granatice-2b-instruct, granite-7b-instruct, gemma-3-4b etc; I tried all of them and then some) that are claimed to have text extraction capability. It worked to some degree with moderate percentage of error and hallucination. Then I decided to train the base model.
InstructLAB: It is not really a LAB rather the LAB stands for Large-Scale Alignment for ChatBots. It was a game changer in my case. I did go down the path of tearing my hair trying to train using LoRA. Then pivoted to using InstructLAB and it was a super simple process. I simply read the tutorial, installed ilab in my docker dev environment, created qna.yaml (I was so lazy that I used chatgpt to create me the qna.yaml) and samples.md (again used chatgpt) and executed iLAB command to first generate synthetic data then train granite-7b-base model to learn new skill and voila, the trained model was able to extract my commands from natural language with significantly improved command extraction and almost no hallucination. It was a bit painful because on CPU this process took 2 days I was't sure whether it is going to work or not. This is where GPU probably have worked much better and in the absence of local GPU I could use the instructLAB available with AIAddons on OpenShift.
The rest of the discoveries (eg: local model vs model server, which model size to choose, which model output the most accurately based on given prompts etc) were from software engineering perspectives and brute force trial and error was easy enough.
The most painful (not hard) part was to figure out which model server to use for local development. Finding and settling on ollama took sometime as there are few options (eg: vllm, llamacpp) available and trying out those took some time.
Local ONNX: The "Backend" component is the core of the IntelliLogs application. During deployment it has an init container that downloads the ONNX ML model file (for custom log classification) from my hugging face repo.

Although it wasn’t exactly a "journey to the center of the Earth", it was certainly an adventurous one.

Here's an architecture overview diagram depicting the components of the IntelliLogs system:

Few interesting things to highlight here are:

Independent layers: At each layer (except the application layer) the underlying technologies are swappable. For example: The application does not have hard dependency on AI Addons and in the absence of this layer the application will still be functional but with slower response time from the AI models.
IntelliLogs components: The components in the application layers are also swappable (except the 1 core component called backend) without changing the core component. For example:

Ollama can be replaced with KServe to optimise resource utilization
Used small LLMs could be replaced with huge LLMs if the need arises (although)
Proxy is currently implemented using python-flask but could be replaced with native K8s ingress or proxy servers like nginx.
Kafka could be replaced with AMQ or S3 or anything else that can provide logs over API call
Web Socket currently implemented using python socket could be replaced with a high performing queue/streaming or dedicated web socket server.
Even frontend could replaced with other UIs as long as handling backend API is implemented.

Here's a sequence diagram of Log Analysis Report process, showing how the IntelliLogs components are working together to provide user the outcome:

The classification process will have similar sequence except we won't need to call model server/endpoint as the SKLearn ML Model's portable format, ONNX, is loaded from localdisk in the backend component. In future I will explore the possibilities to create an inference endpoint with this log_classification.onnx. But for now it works without any issues.

Future of IntelliLogs:

Agentic AI: As you may already be thinking this could have been simply created using AI Agent and yes you are correct. It could have. My initial discovery took a bit longer due to my time poorness and this being a side project and within that time MCP concept and tooling accelerated and era of Agentic AI started. I need to do discoveries on this subject matter and see if I can re-use the "Backend" component to turn into a model server and train command extraction model to make function calls.

User Feedback Loop: Enhance the q&a part with context of the classified logs or log analysis report (eg: instead of reading table or display ask the model more questions). This is something seems possible but challenge here is token size. Need to do further discovery to understand if Agentic AI will solve for this too or how can I provide more context to the LLM to answer user's follow up queries and still maintain running on CPU NFR.

Remediation Automation: This is probably the low hanging fruit. Along with text based remediation suggestions train the model to provide with existing automation to execute to remediate.

Conclusion:

As organisations and human adopt AI, I think, there probably exists a convolution between ChatGPT or Gemini or CoPilot and general LLMs (specially the small ones) which may pose as risk that AI initiatives within an organisation getting lost in translation.

This IntelliLogs project is a starting point about understanding systems at a higher level of abstraction, and taking us closer to intelligent, self-explaining infrastructure or ecosystem of systems. There's also a potential to achieve true self healing. Not sure how far is too far or what is the latest definition of "too far" though.

Regardless all the jargon, this project was indeed exciting.

That's it. Happy AI-ing.

source: sora

References:

InstructLab tutorial: Installing and fine-tuning your first AI model - Part 1

InstructLab tutorial: Installing and fine-tuning your first AI model - Part 2

Step by Step Tutorial on Logistic Regression in Python | sklearn |Jupyter Notebook

Machine Learning Tutorial Python - 8 Logistic Regression (Multiclass Classification)

Stressed - Log Analysis

According To Ali

Search This Blog