ALSATE an acronym for "Analysis of Linux System Logs for Active Threat detection and Evaluation" is an automated log monitoring system that continuously analyzes Linux sys-logs for threat levels 4 and below, leveraging a fine-tuned Large Language Model (LLM) to detect issues, provide explanations, and recommend actionable remediation measures to enhance system security and efficiency.
ALSATE prioritizes the following five levels of system logs:
- L0 - EMERGENCY
- L1 - ALERT
- L2 - CRITICAL
- L3 - ERROR
- L4 - WARNING
Logs at levels 5 and above (such as Notice or Info) are not analyzed, as they typically indicate no significant threat.
-
Oracle Virtual Box 7.1.4 (Recommended) (Any Virtual Engine with latest the lersions can also be used)
-
Ubuntu 24.04.1 (Recommended) (Any other Linux Distribution or any other latest version of the Ubuntu can also be used)
-
python-pip 24.0 (Recommended)
To Install python-pip, open a bash terminal and execute this,
sudo apt-get install python-pip
-
cURL 8.11.0 (Recommended)
To Install cURL, open a bash terminal and execute this,
sudo apt-get install curl
-
git 2.43.0 (Recommended)
To Install git, open a bash terminal and execute this,
sudo apt install git-all
-
Homebrew 4.4.8 (Recommended)
To Install Homebrew, open a bash terminal and execute these,
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
test -d ~/.linuxbrew && eval "$(~/.linuxbrew/bin/brew shellenv)"
test -d /home/linuxbrew/.linuxbrew && eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"
echo "eval \"\$($(brew --prefix)/bin/brew shellenv)\"" >> ~/.bashrc
All The Modules/Libraries Used in the project can be installed using requirements.txt
re
time
json
requests
streamlit
subprocess
streamlit.components.v1
Since our focus is exclusively on syslogs of severity level 4 and below, as logs with higher severity levels do not pose significant threats, we need to access only these critical logs while ignoring less severe entries. This can be achieved by modifying the journalctl configuration in Linux systems.
-
To set the system to store only level 4 and below logs, open your operating system and navigate to,
/etc/systemd/journald.conf
-
open the journald.conf in editable mode and scroll down in that file.
-
in the configuration list you can notice two fields such as:
- MaxLevelStore and
- MaxLevelSyslog
-
now set both of them to warning
#MaxLevelStore = warning #MaxLevelSyslog = warning
-
save the file and restart the operating system.
- Fine Tuned and Quantized Model: k-arthik-r/llama-3.2-3b-sys-log-analysis-alsate-Q4_K_M-GGUF
- Download the Fine tuned and quantized model by contacting developers at [email protected] .
- To Know More about Quantized Model Click Here.
-
Inside in your Linux Distribution, select a location for the project in any easily accessible location(Like Desktop).
-
Open a bash terminal in the selected location, Initialize an empty git repository and clone the repository using,
git init
https://github.com/k-arthik-r/ALSATE.git
-
A folder with name "ALSATE" will be created.
-
Move the downloaded model(From Step 2) into the ALSATE Folder.
-
Navigate to the ALSATE Folder.
-
Inside ALSATE folder open a bash terminal.
-
Install virtualenv,
pip3 install virtualenv
-
create a python virtual environment with the name env.
python3 -m venv env
-
Activate the Virtual Environment.
source env/bin/activate
-
Install the requirments from the requirements.txt
pip3 install -r requirements.txt
The execution of this project involves 3 Steps:
To Activate the LLM Server we need a package called llama.cpp,
Llama.cpp makes it simple to download and run a GGUF model. You just need to provide the path to the Hugging Face repository and the model file name. It will automatically download the model checkpoint and save it in a cache. The cache location is controlled by the LLAMA_CACHE environment variable.; read more about it here.
You can install llama.cpp through brew (works on Mac and Linux), or you can build it from source. There are also pre-built binaries and Docker images that you can check in the official documentation.
-
Within the Current directory open a new bash terminal.
-
Install llama.cpp using brew,
brew install llama.cpp
-
Activate the LLM Server Using,
llama-server -m llama-3.2-3b-sys-log-analysis-alsate-q4_k_m.gguf
Note:After running the command, you may need to wait 10-15 minutes for the model to fully load and become operational.
-
You can verify that the model is active by visiting the link where it is running—usually on port 8080 or 8081. Open http://127.0.0.1:8080 or http://127.0.0.1:8081 in your browser, and it will redirect you to a chat interface similar to the snapshot shown below.
-
or by sending a curl request,
curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer no-key" \ -d '{ "messages": [ { "role": "system", "content": "Your name is ALSATE,you are an advanced syslog parsing and analysis tool. Your task is to analyze provided system logs, identify potential causes of their generation, and detect any security threats or anomalies. If threats are found, suggest precise remediation steps. Respond only when the input is a valid system log; otherwise, reply with: Input does not appear to be a valid system log. Unable to assist." }, { "role": "user", "content": "Nov 05 22:28:18 ubuntu kernel: workqueue: blk_mq_requeue_work hogged CPU for >10000us 16 times, consider switching to WQ_UNBOUND" } ] }'
Note: if you use a curl request method, you may have to wait for 2 to 3 Minutes before you get any response depending on your current system status. you will get a response similar to the snapshot provided below.
Disclaimer* - This applies only if the LLM is activated at http://localhost:8080. If your LLM is running on a different address or port, you will need to update the link accordingly.
-
After Successfully activating the LLM Server, Add its location where its active in the .env file located at the root.
URL = http://localhost:8080
-
Within the Current directory open a new bash terminal.
-
activate python virtual environment,
source env/bin/activate
-
execute read.py,
python3 read.py
-
After the Execution of this you could see a new text file with name live_logs.txt being created and sys-logs being added in that file.
Step 3 : Fetch Sys-Logs from dynamic text file, analyse it using the fine tuned llm and display it in the streamlit interface along with its cause and remediation.
-
Within the current directory open a new bash terminal.
-
activate python virtual environment,
source env/bin/activate
-
execute main.py,
sreamlit run main.py
-
After the Execution of this you could see a Streamlit Application running in localhost.
Use journalctl, a command-line tool for querying and displaying logs from the systemd journal, to filter logs based on a specific threat level. In this case, you're looking for logs with threat levels 4 and below. This helps in narrowing down the logs to a particular set of interest based on severity.
Once you have filtered the logs, redirect or save them into a "live logs" file. This file serves as a real-time repository of logs that can be monitored, analyzed, and processed further.
After the live logs file is populated, extract the first log entry. This log entry will be the starting point for further analysis.
Pass the fetched log to a fine-tuned machine learning model (Llama-3.2-3B in this case) for analysis. The model is expected to process the log and generate a response that contains key insights, such as the heading, log content, possible cause, and recommended remediation.
After receiving the model's response, use predefined regular expressions to parse the response. The goal here is to extract specific details such as:
- Heading: A brief title or summary of the log's content.
- Log: The main content or description of the log entry.
- Cause: The potential reason behind the issue or event recorded in the log.
- Remediation: Suggested actions to resolve or mitigate the problem identified in the log.
After parsing the response, append the extracted details (heading, log, cause, and remediation) to the live logs list. This allows you to continuously build and maintain a collection of structured log information for further analysis.
Once the first log entry has been processed and its details have been extracted and stored, delete it from the live logs file. This ensures that only unprocessed logs remain in the file for future querying and analysis.
Use Streamlit, a Python library for creating interactive web applications, to visualize and display the records. The live logs list, which now contains structured and parsed information, can be shown in a user-friendly interface, allowing users to easily view and interact with the log records, including their heading, content, cause, and remediation.
-
Automated Syslog Monitoring
- Real-time tracking of Linux sys-logs, focusing on high-severity threats (levels 4 and below).
- Continuous analysis without requiring manual intervention.
-
AI-Driven Analysis
- Utilizes a fine-tuned Large Language Model (LLM) for log analysis.
- Identifies vulnerabilities, determines root causes, and suggests actionable remediation steps.
-
Proactive Threat Management
- Provides early detection of potential threats to prevent escalation.
- Generates actionable insights to prioritize responses effectively.
-
Efficient Log Management
- Reduces operational overhead with automated log processing.
- Enhances security and operational efficiency by minimizing manual efforts.
-
Low Infrastructure Requirements
- Optimized to run efficiently on resource-constrained hardware, such as devices with minimal RAM, storage, and processing power.
- Eliminates the need for expensive components or high-end systems, making it accessible for organizations with limited budgets.
-
User-Friendly Interface
- Presents processed log data and insights through an intuitive dashboard for easy decision-making.
-
Scalability and Adaptability
- Capable of handling logs from multiple Linux systems.
- Designed to adapt to various enterprise environments and applications.
-
Optimization Features
- Fine-tuned using Llama 3.2 3B Instruct Base Model with Low Rank Adaptation (LoRA) for resource efficiency.
- Quantization techniques enable deployment on devices with constrained resources.
-
Expected Outcomes
- Enhanced security posture and operational efficiency.
- Automated log management with proactive issue resolution.
- Data-driven insights for informed decision-making.
demo.webm
If you have any feedback, please reach out to us at [email protected] . You are also welcomed to add new features by creating Pull Requests.