GitHub - k-arthik-r/ALSATE: An automated log monitoring system that continuously analyzes Linux sys-logs for threat levels 4 and below, leveraging a fine-tuned Large Language Model (LLM) to detect issues, provide explanations, and recommend actionable remediation measures to enhance system security and efficiency.

ALSATE an acronym for "Analysis of Linux System Logs for Active Threat detection and Evaluation" is an automated log monitoring system that continuously analyzes Linux sys-logs for threat levels 4 and below, leveraging a fine-tuned Large Language Model (LLM) to detect issues, provide explanations, and recommend actionable remediation measures to enhance system security and efficiency.

ALSATE prioritizes the following five levels of system logs:

L0 - EMERGENCY
L1 - ALERT
L2 - CRITICAL
L3 - ERROR
L4 - WARNING

Logs at levels 5 and above (such as Notice or Info) are not analyzed, as they typically indicate no significant threat.

Requirments

Web Level Requirements

Huggingface Account

System Level Requirements (Only for Windows and MAC)

Oracle Virtual Box 7.1.4 (Recommended) (Any Virtual Engine with latest the lersions can also be used)

Virtal Machine Level Requirements

Ubuntu 24.04.1 (Recommended) (Any other Linux Distribution or any other latest version of the Ubuntu can also be used)

Linux Distribution Level Requirements

Python 3.12.3 (Recommended)

python-pip 24.0 (Recommended)

To Install python-pip, open a bash terminal and execute this,
```
sudo apt-get install python-pip
```

cURL 8.11.0 (Recommended)

To Install cURL, open a bash terminal and execute this,
```
sudo apt-get install curl
```

git 2.43.0 (Recommended)

To Install git, open a bash terminal and execute this,
```
sudo apt install git-all
```

Homebrew 4.4.8 (Recommended)

To Install Homebrew, open a bash terminal and execute these,

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

test -d ~/.linuxbrew && eval "$(~/.linuxbrew/bin/brew shellenv)"

test -d /home/linuxbrew/.linuxbrew && eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"

echo "eval \"\$($(brew --prefix)/bin/brew shellenv)\"" >> ~/.bashrc

Modules/Libraries Used

All The Modules/Libraries Used in the project can be installed using requirements.txt

re
time
json
requests
streamlit
subprocess
streamlit.components.v1

Setup

Step 1 : Restrict the System to produce only Level 4 and below logs.

Since our focus is exclusively on syslogs of severity level 4 and below, as logs with higher severity levels do not pose significant threats, we need to access only these critical logs while ignoring less severe entries. This can be achieved by modifying the journalctl configuration in Linux systems.

To set the system to store only level 4 and below logs, open your operating system and navigate to,
```
/etc/systemd/journald.conf
```
open the journald.conf in editable mode and scroll down in that file.
in the configuration list you can notice two fields such as:
- MaxLevelStore and
- MaxLevelSyslog

now set both of them to warning

#MaxLevelStore = warning
#MaxLevelSyslog = warning

save the file and restart the operating system.

Step 2 : Download the Quantized Model

Fine Tuned and Quantized Model: k-arthik-r/llama-3.2-3b-sys-log-analysis-alsate-Q4_K_M-GGUF
Download the Fine tuned and quantized model by contacting developers at [email protected] .
To Know More about Quantized Model Click Here.

Step 3 : Setup the Repository

Inside in your Linux Distribution, select a location for the project in any easily accessible location(Like Desktop).
Open a bash terminal in the selected location, Initialize an empty git repository and clone the repository using,
```
git init
```
```
https://github.com/k-arthik-r/ALSATE.git
```
A folder with name "ALSATE" will be created.
Move the downloaded model(From Step 2) into the ALSATE Folder.
Navigate to the ALSATE Folder.

Step 4 : Create a Python Virtual Env and install the requirements.

Inside ALSATE folder open a bash terminal.
Install virtualenv,
```
pip3 install virtualenv
```
create a python virtual environment with the name env.
```
python3 -m venv env
```
Activate the Virtual Environment.
```
source env/bin/activate
```
Install the requirments from the requirements.txt
```
pip3 install -r requirements.txt
```

How to Run?

The execution of this project involves 3 Steps:

Step 1 : Activate LLM Server

To Activate the LLM Server we need a package called llama.cpp,

About llama.cpp

Llama.cpp makes it simple to download and run a GGUF model. You just need to provide the path to the Hugging Face repository and the model file name. It will automatically download the model checkpoint and save it in a cache. The cache location is controlled by the LLAMA_CACHE environment variable.; read more about it here.

You can install llama.cpp through brew (works on Mac and Linux), or you can build it from source. There are also pre-built binaries and Docker images that you can check in the official documentation.

Follow the below steps provided

Within the Current directory open a new bash terminal.
Install llama.cpp using brew,
```
brew install llama.cpp
```
Activate the LLM Server Using,
```
llama-server -m llama-3.2-3b-sys-log-analysis-alsate-q4_k_m.gguf
```
Note:After running the command, you may need to wait 10-15 minutes for the model to fully load and become operational.
You can verify that the model is active by visiting the link where it is running—usually on port 8080 or 8081. Open http://127.0.0.1:8080 or http://127.0.0.1:8081 in your browser, and it will redirect you to a chat interface similar to the snapshot shown below.

or by sending a curl request,

curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" \
-d '{
"messages": [
{
        "role": "system",
        "content": "Your name is ALSATE,you are an advanced syslog parsing and analysis tool. Your task is to analyze provided system logs, identify potential causes of their generation, and detect any security threats or anomalies. If threats are found, suggest precise remediation steps. Respond only when the input is a valid system log; otherwise, reply with: Input does not appear to be a valid system log. Unable to assist."
    },
    {
        "role": "user",
        "content": "Nov 05 22:28:18 ubuntu kernel: workqueue: blk_mq_requeue_work hogged CPU for >10000us 16 times, consider switching to WQ_UNBOUND"
    }
  ]
}'

Note: if you use a curl request method, you may have to wait for 2 to 3 Minutes before you get any response depending on your current system status. you will get a response similar to the snapshot provided below.

Disclaimer* - This applies only if the LLM is activated at http://localhost:8080. If your LLM is running on a different address or port, you will need to update the link accordingly.

After Successfully activating the LLM Server, Add its location where its active in the .env file located at the root.
```
URL = http://localhost:8080
```

Step 2 : Read Sys-Logs and save it in a dynamic text file.

Within the Current directory open a new bash terminal.
activate python virtual environment,
```
source env/bin/activate
```
execute read.py,
```
python3 read.py
```
After the Execution of this you could see a new text file with name live_logs.txt being created and sys-logs being added in that file.

Step 3 : Fetch Sys-Logs from dynamic text file, analyse it using the fine tuned llm and display it in the streamlit interface along with its cause and remediation.

Within the current directory open a new bash terminal.
activate python virtual environment,
```
source env/bin/activate
```
execute main.py,
```
sreamlit run main.py
```
After the Execution of this you could see a Streamlit Application running in localhost.

Working

1. Extract Logs with Specific Threat Level:

Use journalctl, a command-line tool for querying and displaying logs from the systemd journal, to filter logs based on a specific threat level. In this case, you're looking for logs with threat levels 4 and below. This helps in narrowing down the logs to a particular set of interest based on severity.

2. Dump Extracted Logs into Live Logs File:

Once you have filtered the logs, redirect or save them into a "live logs" file. This file serves as a real-time repository of logs that can be monitored, analyzed, and processed further.

3. Fetch the First Log from the Live Logs File:

After the live logs file is populated, extract the first log entry. This log entry will be the starting point for further analysis.

4. Query the Log with Llama-3.2-3B Fine-Tuned Model:

Pass the fetched log to a fine-tuned machine learning model (Llama-3.2-3B in this case) for analysis. The model is expected to process the log and generate a response that contains key insights, such as the heading, log content, possible cause, and recommended remediation.

5. Receive and Parse the Response:

After receiving the model's response, use predefined regular expressions to parse the response. The goal here is to extract specific details such as:

Heading: A brief title or summary of the log's content.
Log: The main content or description of the log entry.
Cause: The potential reason behind the issue or event recorded in the log.
Remediation: Suggested actions to resolve or mitigate the problem identified in the log.

6. Append the Parsed Details to the Live Logs List:

After parsing the response, append the extracted details (heading, log, cause, and remediation) to the live logs list. This allows you to continuously build and maintain a collection of structured log information for further analysis.

7. Delete(Update) the Fetched Log from the Live Log File:

Once the first log entry has been processed and its details have been extracted and stored, delete it from the live logs file. This ensures that only unprocessed logs remain in the file for future querying and analysis.

8. Represent the Records in Streamlit Interface:

Use Streamlit, a Python library for creating interactive web applications, to visualize and display the records. The live logs list, which now contains structured and parsed information, can be shown in a user-friendly interface, allowing users to easily view and interact with the log records, including their heading, content, cause, and remediation.

Key Features

Automated Syslog Monitoring
- Real-time tracking of Linux sys-logs, focusing on high-severity threats (levels 4 and below).
- Continuous analysis without requiring manual intervention.
AI-Driven Analysis
- Utilizes a fine-tuned Large Language Model (LLM) for log analysis.
- Identifies vulnerabilities, determines root causes, and suggests actionable remediation steps.
Proactive Threat Management
- Provides early detection of potential threats to prevent escalation.
- Generates actionable insights to prioritize responses effectively.
Efficient Log Management
- Reduces operational overhead with automated log processing.
- Enhances security and operational efficiency by minimizing manual efforts.
Low Infrastructure Requirements
- Optimized to run efficiently on resource-constrained hardware, such as devices with minimal RAM, storage, and processing power.
- Eliminates the need for expensive components or high-end systems, making it accessible for organizations with limited budgets.
User-Friendly Interface
- Presents processed log data and insights through an intuitive dashboard for easy decision-making.
Scalability and Adaptability
- Capable of handling logs from multiple Linux systems.
- Designed to adapt to various enterprise environments and applications.
Optimization Features
- Fine-tuned using Llama 3.2 3B Instruct Base Model with Low Rank Adaptation (LoRA) for resource efficiency.
- Quantization techniques enable deployment on devices with constrained resources.
Expected Outcomes
- Enhanced security posture and operational efficiency.
- Automated log management with proactive issue resolution.
- Data-driven insights for informed decision-making.

Recording

demo.webm

License

For the code and streamlit application,
For the Model and Dataset,

Feedback

If you have any feedback, please reach out to us at [email protected] . You are also welcomed to add new features by creating Pull Requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Requirments

Web Level Requirements

System Level Requirements (Only for Windows and MAC)

Virtal Machine Level Requirements

Linux Distribution Level Requirements

Modules/Libraries Used

Setup

Step 1 : Restrict the System to produce only Level 4 and below logs.

Step 2 : Download the Quantized Model

Step 3 : Setup the Repository

Step 4 : Create a Python Virtual Env and install the requirements.

How to Run?

Step 1 : Activate LLM Server

About llama.cpp

Follow the below steps provided

Step 2 : Read Sys-Logs and save it in a dynamic text file.

Step 3 : Fetch Sys-Logs from dynamic text file, analyse it using the fine tuned llm and display it in the streamlit interface along with its cause and remediation.

Working

1. Extract Logs with Specific Threat Level:

2. Dump Extracted Logs into Live Logs File:

3. Fetch the First Log from the Live Logs File:

4. Query the Log with Llama-3.2-3B Fine-Tuned Model:

5. Receive and Parse the Response:

6. Append the Parsed Details to the Live Logs List:

7. Delete(Update) the Fetched Log from the Live Log File:

8. Represent the Records in Streamlit Interface:

Key Features

Recording

License

Feedback

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Model		Model
resources		resources
.env		.env
LICENSE		LICENSE
README.md		README.md
live_logs.txt		live_logs.txt
main.py		main.py
read.py		read.py
requirements.txt		requirements.txt

License

k-arthik-r/ALSATE

Folders and files

Latest commit

History

Repository files navigation

Requirments

Web Level Requirements

System Level Requirements (Only for Windows and MAC)

Virtal Machine Level Requirements

Linux Distribution Level Requirements

Modules/Libraries Used

Setup

Step 1 : Restrict the System to produce only Level 4 and below logs.

Step 2 : Download the Quantized Model

Step 3 : Setup the Repository

Step 4 : Create a Python Virtual Env and install the requirements.

How to Run?

Step 1 : Activate LLM Server

About llama.cpp

Follow the below steps provided

Step 2 : Read Sys-Logs and save it in a dynamic text file.

Step 3 : Fetch Sys-Logs from dynamic text file, analyse it using the fine tuned llm and display it in the streamlit interface along with its cause and remediation.

Working

1. Extract Logs with Specific Threat Level:

2. Dump Extracted Logs into Live Logs File:

3. Fetch the First Log from the Live Logs File:

4. Query the Log with Llama-3.2-3B Fine-Tuned Model:

5. Receive and Parse the Response:

6. Append the Parsed Details to the Live Logs List:

7. Delete(Update) the Fetched Log from the Live Log File:

8. Represent the Records in Streamlit Interface:

Key Features

Recording

License

Feedback

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages