Skip to content

An automated log monitoring system that continuously analyzes Linux sys-logs for threat levels 4 and below, leveraging a fine-tuned Large Language Model (LLM) to detect issues, provide explanations, and recommend actionable remediation measures to enhance system security and efficiency.

License

Notifications You must be signed in to change notification settings

k-arthik-r/ALSATE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


                                               

ALSATE an acronym for "Analysis of Linux System Logs for Active Threat detection and Evaluation" is an automated log monitoring system that continuously analyzes Linux sys-logs for threat levels 4 and below, leveraging a fine-tuned Large Language Model (LLM) to detect issues, provide explanations, and recommend actionable remediation measures to enhance system security and efficiency.

ALSATE prioritizes the following five levels of system logs:

  • L0 - EMERGENCY
  • L1 - ALERT
  • L2 - CRITICAL
  • L3 - ERROR
  • L4 - WARNING

Logs at levels 5 and above (such as Notice or Info) are not analyzed, as they typically indicate no significant threat.


Requirments

Web Level Requirements

  • Huggingface Account


System Level Requirements (Only for Windows and MAC)

  • Oracle Virtual Box 7.1.4 (Recommended) (Any Virtual Engine with latest the lersions can also be used)


Virtal Machine Level Requirements

  • Ubuntu 24.04.1 (Recommended) (Any other Linux Distribution or any other latest version of the Ubuntu can also be used)


Linux Distribution Level Requirements

  • Python 3.12.3 (Recommended)


  • python-pip 24.0 (Recommended)

    To Install python-pip, open a bash terminal and execute this,

    sudo apt-get install python-pip

  • cURL 8.11.0 (Recommended)

    To Install cURL, open a bash terminal and execute this,

    sudo apt-get install curl

  • git 2.43.0 (Recommended)

    To Install git, open a bash terminal and execute this,

    sudo apt install git-all

  • Homebrew 4.4.8 (Recommended)

    To Install Homebrew, open a bash terminal and execute these,

    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
    test -d ~/.linuxbrew && eval "$(~/.linuxbrew/bin/brew shellenv)"
    test -d /home/linuxbrew/.linuxbrew && eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"
    echo "eval \"\$($(brew --prefix)/bin/brew shellenv)\"" >> ~/.bashrc

Modules/Libraries Used

All The Modules/Libraries Used in the project can be installed using requirements.txt

  • re
  • time
  • json
  • requests
  • streamlit
  • subprocess
  • streamlit.components.v1

Setup

Step 1 : Restrict the System to produce only Level 4 and below logs.

Since our focus is exclusively on syslogs of severity level 4 and below, as logs with higher severity levels do not pose significant threats, we need to access only these critical logs while ignoring less severe entries. This can be achieved by modifying the journalctl configuration in Linux systems.

  • To set the system to store only level 4 and below logs, open your operating system and navigate to,

    /etc/systemd/journald.conf
  • open the journald.conf in editable mode and scroll down in that file.

  • in the configuration list you can notice two fields such as:

    • MaxLevelStore and
    • MaxLevelSyslog
  • now set both of them to warning

    #MaxLevelStore = warning
    #MaxLevelSyslog = warning
  • save the file and restart the operating system.

Step 2 : Download the Quantized Model

  • Fine Tuned and Quantized Model: k-arthik-r/llama-3.2-3b-sys-log-analysis-alsate-Q4_K_M-GGUF
  • Download the Fine tuned and quantized model by contacting developers at [email protected] .
  • To Know More about Quantized Model Click Here.

Step 3 : Setup the Repository

  • Inside in your Linux Distribution, select a location for the project in any easily accessible location(Like Desktop).

  • Open a bash terminal in the selected location, Initialize an empty git repository and clone the repository using,

    git init
    https://github.com/k-arthik-r/ALSATE.git
  • A folder with name "ALSATE" will be created.

  • Move the downloaded model(From Step 2) into the ALSATE Folder.

  • Navigate to the ALSATE Folder.

Step 4 : Create a Python Virtual Env and install the requirements.

  • Inside ALSATE folder open a bash terminal.

  • Install virtualenv,

    pip3 install virtualenv
  • create a python virtual environment with the name env.

    python3 -m venv env
  • Activate the Virtual Environment.

    source env/bin/activate
  • Install the requirments from the requirements.txt

    pip3 install -r requirements.txt

How to Run?

The execution of this project involves 3 Steps:

Step 1 : Activate LLM Server

To Activate the LLM Server we need a package called llama.cpp,

About llama.cpp

Llama.cpp makes it simple to download and run a GGUF model. You just need to provide the path to the Hugging Face repository and the model file name. It will automatically download the model checkpoint and save it in a cache. The cache location is controlled by the LLAMA_CACHE environment variable.; read more about it here.

You can install llama.cpp through brew (works on Mac and Linux), or you can build it from source. There are also pre-built binaries and Docker images that you can check in the official documentation.

Follow the below steps provided

  • Within the Current directory open a new bash terminal.

  • Install llama.cpp using brew,

    brew install llama.cpp
  • Activate the LLM Server Using,

    llama-server -m llama-3.2-3b-sys-log-analysis-alsate-q4_k_m.gguf

    Note:After running the command, you may need to wait 10-15 minutes for the model to fully load and become operational.

  • You can verify that the model is active by visiting the link where it is running—usually on port 8080 or 8081. Open http://127.0.0.1:8080 or http://127.0.0.1:8081 in your browser, and it will redirect you to a chat interface similar to the snapshot shown below.

    Screenshot 2024-12-04 220654

  • or by sending a curl request,

    curl http://localhost:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer no-key" \
    -d '{
    "messages": [
    {
            "role": "system",
            "content": "Your name is ALSATE,you are an advanced syslog parsing and analysis tool. Your task is to analyze provided system logs, identify potential causes of their generation, and detect any security threats or anomalies. If threats are found, suggest precise remediation steps. Respond only when the input is a valid system log; otherwise, reply with: Input does not appear to be a valid system log. Unable to assist."
        },
        {
            "role": "user",
            "content": "Nov 05 22:28:18 ubuntu kernel: workqueue: blk_mq_requeue_work hogged CPU for >10000us 16 times, consider switching to WQ_UNBOUND"
        }
      ]
    }'

    Note: if you use a curl request method, you may have to wait for 2 to 3 Minutes before you get any response depending on your current system status. you will get a response similar to the snapshot provided below.

    Screenshot 2024-12-04 214110

    Disclaimer* - This applies only if the LLM is activated at http://localhost:8080. If your LLM is running on a different address or port, you will need to update the link accordingly.

  • After Successfully activating the LLM Server, Add its location where its active in the .env file located at the root.

    URL = http://localhost:8080
    

Step 2 : Read Sys-Logs and save it in a dynamic text file.

  • Within the Current directory open a new bash terminal.

  • activate python virtual environment,

    source env/bin/activate
  • execute read.py,

    python3 read.py
  • After the Execution of this you could see a new text file with name live_logs.txt being created and sys-logs being added in that file.

Step 3 : Fetch Sys-Logs from dynamic text file, analyse it using the fine tuned llm and display it in the streamlit interface along with its cause and remediation.

  • Within the current directory open a new bash terminal.

  • activate python virtual environment,

    source env/bin/activate
  • execute main.py,

    sreamlit run main.py
  • After the Execution of this you could see a Streamlit Application running in localhost.


Working

1. Extract Logs with Specific Threat Level:

Use journalctl, a command-line tool for querying and displaying logs from the systemd journal, to filter logs based on a specific threat level. In this case, you're looking for logs with threat levels 4 and below. This helps in narrowing down the logs to a particular set of interest based on severity.

2. Dump Extracted Logs into Live Logs File:

Once you have filtered the logs, redirect or save them into a "live logs" file. This file serves as a real-time repository of logs that can be monitored, analyzed, and processed further.

3. Fetch the First Log from the Live Logs File:

After the live logs file is populated, extract the first log entry. This log entry will be the starting point for further analysis.

4. Query the Log with Llama-3.2-3B Fine-Tuned Model:

Pass the fetched log to a fine-tuned machine learning model (Llama-3.2-3B in this case) for analysis. The model is expected to process the log and generate a response that contains key insights, such as the heading, log content, possible cause, and recommended remediation.

5. Receive and Parse the Response:

After receiving the model's response, use predefined regular expressions to parse the response. The goal here is to extract specific details such as:

  • Heading: A brief title or summary of the log's content.
  • Log: The main content or description of the log entry.
  • Cause: The potential reason behind the issue or event recorded in the log.
  • Remediation: Suggested actions to resolve or mitigate the problem identified in the log.

6. Append the Parsed Details to the Live Logs List:

After parsing the response, append the extracted details (heading, log, cause, and remediation) to the live logs list. This allows you to continuously build and maintain a collection of structured log information for further analysis.

7. Delete(Update) the Fetched Log from the Live Log File:

Once the first log entry has been processed and its details have been extracted and stored, delete it from the live logs file. This ensures that only unprocessed logs remain in the file for future querying and analysis.

8. Represent the Records in Streamlit Interface:

Use Streamlit, a Python library for creating interactive web applications, to visualize and display the records. The live logs list, which now contains structured and parsed information, can be shown in a user-friendly interface, allowing users to easily view and interact with the log records, including their heading, content, cause, and remediation.

alsate-working-drawio


Key Features

  • Automated Syslog Monitoring

    • Real-time tracking of Linux sys-logs, focusing on high-severity threats (levels 4 and below).
    • Continuous analysis without requiring manual intervention.
  • AI-Driven Analysis

    • Utilizes a fine-tuned Large Language Model (LLM) for log analysis.
    • Identifies vulnerabilities, determines root causes, and suggests actionable remediation steps.
  • Proactive Threat Management

    • Provides early detection of potential threats to prevent escalation.
    • Generates actionable insights to prioritize responses effectively.
  • Efficient Log Management

    • Reduces operational overhead with automated log processing.
    • Enhances security and operational efficiency by minimizing manual efforts.
  • Low Infrastructure Requirements

    • Optimized to run efficiently on resource-constrained hardware, such as devices with minimal RAM, storage, and processing power.
    • Eliminates the need for expensive components or high-end systems, making it accessible for organizations with limited budgets.
  • User-Friendly Interface

    • Presents processed log data and insights through an intuitive dashboard for easy decision-making.
  • Scalability and Adaptability

    • Capable of handling logs from multiple Linux systems.
    • Designed to adapt to various enterprise environments and applications.
  • Optimization Features

    • Fine-tuned using Llama 3.2 3B Instruct Base Model with Low Rank Adaptation (LoRA) for resource efficiency.
    • Quantization techniques enable deployment on devices with constrained resources.
  • Expected Outcomes

    • Enhanced security posture and operational efficiency.
    • Automated log management with proactive issue resolution.
    • Data-driven insights for informed decision-making.

Recording

demo.webm

License

  • For the code and streamlit application,

    Licence

  • For the Model and Dataset,

    Licence


Feedback

If you have any feedback, please reach out to us at [email protected] . You are also welcomed to add new features by creating Pull Requests.

About

An automated log monitoring system that continuously analyzes Linux sys-logs for threat levels 4 and below, leveraging a fine-tuned Large Language Model (LLM) to detect issues, provide explanations, and recommend actionable remediation measures to enhance system security and efficiency.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •