Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add windowsevent stage loki process #2545

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

wildum
Copy link
Contributor

@wildum wildum commented Jan 27, 2025

PR Description

The existing eventlogmessage stage has a few parsing flaws that cannot be addressed without breaking changes (see the issue linked).

This is why I decided to create a new stage "windowsevent" which covers the same functionality and has the same arguments as the existing eventlogmessage, except that it parses the message differently.

New parsing logic:

  • The windowsevent stage expects the message to be structured in sections that are split by empty lines.

  • The first section of the input is treated as a whole block and stored in the extracted map with the key Description.

  • Sections following the Description are expected to contain key-value pairs in the format key: value.

  • If the first line of a section has no value (e.g., "Subject:"), the key will act as a prefix for subsequent keys in the same section.

  • If a line within a section does not include the : symbol, it is considered part of the previous entry's value. The line is appended to the previous value, separated by a comma.

  • Lines in a section without a preceding valid entry (key-value pair) are ignored and discarded.

I scrolled through Windows events on my personal computer to get some examples. You can check the example in the doc and in the tests to see the results.

Which issue(s) this PR fixes

Fixes #2337

Notes to the Reviewer

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • [NA] Config converters updated

@wildum wildum requested review from clayton-cornell and a team as code owners January 27, 2025 15:21
Copy link
Contributor

github-actions bot commented Jan 27, 2025

@wildum wildum force-pushed the add-windowsevent-stage-loki-process branch from 7c88f59 to 24b6f7b Compare January 27, 2025 16:16
continue
}

ek := parts[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to sanitize here? Had to follow the logic to make sure the key prefix also ultimately gets sanitized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it's better not to sanitize the prefix by itself because the sanitization process checks for duplicates and it would add an "_extracted" prefix that's not needed on the prefix in case of collision.

For example:

  • the key "Subject" already exists in the map
  • the prefix "Subject" is parsed.
  • if we sanitize the prefix by itself and we have overwrite turned off, it will become "Subject_extracted"
  • the next keys in the section will be like "Subject_extracted_LoginID" which is not so nice because it could just have been "Subject_LoginID" since there is no collision with this

If you're ok with the logic I can add a comment to document this more clearly

Copy link
Collaborator

@mattdurham mattdurham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments but overall looks solid.

Copy link
Collaborator

@mattdurham mattdurham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Nachtfalkeaw
Copy link

Hello,

this is my windows eventlog processing pipeline:

  • I do separate the event_log channels with "service_name" because this fits into "Explore Logs" app
  • I try to change the existing "level" and "levelText" values to map the "level" value. sometimes logs have this, sometimes not. sometimes english, sometimes german. At the end all logs have/should have the "level" field.

Not sure if this helps or not.

loki.source.windowsevent "application"  {
    eventlog_name = "Application"
    use_incoming_timestamp = true

    labels = {
      "service_name"  = "windows_eventlog",
      "channel"       = "Application",
    }

    forward_to = [loki.relabel.windows_event_level.receiver]
}


//
//=======================================================================================
//


loki.source.windowsevent "security"  {
    eventlog_name = "Security"
    use_incoming_timestamp = true

    labels = {
      "service_name"  = "windows_eventlog",
      "channel"       = "Security",
    }

    forward_to = [loki.relabel.windows_event_level.receiver]
}


//
//=======================================================================================
//


loki.source.windowsevent "setup"  {
    eventlog_name = "Setup"
    use_incoming_timestamp = true

    labels = {
      "service_name"  = "windows_eventlog",
      "channel"       = "Setup",
    }

    forward_to = [loki.relabel.windows_event_level.receiver]
}

//
//=======================================================================================
//


loki.source.windowsevent "system"  {
    eventlog_name = "System"
    use_incoming_timestamp = true

    labels = {
      "service_name"  = "windows_eventlog",
      "channel"       = "System",
    }

    forward_to = [loki.relabel.windows_event_level.receiver]
}


//
//=======================================================================================
//


loki.relabel "windows_event_level" {
// if "level" label is empty or does not exist create it and set value "tmp_level"
  rule {
    action        = "replace"
    source_labels = ["level"]
    regex         = "^$"
    replacement   = "tmp_level"
    target_label  = "level"
  }

  forward_to = [loki.process.windows_eventlog.receiver]

}


//
//=======================================================================================
//


loki.process "windows_eventlog" {

  stage.json {
      expressions = {
        source            = "",
        channel           = "",
        computer          = "",
        event_id          = "",
        levelText         = "",
        level             = "",
        opCodeText        = "",
        keywords          = "",
        timeCreated       = "",
        eventRecordID     = "",
        event_data        = "",
        user_data         = "",
        message           = "",
        task              = "",
        taskText          = "",
        version           = "",
        opCode            = "",
        execution         = "",
        processId         = "execution.\"processId\"",
        threadId          = "execution.\"threadId\"",
        processName       = "execution.\"processName\"",
        security          = "",
        userId            = "security.\"userId\"",
        userName          = "security.\"userName\"",

      }
  }



// sometimes windows level values are numbers. we convert it to strings
// sometimes messages do not have "level" at all. for that we created a label "level" with value "tmp_level" in loki.relabel.windows_event_level before.
// if "tmp_level" is set we use the level information from "levelText". In my case it is german and I translate it to the loki know english naming.
  stage.template {
      source   = "level"
      template = `{{- $level := .Value -}}
                  {{- if eq $level "0" -}}debug
                  {{- else if eq $level "1" -}}critical
                  {{- else if eq $level "2" -}}error
                  {{- else if eq $level "3" -}}warn
                  {{- else if eq $level "4" -}}info
                  {{- else if eq $level "5" -}}trace
                  {{- else if eq $level "tmp_level" -}}{{- .levelText -}}
                  {{- else if eq .levelText "Information" -}}info
                  {{- else if eq .levelText "Informationen" -}}info
                  {{- else if eq .levelText "Warning" -}}warn
                  {{- else if eq .levelText "Warnung" -}}warn
                  {{- else if eq .levelText "Fehler" -}}error
                  {{- else if eq .levelText "Kritisch" -}}critical
                  {{- else if eq .levelText nil -}}unknown
                  {{- else -}}{{- .levelText -}}{{- end -}}`
  }

  stage.labels {
    values = {
      level       = "",
      channel     = "",
    }
  }



            // everything we do not need as label as struchtured_metadata
  stage.structured_metadata {
    values = {
        source            = "",
        //channel         = "",
        computer          = "",
        event_id          = "",
        //level           = "",
        levelText         = "",
        opCodeText        = "",
        keywords          = "",
        timeCreated       = "",
        eventRecordID     = "",
        event_data        = "",
        user_data         = "",
        // message        = "",
        task              = "",
        taskText          = "",
        // execution      = "",
        // security       = "",
        processId         = "",
        threadId          = "",
        processName       = "",
        userId            = "",
        userName          = "",
        version           = "",
        opCode            = "",
    }
  }



            // drop all alloy messages from event_log because it is to noisy. parsing error messages and so on.
  stage.drop {
      source = "source"
      value  = "Alloy"
      drop_counter_reason = "windows_eventlog_alloy"
  }


// to parse the original "message" field
  stage.eventlogmessage {
      source = "message"
      overwrite_existing = true
  }


            // only message field as output. rest is in structured_metadata
  stage.output {
      source = "message"
  }


// to parse the timestamp correctly.
  stage.timestamp {
      source      = "timeCreated"
      format      = "2006-01-02T15:04:05.0000000Z"
//      location    = "Europe/Berlin"                 // DO NOT SET if there is any time zone in the timestamp itself or it will not process any logs at all anymore.
  }


forward_to = [loki.relabel.hostname.receiver]

}


loki.relabel "hostname" {

            // use the hostname as "instance" because "instance" is used in prometheus metrics and so hostnames have equal labels
  rule {
    action        = "replace"
    replacement   = constants.hostname
    target_label  = "instance"
  }

            // allo hostnames to lowercase
  rule {
    action        = "lowercase"
    source_labels = ["instance"]
    target_label  = "instance"
  }

            // only hostname, no domainname
  rule {
    action        = "replace"
    source_labels = ["instance"]
    regex         = "^([^.]+)\\..*$"
    replacement   = "$1"
    target_label  = "instance"
  }

            // label to identify if this is a windows client (enduser) or a windows server (datacenter)
  rule {
    action        = "replace"
    replacement   = "server"
    target_label  = "system_type"
  }

            // if previous stages - no matter if windows eventlog or other logs do not have a "level" we set one as "unknown" what matches the loki explore naming scheme
  rule {
    action        = "replace"
    source_labels = ["level"]
    regex         = "^$"
    replacement   = "unknown"
    target_label  = "level"
  }

            // we add a service_name to match Explore Logs app
  rule {
    action        = "replace"
    source_labels = ["service_name"]
    regex         = "^$"
    replacement   = "unknown"
    target_label  = "service_name"
  }


  forward_to = [loki.write.loki.receiver]
}

docs/sources/reference/components/loki/loki.process.md Outdated Show resolved Hide resolved
docs/sources/reference/components/loki/loki.process.md Outdated Show resolved Hide resolved
docs/sources/reference/components/loki/loki.process.md Outdated Show resolved Hide resolved
docs/sources/reference/components/loki/loki.process.md Outdated Show resolved Hide resolved
docs/sources/reference/components/loki/loki.process.md Outdated Show resolved Hide resolved

The first section of the input is treated as a whole block and stored in the extracted map with the key `Description`.

Sections following the Description are expected to contain key-value pairs in the format key:value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if they don't have key:value pairs? The "are expected" suggests that it may not happen this way.

Is this a "must"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should rephrase because it's not a must but the lines that don't have a key:value pair will be ignored (unless there was already a key:value pair in the section. In this case it's considered a multi line value)

Copy link
Contributor

@clayton-cornell clayton-cornell Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just adding the bit about what happens if a key:value pair isn't found will help clarify what happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's documented a bit below with "Lines in a section without a preceding valid entry (key-value pair) are ignored and discarded."

docs/sources/reference/components/loki/loki.process.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve eventlogmessage stage in loki.process
4 participants