Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(host_metrics source): Implement process collection for host metrics #21791

Merged
merged 12 commits into from
Nov 20, 2024

Conversation

LeeTeng2001
Copy link
Contributor

@LeeTeng2001 LeeTeng2001 commented Nov 14, 2024

Summary

Implement process metric collection for host metrics souce

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

A simplistic vector config for testing, additionally, it has unit test written in similar style to other host metrics

[sources.host]
type = "host_metrics"
collectors = ["process"]

[sinks.my_sink_id]
type = "console"
encoding.codec = "json"
inputs = ["host"]

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

  • Please read our Vector contributor resources.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run dd-rust-license-tool write to regenerate the license inventory and commit the changes (if any). More details here.

References

#9626

@LeeTeng2001 LeeTeng2001 requested a review from a team as a code owner November 14, 2024 05:23
@bits-bot
Copy link

bits-bot commented Nov 14, 2024

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the domain: sources Anything related to the Vector's sources label Nov 14, 2024
@LeeTeng2001 LeeTeng2001 changed the title feat(source): Implement process collection for host metrics feat(host_metrics source): Implement process collection for host metrics Nov 14, 2024
@pront
Copy link
Member

pront commented Nov 15, 2024

Thanks @LeeTeng2001! There a few failing CI checks. cargo fmt should fix them.

See how to run common checks locally here: https://github.com/vectordotdev/vector/blob/master/CONTRIBUTING.md#running-other-checks

@LeeTeng2001
Copy link
Contributor Author

Thanks @LeeTeng2001! There a few failing CI checks. cargo fmt should fix them.

See how to run common checks locally here: https://github.com/vectordotdev/vector/blob/master/CONTRIBUTING.md#running-other-checks

Hi, I've fixed the format

@LeeTeng2001 LeeTeng2001 requested review from a team as code owners November 18, 2024 10:34
@github-actions github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Nov 18, 2024
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @LeeTeng2001, this looks good!

src/sources/host_metrics/process.rs Outdated Show resolved Hide resolved
changelog.d/process_host_metrics.feature.md Outdated Show resolved Hide resolved
src/sources/host_metrics/process.rs Show resolved Hide resolved
@@ -72,7 +72,7 @@ base: components: sources: host_metrics: configuration: {
"""
required: false
type: array: {
default: ["cpu", "disk", "filesystem", "load", "host", "memory", "network", "cgroups"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to document the newly published metrics here:

output: metrics: {
_host_metrics_tags: {
collector: {
description: "Which collector this metric comes from."
required: true
}
host: {
description: "The hostname of the originating system."
required: true
examples: [_values.local_host]
}
}
// Host CPU
host_cpu_seconds_total: _host & {
description: "The number of CPU seconds accumulated in different operating modes."
type: "counter"
tags: _host_metrics_tags & {
collector: examples: ["cpu"]
cpu: {
description: "The index of the CPU core or socket."
required: true
examples: ["1"]
}
mode: {
description: "Which mode the CPU was running in during the given time."
required: true
examples: ["idle", "system", "user", "nice", "io_wait"]
}
}
}
host_logical_cpus: _host & {
description: "The number of logical CPUs."
type: "gauge"
}
host_physical_cpus: _host & {
description: "The number of physical CPUs."
type: "gauge"
}
// Host cgroups
cgroup_cpu_usage_seconds_total: _host & _cgroup_cpu & {description: "The total amount CPU time used by this cgroup and its descendants, in seconds."}
cgroup_cpu_user_seconds_total: _host & _cgroup_cpu & {description: "The total amount of CPU time spent by this cgroup in user space, in seconds."}
cgroup_cpu_system_seconds_total: _host & _cgroup_cpu & {description: "The total amount of CPU time spent by this cgroup in system tasks, in seconds."}
cgroup_memory_current_bytes: _host & _cgroup_memory & {description: "The total amount of memory currently being used by this cgroup and its descendants, in bytes."}
cgroup_memory_anon_bytes: _host & _cgroup_memory & {description: "The total amount of memory used by this cgroup in anonymous mappings (normal program allocation), in bytes."}
cgroup_memory_file_bytes: _host & _cgroup_memory & {description: "The total amount of memory used by this cgroup to cache filesystem data, including tmpfs and shared memory, in bytes."}
// Host disk
disk_read_bytes_total: _host & _disk_counter & {description: "The accumulated number of bytes read in."}
disk_reads_completed_total: _host & _disk_counter & {description: "The accumulated number of read operations completed."}
disk_written_bytes_total: _host & _disk_counter & {description: "The accumulated number of bytes written out."}
disk_writes_completed_total: _host & _disk_counter & {description: "The accumulated number of write operations completed."}
// Host filesystem
filesystem_free_bytes: _host & _filesystem_bytes & {description: "The number of bytes free on the named filesystem."}
filesystem_total_bytes: _host & _filesystem_bytes & {description: "The total number of bytes in the named filesystem."}
filesystem_used_bytes: _host & _filesystem_bytes & {description: "The number of bytes used on the named filesystem."}
filesystem_used_ratio: _host & _filesystem_bytes & {description: "The ratio between used and total bytes on the named filesystem."}
// Host load
load1: _host & _loadavg & {description: "System load averaged over the last 1 minute."}
load5: _host & _loadavg & {description: "System load averaged over the last 5 minutes."}
load15: _host & _loadavg & {description: "System load averaged over the last 15 minutes."}
// Host time
uptime: _host & _host_metric & {description: "The number of seconds since the last boot."}
boot_time: _host & _host_metric & {description: "The UNIX timestamp of the last boot."}
// Host memory
memory_active_bytes: _host & _memory_gauge & _memory_nowin & {description: "The number of bytes of active main memory."}
memory_available_bytes: _host & _memory_gauge & {description: "The number of bytes of main memory available."}
memory_buffers_bytes: _host & _memory_linux & {description: "The number of bytes of main memory used by buffers."}
memory_cached_bytes: _host & _memory_linux & {description: "The number of bytes of main memory used by cached blocks."}
memory_free_bytes: _host & _memory_gauge & {description: "The number of bytes of main memory not used."}
memory_inactive_bytes: _host & _memory_macos & {description: "The number of bytes of main memory that is not active."}
memory_shared_bytes: _host & _memory_linux & {description: "The number of bytes of main memory shared between processes."}
memory_swap_free_bytes: _host & _memory_gauge & {description: "The number of free bytes of swap space."}
memory_swapped_in_bytes_total: _host & _memory_counter & _memory_nowin & {
description: "The number of bytes that have been swapped into main memory."
}
memory_swapped_out_bytes_total: _host & _memory_counter & _memory_nowin & {
description: "The number of bytes that have been swapped out from main memory."
}
memory_swap_total_bytes: _host & _memory_gauge & {description: "The total number of bytes of swap space."}
memory_swap_used_bytes: _host & _memory_gauge & {description: "The number of used bytes of swap space."}
memory_total_bytes: _host & _memory_gauge & {description: "The total number of bytes of main memory."}
memory_used_bytes: _host & _memory_linux & {description: "The number of bytes of main memory used by programs or caches."}
memory_wired_bytes: _host & _memory_macos & {description: "The number of wired bytes of main memory."}
// Host network
network_receive_bytes_total: _host & _network_gauge & {description: "The number of bytes received on this interface."}
network_receive_errs_total: _host & _network_gauge & {description: "The number of errors encountered during receives on this interface."}
network_receive_packets_total: _host & _network_gauge & {description: "The number of packets received on this interface."}
network_transmit_bytes_total: _host & _network_gauge & {description: "The number of bytes transmitted on this interface."}
network_transmit_errs_total: _host & _network_gauge & {description: "The number of errors encountered during transmits on this interface."}
network_transmit_packets_drop_total: _host & _network_nomac & {description: "The number of packets dropped during transmits on this interface."}
network_transmit_packets_total: _host & _network_nomac & {description: "The number of packets transmitted on this interface."}
// Helpers
_host: {
default_namespace: "host"
}
_cgroup_cpu: {
type: "counter"
tags: _host_metrics_tags & {
collector: examples: ["cgroups"]
cgroup: _cgroup_name
}
}
_cgroup_memory: {
type: "gauge"
tags: _host_metrics_tags & {
collector: examples: ["cgroups"]
cgroup: _cgroup_name
}
}
_cgroup_name: {
description: "The control group name."
required: true
examples: ["/", "user.slice", "system.slice/snapd.service"]
}
_disk_device: {
description: "The disk device name."
required: true
examples: ["sda", "sda1", "dm-1"]
}
_disk_counter: {
type: "counter"
tags: _host_metrics_tags & {
collector: examples: ["disk"]
device: _disk_device
}
}
_filesystem_bytes: {
type: "gauge"
tags: _host_metrics_tags & {
collector: examples: ["filesystem"]
device: _disk_device
filesystem: {
description: "The name of the filesystem type."
required: true
examples: ["ext4", "ntfs"]
}
}
}
_loadavg: {
type: "gauge"
tags: _host_metrics_tags & {
collector: examples: ["loadavg"]
}
relevant_when: "OS is not Windows"
}
_host_metric: {
type: "gauge"
tags: _host_metrics_tags & {
collector: examples: ["host"]
}
}
_memory_counter: {
type: "counter"
tags: _host_metrics_tags & {
collector: examples: ["memory"]
}
}
_memory_gauge: {
type: "gauge"
tags: _host_metrics_tags & {
collector: examples: ["memory"]
}
}
_memory_linux: _memory_gauge & {relevant_when: "OS is Linux"}
_memory_macos: _memory_gauge & {relevant_when: "OS is macOS X"}
_memory_nowin: {relevant_when: "OS is not Windows"}
_network_gauge: {
type: "gauge"
tags: _host_metrics_tags & {
collector: examples: ["network"]
device: {
description: "The network interface device name."
required: true
examples: ["eth0", "enp5s3"]
}
}
}
_network_nomac: _network_gauge & {relevant_when: "OS is not macOS"}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I've completely missed that, thanks!

Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LeeTeng2001

@pront pront enabled auto-merge November 20, 2024 15:57
@pront pront added this pull request to the merge queue Nov 20, 2024
Merged via the queue into vectordotdev:master with commit 798f300 Nov 20, 2024
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: external docs Anything related to Vector's external, public documentation domain: sources Anything related to the Vector's sources
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants