This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
Note : The agent version(s) below has dates (ciprod), which indicate the agent build dates (not release dates)
Version microsoft/oms:ciprod06272022-hotfix Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod06272022-hotfix (linux)
- Fixes for sending the proper node allocatable cpu and memory value for the container which does not specify limits.
Version microsoft/oms:ciprod06272022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod06272022 (linux)
- Fixes for following bugs in ciprod06142022 which are caught in AKS Canary region deployment
- Fix the exceptions related to file write & read access of the MDM inventory state file
- Fix for missing Node GPU allocatable & capacity metrics for the clusters which are whitelisted for AKS LargeCluster Private Preview feature
Version microsoft/oms:ciprod06142022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod06142022 (linux)
Version microsoft/oms:win-ciprod06142022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod06142022 (windows)
- Linux Agent
- Prometheus sidecar memory optimization
- Fix for issue of Telegraf connecting to FluentD Port 25228 during container startup
- Add integration for collecting Subnets IP usage metrics for Azure CNI (turned OFF by default)
- Replicaset Agent improvements related to supporting of 5K Node cluster scale
- Common (Linux & Windows Agent)
- Make custom metrics endpoint configurable to support edge environments
- Misc
- Moved Trivy image scan to Azure Pipeline
Version microsoft/oms:ciprod05192022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod05192022 (linux)
Version microsoft/oms:win-ciprod05192022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod05192022 (windows)
- Linux Agent
- PodReadyPercentage metric bug fix
- add cifs & fuse file systems to ignore list
- CA Cert Fix for Mariner Hosts in Air Gap
- Disk usage metrics will no longer be collected for the paths "/mnt/containers" and "/mnt/docker"
- Windows Agent
- Ruby version upgrade from 2.6.5.1 to 2.7.5.1
- Added Support for Windows Server 2022
- Multi-Arch Image to support both Windows 2019 and Windows 2022
- Common (Linux & Windows Agent)
- Telegraf version update from 1.20.3 to 1.22.2 to fix the vulnerabilitis
- Removal of Health feature as part of deprecation plan
- AAD Auth MSI feature support for Arc K8s (not usable externally yet)
- MSI onboarding ARM template updates for both AKS & Arc K8s
- Fixed the bug related to windows metrics in MSI mode for AKS
- Configmap updates for log collection settings for v2 schema
- Misc
- Improvements related to CI/CD Multi-arc image
- Do trivy rootfs checks
- Disable push to ACR for PR and PR updates
- Enable batch builds
- Scope Dev/Prod pipelines to respective branches
- Shorten datetime component of image tag
- Troubleshooting script updates for MSI onboarding
- Instructions for testing of agent in MSI auth mode
- Add CI Windows Build to MultiArch Dev pipeline
- Updates related to building of Multi-arc image for windows in Build Pipeline and local dev builds
- Test yamls to test container logs and prometheus scraping on both WS2019 & WS2022
- Arc K8s conformance test updates
- Script to collect the Agent logs for troubleshooting
- Force run trivy stage for Linux
- Fix docker msi download link in windows install-build-pre-requisites.ps1 script
- Added Onboarding templates for legacy auth for internal testing
- Update the Build pipelines to have separate phase for Windows
- Improvements related to CI/CD Multi-arc image
Version microsoft/oms:ciprod03172022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod03172022 (linux)
Version microsoft/oms:win-ciprod03172022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod03172022 (windows)
- Linux Agent
- Multi-Arch Image to support both AMD64 and ARM64
- Ruby upgraded to version 2.7 from 2.6
- Fix Telegraf Permissions
- Fix ADX bug with database name
- Vulnerability fixes
- MDSD updated to 1.17.0
- HTTP Proxy support
- Retries for Log Analytics Ingestion
- ARM64 support
- Memory leak fixes for network failure scenario
- Windows Agent
- Bug fix for FluentBit stdout and stderr log filtering
- Common
- Upgrade Go lang version from 1.14.1 to 1.15.14
- MSI onboarding ARM template update
- AKS HTTP Proxy support
- Go packages upgrade to address vulnerabilities
Version microsoft/oms:ciprod01312022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod01312022 (linux)
Version microsoft/oms:win-ciprod01312022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod01312022 (windows)
- Linux Agent
- Configurable DB name via configmap for ADX (default DB name:containerinsights)
- Default to cAdvisor port to 10250 and container runtime to Containerd
- Update AgentVersion annotation in yamls (omsagent and chart) with released MDSD agent version
- Incresing windows agent CPU limits from 200m to 500m
- Ignore new disk path that comes from containerd starting with k8s version >= 1.19.x, which was adding unnecessary InsightsMetrics logs and increasing cost
- Route the AI SDK logs to log file instead of stdout
- Telemetry to collect ContainerLog Records with empty Timestamp
- FluentBit version upgrade from 1.6.8 to 1.7.8
- Windows Agent
- Update to use FluentBit for container log collection and removed FluentD dependency for container log collection
- Telemetry to track if any of the variable fields of windows container inventory records has field size >= 64KB
- Add windows os check in in_cadvisor_perf plugin to avoid making call in MDSD in MSI auth mode
- Bug fix for placeholder_hostname in telegraf metrics
- FluentBit version upgrade from 1.4.0 to 1.7.8
- Common
- Upgrade FluentD gem version from 1.12.2 to 1.14.2
- Upgrade Telegraf version from 1.18.0 to 1.20.3
- Fix for exception in node allocatable
- Telemetry to track nodeCount & containerCount
- Other changes
- Updates to Arc K8s Extension ARM Onboarding templates with GA API version
- Added ARM Templates for MSI Based Onboarding for AKS
- Conformance test updates relates to sidecar container
- Troubelshooting script to detect issues related to Arc K8s Extension onboarding
- Remove the dependency SP for CDPX since configured to use MSI
- Linux Agent Image build improvements
- Update msys2 version to fix windows agent build
- Add explicit exit code 1 across all the PS scripts
Version microsoft/oms:ciprod10132021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod10132021 (linux)
Version microsoft/oms:win-ciprod10132021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod10132021 (windows)
- Linux Agent
- MDSD Proxy support for non-AKS
- log rotation for mdsd log files {err,warn, info & qos}
- Onboarding status
- AAD Auth MSI changes (not usable externally yet)
- Upgrade k8s and adx go packages to fix vulnerabilities
- Fix missing telegraf metrics (TelegrafMetricsSentCount & TelegrafMetricsSendErrorCount) in mdsd route
- Improve fluentd liveness probe checks to handle both supervisor and worker process
- Fix telegraf startup issue when endpoint is unreachable
- Windows Agent
- Windows liveness probe optimization
- Common
- Add new metrics to MDM for allocatable % calculation of cpu and memory usage
- Other changes
- Helm chart updates for removal of rbac api version and deprecation of.Capabilities.KubeVersion.GitVersion to .Capabilities.KubeVersion.Version
- Updates to build and release ev2
- Scripts to collect troubleshooting logs
- Unit test tooling
- Yaml updates in parity with aks rp yaml
- upgrade golang version for windows in pipelines
- Conformance test updates
Version microsoft/oms:ciprod08052021-1 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod08052021-1 (linux)
- Bumping image tag for some tooling (no code changes except the IMAGE_TAG environment variable)
Version microsoft/oms:ciprod08052021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod08052021 (linux)
- Linux Agent
- Fix for CPU spike which occurrs at around 6.30am UTC on every day because of unattended package upgrades
- Update MDSD build which has fixes for the following issues
- Undeterministic Core dump issue because of the non 200 status code and runtime exception stack unwindings
- Reduce the verbosity of the error logs for OMS & ODS code paths.
- Increase Timeout for OMS Homing service API calls from 30s to 60s
- Fix for Azure/AKS#2457
- In replicaset, tailing of the mdsd.err log file to agent telemetry
Version microsoft/oms:win-ciprod06112021-2 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod06112021-2 (windows)
- Hotfix for fixing NODE_IP environment variable not set issue for non sidecar mode
Version microsoft/oms:ciprod06112021-1 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod06112021-1 (linux)
Version microsoft/oms:win-ciprod06112021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod06112021 (windows)
- Hotfix for crash in clean_cache in in_kube_node_inventory plugin
- We didn't rebuild windows container, so the image version for windows container stays the same as last release (ciprod:win-ciprod06112021) before this hotfix
Version microsoft/oms:ciprod06112021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod06112021 (linux)
Version microsoft/oms:win-ciprod06112021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod06112021 (windows)
- Linux Agent
- Removal of base omsagent dependency
- Using MDSD version 1.10.1 as base agent for all the supported LA data types
- Ruby version upgrade to 2.6 i.e. same version as windows agent
- Upgrade FluentD gem version to 1.12.2
- All the Ruby Fluentd Plugins upgraded to v1 as per Fluentd guidance
- Fluent-bit tail plugin Mem_Buf_limit is configurable via ConfigMap
- Windows Agent
- CA cert changes for airgapped clouds
- Send perf metrics to MDM from windows daemonset
- FluentD gem version upgrade from 1.10.2 to 1.12.2 to make same version as Linux Agent
- Doc updates
- README updates related to OSM preview release for Arc K8s
- README updates related to recommended alerts
Version microsoft/oms:ciprod05202021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod05202021 (linux)
- Telegraf now waits 30 seconds on startup for network connections to complete (Linux only)
- Change adding telegraf to the liveness probe reverted (Linux only)
Version microsoft/oms:ciprod05122021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod05122021 (linux)
- Upgrading oneagent to version 1.8 (only for Linux)
- Enabling oneagent for container logs for East US 2
Version microsoft/oms:ciprod04222021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod04222021 (linux)
Version microsoft/oms:win-ciprod04222021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod04222021 (windows)
- Bug fixes for metrics cpuUsagePercentage and memoryWorkingSetPercentage for windows nodes
- Added metrics for threshold violation
- Made Job completion metric configurable
- Udated default buffer sizes in fluent-bit
- Updated recommended alerts
- Fixed bug where logs written before agent starts up were not collected
- Fixed bug which kept agent logs from being rotated
- Bug fix for Windows Containerd container log collection
- Bug fixes
- Doc updates
- Minor telemetry changes
Version microsoft/oms:ciprod03262021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod03262021 (linux)
Version microsoft/oms:win-ciprod03262021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod03262021 (windows)
- Started collecting new metric - kubelet running pods count
- Onboarding script fixes to add explicit json output
- Proxy and token updates for ARC
- Doc updates for Microsoft charts repo release
- Bug fixes for trailing whitespaces in enable-monitoring.sh script
- Support for higher volume of prometheus metrics by scraping metrics from sidecar
- Update to get new version of telegraf - 1.18
- Add label and field selectors for prometheus scraping using annotations
- Support for OSM integration
- Removed wireserver calls to get CA certs since access is removed
- Added liveness timeout for exec for linux containers
Version microsoft/oms:ciprod02232021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod02232021 (linux)
Version microsoft/oms:win-ciprod02232021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod02232021 (windows)
- ContainerLogV2 schema support for LogAnalytics & ADX (not usable externally yet)
- Fix nodemetrics (cpuusageprecentage & memoryusagepercentage) metrics not flowing. This is fixed upstream for k8s versions >= 1.19.7 and >=1.20.2.
- Fix cpu & memory usage exceeded threshold container metrics not flowing when requests and/or limits were not set
- Mute some unused exceptions from going to telemetry
- Collect containerimage (repository, image & imagetag) from spec (instead of runtime)
- Add support for extension MSI for k8s arc
- Use cloud specific instrumentation keys for telemetry
- Picked up newer version for apt
- Add priority class to daemonset (in our chart only)
Version microsoft/oms:ciprod01112021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod01112021 (linux)
Version microsoft/oms:win-ciprod01112021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod01112021 (windows)
- Fixes for Linux Agent Replicaset Pod OOMing issue
- Update fluentbit (1.14.2 to 1.6.8) for the Linux Daemonset
- Make Fluentbit settings: log_flush_interval_secs, tail_buf_chunksize_megabytes and tail_buf_maxsize_megabytes configurable via configmap
- Support for PV inventory collection
- Removal of Custom metric region check for Public cloud regions and update to use cloud environment variable to determine the custom metric support
- For daemonset pods, add the dnsconfig to use ndots: 3 from ndots:5 to optimize the number of DNS API calls made
- Fix for inconsistency in the collection container environment variables for the pods which has high number of containers
- Fix for disabling of std{out;err} log_collection_settings via configmap issue in windows daemonset
- Update to use workspace key from mount file rather than environment variable for windows daemonset agent
- Remove per container info logs in the container inventory
- Enable ADX route for windows container logs
- Remove logging to termination log in windows agent liveness probe
- Fix for duplicate windows metrics
Version microsoft/oms:ciprod10272020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod10272020 (linux)
Version microsoft/oms:win-ciprod10272020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod10272020 (windows)
- Activate oneagent in few AKS regions (koreacentral,norwayeast)
- Disable syslog
- Fix timeout for Windows daemonset liveness probe
- Make request == limit for Windows daemonset resources (cpu & memory)
- Schema v2 for container log (ADX only - applicable only for select customers for piloting)
Version microsoft/oms:ciprod10052020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod10052020 (linux)
Version microsoft/oms:win-ciprod10052020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod10052020 (windows)
- Health CRD to version v1 (from v1beta1) for k8s versions >= 1.19.0
- Collection of PV usage metrics for PVs mounted by pods (kube-system pods excluded by default)(doc-link-needed)
- Zero fill few custom metrics under a timer, also add zero filling for new PV usage metrics
- Collection of additional Kubelet metrics ('kubelet_running_pod_count','volume_manager_total_volumes','kubelet_node_config_error','process_resident_memory_bytes','process_cpu_seconds_total','kubelet_runtime_operations_total','kubelet_runtime_operations_errors_total'). This also includes updates to 'kubelet' workbook to include these new metrics
- Collection of Azure NPM (Network Policy Manager) metrics (basic & advanced. By default, NPM metrics collection is turned OFF)(doc-link-needed)
- Support log collection when docker root is changed with knode. Tracked by this issue
- Support for Pods in 'Terminating' state for nodelost scenarios
- Fix for reduction in telemetry for custom metrics ingestion failures
- Fix CPU capacity/limits metrics being 0 for Virtual nodes (VK)
- Add new custom metric regions (eastus2,westus,australiasoutheast,brazilsouth,germanywestcentral,northcentralus,switzerlandnorth)
- Enable strict SSL validation for AppInsights Ruby SDK
- Turn off custom metrics upload for unsupported cluster types
- Install CA certs from wire server for windows (in certain clouds)
Note: This agent release targetted ONLY for non-AKS clusters via Azure Monitor for containers HELM chart update
Version microsoft/oms:ciprod09162020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod09162020 (linux)
Version microsoft/oms:win-ciprod09162020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod09162020 (windows)
- Collection of Azure Network Policy Manager Basic and Advanced metrics
- Add support in Windows Agent for Container log collection of CRI runtimes such as ContainerD
- Alertable metrics support Arc K8s cluster to parity with AKS
- Support for multiple container log mount paths when docker is updated through knode
- Bug fix related to MDM telemetry
Version microsoft/oms:ciprod08072020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod08072020 (linux)
Version microsoft/oms:win-ciprod08072020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod08072020 (windows)
- Collection of KubeState metrics for deployments and HPA
- Add the Proxy support for Windows agent
- Fix for ContainerState in ContainerInventory to handle Failed state and collection of environment variables for terminated and failed containers
- Change /spec to /metrics/cadvisor endpoint to collect node capacity metrics
- Disable Health Plugin by default and can enabled via configmap
- Pin version of jq to 1.5+dfsg-2
- Bug fix for showing node as 'not ready' when there is disk pressure
- oneagent integration (disabled by default)
- Add region check before sending alertable metrics to MDM
- Telemetry fix for agent telemetry for sov. clouds
Version microsoft/oms:ciprod07152020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod07152020 (linux)
Version microsoft/oms:win-ciprod05262020-2 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod05262020-2 (windows)
- Following hotfixes which are applicable only for Linux agent
- Fix the issue related to collection of multi-containers in pod for the ContainerInventory table
- Fix the containerhostname field value to have podname rather than nodename in ContainerInventory table
- Fix OOM issue during container startup if there are high number of pods or containers on the node
- Fix the ContainerName field value same as before in ContainerInventory table
- We didn't rebuild windows container, so the image version for windows container stays the same as last release (ciprod:win-ciprod05262020-2) before this hotfix
Version microsoft/oms:ciprod06302020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod06302020 (linux)
Version microsoft/oms:win-ciprod05262020-2 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod05262020-2 (windows)
- Hotfix for nested JSON log parsing bug (applicable only to Linux Daemonset)
- We didn't rebuild windows container, so the image version for windows container stays the same as last release (ciprod:win-ciprod05262020-2) before this hotfix
Version microsoft/oms:win-ciprod05262020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod05262020-2 (windows)
- Update application insights instrumentation key for windows image to point to the production instance
Version microsoft/oms:ciprod05222020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod05222020 (linux)
Version microsoft/oms:win-ciprod05222020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod05222020 (windows)
- Windows Daemonset - Collection of Windows std/stderr logs
- More Alerable Metrics (going to Metrics Store/custom metrics - see Customer Impact section below for metrics list)
- Fix OOM-ing at high prometheus scrape volume
- Update fluentbit (0.14.4 to 1.4.2)
- Drop non-numeric metrics thru Telegraf
- Reduce Health exception (when API server response is nil)
- Add 'Computer' dimension to all telemetry (internal use)
- Support for specifiying HTTP & HTTPS Proxy for outbound/egress (applicable only for non-AKS clusters)
- Move to rbac.authorization.k8s.io/v1 for ClusterRole & ClusterRoleBinding
- Move to apiextensions.k8s.io/v1 for Health CRD
- Windows Logs - Customers will see agent automatically start collecting windows container STDOUT/STDERR logs sending them to same loganaytics workspace (containerlogs table)
- Alertable metrics - Customers will see the below metrics & namespaces in 'Metrics' TOC for AKS clusters
- Metrics
- diskUsagePercentage
- completedJobsCount
- oomKilledContainerCount
- podReadyPercentage
- restartingContainerCount
- cpuExceededPercentage
- memoryRssExceededPercentage
- memoryWorkingSetExceededPercentage
- Metric Namespaces
- insights.container/containers
- Metrics
- HTTP/S Proxy support - For non-AKS clusters, proxy can be configured when installing thru HELM. Please see documentation for more details
Note: This agent release targetted ONLY for non-AKS clusters via Azure Monitor for containers HELM chart update
Version microsoft/oms:ciprod04162020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod04162020
- Add support for rate limiting
- Add support for Container Runtime Interface compatible container runtime(s) like CRI-O and ContainerD
- cAdvisor APIs are used to collect the container inventory for Docker/Moby and CRI runtime K8s environments
- Based on the container runtime, corresponding container log FluentBit parser(docker/cri) selected
- Ingestion will throttle the workspaces if the agent on the cluster sending the beyond Log Analytics Workspace throttling limits i.e. 500 MB/s
- On Docker runtime environments, Inventory of the containers obtained earlier via Docker REST API. Agent now uses the cAdvisor APIs to get the inventory of the containers for Docker and non-Docker container runtime environments.
Version microsoft/oms:ciprod03022020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod03022020
- Collection of GPU metrics as InsightsMetrics
- Enable config map settings to enable collection of 'Normal' kube events
- Fix kubehealth exceptions to handle empty/nil kube api responses
- Get resource limits for health and MDM from kubelet instead of kube api
- Bug fix for windows node image collection where image name contains multiple slashes
- Exclude ARO master node for data collection
- Telemetry for kube events flushed
- Changes to support msi for mdm if service principal doesnt exist
- Changes for AKS telemetry to ping ods endpoint first and then network check
- KubeEvents bug fix for KubeEvent type
- Providing capability for customers to collect 'Normal' kube events using config map
- Metrics for GPU are collected and ingested to customers workspace if they have GPU enabled nodes
- Bug fix for windows container image collection allows customers to get the right data in the ContainerInventory table for windows containers.
Version microsoft/oms:ciprod01072020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod01072020
- Switch between 10255(old) and 10250(new) ports for cadvisor for older and newer versions of kubernetes
- Node cpu, node memory, container cpu and container memory metrics were obtained earlier by querying kubelet readonly port(http://$NODE_IP:10255). Agent now supports getting these metrics from kubelet port(https://$NODE_IP:10250) as well. During the agent startup, it checks for connectivity to kubelet port(https://$NODE_IP:10250), and if it fails the metrics source is defaulted to readonly port(http://$NODE_IP:10255).
Version microsoft/oms:ciprod12042019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod12042019
- Fix scheduler for all input plugins
- Fix liveness probe
- Reduce chunk sizes for all fluentD buffers to support larger clusters (nodes & pods)
- Chunk Kubernetes API calls (pods,nodes,events)
- Use HTTP.start instead of HTTP.new
- Merge KubePerf into KubePods & KubeNodes
- Merge KubeServices into KubePod
- Use stream based yajl for JSON parsing
- Health - Query only kube-system pods
- Health - Use keep_if instead of select
- Container log enrichment (turned OFF by default for ContainerName & ContainerImage)
- Application Insights Telemetry - Async
- Fix metricTime to be batch time for all metric input plugins
- Close socket connections properly for DockerAPIClient
- Fix top un handled exceptions in Kubernetes API Client and pod inventory
- Fix retries, wait between retries, chunk size, thread counts to be consistent for all FluentD workflows
- Back-off for containerlog enrichment K8S API calls
- Add new regions (3) for Azure Monitor Custom metrics
- Increase the cpu(1 core) & memory(750Mi) limits for replica-set to support larger clusters (nodes & pods)
- Move to Ubuntu 18.04 LTS
- Support for Kubernetes 1.16
- Use ifconfig for detecting network connectivity issues
- Collect eventType != Normal
Version microsoft/oms:ciprod10112019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod10112019
- Update prometheus config scraping capability to restrict collecting metrics from pods in specific namespaces.
- Feature to send custom configuration/prometheus scrape errors to KubeMonAgentEvents table in customer's workspace.
- Bug fix to collect data for init containers for Container Logs, KubePodInventory and Perf.
- Bug fix for empty array being a valid setting in custom config in configmap.
- Restrict kubelet_docker_operations and kubelet_docker_operations_errors to create_containers, remove_containers and pull_image operations.
- Fix top exceptions in telemetry
Version microsoft/oms:ciprod08222019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod08222019
- Cluster Health Private Preview based on config map setting
- Update resource requests for replicaset to 110m and 250Mi
- Update custom metrics supported regions
- Fix for promethus config map telemetry
- Telemetry for controller kind
- Update url to use one of the whitelisted urls for cp monitor telemetry
- Configmap with clusterid for AKS to be used by Application Insights
Version microsoft/oms:ciprod07092019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod07092019
- Prometheus custom metric collection using config map allowing omsagent to
- Scrape metrics from user defined urls
- Scrape kubernetes pods with prometheus annotations
- Scrape metrics from kubernetes services
- Exception fixes in daemonset and replicaset
- Container Inventory plugin changes to get image id from the repo digest and populate repository for image with only image digest
- Remove telegraf errors from being sent to ApplicationInsights and instead log it to stderr to provide visibility for customers
- Bug fixes for region names with spaces being processed incorrectly while sending mdm metrics
- Add log size in telemetry
- Remove buffer chunk size and buffer max size from fluentbit configuration
Version microsoft/oms:ciprod06142019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod06142019
- MDM pod metrics bug fixes - MDM rejecting pod metrics due to nodename or controllername dimensions being empty
- Prometheus metrics collection by default in every node for kubelet docker operations and kubelet docker operation errors
- Telegraf metric collection for diskio and networkio metrics
- Agent Configuration/ Settings for data collection
- Cluster level log collection enable/disable option
- Ability to enable/disable stdout and/or stderr logs collection per namespace
- Cluster level environment variable collection enable/disable option
- Config file version & config schema version
- Pod annotation for supported config schema version(s)
- Log collection optimization/tuning for better performance
- Derive k8s namespaces from log file name (instead of making call to k8s api service)
- Do not tail log files for containers in the excluded namespace list (if excluded both in stdout & stderr)
- Limit buffer size to 1M and flush logs more frequently [every 10 secs (instead of 30 secs)]
- Tuning of several other fluent bit settings
- Increase requests
- Replica set memory request by 75M (100M to 175M)
- Daemonset CPU request by 25m (50m to 75m)
- Will be pushing image only to MCR ( no more Docker) starting this release. AKS-engine will also start to pull our agent image from MCR
Version microsoft/oms:ciprod043232019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod04232019
- Windows node monitoring (metrics & inventory)
- Telegraf integration (Telegraf metrics to LogAnalytics)
- Node Disk usage metrics (used, free, used%) as InsightsMetrics
- Resource stamping for all types (inventory, metrics (perf), metrics (InsightsMetrics), logs) [Applicable only for AKS clusters]
- Upped daemonset memory request (not limit) from 150Mi to 225 Mi
- Added liveness probe for fluentbit
- Fix for MDM filter plugin when kubeapi returns non-200 response
- Fix for closing response.Body in outoms
- Update Mem_Buf_Limit to 5m for fluentbit
- Tail only files that were modified since 5 minutes
- Remove some unwanted logs that are chatty in outoms
- Fix for MDM disablement for AKS-Engine
- Fix for Pod count metric (same as container count) in MDM
- Container logs enrichment optimization
- Get container meta data only for containers in current node (vs cluster before)
- Update fluent bit 0.13.7 => 0.14.4
- This fixes the escaping issue in the container logs
- Mooncake cloud support for agent (AKS only)
- Ability to disable agent telemetry
- Ability to onboard and ingest to mooncake cloud
- Add & populate 'ContainerStatusReason' column to KubePodInventory
- Alertable (custom) metrics (to AzureMonitor - only for AKS clusters)
- Cpuusagenanocores & % metric
- MemoryWorkingsetBytes & % metric
- MemoryRssBytes & % metric
- Podcount by node, phase & namespace metric
- Nodecount metric
- ContainerNodeInventory_CL to fixed type
- Omsagent - 1.8.1.256 (nov 2018 release)
- Persist fluentbit state between container restarts
- Populate 'TimeOfCommand' for agent ingest time for container logs
- Get node cpu usage from cpuusagenanoseconds (and convert to cpuusgaenanocores)
- Container Node Inventory - move to fluentD from OMI
- Mount docker.sock (Daemon set) as /var/run/host
- Add omsagent user to docker group
- Move to fixed type for kubeevents & kubeservices
- Disable collecting ENV for our oms agent container (daemonset & replicaset)
- Disable container inventory collection for 'sandbox' containers & non kubernetes managed containers
- Agent telemetry - ContainerLogsAgentSideLatencyMs
- Agent telemetry - PodCount
- Agent telemetry - ControllerCount
- Agent telemetry - K8S Version
- Agent telemetry - NodeCoreCapacity
- Agent telemetry - NodeMemoryCapacity
- Agent telemetry - KubeEvents (exceptions)
- Agent telemetry - Kubenodes (exceptions)
- Agent telemetry - kubepods (exceptions)
- Agent telemetry - kubeservices (exceptions)
- Agent telemetry - Daemonset , Replicaset as dimensions (bug fix)
- Disable Container Image inventory workflow
- Kube_Events memory leak fix for replica-set
- Timeout (30 secs) for outOMS
- Reduce critical lock duration for quicker log processing (for log enrichment)
- Disable OMI based Container Inventory workflow to fluentD based Container Inventory
- Moby support for the new Container Inventory workflow
- Ability to disable environment variables collection by individual container
- Bugfix - No inventory data due to container status(es) not available
- Agent telemetry cpu usage & memory usage (for DaemonSet and ReplicaSet)
- Agent telemetry - log generation rate
- Agent telemetry - container count per node
- Agent telemetry - collect container logs from agent (DaemonSet and ReplicaSet) as AI trace
- Agent telemetry - errors/exceptions for Container Inventory workflow
- Agent telemetry - Container Inventory Heartbeat
- Fix for containerID being 00000-00000-00000
- Move from fluentD to fluentbit for container log collection
- Seg fault fixes in json parsing for container inventory & container image inventory
- Telemetry enablement
- Remove ContainerPerf, ContainerServiceLog, ContainerProcess fluentd-->OMI workflows
- Update log level for all fluentD based workflows
- Changes for node lost scenario (roll-up pod & container statuses as Unknown)
- Discover unscheduled pods
- KubeNodeInventory - delimit multiple true node conditions for node status
- UTF Encoding support for container logs
- Container environment variable truncated to 200K
- Handle json parsing errors for OMI provider for docker
- Test mode enablement for ACS-engine testing
- Latest OMS agent (1.6.0-163)
- Latest OMI (1.4.2.5)
- Remove node-0 dependency
- Remove passing WSID & Key as environment variables and pass them as kubernetes secret (for non-AKS; we already pass them as secret for AKS)
- Please note that if you are manually deploying thru yaml you need to -
- Provide workspaceid & key as base64 encoded strings with in double quotes (.yaml has comments to do so as well)
- Provide cluster name twice (for each container – daemonset & replicaset)
- Kubernetes RBAC enablement
- Latest released omsagent (1.6.0-42)
- Bug fix so that we do not collect kube-system namespace container logs when kube api calls fail occasionally (Bug #215107)
- .yaml changes (for RBAC)