Skip to content

Commit

Permalink
[Doc] Replace Head Node ServiceMonitor with PodMonitor (#49476)
Browse files Browse the repository at this point in the history
Since KubeRay has changed the collection of Head Node metrics from
`ServiceMonitor` to `PodMonitor`, this PR will update the Ray doc to
reflect the current usage.

Ref: ray-project/kuberay#2689

---------

Signed-off-by: win5923 <[email protected]>
Signed-off-by: Blocka <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: angelinalg <[email protected]>
  • Loading branch information
3 people authored Jan 7, 2025
1 parent abb11c3 commit af0e2df
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 36 deletions.
Binary file modified doc/source/cluster/kubernetes/images/prometheus_web_ui.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
92 changes: 56 additions & 36 deletions doc/source/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ kubectl get all -n prometheus-system
# deployment.apps/prometheus-kube-state-metrics 1/1 1 1 46s
```

* KubeRay provides an [install.sh script](https://github.com/ray-project/kuberay/blob/master/install/prometheus/install.sh) to install the [kube-prometheus-stack v48.2.1](https://github.com/prometheus-community/helm-charts/tree/kube-prometheus-stack-48.2.1/charts/kube-prometheus-stack) chart and related custom resources, including **ServiceMonitor**, **PodMonitor** and **PrometheusRule**, in the namespace `prometheus-system` automatically.
* KubeRay provides an [install.sh script](https://github.com/ray-project/kuberay/blob/master/install/prometheus/install.sh) to install the [kube-prometheus-stack v48.2.1](https://github.com/prometheus-community/helm-charts/tree/kube-prometheus-stack-48.2.1/charts/kube-prometheus-stack) chart and related custom resources, including **PodMonitor** and **PrometheusRule**, in the namespace `prometheus-system` automatically.

* We made some modifications to the original `values.yaml` in kube-prometheus-stack chart to allow embedding Grafana panels in Ray Dashboard. See [overrides.yaml](https://github.com/ray-project/kuberay/tree/master/install/prometheus/overrides.yaml) for more details.
```yaml
Expand Down Expand Up @@ -62,8 +62,8 @@ kubectl apply -f ray-cluster.embed-grafana.yaml
kubectl get pod -l ray.io/node-type=head

# Example output:
# NAME READY STATUS RESTARTS AGE
# raycluster-kuberay-head-btwc2 1/1 Running 0 63s
# NAME READY STATUS RESTARTS AGE
# raycluster-embed-grafana-head-98fqt 1/1 Running 0 11m

# Wait until all Ray Pods are running and forward the port of the Prometheus metrics endpoint in a new terminal.
kubectl port-forward ${RAYCLUSTER_HEAD_POD} 8080:8080
Expand All @@ -72,13 +72,15 @@ curl localhost:8080
# Example output (Prometheus metrics format):
# # HELP ray_spill_manager_request_total Number of {spill, restore} requests.
# # TYPE ray_spill_manager_request_total gauge
# ray_spill_manager_request_total{Component="raylet",NodeAddress="10.244.0.13",Type="Restored",Version="2.0.0"} 0.0
# ray_spill_manager_request_total{Component="raylet", NodeAddress="10.244.0.13", SessionName="session_2025-01-02_07-58-21_419367_11", Type="FailedDeletion", Version="2.9.0", container="ray-head", endpoint="metrics", instance="10.244.0.13:8080", job="prometheus-system/ray-head-monitor", namespace="default", pod="raycluster-embed-grafana-head-98fqt", ray_io_cluster="raycluster-embed-grafana"} 0

# Ensure that the port (8080) for the metrics endpoint is also defined in the head's Kubernetes service.
kubectl get service

# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# raycluster-kuberay-head-svc ClusterIP 10.96.201.142 <none> 6379/TCP,8265/TCP,8080/TCP,8000/TCP,10001/TCP 106m
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# kuberay-operator ClusterIP 10.96.137.190 <none> 8080/TCP 13m
# kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 14m
# raycluster-embed-grafana-head-svc ClusterIP None <none> 44217/TCP,10001/TCP,44227/TCP,8265/TCP,6379/TCP,8080/TCP 13m
```

* KubeRay exposes a Prometheus metrics endpoint in port **8080** via a built-in exporter by default. Hence, we do not need to install any external exporter.
Expand All @@ -103,37 +105,54 @@ kubectl get service
Because we forward the port of Grafana to `127.0.0.1:3000` in this example, we set `RAY_GRAFANA_IFRAME_HOST` to `http://127.0.0.1:3000`.
* `http://` is required.

## Step 5: Collect Head Node metrics with a ServiceMonitor
## Step 5: Collect Head Node metrics with a PodMonitor

RayService creates two Kubernetes services for the head Pod; one managed by the RayService and the other by the underlying RayCluster. Therefore, it's recommended to use a PodMonitor to monitor the metrics for head Pods to avoid misconfigurations that could result in double counting the same metrics when using a ServiceMonitor.

```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
kind: PodMonitor
metadata:
name: ray-head-monitor
namespace: prometheus-system
labels:
# `release: $HELM_RELEASE`: Prometheus can only detect ServiceMonitor with this label.
# `release: $HELM_RELEASE`: Prometheus can only detect PodMonitor with this label.
release: prometheus
name: ray-head-monitor
namespace: prometheus-system
spec:
jobLabel: ray-head
# Only select Kubernetes Services in the "default" namespace.
# Only select Kubernetes Pods in the "default" namespace.
namespaceSelector:
matchNames:
- default
# Only select Kubernetes Services with "matchLabels".
# Only select Kubernetes Pods with "matchLabels".
selector:
matchLabels:
ray.io/node-type: head
# A list of endpoints allowed as part of this ServiceMonitor.
endpoints:
# A list of endpoints allowed as part of this PodMonitor.
podMetricsEndpoints:
- port: metrics
targetLabels:
- ray.io/cluster
relabelings:
- action: replace
sourceLabels:
- __meta_kubernetes_pod_label_ray_io_cluster
targetLabel: ray_io_cluster
- port: as-metrics # autoscaler metrics
relabelings:
- action: replace
sourceLabels:
- __meta_kubernetes_pod_label_ray_io_cluster
targetLabel: ray_io_cluster
- port: dash-metrics # dashboard metrics
relabelings:
- action: replace
sourceLabels:
- __meta_kubernetes_pod_label_ray_io_cluster
targetLabel: ray_io_cluster
```
* The YAML example above is [serviceMonitor.yaml](https://github.com/ray-project/kuberay/blob/master/config/prometheus/serviceMonitor.yaml), and it is created by **install.sh**. Hence, no need to create anything here.
* See [ServiceMonitor official document](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#servicemonitor) for more details about the configurations.
* `release: $HELM_RELEASE`: Prometheus can only detect ServiceMonitor with this label.
* The **install.sh** script creates the above YAML example, [podMonitor.yaml](https://github.com/ray-project/kuberay/blob/master/config/prometheus/podMonitor.yaml#L26-L63) so you don't need to create anything.
* See the [PodMonitor official document](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#podmonitor) for more details about the configurations.
* `release: $HELM_RELEASE`: Prometheus can only detect PodMonitor with this label. See [here](#prometheus-can-only-detect-this-label) for more details.

(prometheus-can-only-detect-this-label)=
```sh
Expand All @@ -143,9 +162,6 @@ spec:
# prometheus prometheus-system 1 2023-02-06 06:27:05.530950815 +0000 UTC deployed kube-prometheus-stack-44.3.1 v0.62.0
kubectl get prometheuses.monitoring.coreos.com -n prometheus-system -oyaml
# serviceMonitorSelector:
# matchLabels:
# release: prometheus
# podMonitorSelector:
# matchLabels:
# release: prometheus
Expand All @@ -154,20 +170,26 @@ spec:
# release: prometheus
```

* `namespaceSelector` and `seletor` are used to select exporter's Kubernetes service. Because Ray uses a built-in exporter, the **ServiceMonitor** selects Ray's head service which exposes the metrics endpoint (i.e. port 8080 here).
* Prometheus uses `namespaceSelector` and `selector` to select Kubernetes Pods.
```sh
kubectl get service -n default -l ray.io/node-type=head
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# raycluster-kuberay-head-svc ClusterIP 10.96.201.142 <none> 6379/TCP,8265/TCP,8080/TCP,8000/TCP,10001/TCP 153m
kubectl get pod -n default -l ray.io/node-type=head
# NAME READY STATUS RESTARTS AGE
# raycluster-embed-grafana-head-khfs4 1/1 Running 0 4m38s
```

* `relabelings`: This configuration renames the label `__meta_kubernetes_pod_label_ray_io_cluster` to `ray_io_cluster` in the scraped metrics. It ensures that each metric includes the name of the RayCluster to which the Pod belongs. This configuration is especially useful for distinguishing metrics when deploying multiple RayClusters. For example, a metric with the `ray_io_cluster` label might look like this:

```
ray_node_cpu_count{SessionName="session_2025-01-02_07-58-21_419367_11", container="ray-head", endpoint="metrics", instance="10.244.0.13:8080", ip="10.244.0.13", job="raycluster-embed-grafana-head-svc", namespace="default", pod="raycluster-embed-grafana-head-98fqt", ray_io_cluster="raycluster-embed-grafana", service="raycluster-embed-grafana-head-svc"}
```
* `targetLabels`: We added `spec.targetLabels[0].ray.io/cluster` because we want to include the name of the RayCluster in the metrics that will be generated by this ServiceMonitor. The `ray.io/cluster` label is part of the Ray head node service and it will be transformed into a `ray_io_cluster` metric label. That is, any metric that will be imported, will also contain the following label `ray_io_cluster=<ray-cluster-name>`. This may seem optional but it becomes mandatory if you deploy multiple RayClusters.
In this example, `raycluster-embed-grafana` is the name of the RayCluster.
## Step 6: Collect Worker Node metrics with PodMonitors
KubeRay operator does not create a Kubernetes service for the Ray worker Pods, therefore we cannot use a Prometheus ServiceMonitor to scrape the metrics from the worker Pods. To collect worker metrics, we can use `Prometheus PodMonitors CRD` instead.
Similar to the head Pod, this tutorial also uses a PodMonitor to collect metrics from worker Pods. The reason for using separate PodMonitors for head Pods and worker Pods is that the head Pod exposes multiple metric endpoints, whereas a worker Pod exposes only one.
**Note**: We could create a Kubernetes service with selectors a common label subset from our worker pods, however, this is not ideal because our workers are independent from each other, that is, they are not a collection of replicas spawned by replicaset controller. Due to that, we should avoid using a Kubernetes service for grouping them together.
**Note**: You could create a Kubernetes service with selectors a common label subset from our worker pods, however, this configuration is not ideal because the workers are independent from each other, that is, they aren't a collection of replicas spawned by replicaset controller. Due to this behavior, avoid using a Kubernetes service for grouping them together.
```yaml
apiVersion: monitoring.coreos.com/v1
Expand All @@ -178,7 +200,6 @@ metadata:
labels:
# `release: $HELM_RELEASE`: Prometheus can only detect PodMonitor with this label.
release: prometheus
ray.io/cluster: raycluster-kuberay # $RAY_CLUSTER_NAME: "kubectl get rayclusters.ray.io"
spec:
jobLabel: ray-workers
# Only select Kubernetes Pods in the "default" namespace.
Expand All @@ -192,22 +213,21 @@ spec:
# A list of endpoints allowed as part of this PodMonitor.
podMetricsEndpoints:
- port: metrics
relabelings:
- sourceLabels: [__meta_kubernetes_pod_label_ray_io_cluster]
targetLabel: ray_io_cluster
```

* `release: $HELM_RELEASE`: Prometheus can only detect PodMonitor with this label. See [here](#prometheus-can-only-detect-this-label) for more details.

* **PodMonitor** in `namespaceSelector` and `selector` are used to select Kubernetes Pods.
```sh
kubectl get pod -n default -l ray.io/node-type=worker
# NAME READY STATUS RESTARTS AGE
# raycluster-kuberay-worker-workergroup-5stpm 1/1 Running 0 3h16m
```

* `ray.io/cluster: $RAY_CLUSTER_NAME`: We also define `metadata.labels` by manually adding `ray.io/cluster: <ray-cluster-name>` and then instructing the PodMonitors resource to add that label in the scraped metrics via `spec.podTargetLabels[0].ray.io/cluster`.

## Step 7: Collect custom metrics with Recording Rules

[Recording Rules](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) allow us to precompute frequently needed or computationally expensive [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/) expressions and save their result as custom metrics. Note this is different from [Custom Application-level Metrics](application-level-metrics) which aim for the visibility of ray applications.
[Recording Rules](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) allow KubeRay to precompute frequently needed or computationally expensive [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/) expressions and save their result as custom metrics. Note that this behavior is different from [Custom application-level metrics](application-level-metrics), which are for the visibility of Ray applications.

```yaml
apiVersion: monitoring.coreos.com/v1
Expand Down

0 comments on commit af0e2df

Please sign in to comment.