With Dataproc 2.2 image version, we recommend installing Google Cloud Ops Agent to obtain system metrics.
This initialization action will install the Ops Agent on a Google Cloud Dataproc cluster and provide similar metrics as the --metric-sources=monitoring-agent-defaults
setting which was supported until Dataproc 2.1.
This page highlights differences in metric collection between the Ops Agent and the legacy monitoring agent.
We provide two variants of this initialization action:
opsagent.sh
installs the Ops Agent. By default, it collects syslogs and system (node) metrics.opsagent_nosyslog.sh
installs the Ops Agent and also specifies a user configuration in order to skip syslogs collection from your cluster nodes. If the user configuration is not specified, Ops Agent will collect syslogs besides the system (node) metrics. You can further customize this configuration to collect logs and metrics from other third-party applications.
If you are looking to match the behavior of Dataproc image versions up to 2.1 with --metric-sources=monitoring-agent-defaults
, which did not ingest syslogs from Dataproc cluster nodes, please use opsagent_nosyslog.sh
.
REGION=<region>
CLUSTER_NAME=<cluster_name>
gcloud dataproc clusters create ${CLUSTER_NAME} \
--image-version=2.2 \
--region=${REGION} \
--initialization-actions=gs://goog-dataproc-initialization-actions-${REGION}/opsagent/opsagent_nosyslog.sh
REGION=<region>
CLUSTER_NAME=<cluster_name>
gcloud dataproc clusters create ${CLUSTER_NAME} \
--image-version=2.2 \
--region=${REGION} \
--initialization-actions=gs://goog-dataproc-initialization-actions-${REGION}/opsagent/opsagent.sh