Skip to content

Files

livy

Apache Livy Initialization Action

This initialization action installs Apache Livy on a master node within a Google Cloud Dataproc cluster.

Using this initialization action

⚠️ NOTICE: See best practices of using initialization actions in production.

You can use this initialization action to create a new Dataproc cluster with Livy installed:

  1. Use the gcloud command to create a new cluster with this initialization action.

    REGION=<region>
    CLUSTER_NAME=<cluster_name>
    gcloud dataproc clusters create ${CLUSTER_NAME} \
        --region ${REGION} \
        --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/livy/livy.sh
  2. To change installed Livy version, use livy-version metadata value:

    REGION=<region>
    CLUSTER_NAME=<cluster_name>
    gcloud dataproc clusters create ${CLUSTER_NAME} \
        --region ${REGION} \
        --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/livy/livy.sh \
        --metadata livy-version=0.7.0
  3. To change version of scala against which livy is linked, use scala-version metadata value:

    REGION=<region>
    CLUSTER_NAME=<cluster_name>
    gcloud dataproc clusters create ${CLUSTER_NAME} \
        --region ${REGION} \
        --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/livy/livy.sh \
        --metadata scala-version=2.12
  4. To change timeout for Livy session, use livy-timeout-session metadata value:

    REGION=<region>
    CLUSTER_NAME=<cluster_name>
    gcloud dataproc clusters create ${CLUSTER_NAME} \
        --region ${REGION} \
        --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/livy/livy.sh \
        --metadata livy-timeout-session='3h'
  5. Once the cluster has been created, Livy is configured to run on port 8998 on the master node in a Dataproc cluster.

  6. To learn about how to use Livy read the documentation for the Rest API