Skip to content

Commit

Permalink
MNMG cluster only
Browse files Browse the repository at this point in the history
  • Loading branch information
skirui-source committed Nov 1, 2023
1 parent c8ebb45 commit 2369ee8
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions source/platforms/databricks-dask.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## DASK Rapids in Databricks MNMG Cluster

You can launch Dask RAPIDS cluster on a single-node or multi-node GPU Databricks cluster
You can launch Dask RAPIDS cluster on a multi-node GPU Databricks cluster

```{warning}
It is also possible to use [Spark RAPIDS](https://docs.nvidia.com/spark-rapids/user-guide/latest/getting-started/databricks.html) with Dask on the same Databricks cluster. To do this, the user
Expand All @@ -13,7 +13,7 @@ must provide an init script that downloads the `rapids-4-spark-xxxx.jar`` plugin

Before creating the cluster, we will need to create an [initialization script](https://docs.databricks.com/en/init-scripts/index.html) to install Dask and the RAPIDS Accelerator for Apache Spark.

Databricks recommends storing all cluster-scoped init scripts using workspace files. Each user has a Home directory configured under the `/Users` directory in the workspace. Navigate to your home directory in the UI and select **Create** > **File** from the menu, create an `init.sh` scripts with contents:
Databricks recommends storing all cluster-scoped init scripts using workspace files. Each user has a Home directory configured under the `/Users` directory in the workspace. Navigate to your home directory in the UI and select **Create** > **File** from the menu, create an `init.sh` script with contents:

```python
#!/bin/bash
Expand Down Expand Up @@ -45,7 +45,7 @@ NOTE: The above script will be packaged as a library to be imported instead.

### Launch a Databricks cluster

Navigate to the **All Purpose Compute** tab of the **Compute** section in Databricks and select **Create Compute**. Name your cluster and choose "Multi node" or "Single node".
Navigate to the **All Purpose Compute** tab of the **Compute** section in Databricks and select **Create Compute**. Name your cluster and choose "Multi node".

![Screenshot of the Databricks compute page](../images/databricks-create-compute.png)

Expand All @@ -55,7 +55,7 @@ Optional to enable autoscale for worker nodes based on load.

![Screenshot of selecting a g4dn.xlarge node type](../images/databricks-ML-runtime.png)

Expand the **Advanced Options** section and open the **Init Scripts** tab and add the file path to the init script starting with `/Users/`.
Expand the **Advanced Options** section and open the **Init Scripts** tab and add the file path to the init script starting with `/Users`.

You can also configure cluster log delivery, which will write the init script logs to DBFS in a subdirectory called `dbfs:/cluster-logs/<cluster-id>/init_scripts/`. Refer to [docs](https://docs.databricks.com/en/init-scripts/logs.html) for more information.

Expand All @@ -72,4 +72,4 @@ import os
client = Client(f'{os.environ["SPARK_LOCAL_IP"]}:8786')
```

You can run also this workflow example (link) to get started on using Dask X Spark RAPIDS. Refer to blog (link) for more information.
You can run also this HPO workflow example (link) to get started on using Dask X Spark RAPIDS. Refer to blog (link) for more information.

0 comments on commit 2369ee8

Please sign in to comment.