simplify and update XGBoost benchmarking example

rapidsai · Jan 24, 2025 · 63516c1 · 63516c1
1 parent fb2bf07
commit 63516c1
Show file tree

Hide file tree

Showing 3 changed files with 50 additions and 51 deletions.
diff --git a/.gitignore b/.gitignore
@@ -3,11 +3,14 @@
 *.csv
 *.csv.gz
 *.csv.zip
+data/
 *.env
+*.parquet
 *.pem
 *.pub
 .ruff_cache
 *.tar.gz
+xgboost_hpo_logs.txt
 *.zip
 *.7z
 

diff --git a/source/examples/xgboost-rf-gpu-cpu-benchmark/Dockerfile b/source/examples/xgboost-rf-gpu-cpu-benchmark/Dockerfile
diff --git a/source/examples/xgboost-rf-gpu-cpu-benchmark/notebook.ipynb b/source/examples/xgboost-rf-gpu-cpu-benchmark/notebook.ipynb
@@ -206,44 +206,23 @@
    "id": "a89edfea-ca14-4d26-94c6-0ef8eaf02d77",
    "metadata": {},
    "source": [
-    "<span style=\"display: block; color:#8735fb; font-size:22pt\"> **Build RAPIDS Container** </span>\n",
+    "<span style=\"display: block; color:#8735fb; font-size:22pt\"> **Run HPO** </span>\n",
     "\n",
-    "Now that we have a fundamental understanding of our workflow process, we can test the code. First make sure you have the correct CUDAtoolkit version with```nvidia smi``` command.\n",
+    "Let's try this out!\n",
     "\n",
+    "The example file `hpo.py` included here implements the patterns described above.\n",
     "\n",
-    "Then starting with latest rapids docker image, we only need to install `optuna` as the container comes with most necessary packages."
+    "First make sure you have the correct CUDAtoolkit version by running `nvidia-smi`. See the RAPIDS installation docs ([link](https://docs.rapids.ai/install/#system-req)) for details on the supported range of GPUs and drivers."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "0edd2279-7d14-4300-8617-da6abd0fea81",
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5e4152ac-2712-4568-a807-8ef1267d8a2d",
    "metadata": {},
+   "outputs": [],
    "source": [
-    "```dockerfile\n",
-    "FROM rapidsai/rapidsai:23.06-cuda11.8-runtime-ubuntu22.04-py3.10\n",
-    "\n",
-    "RUN mamba install -y -n rapids optuna\n",
-    "\n",
-    "```"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a825e0de-a1a1-4c2b-82bb-b33dfc494fd1",
-   "metadata": {},
-   "source": [
-    "The build step will be dominated by the download of the RAPIDS image (base layer). If it's already been downloaded the build will take less than 1 minute."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b8fca11b-bca6-4e0a-bb91-1694ecaf12e1",
-   "metadata": {},
-   "source": [
-    "```\n",
-    "docker build -t rapids-tco-benchmark:v23.06 .\n",
-    "\n",
-    "```"
+    "!nvidia-smi"
    ]
   },
   {
@@ -269,45 +248,65 @@
    "id": "77df8ce3-39b8-41d9-a538-ae404be25b45",
    "metadata": {},
    "source": [
-    "When starting the container, be sure to expose the  `--gpus all` flag to make all available GPUs on the host machine accessible within the Docker environment. Use `-v` (or`--volume`) option to mount working dir from the host machine into the docker container. This enables data or directories on the host machine to be accessible within the container, and any changes made to the mounted files or directories will be reflected in both the host and container environments.\n",
+    "Run the following to run hyper-parameter optimization in a Docker container.\n",
+    "\n",
+    "If you don't yet have that image locally, the first time this runs it might take a few minutes to pull it.\n",
+    "After that, startup should be very fast.\n",
+    "\n",
+    "Here's what the arguments in that command below are doing:\n",
     "\n",
-    "Optional to expose jupyter via ports `8786-8888`."
+    "* `--gpus all` = make all GPUs on the system available to processes in the container\n",
+    "* `--env EXTRA_CONDA_PACKAGES` = install `optuna` and `optuna-integration` conda packages\n",
+    "  - *the image already comes with all of the RAPIDS libraries and their dependencies installed*\n",
+    "* `-p 8787:8787` = forward between port port 8787 on the host and 8787 on the container\n",
+    "  - *navigate to `{public IP of box}:8787 to see the Dask dashboard!*\n",
+    "* `-v / -w` = mount the current directory from the host machine into the container\n",
+    "  - *this allows processes in the container to read the data you downloaded to the `./data` directory earlier*\n",
+    "  - *it also means that any changes made to these files from inside the container will be reflected back on the host*\n",
+    "\n",
+    "Piping to a file called `xgboost_hpo_logs.txt` is helpful, as it preserves all the logs for later inspection."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "49617e76-c35a-4caa-a07a-6bfaefd22c4b",
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4fcf8b8f-63a9-4981-b714-9fcc77a7b6c8",
    "metadata": {},
+   "outputs": [],
    "source": [
-    "```\n",
-    "docker run -it --gpus all -p 8888:8888 -p 8787:8787 -p 8786:8786 -v \\\n",
-    "      /home/ec2-user/tco_hpo_gpu_cpu_perf_benchmark:/rapids/notebooks/host \\\n",
-    "      rapids-tco-benchmark:v23.06 \n",
-    "```"
+    "!docker run \\\n",
+    "    --gpus all \\\n",
+    "    --env EXTRA_CONDA_PACKAGES=\"optuna optuna-integration\" \\\n",
+    "    -p 8787:8787 \\\n",
+    "    -v $(pwd):/home/rapids/xgboost-hpo-example \\\n",
+    "    -w /home/rapids/xgboost-hpo-example \\\n",
+    "    -it {{ rapids_container }} \\\n",
+    "    /bin/bash -c \"python ./hpo.py --model-type 'XGBoost' --target 'gpu'\" \\\n",
+    "> ./xgboost_hpo_logs.txt 2>&1"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "55f881ee-ce7f-4810-a6b7-fa8aa72d91f3",
    "metadata": {},
    "source": [
-    "<span style=\"display: block; color:#8735fb; font-size:22pt\"> **Run HPO** </span>\n",
+    "<span style=\"display: block; color:#8735fb; font-size:22pt\"> **Try Some Modifications** </span>\n",
     "\n",
-    "Navigate to the host directory inside the container and run the python training script with the following command : \n",
+    "Now that you've run this example, try some modifications!\n",
     "\n",
-    "```\n",
-    "python ./hpo.py --model-type \"XGBoost\" --mode \"gpu\"  > xgboost_gpu.txt 2>&1\n",
-    "```\n",
+    "For example:\n",
     "\n",
-    "The code above will run XGBoost HPO jobs on the gpu and output the benchmark results to a text file. You can run the same for RandomForest by changing `--model type` and `--mode` args to `RandomForest` and `cpu` respectively."
+    "* use `--model-type \"RandomForest\"` to see how a random forest model compares to XGBoost\n",
+    "* use `--target \"cpu\"` to estimate the speedup from GPU-accelerated training\n",
+    "* modify the pipeline in `hpo.py` with other customizations"
    ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "rapids-23.06",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
-   "name": "rapids-23.06"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
@@ -319,7 +318,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.12"
+   "version": "3.12.8"
   }
  },
  "nbformat": 4,