From 919a1fac1283a437739ab6d60f785f2bfda222fd Mon Sep 17 00:00:00 2001 From: Funtowicz Morgan Date: Tue, 12 Dec 2023 22:23:30 +0100 Subject: [PATCH] Add initial set of documentation to build the `optimum-nvidia` container (#39) * Add initial set of documentation * Update docs/source/installation.mdx Co-authored-by: Laikh Tewari * Update docs/source/installation.mdx Co-authored-by: Laikh Tewari * Update docs/source/installation.mdx Co-authored-by: Laikh Tewari * Update docs/source/installation.mdx Co-authored-by: Laikh Tewari * Update docs/source/installation.mdx Co-authored-by: Laikh Tewari * Update docs/source/installation.mdx Co-authored-by: Laikh Tewari * Update docs/source/installation.mdx Co-authored-by: Laikh Tewari * Update docs/source/installation.mdx Co-authored-by: Laikh Tewari --------- Co-authored-by: Laikh Tewari --- docs/source/index.md | 25 ++++++++++++++++ docs/source/installation.mdx | 58 ++++++++++++++++++++++++++++++++++++ 2 files changed, 83 insertions(+) create mode 100644 docs/source/index.md create mode 100644 docs/source/installation.mdx diff --git a/docs/source/index.md b/docs/source/index.md new file mode 100644 index 00000000..e75c47bf --- /dev/null +++ b/docs/source/index.md @@ -0,0 +1,25 @@ + + +# 🤗 Optimum Nvidia + +🤗 Optimum Nvidia provides seamless integrating for [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) in the Hugging Face ecosystem. + +While TensorRT-LLM provides the foundational blocks to ensure the greatest performances on NVIDIA GPUs, `optimum-nvidia` allows +to leverage the 🤗 to retrieve and load the weights directly inside TensorRT-LLM while maintaining a similar or identical API compared to `transformers` and others 🤗 libraries. + +For NVIDIA Tensor Cores GPUs with `float8` hardware acceleration, `optimum-nvidia` allows to run all the necessary preprocessing steps required to target this datatype along with +deploying the necessary technical blocks to ensure developer experience is fast and smooth for these architectures. \ No newline at end of file diff --git a/docs/source/installation.mdx b/docs/source/installation.mdx new file mode 100644 index 00000000..c55ca249 --- /dev/null +++ b/docs/source/installation.mdx @@ -0,0 +1,58 @@ + + +# Installation + +There is currently no `pip` support for `optimum-nvidia` as transitive dependencies are missing on PyPI. +To get started with Optimum-NVIDIA, you can: +- Pull the pre-built docker container `huggingface/optimum-nvidia` +- Build the docker container locally + + +## Pulling prebuilt Docker Container + +Hugging Face pushes and hosts versions of the container matching the release version of `optimum-nvidia` on the Docker hub. +This container comes with all the required dependencies to execute `float32`, `float16`, `int8` compressed or quantized and `float8` models. + +To get started, simply pull the container with the following command: +```bash +docker run -it --gpus all --ipc host huggingface/optimum-nvidia +``` + +## Building Docker Container locally + +If you want to build your own image and/or customize it, you can do so by using the three steps process described below: + +1. Clone `optimum-nvidia` repository: +```bash +git clone --recursive --depth=1 https://github.com/huggingface/optimum-nvidia && cd optimum-nvidia +2. Build the `tensorrt_llm:latest` image from the NVIDIA TensorRT-LLM repository. If you cloned the `optimum-nvidia` from +the step above, you can use the following command (assuming you're at the root of `optimum-nvidia` repository: + +```bash +cd third-party/tensorrt-llm && make -C docker release_build CUDA_ARCHS="" +``` + +Where `CUDA_ARCHS` is a comma-separated list of CUDA architectures you'd like to support. +For instance here are a few examples of TARGET_SM values: +- `90-real` : H100/H200 +- `89-real` : L4/L40/L40s/RTX Ada/RTX 4090 +- `86-real` : A10/A40/RTX Ax000 +- `80-real` : A100/A30 +- `75-real` : T4/RTX Quadro +- `70-real` : V100 + +3. Finally, let's build the `huggingface/optimum-nvidia` docker image on-top of the `tensorrt_llm` layer: + +```bash +cd ../.. && docker build -t huggingface/optimum-nvidia -f docker/Dockerfile . +```