Skip to content

Commit

Permalink
Add initial set of documentation to build the optimum-nvidia contai…
Browse files Browse the repository at this point in the history
…ner (#39)

* Add initial set of documentation

* Update docs/source/installation.mdx

Co-authored-by: Laikh Tewari <[email protected]>

* Update docs/source/installation.mdx

Co-authored-by: Laikh Tewari <[email protected]>

* Update docs/source/installation.mdx

Co-authored-by: Laikh Tewari <[email protected]>

* Update docs/source/installation.mdx

Co-authored-by: Laikh Tewari <[email protected]>

* Update docs/source/installation.mdx

Co-authored-by: Laikh Tewari <[email protected]>

* Update docs/source/installation.mdx

Co-authored-by: Laikh Tewari <[email protected]>

* Update docs/source/installation.mdx

Co-authored-by: Laikh Tewari <[email protected]>

* Update docs/source/installation.mdx

Co-authored-by: Laikh Tewari <[email protected]>

---------

Co-authored-by: Laikh Tewari <[email protected]>
  • Loading branch information
mfuntowicz and laikhtewari authored Dec 12, 2023
1 parent 9606a1d commit 919a1fa
Show file tree
Hide file tree
Showing 2 changed files with 83 additions and 0 deletions.
25 changes: 25 additions & 0 deletions docs/source/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<!---
Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# 🤗 Optimum Nvidia

🤗 Optimum Nvidia provides seamless integrating for [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) in the Hugging Face ecosystem.

While TensorRT-LLM provides the foundational blocks to ensure the greatest performances on NVIDIA GPUs, `optimum-nvidia` allows
to leverage the 🤗 to retrieve and load the weights directly inside TensorRT-LLM while maintaining a similar or identical API compared to `transformers` and others 🤗 libraries.

For NVIDIA Tensor Cores GPUs with `float8` hardware acceleration, `optimum-nvidia` allows to run all the necessary preprocessing steps required to target this datatype along with
deploying the necessary technical blocks to ensure developer experience is fast and smooth for these architectures.
58 changes: 58 additions & 0 deletions docs/source/installation.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Installation

There is currently no `pip` support for `optimum-nvidia` as transitive dependencies are missing on PyPI.
To get started with Optimum-NVIDIA, you can:
- Pull the pre-built docker container `huggingface/optimum-nvidia`
- Build the docker container locally


## Pulling prebuilt Docker Container

Hugging Face pushes and hosts versions of the container matching the release version of `optimum-nvidia` on the Docker hub.
This container comes with all the required dependencies to execute `float32`, `float16`, `int8` compressed or quantized and `float8` models.

To get started, simply pull the container with the following command:
```bash
docker run -it --gpus all --ipc host huggingface/optimum-nvidia
```

## Building Docker Container locally

If you want to build your own image and/or customize it, you can do so by using the three steps process described below:

1. Clone `optimum-nvidia` repository:
```bash
git clone --recursive --depth=1 https://github.com/huggingface/optimum-nvidia && cd optimum-nvidia
2. Build the `tensorrt_llm:latest` image from the NVIDIA TensorRT-LLM repository. If you cloned the `optimum-nvidia` from
the step above, you can use the following command (assuming you're at the root of `optimum-nvidia` repository:

```bash
cd third-party/tensorrt-llm && make -C docker release_build CUDA_ARCHS="<TARGET_SMs>"
```

Where `CUDA_ARCHS` is a comma-separated list of CUDA architectures you'd like to support.
For instance here are a few examples of TARGET_SM values:
- `90-real` : H100/H200
- `89-real` : L4/L40/L40s/RTX Ada/RTX 4090
- `86-real` : A10/A40/RTX Ax000
- `80-real` : A100/A30
- `75-real` : T4/RTX Quadro
- `70-real` : V100

3. Finally, let's build the `huggingface/optimum-nvidia` docker image on-top of the `tensorrt_llm` layer:

```bash
cd ../.. && docker build -t huggingface/optimum-nvidia -f docker/Dockerfile .
```

0 comments on commit 919a1fa

Please sign in to comment.