From 919a1fac1283a437739ab6d60f785f2bfda222fd Mon Sep 17 00:00:00 2001
From: Funtowicz Morgan <mfuntowicz@users.noreply.github.com>
Date: Tue, 12 Dec 2023 22:23:30 +0100
Subject: [PATCH] Add initial set of documentation to build the
 `optimum-nvidia` container (#39)

* Add initial set of documentation

* Update docs/source/installation.mdx

Co-authored-by: Laikh Tewari <ltewari@nvidia.com>

* Update docs/source/installation.mdx

Co-authored-by: Laikh Tewari <ltewari@nvidia.com>

* Update docs/source/installation.mdx

Co-authored-by: Laikh Tewari <ltewari@nvidia.com>

* Update docs/source/installation.mdx

Co-authored-by: Laikh Tewari <ltewari@nvidia.com>

* Update docs/source/installation.mdx

Co-authored-by: Laikh Tewari <ltewari@nvidia.com>

* Update docs/source/installation.mdx

Co-authored-by: Laikh Tewari <ltewari@nvidia.com>

* Update docs/source/installation.mdx

Co-authored-by: Laikh Tewari <ltewari@nvidia.com>

* Update docs/source/installation.mdx

Co-authored-by: Laikh Tewari <ltewari@nvidia.com>

---------

Co-authored-by: Laikh Tewari <ltewari@nvidia.com>
---
 docs/source/index.md         | 25 ++++++++++++++++
 docs/source/installation.mdx | 58 ++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)
 create mode 100644 docs/source/index.md
 create mode 100644 docs/source/installation.mdx

diff --git a/docs/source/index.md b/docs/source/index.md
new file mode 100644
index 00000000..e75c47bf
--- /dev/null
+++ b/docs/source/index.md
@@ -0,0 +1,25 @@
+<!---
+Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# 🤗 Optimum Nvidia
+
+🤗 Optimum Nvidia provides seamless integrating for [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) in the Hugging Face ecosystem.
+
+While TensorRT-LLM provides the foundational blocks to ensure the greatest performances on NVIDIA GPUs, `optimum-nvidia` allows
+to leverage the 🤗 to retrieve and load the weights directly inside TensorRT-LLM while maintaining a similar or identical API compared to `transformers` and others 🤗 libraries.
+
+For NVIDIA Tensor Cores GPUs with `float8` hardware acceleration, `optimum-nvidia` allows to run all the necessary preprocessing steps required to target this datatype along with 
+deploying the necessary technical blocks to ensure developer experience is fast and smooth for these architectures.
\ No newline at end of file
diff --git a/docs/source/installation.mdx b/docs/source/installation.mdx
new file mode 100644
index 00000000..c55ca249
--- /dev/null
+++ b/docs/source/installation.mdx
@@ -0,0 +1,58 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Installation
+
+There is currently no `pip` support for `optimum-nvidia` as transitive dependencies are missing on PyPI.
+To get started with Optimum-NVIDIA, you can:
+- Pull the pre-built docker container `huggingface/optimum-nvidia`
+- Build the docker container locally
+
+
+## Pulling prebuilt Docker Container
+
+Hugging Face pushes and hosts versions of the container matching the release version of `optimum-nvidia` on the Docker hub.
+This container comes with all the required dependencies to execute `float32`, `float16`, `int8` compressed or quantized and `float8` models.
+
+To get started, simply pull the container with the following command:
+```bash
+docker run -it --gpus all --ipc host huggingface/optimum-nvidia
+```
+
+## Building Docker Container locally
+
+If you want to build your own image and/or customize it, you can do so by using the three steps process described below:
+
+1. Clone `optimum-nvidia` repository:
+```bash
+git clone --recursive --depth=1 https://github.com/huggingface/optimum-nvidia && cd optimum-nvidia
+2. Build the `tensorrt_llm:latest` image from the NVIDIA TensorRT-LLM repository. If you cloned the `optimum-nvidia` from
+the step above, you can use the following command (assuming you're at the root of `optimum-nvidia` repository:
+
+```bash
+cd third-party/tensorrt-llm && make -C docker release_build CUDA_ARCHS="<TARGET_SMs>"
+```
+
+Where `CUDA_ARCHS` is a comma-separated list of CUDA architectures you'd like to support.
+For instance here are a few examples of TARGET_SM values:
+- `90-real` : H100/H200
+- `89-real` : L4/L40/L40s/RTX Ada/RTX 4090
+- `86-real` : A10/A40/RTX Ax000
+- `80-real` : A100/A30
+- `75-real` : T4/RTX Quadro
+- `70-real` : V100
+
+3. Finally, let's build the `huggingface/optimum-nvidia` docker image on-top of the `tensorrt_llm` layer:
+
+```bash
+cd ../.. && docker build -t huggingface/optimum-nvidia -f docker/Dockerfile .
+```