Update documentation for 0.15.0 release

NVIDIA · Jul 26, 2024 · 6de9560 · 6de9560
1 parent 822d7c6
commit 6de9560
Show file tree

Hide file tree

Showing 277 changed files with 11,577 additions and 2,729 deletions.
diff --git a/.buildinfo b/.buildinfo
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 89ada319c94fcb1610b7f80d777e8b12
+config: 0ea2334c76c1e774d577e20446a79224
 tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/.doctrees/deployment/1_tensorrt_llm_deployment.doctree b/.doctrees/deployment/1_tensorrt_llm_deployment.doctree
diff --git a/.doctrees/environment.pickle b/.doctrees/environment.pickle
diff --git a/.doctrees/examples/0_all_examples.doctree b/.doctrees/examples/0_all_examples.doctree
diff --git a/.doctrees/getting_started/1_overview.doctree b/.doctrees/getting_started/1_overview.doctree
diff --git a/.doctrees/getting_started/2_installation.doctree b/.doctrees/getting_started/2_installation.doctree
diff --git a/.doctrees/getting_started/3_quantization.doctree b/.doctrees/getting_started/3_quantization.doctree
diff --git a/.doctrees/getting_started/5_distillation.doctree b/.doctrees/getting_started/5_distillation.doctree
diff --git a/.doctrees/getting_started/6_sparsity.doctree b/.doctrees/getting_started/6_sparsity.doctree
diff --git a/.doctrees/guides/1_quantization.doctree b/.doctrees/guides/1_quantization.doctree
diff --git a/.doctrees/guides/4_distillation.doctree b/.doctrees/guides/4_distillation.doctree
diff --git a/.doctrees/guides/5_sparsity.doctree b/.doctrees/guides/5_sparsity.doctree
diff --git a/.doctrees/guides/_basic_quantization.doctree b/.doctrees/guides/_basic_quantization.doctree
diff --git a/.doctrees/guides/_onnx_quantization.doctree b/.doctrees/guides/_onnx_quantization.doctree
diff --git a/.doctrees/guides/_pytorch_quantization.doctree b/.doctrees/guides/_pytorch_quantization.doctree
diff --git a/.doctrees/index.doctree b/.doctrees/index.doctree
diff --git a/.doctrees/reference/0_versions.doctree b/.doctrees/reference/0_versions.doctree
diff --git a/.doctrees/reference/generated/modelopt.deploy.doctree b/.doctrees/reference/generated/modelopt.deploy.doctree
diff --git a/.doctrees/reference/generated/modelopt.deploy.llm.doctree b/.doctrees/reference/generated/modelopt.deploy.llm.doctree
diff --git a/.doctrees/reference/generated/modelopt.deploy.llm.generate.doctree b/.doctrees/reference/generated/modelopt.deploy.llm.generate.doctree
diff --git a/.doctrees/reference/generated/modelopt.deploy.llm.model_config_trt.doctree b/.doctrees/reference/generated/modelopt.deploy.llm.model_config_trt.doctree
diff --git a/.doctrees/reference/generated/modelopt.onnx.op_types.doctree b/.doctrees/reference/generated/modelopt.onnx.op_types.doctree
diff --git a/.doctrees/reference/generated/modelopt.onnx.quantization.calib_utils.doctree b/.doctrees/reference/generated/modelopt.onnx.quantization.calib_utils.doctree
diff --git a/.doctrees/reference/generated/modelopt.onnx.quantization.doctree b/.doctrees/reference/generated/modelopt.onnx.quantization.doctree
diff --git a/.doctrees/reference/generated/modelopt.onnx.quantization.extensions.doctree b/.doctrees/reference/generated/modelopt.onnx.quantization.extensions.doctree
diff --git a/.doctrees/reference/generated/modelopt.onnx.quantization.fp8.doctree b/.doctrees/reference/generated/modelopt.onnx.quantization.fp8.doctree
diff --git a/.doctrees/reference/generated/modelopt.onnx.quantization.graph_utils.doctree b/.doctrees/reference/generated/modelopt.onnx.quantization.graph_utils.doctree
diff --git a/.doctrees/reference/generated/modelopt.onnx.quantization.int4.doctree b/.doctrees/reference/generated/modelopt.onnx.quantization.int4.doctree
diff --git a/.doctrees/reference/generated/modelopt.onnx.quantization.int8.doctree b/.doctrees/reference/generated/modelopt.onnx.quantization.int8.doctree
diff --git a/.doctrees/reference/generated/modelopt.onnx.quantization.ort_patching.doctree b/.doctrees/reference/generated/modelopt.onnx.quantization.ort_patching.doctree
diff --git a/.doctrees/reference/generated/modelopt.onnx.quantization.ort_utils.doctree b/.doctrees/reference/generated/modelopt.onnx.quantization.ort_utils.doctree
diff --git a/.doctrees/reference/generated/modelopt.onnx.quantization.qdq_utils.doctree b/.doctrees/reference/generated/modelopt.onnx.quantization.qdq_utils.doctree
diff --git a/.doctrees/reference/generated/modelopt.onnx.quantization.quant_utils.doctree b/.doctrees/reference/generated/modelopt.onnx.quantization.quant_utils.doctree
diff --git a/.doctrees/reference/generated/modelopt.onnx.quantization.quantize.doctree b/.doctrees/reference/generated/modelopt.onnx.quantization.quantize.doctree
diff --git a/.doctrees/reference/generated/modelopt.onnx.utils.doctree b/.doctrees/reference/generated/modelopt.onnx.utils.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.distill.config.doctree b/.doctrees/reference/generated/modelopt.torch.distill.config.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.distill.distillation.doctree b/.doctrees/reference/generated/modelopt.torch.distill.distillation.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.distill.distillation_model.doctree b/.doctrees/reference/generated/modelopt.torch.distill.distillation_model.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.distill.doctree b/.doctrees/reference/generated/modelopt.torch.distill.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.distill.loss_balancers.doctree b/.doctrees/reference/generated/modelopt.torch.distill.loss_balancers.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.distill.losses.doctree b/.doctrees/reference/generated/modelopt.torch.distill.losses.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.distill.mode.doctree b/.doctrees/reference/generated/modelopt.torch.distill.mode.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.distill.registry.doctree b/.doctrees/reference/generated/modelopt.torch.distill.registry.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.doctree b/.doctrees/reference/generated/modelopt.torch.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.export.distribute.doctree b/.doctrees/reference/generated/modelopt.torch.export.distribute.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.export.doctree b/.doctrees/reference/generated/modelopt.torch.export.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.export.hf_config_map.doctree b/.doctrees/reference/generated/modelopt.torch.export.hf_config_map.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.export.layer_utils.doctree b/.doctrees/reference/generated/modelopt.torch.export.layer_utils.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.export.model_config.doctree b/.doctrees/reference/generated/modelopt.torch.export.model_config.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.export.model_config_export.doctree b/.doctrees/reference/generated/modelopt.torch.export.model_config_export.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.export.scaling_factor_utils.doctree b/.doctrees/reference/generated/modelopt.torch.export.scaling_factor_utils.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.export.tensorrt_llm_utils.doctree b/.doctrees/reference/generated/modelopt.torch.export.tensorrt_llm_utils.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.export.vllm.doctree b/.doctrees/reference/generated/modelopt.torch.export.vllm.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.opt.hparam.doctree b/.doctrees/reference/generated/modelopt.torch.opt.hparam.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.opt.searcher.doctree b/.doctrees/reference/generated/modelopt.torch.opt.searcher.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.opt.utils.doctree b/.doctrees/reference/generated/modelopt.torch.opt.utils.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.algorithms.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.algorithms.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.calib.histogram.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.calib.histogram.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.calib.max.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.calib.max.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.config.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.config.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.conversion.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.conversion.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.extensions.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.extensions.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.model_calib.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.model_calib.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.model_quant.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.model_quant.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.nn.modules.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.nn.modules.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.nn.modules.quant_conv.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.nn.modules.quant_conv.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.nn.modules.quant_linear.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.nn.modules.quant_linear.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.nn.modules.quant_module.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.nn.modules.quant_module.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.nn.modules.quant_rnn.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.nn.modules.quant_rnn.doctree
diff --git a/...trees/reference/generated/modelopt.torch.quantization.nn.modules.tensor_quantizer.doctree b/...trees/reference/generated/modelopt.torch.quantization.nn.modules.tensor_quantizer.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.plugins.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.plugins.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.qtensor.base_qtensor.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.qtensor.base_qtensor.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.qtensor.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.qtensor.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.qtensor.int4_tensor.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.qtensor.int4_tensor.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.qtensor.nf4_tensor.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.qtensor.nf4_tensor.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.quantization.tensor_quant.doctree b/.doctrees/reference/generated/modelopt.torch.quantization.tensor_quant.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.utils.dataset_utils.doctree b/.doctrees/reference/generated/modelopt.torch.utils.dataset_utils.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.utils.distributed.doctree b/.doctrees/reference/generated/modelopt.torch.utils.distributed.doctree
diff --git a/.doctrees/reference/generated/modelopt.torch.utils.network.doctree b/.doctrees/reference/generated/modelopt.torch.utils.network.doctree
diff --git a/.doctrees/support/1_contact.doctree b/.doctrees/support/1_contact.doctree
diff --git a/.doctrees/support/2_faqs.doctree b/.doctrees/support/2_faqs.doctree
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1 @@
+.doctrees/environment.pickle filter=lfs diff=lfs merge=lfs -text
diff --git a/_sources/deployment/1_tensorrt_llm_deployment.rst.txt b/_sources/deployment/1_tensorrt_llm_deployment.rst.txt
@@ -90,50 +90,55 @@ If the :meth:`export_tensorrt_llm_checkpoint <modelopt.torch.export.model_config
      - Yes
      - Yes
      - No
-   * - Falcon RW 1B, 7B
+   * - MPT 7B, 30B
      - Yes
      - Yes
      - Yes
      - Yes
-   * - MPT 7B, 30B
+   * - Baichuan 1, 2
      - Yes
      - Yes
      - Yes
      - Yes
-   * - Baichuan 1, 2
+   * - ChatGLM2, 3 6B
      - Yes
+     - No
+     - No
      - Yes
+   * - Bloom
      - Yes
      - Yes
-   * - Qwen 7B, 14B
      - Yes
      - Yes
+   * - Phi-1, 2, 3
+     - Yes
      - Yes
      - Yes
-   * - ChatGLM2, 3 6B
      - Yes
+   * - Nemotron 8
      - Yes
      - Yes
+     - No
      - Yes
-   * - Bloom
+   * - Gemma 2B, 7B
      - Yes
      - Yes
+     - No
      - Yes
+   * - Recurrent Gemma
      - Yes
-   * - Phi-1, 2, 3
      - Yes
      - Yes
      - Yes
+   * - StarCoder 2
      - Yes
-   * - Nemotron 8
      - Yes
      - Yes
-     - No
      - Yes
-   * - Gemma 2B, 7B
+   * - Qwen-1, 1.5
+     - Yes
      - Yes
      - Yes
-     - No
      - Yes
 
 Convert to TensorRT-LLM

diff --git a/_sources/examples/0_all_examples.rst.txt b/_sources/examples/0_all_examples.rst.txt
@@ -1,5 +1,5 @@
-All ModelOpt Examples
-=====================
+GitHub Examples
+===============
 
-Please visit the `TensorRT-Model-Optimizer GitHub repository <https://github.com/NVIDIA/TensorRT-Model-Optimizer>`_
-for all ModelOpt examples.
+All examples can be accessed from the ModelOpt GitHub repository at
+`github.com/NVIDIA/TensorRT-Model-Optimizer <https://github.com/NVIDIA/TensorRT-Model-Optimizer/>`_.
diff --git a/_sources/getting_started/1_overview.rst.txt b/_sources/getting_started/1_overview.rst.txt
@@ -7,16 +7,16 @@ Overview
 Minimizing inference costs presents a significant challenge as generative AI models continue to grow in complexity and size.
 The `NVIDIA TensorRT Model Optimizer <https://github.com/NVIDIA/TensorRT-Model-Optimizer>`_ (referred to as Model Optimizer, or ModelOpt)
 is a library comprising state-of-the-art model optimization techniques including quantization and sparsity to compress model.
-It accepts a torch or ONNX model as inputs and provides Python APIs for users to easily stack different model optimization
-techniques to produce quantized checkpoint. Seamlessly integrated within the NVIDIA AI software ecosystem, the quantized
+It accepts a torch or ONNX model as input and provides Python APIs for users to easily stack different model optimization
+techniques to produce optimized & quantized checkpoints. Seamlessly integrated within the NVIDIA AI software ecosystem, the quantized
 checkpoint generated from Model Optimizer is ready for deployment in downstream inference frameworks like
 `TensorRT-LLM <https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/quantization>`_ or `TensorRT <https://github.com/NVIDIA/TensorRT>`_.
 Further integrations are planned for `NVIDIA NeMo <https://github.com/NVIDIA/NeMo>`_ and `Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_
 for training-in-the-loop optimization techniques. For enterprise users, the 8-bit quantization with Stable Diffusion is also available on
 `NVIDIA NIM <https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/>`_.
 
 Model Optimizer is available for free for all developers on `NVIDIA PyPI <https://pypi.org/project/nvidia-modelopt/>`_.
-Visit `/NVIDIA/TensorRT-Model-Optimizer repository <https://github.com/NVIDIA/TensorRT-Model-Optimizer>`_ for end-to-end
+Visit the `TensorRT Model Optimizer GitHub repository <https://github.com/NVIDIA/TensorRT-Model-Optimizer>`_ for end-to-end
 example scripts and recipes optimized for NVIDIA GPUs.
 
 Techniques
@@ -34,8 +34,11 @@ for list of formats supported.
 Sparsity
 ^^^^^^^^
 Sparsity is a technique to further reduce the memory footprint of deep learning models and accelerate the inference.
-Model Optimizer provides Python API :meth:`mts.sparsify() <modelopt.torch.sparsity.sparsification.sparsify>` to apply
-weight sparsity to a given model. The ``mts.sparsify()`` API supports `NVIDIA 2:4 <https://arxiv.org/pdf/2104.0837>`_
-sparsity pattern and various sparsification methods, such as NVIDIA `ASP <https://github.com/NVIDIA/apex/tree/master/apex/contrib/sparsity>`_
-and `SparseGPT <https://arxiv.org/abs/2301.00774>`_. It supports both post-training sparsity and sparsity with fine-tuning.
-The latter workflow is recommended to minimize accuracy degradation.
+Model Optimizer provides the Python API :meth:`mts.sparsify() <modelopt.torch.sparsity.sparsification.sparsify>` to
+automatically apply weight sparsity to a given model. The
+:meth:`mts.sparsify() <modelopt.torch.sparsity.sparsification.sparsify>` API supports
+`NVIDIA 2:4 <https://arxiv.org/pdf/2104.0837>`_ sparsity pattern and various sparsification methods,
+such as `NVIDIA ASP <https://github.com/NVIDIA/apex/tree/master/apex/contrib/sparsity>`_ and
+`SparseGPT <https://arxiv.org/abs/2301.00774>`_. It supports both post-training sparsity (PTS) and
+sparsity-aware training (SAT). The latter workflow is recommended to minimize accuracy
+degradation.
diff --git a/_sources/getting_started/2_installation.rst.txt b/_sources/getting_started/2_installation.rst.txt
@@ -7,17 +7,19 @@ System requirements
 
 Model Optimizer (``nvidia-modelopt``) currently has the following system requirements:
 
-+----------------------+-----------------------------+
-| OS                   |  Linux, Windows             |
-+----------------------+-----------------------------+
-| Architecture         |  x86_64, aarch64, win_amd64 |
-+----------------------+-----------------------------+
-| Python               |  >=3.8,<3.12                |
-+----------------------+-----------------------------+
-| PyTorch              |  >=1.11                     |
-+----------------------+-----------------------------+
-| CUDA                 |  >=11.8 (Recommended)       |
-+----------------------+-----------------------------+
++-------------------------+-----------------------------+
+| OS                      |  Linux                      |
++-------------------------+-----------------------------+
+| Architecture            |  x86_64                     |
++-------------------------+-----------------------------+
+| Python                  |  >=3.8,<3.13                |
++-------------------------+-----------------------------+
+| CUDA                    |  >=11.8 (Recommended)       |
++-------------------------+-----------------------------+
+| PyTorch (Optional)      |  >=1.11                     |
++-------------------------+-----------------------------+
+| TensorRT-LLM (Optional) |  0.11                       |
++-------------------------+-----------------------------+
 
 Install Model Optimizer
 =======================
@@ -34,11 +36,11 @@ license terms of ModelOpt and any dependencies before use.
     **Setting up a virtual environment**
 
     We recommend setting up a virtual environment if you don't have one already. Run the following
-    command to set up and activate a ``conda`` virtual environment named ``modelopt`` with Python 3.11:
+    command to set up and activate a ``conda`` virtual environment named ``modelopt`` with Python 3.12:
 
     .. code-block:: bash
 
-        conda create -n modelopt python=3.11 pip
+        conda create -n modelopt python=3.12 pip
 
     .. code-block:: bash
 
@@ -89,11 +91,14 @@ license terms of ModelOpt and any dependencies before use.
         * - ``transformers`` (Huggingface)
           - ``[hf]``
 
+    If you want to install only partial dependencies, please replace ``[all]`` with the desired
+    optional dependencies for the below ``pip`` installation command.
+
 **Install Model Optimizer** (``nvidia-modelopt``)
 
 .. code-block:: bash
 
-    pip install "nvidia-modelopt[all]" --no-cache-dir --extra-index-url https://pypi.nvidia.com
+    pip install "nvidia-modelopt[all]" --extra-index-url https://pypi.nvidia.com
 
 Check installation
 ==================
@@ -103,7 +108,7 @@ Check installation
     When you use ModelOpt's PyTorch quantization APIs for the first time, it will compile the fast quantization kernels
     using your installed torch and CUDA if available.
     This may take a few minutes but subsequent quantization calls will be much faster.
-    To invoke the compilation now and check if it is successful, run the following command:
+    To invoke the compilation and check if it is successful or pre-compile for docker builds, run the following command:
 
     .. code-block:: bash
 

diff --git a/_sources/getting_started/3_quantization.rst.txt b/_sources/getting_started/3_quantization.rst.txt
@@ -9,8 +9,8 @@ Quantization is an effective technique to reduce the memory footprint of deep le
 accelerate the inference speed.
 
 ModelOpt's :meth:`mtq.quantize() <modelopt.torch.quantization.model_quant.quantize>` API enables
-users to quantize a model with advanced algorithms like SmoothQuant, AWQ etc. ModelOpt supports both
-Post Training Quantization (PTQ) and Quantization Aware Training (QAT).
+users to quantize a model with advanced algorithms like SmoothQuant, AWQ, and more. ModelOpt
+supports both Post Training Quantization (PTQ) and Quantization Aware Training (QAT).
 
 .. tip::
 
@@ -21,7 +21,7 @@ PTQ for PyTorch models
 -----------------------------
 
 :meth:`mtq.quantize <modelopt.torch.quantization.model_quant.quantize>` requires the model,
-the appropriate quantization configuration and a forward loop as inputs. Here is a quick example of
+the appropriate quantization configuration, and a forward loop as inputs. Here is a quick example of
 quantizing a model with int8 SmoothQuant using
 :meth:`mtq.quantize <modelopt.torch.quantization.model_quant.quantize>`:
 
@@ -55,8 +55,8 @@ Deployment
 The quantized model is just like a regular Pytorch model and is ready for evaluation or deployment.
 
 Huggingface or Nemo LLM models can be exported to TensorRT-LLM using ModelOpt.
-Please see :doc:`TensorRT-LLM Deployment <../deployment/1_tensorrt_llm_deployment>` guide for more
-details.
+Please see the :doc:`TensorRT-LLM Deployment <../deployment/1_tensorrt_llm_deployment>` guide for
+more details.
 
 The model can be also exported to ONNX using
 `torch.onnx.export <https://pytorch.org/docs/stable/onnx_torchscript.html#torch.onnx.export>`_.

diff --git a/_sources/getting_started/5_distillation.rst.txt b/_sources/getting_started/5_distillation.rst.txt
@@ -0,0 +1,115 @@
+
+=========================
+Quick Start: Distillation
+=========================
+
+ModelOpt's :doc:`Distillation <../guides/4_distillation>` is a set of wrappers and utilities
+to easily perform Knowledge Distillation among teacher and student models.
+Given a pretrained teacher model, Distillation has the potential to train a smaller student model
+faster and/or with higher accuracy than the student model could achieve on its own.
+
+This quick-start guide shows the necessary steps to integrate Distillation into your
+training pipeline.
+
+Set up your base models
+-----------------------
+
+First obtain both a pretrained model to act as the teacher and a (usualy smaller) model to serve
+as the student.
+
+.. code-block:: python
+
+    from torchvision.models import resnet50, resnet18
+
+    # Define student
+    student_model = resnet18()
+
+
+    # Define callable which returns teacher
+    def teacher_factory():
+        teacher_model = resnet50()
+        teacher_model.load_state_dict(pretrained_weights)
+        return teacher_model
+
+
+Set up the meta model
+---------------------
+
+As Knowledge Distillation involves (at least) two models, ModelOpt simplifies the integration
+process by wrapping both student and teacher into one meta model.
+
+Please see an example Distillation setup below. This example assumes the outputs
+of ``teacher_model`` and ``student_model`` are logits.
+
+.. code-block:: python
+
+    import modelopt.torch.distill as mtd
+
+    distillation_config = {
+        "teacher_model": teacher_factory,  # model initializer
+        "criterion": mtd.LogitsDistillationLoss(),  # callable receiving student and teacher outputs, in order
+        "loss_balancer": mtd.StaticLossBalancer(),  # combines multiple losses; omit if only one distillation loss used
+    }
+
+    distillation_model = mtd.convert(student_model, mode=[("kd_loss", distillation_config)])
+
+The ``teacher_model`` can be either a callable which returns an ``nn.Module`` or a tuple of ``(model_cls, args, kwargs)``.
+The ``criterion`` is the distillation loss used between student and teacher tensors.
+The ``loss_balancer`` determines how the original and distillation losses are combined (if needed).
+
+See :doc:`Distillation <../guides/4_distillation>` for more info.
+
+
+Distill during training
+-----------------------
+
+To Distill from teacher to student, simply use the meta model in the usual training loop, while
+also using the meta model's ``.compute_kd_loss()`` method to compute the distillation loss, in addition to
+the original user loss.
+
+An example of Distillation training is given below:
+
+.. code-block:: python
+    :emphasize-lines: 14
+
+    # Setup the data loaders. As example:
+    train_loader = get_train_loader()
+
+    # Define user loss function. As example:
+    loss_fn = get_user_loss_fn()
+
+    for input, labels in train_dataloader:
+        distillation_model.zero_grad()
+        # Forward through the wrapped models
+        out = distillation_model(input)
+        # Same loss as originally present
+        loss = loss_fn(out, labels)
+        # Combine distillation and user losses
+        loss_total = distillation_model.compute_kd_loss(student_loss=loss)
+        loss_total.backward()
+
+
+.. note::
+    `DataParallel <https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html>`_ may
+    break ModelOpt's Distillation feature.
+    Note that `HuggingFace Trainer <https://huggingface.co/docs/transformers/en/main_classes/trainer>`_
+    uses DataParallel by default.
+
+
+Export trained model
+--------------------
+
+The model can easily be reverted to its original class for further use (i.e deployment)
+without any ModelOpt modifications attached.
+
+.. code-block:: python
+
+    model = mtd.export(distillation_model)
+
+
+--------------------------------
+
+**Next steps**
+    * Learn more about :doc:`Distillation <../guides/4_distillation>`.
+    * See ModelOpt's :doc:`API documentation <../reference/1_modelopt_api>` for detailed
+      functionality and usage information.
diff --git a/_sources/getting_started/6_sparsity.rst.txt b/_sources/getting_started/6_sparsity.rst.txt
@@ -6,13 +6,13 @@ Sparsity
 --------
 
 ModelOpt's :doc:`sparsity<../guides/5_sparsity>` feature is an effective technique to reduce the
-memory footprint of deep learning models and accelerate the inference speed. ModelOpt provides an
+memory footprint of deep learning models and accelerate the inference speed. ModelOpt provides the
 easy-to-use API :meth:`mts.sparsify() <modelopt.torch.sparsity.sparsification.sparsify>` to apply
 weight sparsity to a given model.
 :meth:`mts.sparsify() <modelopt.torch.sparsity.sparsification.sparsify>` supports
 `NVIDIA 2:4 Sparsity <https://arxiv.org/abs/2104.08378>`_ sparsity pattern and various sparsification
-methods, such as (`NVIDIA ASP <https://github.com/NVIDIA/apex/tree/master/apex/contrib/sparsity>`_)
-and (`SparseGPT <https://arxiv.org/abs/2301.00774>`_).
+methods, such as `NVIDIA ASP <https://github.com/NVIDIA/apex/tree/master/apex/contrib/sparsity>`_
+and `SparseGPT <https://arxiv.org/abs/2301.00774>`_.
 
 This guide provides a quick start to apply weight sparsity to a PyTorch model using ModelOpt.
 
@@ -38,7 +38,7 @@ Here is a quick example of sparsifying a model to 2:4 sparsity pattern with Spar
     sparsity_config = {"data_loader": data_loader, "collect_func": lambda x: x}
 
     # Sparsify the model and perform calibration (PTS)
-    model = mts.sparsity(model, mode="sparsegpt", config=sparsity_config)
+    model = mts.sparsify(model, mode="sparsegpt", config=sparsity_config)
 
 .. note::
     `data_loader` is only required in case of data-driven sparsity, e.g., SparseGPT for calibration.
@@ -48,10 +48,19 @@ Here is a quick example of sparsifying a model to 2:4 sparsity pattern with Spar
     `data_loader` and `collect_func` can be substituted with a `forward_loop` that iterates the model through the
     calibration dataset.
 
+Sparsity-aware Training (SAT) for PyTorch models
+------------------------------------------------
+
+After sparsifying the model, you can save the checkpoint for the sparsified model and use it for
+fine-tuning the sparsified model. Check out the
+`GitHub end-to-end example <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/llm_sparsity>`_
+to learn more about SAT.
+
+
 --------------------------------
 
 **Next Steps**
     * Learn more about sparsity and advanced usage of ModelOpt sparsity in
       :doc:`Sparsity guide <../guides/5_sparsity>`.
-    * Checkout out the end-to-end examples on GitHub for PTQ and QAT
-      `here <https://github.com/NVIDIA/TensorRT-Model-Optimizer?tab=readme-ov-file#examples>`_.
+    * Checkout out the `end-to-end example on GitHub <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/llm_sparsity>`_
+      for PTS and SAT.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		.doctrees/environment.pickle filter=lfs diff=lfs merge=lfs -text