tensorflow-quantization

Nov 2, 2022

1d6bf36 · Nov 2, 2022

This branch is 92 commits behind NVIDIA/TensorRT:main.

Name	Name	Last commit message	Last commit date
parent directory ..
docs	docs	Expanded model zoo + fixed links in TF-QAT toolkit (w/ sign-off)	Aug 12, 2022
examples	examples	Expanded model zoo + fixed links in TF-QAT toolkit (w/ sign-off)	Aug 12, 2022
tensorflow_quantization	tensorflow_quantization	TensorRT 8.5.1 OSS Release	Nov 2, 2022
tests	tests	TF-QAT: Improved generalization of residual branch detection for QDQ	Sep 9, 2022
.gitignore	.gitignore	Add Tensorflow-Quantization toolkit v0.1.0	Jun 14, 2022
CHANGELOG.md	CHANGELOG.md	Update tools.	Sep 21, 2022
LICENSE	LICENSE	Add Tensorflow-Quantization toolkit v0.1.0	Jun 14, 2022
README.md	README.md	TensorRT 8.5.1 OSS Release	Nov 2, 2022
VERSION	VERSION	Add Tensorflow-Quantization toolkit v0.1.0	Jun 14, 2022
install.sh	install.sh	Updated installation instructions	Jun 22, 2022
rebuild.sh	rebuild.sh	Add Tensorflow-Quantization toolkit v0.1.0	Jun 14, 2022
setup.py	setup.py	Add Tensorflow-Quantization toolkit v0.1.0	Jun 14, 2022

README.md

NVIDIA TensorFlow 2.x Quantization

This TensorFlow 2.x Quantization toolkit quantizes (inserts Q/DQ nodes) TensorFlow 2.x Keras models for Quantization-Aware Training (QAT). We follow NVIDIA's QAT recipe, which leads to optimal model acceleration with TensorRT on NVIDIA GPUs and hardware accelerators.

Features

Implements NVIDIA Quantization recipe.
Supports fully automated or manual insertion of Quantization and DeQuantization (QDQ) nodes in the TensorFlow 2.x model with minimal code.
Can easily to add support for new layers.
Quantization behavior can be set programmatically.
Implements automatic tests for popular architecture blocks such as residual and inception.
Offers utilities for TensorFlow 2.x to TensorRT conversion via ONNX.
Includes example workflows.

Dependencies

Python >= 3.8
TensorFlow >= 2.8
tf2onnx >= 1.10.1
onnx-graphsurgeon
pytest
pytest-html
TensorRT (optional) >= 8.4 GA

Installation

Docker

Latest TensorFlow 2.x docker image from NGC is recommended.

$ cd ~/
$ git clone https://github.com/NVIDIA/TensorRT.git
$ docker pull nvcr.io/nvidia/tensorflow:22.03-tf2-py3
$ docker run -it --runtime=nvidia --gpus all --net host -v ~/TensorRT/tools/tensorflow-quantization:/home/tensorflow-quantization nvcr.io/nvidia/tensorflow:22.03-tf2-py3 /bin/bash

After last command, you will be placed in /workspace directory inside the running docker container whereas tensorflow-quantization repo is mounted in /home directory.

$ cd /home/tensorflow-quantization
$ ./install.sh
$ cd tests
$ python3 -m pytest quantize_test.py -rP

If all tests pass, installation is successful.

Local

$ cd ~/
$ git clone https://github.com/NVIDIA/TensorRT.git
$ cd TensorRT/tools/tensorflow-quantization
$ ./install.sh
$ cd tests
$ python3 -m pytest quantize_test.py -rP

If all tests pass, installation is successful.

Documentation

TensorFlow 2.x Quantization toolkit user guide.

Known limitations

Only Quantization Aware Training (QAT) is supported as a quantization method.
Only Functional and Sequential Keras models are supported. Original Keras layers are wrapped into quantized layers using TensorFlow's clone_model method, which doesn't support subclassed models.
Saving the quantized version of a few layers may not be supported in TensorFlow < 2.8:
- DepthwiseConv2D support was added in TF 2.8.
- Conv2DTranspose is not yet supported by TF (see the open bug here). However, there's a workaround if you do not need the TF2 SavedModel file and just the ONNX file:
  1. Implement Conv2DTransposeQuantizeWrapper. See our user guide for more information on how to do that.
  2. Convert the quantized Keras model to ONNX using our provided utility function convert_keras_model_to_onnx.

Resources

GTC 2022 talk
Quantization Basics whitepaper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

tensorflow-quantization

tensorflow-quantization

README.md

NVIDIA TensorFlow 2.x Quantization

Features

Dependencies

Installation

Docker

Local

Documentation

Known limitations

Resources

Files

tensorflow-quantization

Directory actions

More options

Directory actions

More options

Latest commit

History

tensorflow-quantization

Folders and files

parent directory

README.md

NVIDIA TensorFlow 2.x Quantization

Features

Dependencies

Installation

Docker

Local

Documentation

Known limitations

Resources