Name		Name	Last commit message	Last commit date
parent directory ..
data		data
efficientnet		efficientnet
inception		inception
mobilenet		mobilenet
resnet		resnet
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
infer_engine.py		infer_engine.py
requirements.txt		requirements.txt
utils.py		utils.py
utils_finetuning.py		utils_finetuning.py

README.md

About

This folder contains the Quantization-Aware Training (QAT) workflow for standard networks.

The QAT end-to-end workflow (TF2-to-ONNX) consists of the following steps:

Model quantization using the quantize_model function with NVIDIA quantization scheme.
QAT model fine-tuning (saves checkpoints).
Baseline vs QAT models accuracy comparison.
QAT model conversion to SavedModel format.
Conversion of SavedModel to ONNX.
TensorRT engine building via ONNX file and inference.

Requirements

1. Base requirements

Install tensorflow-quantization toolkit.
Install additional requirements: pip install -r requirements.txt.
(Optional) Install TensorRT for full workflow support (needed for infer_engine.py).

Note: For CLI run, please go to the cloned repository's root directory and run export PYTHONPATH=$PWD, so that the examples folder is available for import.

2. Data preparation

A. Raw data download

We are using the ImageNet 2012 dataset (task 1 - image classification), which requires manual downloads due to terms of access agreements. Please login/sign-up on the ImageNet website and download the "train/validation data". This is needed for the QAT model fine-tuning, and it is also used to evaluate the Baseline and QAT models.

B. Conversion to tfrecord

Our workflow supports tfrecord format, so please follow the following instructions (modified from TensorFlow's instructions) to convert the downloaded .tar ImageNet files to the required format:

Set IMAGENET_HOME=/path/to/imagenet/tar/files in data/imagenet_data_setup.sh.
Download imagenet_to_gcs.py to $IMAGENET_HOME.
Run ./data/imagenet_data_setup.sh.

Workflow

Step 1: Model quantization and fine-tuning

Model quantization, fine-tuning, and conversion to ONNX.

Example models:

Model	Task	Script - QAT Workflow
ResNet	Classification	resnet
EfficientNet	Classification	efficientnet
MobileNet	Classification	mobilenet
Inception	Classification	inception

For each model's performance results, please refer to the toolkit's User Guide ("Model Zoo").

Step 2: TensorRT deployment

Build the TensorRT engine and evaluate its latency and accuracy performances.

2.1. Build TensorRT engine from ONNX

Convert the ONNX model into a TensorRT engine (also obtains latency measurements):

trtexec --onnx=model_qat.onnx --int8 --saveEngine=model_qat.engine --verbose

Arguments:

--onnx: Path to QAT onnx graph.
--saveEngine: Output filename of TensorRT engine.
--verbose: Flag to enable verbose logging.

2.2. TensorRT Inference

Obtain accuracy results on the validation dataset:

python infer_engine.py --engine=<path_to_trt_engine> --data_dir=<path_to_tfrecord_val_data> -b=<batch_size>

Arguments:

-e, --engine: TensorRT engine filename (to load).
-m, --model_name: Name of the model, needed to choose the appropriate input pre-processing. Options={resnet_v1 (default), resnet_v2, efficientnet_b0, efficientnet_b3, mobilenet_v1, mobilenet_v2}.
-d, --data_dir: Path to directory of input images in tfrecord format (data["validation"]).
-k, --top_k_value (default=1): Value of K for the top-K predictions used in the accuracy calculation.
-b, --batch_size (default=1): Number of inputs to send in parallel (up to max batch size of engine).
--log_file: Filename to save logs.

Outputs:

.log file: contains the engine's performance accuracy.

Additional resources

The following resources provide a deeper understanding about Quantization aware training, TF2ONNX and importing a model into TensorRT using Python.

Quantization Aware Training

Achieving FP32 Accuracy for INT8 Inference Using Quantization Aware Training with NVIDIA TensorRT

Parsers

Documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

README.md

About

Requirements

1. Base requirements

2. Data preparation

A. Raw data download

B. Conversion to tfrecord

Workflow

Step 1: Model quantization and fine-tuning

Step 2: TensorRT deployment

2.1. Build TensorRT engine from ONNX

2.2. TensorRT Inference

Additional resources

Files

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

About

Requirements

1. Base requirements

2. Data preparation

A. Raw data download

B. Conversion to tfrecord

Workflow

Step 1: Model quantization and fine-tuning

Step 2: TensorRT deployment

2.1. Build TensorRT engine from ONNX

2.2. TensorRT Inference

Additional resources