From 915feb29c7623a485f337578ad628056bbcdadbc Mon Sep 17 00:00:00 2001 From: idanb Date: Tue, 19 Mar 2024 14:37:42 +0200 Subject: [PATCH 1/7] Remove uneeded *py tutorials and edit the tutorials readm --- tutorials/notebooks/IMX500_notebooks.md | 12 + tutorials/notebooks/MCT_Features.md | 137 +++++++++ tutorials/notebooks/README.md | 150 ++-------- .../gptq/example_keras_mobilenet_gptq.py | 129 --------- .../keras/ptq/example_keras_mobilenet.py | 120 -------- ...example_keras_mobilenet_mixed_precision.py | 150 ---------- .../notebooks/keras/qat/example_keras_qat.py | 262 ------------------ ...ch_mobilenetv2_cifar100_mixed_precision.py | 260 ----------------- 8 files changed, 167 insertions(+), 1053 deletions(-) create mode 100644 tutorials/notebooks/IMX500_notebooks.md create mode 100644 tutorials/notebooks/MCT_Features.md delete mode 100644 tutorials/notebooks/keras/gptq/example_keras_mobilenet_gptq.py delete mode 100644 tutorials/notebooks/keras/ptq/example_keras_mobilenet.py delete mode 100644 tutorials/notebooks/keras/ptq/example_keras_mobilenet_mixed_precision.py delete mode 100644 tutorials/notebooks/keras/qat/example_keras_qat.py delete mode 100644 tutorials/notebooks/pytorch/ptq/example_pytorch_mobilenetv2_cifar100_mixed_precision.py diff --git a/tutorials/notebooks/IMX500_notebooks.md b/tutorials/notebooks/IMX500_notebooks.md new file mode 100644 index 000000000..2495497d3 --- /dev/null +++ b/tutorials/notebooks/IMX500_notebooks.md @@ -0,0 +1,12 @@ +# Sony-IMX500 Notebooks + +Here we provide examples on quantizing pre-trained models for deployment on Sony-IMX500 processing platform. +We will cover various tasks and demonstrate the necessary steps to achieve efficient quantization for optimal +deployment performance. + + | Task | Model | Source Repository | Notebook | + |-----------------------------------------------------------------|----------------|---------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------| + | Classification | MobileNetV2 | [Keras Applications](https://keras.io/api/applications/) | [Keras notebook](model_optimization/tutorials/notebooks/keras/ptq/example_keras_imagenet.ipynb) | + | Object Detection | YOLOv8n | [Ultralytics](https://github.com/ultralytics/ultralytics) | [Keras notebook](model_optimization/tutorials/notebooks/keras/ptq/keras_yolov8n_for_imx500.ipynb) | + | Semantic Segmentation | DeepLabV3-Plus | [bonlime's repo](https://github.com/bonlime/keras-deeplab-v3-plus) | [Keras notebook](model_optimization/tutorials/notebooks/keras/ptq/keras_deeplabv3plus_for_imx500.ipynb) | + diff --git a/tutorials/notebooks/MCT_Features.md b/tutorials/notebooks/MCT_Features.md new file mode 100644 index 000000000..b2028ff7c --- /dev/null +++ b/tutorials/notebooks/MCT_Features.md @@ -0,0 +1,137 @@ +# MCT Features + +### Keras Tutorials + +
+ Post-Training Quantization (PTQ) + + | Tutorial | Included Features | + |------------------------------|-----------------------------------------------------------------------------------------------------| + | [MobileNetV2](keras/ptq/example_keras_imagenet.ipynb) | ✅ PTQ | + | [Mixed-Precision MobileNetV2](keras/ptq/example_keras_mobilenet_mixed_precision.ipynb) | ✅ PTQ
✅ Mixed-Precision | + | [Nanodet-Plus](keras/ptq/example_keras_nanodet_plus.ipynb) | ✅ PTQ | + | [YoloV8-nano](keras/ptq/example_keras_yolov8n.ipynb) | ✅ PTQ | + | [EfficientDetLite0](keras/ptq/example_keras_effdet_lite0.ipynb) | ✅ PTQ
✅ [sony-custom-layers](https://github.com/sony/custom_layers) integration | + +
+ +
+ Gradient-Based Post-Training Quantization (GPTQ) + + | Tutorial | Included Features | + |------------------------------|---------------| + | [MobileNetV2](keras/gptq/example_keras_mobilenet_gptq.ipynb) | ✅ GPTQ | + +
+ +
+ Quantization-Aware Training (QAT) + + | Tutorial | Included Features | + |---------------------------------------------------|--------------| + | [QAT on MNIST](keras/qat/example_keras_qat.ipynb) | ✅ QAT | + +
+ + +
+ Structured Pruning + + | Tutorial | Included Features | + |---------------------------------------------------------------------|------------------| + | [Fully-Connected Model Pruning](keras/pruning/example_keras_pruning_mnist.ipynb) | ✅ Pruning | + +
+ +
+ Export Quantized Models + + | Tutorial | Included Features | + |---------------------------------------------------------------------------------------|-----------------| + | [Exporter Usage](keras/export/example_keras_export.ipynb) | ✅ Export | + +
+ +
+ Debug Tools + + | Tutorial | Included Features | + |-------------------------------------------------------------------------------------|-------------------------| + | [Network Editor Usage](keras/debug_tools/example_keras_network_editor.ipynb) | ✅ Network Editor | + +
+ +### Pytorch Tutorials + + +
+ Quick-Start with Torchvision + + | Tutorial | + |-----------------------------------------------------------------------------------------------------------------| + | [Quick Start - Torchvision Pretrained Model](pytorch/example_quick_start_torchvision.ipynb) | + +
+ + +
+ Post-Training Quantization (PTQ) + + | Tutorial | Included Features | + |---------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------| + | [Training & Quantizing Model on MNIST](pytorch/ptq/example_pytorch_quantization_mnist.ipynb) | ✅ PTQ | + | [Mixed-Precision MobileNetV2 on Cifar100](pytorch/ptq/example_pytorch_mobilenetv2_cifar100_mixed_precision.ipynb) | ✅ PTQ
✅ Mixed-Precision | + | [SSDLite MobileNetV3 Quantization](pytorch/ptq/example_pytorch_ssdlite_mobilenetv3.ipynb) | ✅ PTQ | + +
+ + + +
+ Quantization-Aware Training (QAT) + + | Tutorial | Included Features | + |-----------------------------------------------------------------------------------|--------------| + | [QAT on MNIST](pytorch/qat/example_pytorch_qat.py) | ✅ QAT | +
+ +
+ Structured Pruning + + | Tutorial | Included Features | + |--------------------------------------------------------------------------------------|------------------| + | [Fully-Connected Model Pruning](pytorch/pruning/example_pytorch_pruning_mnist.ipynb) | ✅ Pruning | + + +
+ +
+ Data Generation + + | Tutorial | Included Features | + |-----------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------| + | [Data-Free Quantization using Data Generation](pytorch/data_generation/example_pytorch_data_generation.ipynb) | ✅ PTQ
✅ Data-Free Quantization
✅ Data Generation | + +
+ + +
+ Export Quantized Models + + | Tutorial | Included Features | + |---------------------------------------------------------------------------------------|-----------------| + | [Exporter Usage](pytorch/export/example_pytorch_export.ipynb) | ✅ Export | + +
+ + + + + + + + + + + + diff --git a/tutorials/notebooks/README.md b/tutorials/notebooks/README.md index 86a411722..3d3f8723b 100644 --- a/tutorials/notebooks/README.md +++ b/tutorials/notebooks/README.md @@ -1,149 +1,35 @@ -# Tutorials +# MCT Tutorials -## Table of Contents -- [Introduction](#introduction) -- [Keras Tutorials](#keras-tutorials) - - [Post-Training Quantization](#keras-ptq) - - [Gradient-Based Post-Training Quantization](#keras-gptq) - - [Quantization-Aware Training](#keras-qat) - - [Structured Pruning](#keras-pruning) - - [Export Quantized Models](#keras-export) - - [Debug Tools](#keras-debug) -- [Pytorch Tutorials](#pytorch-tutorials) - - [Quick-Start with Torchvision](#pytorch-quickstart-torchvision) - - [Post-Training Quantization](#pytorch-ptq) - - [Quantization-Aware Training](#pytorch-qat) - - [Structured Pruning](#pytorch-pruning) - - [Data Generation](#pytorch-data-generation) - - [Export Quantized Models](#pytorch-export) - -## Introduction Dive into the Model-Compression-Toolkit (MCT) with our collection of tutorials, covering a wide -range of compression techniques for Keras and Pytorch models. We provide -both Python scripts and interactive Jupyter notebooks for an +range of compression techniques for Keras and Pytorch models. We provide interactive Jupyter notebooks for an engaging and hands-on experience. +## Getting started +This "hello world" notebook shows how to quickly quantize a pre-trained model using MCT post training quantization technique both for Keras models and Pytorch models. +- [Keras MobileNetV2 post training quantization](keras/ptq/example_keras_imagenet.ipynb) +- [Pytorch MobileNetV2 post training quantization](pytorch/ptq/example_pytorch_mobilenet_v2.py) -## Keras Tutorials - -
- Post-Training Quantization (PTQ) - - | Tutorial | Included Features | - |------------------------------|-----------------------------------------------------------------------------------------------------| - | [MobileNetV2](keras/ptq/example_keras_imagenet.ipynb) | ✅ PTQ | - | [Mixed-Precision MobileNetV2](keras/ptq/example_keras_mobilenet_mixed_precision.ipynb) | ✅ PTQ
✅ Mixed-Precision | - | [Nanodet-Plus](keras/ptq/example_keras_nanodet_plus.ipynb) | ✅ PTQ | - | [YoloV8-nano](keras/ptq/example_keras_yolov8n.ipynb) | ✅ PTQ | - | [EfficientDetLite0](keras/ptq/example_keras_effdet_lite0.ipynb) | ✅ PTQ
✅ [sony-custom-layers](https://github.com/sony/custom_layers) integration | - -
- -
- Gradient-Based Post-Training Quantization (GPTQ) - - | Tutorial | Included Features | - |------------------------------|---------------| - | [MobileNetV2](keras/gptq/example_keras_mobilenet_gptq.ipynb) | ✅ GPTQ | - -
- -
- Quantization-Aware Training (QAT) - - | Tutorial | Included Features | - |---------------------------------------------------|--------------| - | [QAT on MNIST](keras/qat/example_keras_qat.ipynb) | ✅ QAT | - -
- - -
- Structured Pruning - - | Tutorial | Included Features | - |---------------------------------------------------------------------|------------------| - | [Fully-Connected Model Pruning](keras/pruning/example_keras_pruning_mnist.ipynb) | ✅ Pruning | - -
- -
- Export Quantized Models - - | Tutorial | Included Features | - |---------------------------------------------------------------------------------------|-----------------| - | [Exporter Usage](keras/export/example_keras_export.ipynb) | ✅ Export | - -
- -
- Debug Tools - - | Tutorial | Included Features | - |-------------------------------------------------------------------------------------|-------------------------| - | [Network Editor Usage](keras/debug_tools/example_keras_network_editor.ipynb) | ✅ Network Editor | - -
- -## Pytorch Tutorials - - -
- Quick-Start with Torchvision - - | Tutorial | - |-----------------------------------------------------------------------------------------------------------------| - | [Quick Start - Torchvision Pretrained Model](pytorch/example_quick_start_torchvision.ipynb) | - -
- - -
- Post-Training Quantization (PTQ) - - | Tutorial | Included Features | - |---------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------| - | [Training & Quantizing Model on MNIST](pytorch/ptq/example_pytorch_quantization_mnist.ipynb) | ✅ PTQ | - | [Mixed-Precision MobileNetV2 on Cifar100](pytorch/ptq/example_pytorch_mobilenetv2_cifar100_mixed_precision.ipynb) | ✅ PTQ
✅ Mixed-Precision | - | [SSDLite MobileNetV3 Quantization](pytorch/ptq/example_pytorch_ssdlite_mobilenetv3.ipynb) | ✅ PTQ | - -
- +## MCT Features +In these examples, we will cover more advanced topics related to quantization. +This includes fine-tuning PTQ (Post-Training Quantization) configurations, exporting models, +and exploring advanced compression techniques. +These techniques are crucial for optimizing models further and achieving better performance in deployment scenarios. +- [MCT Features](MCT_Features.md) +## Quantization for Sony-IMX500 deployment +This section provides a guide on quantizing pre-trained models to meet specific constraints for deployment on the +processing platform. Our focus will be on quantizing models for deployment on Sony-IMX500 processing platform. +We will cover various tasks and demonstrate the necessary steps to achieve efficient quantization for optimal +deployment performance. +- [IMX500 notebooks](IMX500_notebooks.md) -
- Quantization-Aware Training (QAT) - - | Tutorial | Included Features | - |-----------------------------------------------------------------------------------|--------------| - | [QAT on MNIST](pytorch/qat/example_pytorch_qat.py) | ✅ QAT | -
- Structured Pruning - | Tutorial | Included Features | - |--------------------------------------------------------------------------------------|------------------| - | [Fully-Connected Model Pruning](pytorch/pruning/example_pytorch_pruning_mnist.ipynb) | ✅ Pruning | -
-
- Data Generation - - | Tutorial | Included Features | - |-----------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------| - | [Data-Free Quantization using Data Generation](pytorch/data_generation/example_pytorch_data_generation.ipynb) | ✅ PTQ
✅ Data-Free Quantization
✅ Data Generation | -
-
- Export Quantized Models - | Tutorial | Included Features | - |---------------------------------------------------------------------------------------|-----------------| - | [Exporter Usage](pytorch/export/example_pytorch_export.ipynb) | ✅ Export | - -
diff --git a/tutorials/notebooks/keras/gptq/example_keras_mobilenet_gptq.py b/tutorials/notebooks/keras/gptq/example_keras_mobilenet_gptq.py deleted file mode 100644 index cf8ea576b..000000000 --- a/tutorials/notebooks/keras/gptq/example_keras_mobilenet_gptq.py +++ /dev/null @@ -1,129 +0,0 @@ -# Copyright 2022 Sony Semiconductor Israel, Inc. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== - -import argparse - -from keras.applications.mobilenet_v2 import MobileNetV2 - -import model_compression_toolkit as mct -import tempfile - -""" -This tutorial demonstrates how a model (more specifically, MobileNetV2) can be -quantized and optimized using the Model Compression Toolkit (MCT) with GPTQ (gradient-based PTQ). -""" - -#################################### -# Preprocessing images -#################################### -import cv2 -import numpy as np - -MEAN = 127.5 -STD = 127.5 -RESIZE_SCALE = 256 / 224 -SIZE = 224 - - -def resize(x): - resize_side = max(RESIZE_SCALE * SIZE / x.shape[0], RESIZE_SCALE * SIZE / x.shape[1]) - height_tag = int(np.round(resize_side * x.shape[0])) - width_tag = int(np.round(resize_side * x.shape[1])) - resized_img = cv2.resize(x, (width_tag, height_tag)) - offset_height = int((height_tag - SIZE) / 2) - offset_width = int((width_tag - SIZE) / 2) - cropped_img = resized_img[offset_height:offset_height + SIZE, offset_width:offset_width + SIZE] - return cropped_img - - -def normalization(x): - return (x - MEAN) / STD - - -def argument_handler(): - parser = argparse.ArgumentParser() - parser.add_argument('--representative_dataset_dir', type=str, required=True, default=None, - help='folder path for the representative dataset.') - parser.add_argument('--batch_size', type=int, default=50, - help='batch size for the representative data.') - parser.add_argument('--num_calibration_iterations', type=int, default=10, - help='number of iterations for calibration.') - parser.add_argument('--num_gptq_training_iterations', type=int, default=5000, - help='number of iterations for gptq training.') - return parser.parse_args() - - -if __name__ == '__main__': - - # Parse arguments - args = argument_handler() - - # Set the batch size of the images at each calibration iteration. - batch_size = args.batch_size - - # Set the path to the folder of images to load and use for the representative dataset. - # Notice that the folder have to contain at least one image. - folder = args.representative_dataset_dir - - # Create a representative data generator, which returns a list of images. - # The images can be preprocessed using a list of preprocessing functions. - image_data_loader = mct.core.FolderImageLoader(folder, - preprocessing=[resize, normalization], - batch_size=batch_size) - - # Create a Callable representative dataset for calibration purposes. - # The function should be called without any arguments, and should return a list numpy arrays (array for each - # model's input). - # For example: A model has two input tensors - one with input shape of [32 X 32 X 3] and the second with - # an input shape of [224 X 224 X 3]. We calibrate the model using batches of 20 images. - # Calling representative_data_gen() should return a list - # of two numpy.ndarray objects where the arrays' shapes are [(20, 3, 32, 32), (20, 3, 224, 224)]. - def representative_data_gen() -> list: - for _ in range(args.num_calibration_iterations): - yield [image_data_loader.sample()] - - - # Get a TargetPlatformModel object that models the hardware for the quantized model inference. - # The model determines the quantization methods to use during the MCT optimization process. - # Here, for example, we use the default target platform model that is attached to a Tensorflow - # layers representation. - target_platform_cap = mct.get_target_platform_capabilities('tensorflow', 'default') - - # Create a model - model = MobileNetV2() - - # Create a core quantization configuration and set the number of calibration iterations. - config = mct.core.CoreConfig() - - # Create a GPTQ quantization configuration and set the number of training iterations. - gptq_config = mct.gptq.get_keras_gptq_config(n_epochs=args.num_gptq_training_iterations) - - quantized_model, quantization_info = mct.gptq.keras_gradient_post_training_quantization(model, - representative_data_gen, - gptq_config=gptq_config, - core_config=config, - target_platform_capabilities=target_platform_cap) - - # Export quantized model to TFLite and Keras. - # For more details please see: https://github.com/sony/model_optimization/blob/main/model_compression_toolkit/exporter/README.md - _, tflite_file_path = tempfile.mkstemp('.tflite') # Path of exported model - mct.exporter.keras_export_model(model=quantized_model, - save_model_path=tflite_file_path, - serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE, - quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT) - - _, keras_file_path = tempfile.mkstemp('.h5') # Path of exported model - mct.exporter.keras_export_model(model=quantized_model, - save_model_path=keras_file_path) diff --git a/tutorials/notebooks/keras/ptq/example_keras_mobilenet.py b/tutorials/notebooks/keras/ptq/example_keras_mobilenet.py deleted file mode 100644 index 84e5db681..000000000 --- a/tutorials/notebooks/keras/ptq/example_keras_mobilenet.py +++ /dev/null @@ -1,120 +0,0 @@ -# Copyright 2021 Sony Semiconductor Israel, Inc. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== - -import argparse - -from tensorflow.keras.applications.mobilenet import MobileNet - -import model_compression_toolkit as mct -import tempfile - -""" -This tutorial demonstrates how a model (more specifically, MobileNetV1) can be -quantized and optimized using the Model Compression Toolkit (MCT). -""" - -#################################### -# Preprocessing images -#################################### -import cv2 -import numpy as np - -MEAN = 127.5 -STD = 127.5 -RESIZE_SCALE = 256 / 224 -SIZE = 224 - - -def resize(x): - resize_side = max(RESIZE_SCALE * SIZE / x.shape[0], RESIZE_SCALE * SIZE / x.shape[1]) - height_tag = int(np.round(resize_side * x.shape[0])) - width_tag = int(np.round(resize_side * x.shape[1])) - resized_img = cv2.resize(x, (width_tag, height_tag)) - offset_height = int((height_tag - SIZE) / 2) - offset_width = int((width_tag - SIZE) / 2) - cropped_img = resized_img[offset_height:offset_height + SIZE, offset_width:offset_width + SIZE] - return cropped_img - - -def normalization(x): - return (x - MEAN) / STD - - -def argument_handler(): - parser = argparse.ArgumentParser() - parser.add_argument('--representative_dataset_dir', type=str, required=True, default=None, - help='folder path for the representative dataset.') - parser.add_argument('--batch_size', type=int, default=50, - help='batch size for the representative data.') - parser.add_argument('--num_calibration_iterations', type=int, default=10, - help='number of iterations for calibration.') - return parser.parse_args() - - -if __name__ == '__main__': - - # Parse arguments - args = argument_handler() - - # Set the batch size of the images at each calibration iteration. - batch_size = args.batch_size - - # Set the path to the folder of images to load and use for the representative dataset. - # Notice that the folder have to contain at least one image. - folder = args.representative_dataset_dir - - # Create a representative data generator, which returns a list of images. - # The images can be preprocessed using a list of preprocessing functions. - image_data_loader = mct.core.FolderImageLoader(folder, - preprocessing=[resize, normalization], - batch_size=batch_size) - - # Create a Callable representative dataset for calibration purposes. - # The function should be called without any arguments, and should return a list numpy arrays (array for each - # model's input). - # For example: A model has two input tensors - one with input shape of [32 X 32 X 3] and the second with - # an input shape of [224 X 224 X 3]. We calibrate the model using batches of 20 images. - # Calling representative_data_gen() should return a list - # of two numpy.ndarray objects where the arrays' shapes are [(20, 3, 32, 32), (20, 3, 224, 224)]. - def representative_data_gen() -> list: - for _ in range(args.num_calibration_iterations): - yield [image_data_loader.sample()] - - # Get a TargetPlatformModel object that models the hardware for the quantized model inference. - # The model determines the quantization methods to use during the MCT optimization process. - # Here, for example, we use the default target platform model that is attached to a Tensorflow - # layers representation. - target_platform_cap = mct.get_target_platform_capabilities('tensorflow', 'default') - - # Create a model and quantize it using the representative_data_gen as the calibration images. - # Set the number of calibration iterations. - model = MobileNet() - - quantized_model, quantization_info = mct.ptq.keras_post_training_quantization(model, - representative_data_gen, - target_platform_capabilities=target_platform_cap) - - - # Export quantized model to TFLite and Keras. - # For more details please see: https://github.com/sony/model_optimization/blob/main/model_compression_toolkit/exporter/README.md - _, tflite_file_path = tempfile.mkstemp('.tflite') # Path of exported model - mct.exporter.keras_export_model(model=quantized_model, - save_model_path=tflite_file_path, - serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE, - quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT) - - _, keras_file_path = tempfile.mkstemp('.h5') # Path of exported model - mct.exporter.keras_export_model(model=quantized_model, - save_model_path=keras_file_path) diff --git a/tutorials/notebooks/keras/ptq/example_keras_mobilenet_mixed_precision.py b/tutorials/notebooks/keras/ptq/example_keras_mobilenet_mixed_precision.py deleted file mode 100644 index 934cfb9ce..000000000 --- a/tutorials/notebooks/keras/ptq/example_keras_mobilenet_mixed_precision.py +++ /dev/null @@ -1,150 +0,0 @@ -# Copyright 2021 Sony Semiconductor Israel, Inc. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== - -import argparse - -from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2 - -import model_compression_toolkit as mct -import tempfile - -""" -Mixed precision is a method for quantizing a model using different bit widths -for different layers of the model. -This tutorial demonstrates how to use mixed-precision in MCT to -quantize MobileNetV2. -MCT supports mixed-precision for both weights and activation. -""" - -#################################### -# Preprocessing images -#################################### -import cv2 -import numpy as np - -MEAN = 127.5 -STD = 127.5 -RESIZE_SCALE = 256 / 224 -SIZE = 224 - - -def resize(x): - resize_side = max(RESIZE_SCALE * SIZE / x.shape[0], RESIZE_SCALE * SIZE / x.shape[1]) - height_tag = int(np.round(resize_side * x.shape[0])) - width_tag = int(np.round(resize_side * x.shape[1])) - resized_img = cv2.resize(x, (width_tag, height_tag)) - offset_height = int((height_tag - SIZE) / 2) - offset_width = int((width_tag - SIZE) / 2) - cropped_img = resized_img[offset_height:offset_height + SIZE, offset_width:offset_width + SIZE] - return cropped_img - - -def normalization(x): - return (x - MEAN) / STD - - -def argument_handler(): - parser = argparse.ArgumentParser() - parser.add_argument('--representative_dataset_dir', type=str, required=True, default=None, - help='folder path for the representative dataset.') - parser.add_argument('--batch_size', type=int, default=50, - help='batch size for the representative data.') - parser.add_argument('--num_calibration_iterations', type=int, default=10, - help='number of iterations for calibration.') - parser.add_argument('--mixed_precision_num_of_images', type=int, default=32, - help='number of images to use for mixed-precision configuration search.') - parser.add_argument('--enable_mixed_precision_gradients_weighting', action='store_true', default=False, - help='Whether to use gradients during mixed-precision configuration search or not.') - parser.add_argument('--weights_compression_ratio', type=float, default=0.75, - help='weights compression ratio.') - return parser.parse_args() - - -if __name__ == '__main__': - - # Parse arguments - args = argument_handler() - - # Set the batch size of the images at each calibration iteration. - batch_size = args.batch_size - - # Set the path to the folder of images to load and use for the representative dataset. - # Notice that the folder have to contain at least one image. - folder = args.representative_dataset_dir - - # Create a representative data generator, which returns a list of images. - # The images can be preprocessed using a list of preprocessing functions. - image_data_loader = mct.core.FolderImageLoader(folder, - preprocessing=[resize, normalization], - batch_size=batch_size) - - # Create a Callable representative dataset for calibration purposes. - # The function should be called without any arguments, and should return a list numpy arrays (array for each - # model's input). - # For example: if the model has two input tensors - one with input shape of 32X32X3 and the second with input - # shape of 224X224X3, and we calibrate the model using batches of 20 images, - # calling representative_data_gen() should return a list - # of two numpy.ndarray objects where the arrays' shapes are [(20, 32, 32, 3), (20, 224, 224, 3)]. - def representative_data_gen() -> list: - for _ in range(args.num_calibration_iterations): - yield [image_data_loader.sample()] - - # Create a model to quantize. - model = MobileNetV2() - - # Create a mixed-precision quantization configuration with possible mixed-precision search options. - # MCT will search a mixed-precision configuration (namely, bit-width for each layer) - # and quantize the model according to this configuration. - # The candidates bit-width for quantization should be defined in the target platform model: - configuration = mct.core.CoreConfig(mixed_precision_config=mct.core.MixedPrecisionQuantizationConfig(num_of_images=args.mixed_precision_num_of_images, - use_hessian_based_scores=args.enable_mixed_precision_gradients_weighting)) - - # Get a TargetPlatformCapabilities object that models the hardware for the quantized model inference. - # Here, for example, we use the default platform that is attached to a Tensorflow layers representation. - target_platform_cap = mct.get_target_platform_capabilities('tensorflow', 'default') - - # Get KPI information to constraint your model's memory size. - # Retrieve a KPI object with helpful information of each KPI metric, - # to constraint the quantized model to the desired memory size. - kpi_data = mct.core.keras_kpi_data(model, representative_data_gen, configuration, - target_platform_capabilities=target_platform_cap) - - # Set a constraint for each of the KPI metrics. - # Create a KPI object to limit our returned model's size. Note that this values affects only layers and attributes - # that should be quantized (for example, the kernel of Conv2D in Keras will be affected by this value, - # while the bias will not) - # examples: - # weights_compression_ratio = 0.75 - About 0.75 of the model's weights memory size when quantized with 8 bits. - kpi = mct.core.KPI(kpi_data.weights_memory * args.weights_compression_ratio) - - # It is also possible to constraint only part of the KPI metric, e.g., by providing only weights_memory target - # in the past KPI object, e.g., kpi = mct.core.KPI(kpi_data.weights_memory * 0.75) - quantized_model, quantization_info = mct.ptq.keras_post_training_quantization(model, - representative_data_gen, - target_kpi=kpi, - core_config=configuration, - target_platform_capabilities=target_platform_cap) - - # Export quantized model to TFLite and Keras. - # For more details please see: https://github.com/sony/model_optimization/blob/main/model_compression_toolkit/exporter/README.md - _, tflite_file_path = tempfile.mkstemp('.tflite') # Path of exported model - mct.exporter.keras_export_model(model=quantized_model, - save_model_path=tflite_file_path, - serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE, - quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT) - - _, keras_file_path = tempfile.mkstemp('.h5') # Path of exported model - mct.exporter.keras_export_model(model=quantized_model, - save_model_path=keras_file_path) diff --git a/tutorials/notebooks/keras/qat/example_keras_qat.py b/tutorials/notebooks/keras/qat/example_keras_qat.py deleted file mode 100644 index 7f4b54c7a..000000000 --- a/tutorials/notebooks/keras/qat/example_keras_qat.py +++ /dev/null @@ -1,262 +0,0 @@ -# Copyright 2022 Sony Semiconductor Israel, Inc. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== - -""" -This tutorial demonstrates how the Model Compression Toolkit (MCT) prepares a model for Quantization Aware -Training. A model is trained on the MNIST dataset and then quantized and being QAT-ready by the MCT and -returned to the user. A QAT-ready model is a model with certain layers wrapped by a QuantizeWrapper with -the requested quantizers. -The user can now Fine-Tune the QAT-ready model. Finally, the model is finalized by the MCT which means the -MCT replaces the QuantizeWrappers with their native layers and quantized weights. -""" - -import argparse - -import numpy as np -import tensorflow as tf -from keras import Model, layers, datasets - -from model_compression_toolkit import DefaultDict -from model_compression_toolkit.target_platform_capabilities.target_platform.op_quantization_config import AttributeQuantizationConfig -from model_compression_toolkit.constants import FLOAT_BITWIDTH -from model_compression_toolkit.target_platform_capabilities.constants import KERNEL_ATTR, KERAS_KERNEL, BIAS_ATTR, BIAS - -import model_compression_toolkit as mct -import tempfile - -tp = mct.target_platform - - -def get_tpc(): - """ - Assuming a target hardware that uses a power-of-2 threshold for activations and - a symmetric threshold for the weights. The activations are quantized to 3 bits, and the kernel weights - are quantized to 2 bits. Our assumed hardware does not require quantization of some layers - (e.g. Flatten & Droupout). - This function generates a TargetPlatformCapabilities with the above specification. - - Returns: - TargetPlatformCapabilities object - """ - - # define a default quantization config for all non-specified weights attributes. - default_weight_attr_config = AttributeQuantizationConfig( - weights_quantization_method=tp.QuantizationMethod.POWER_OF_TWO, - weights_n_bits=8, - weights_per_channel_threshold=False, - enable_weights_quantization=False, - lut_values_bitwidth=None) - - # define a quantization config to quantize the kernel (for layers where there is a kernel attribute). - kernel_base_config = AttributeQuantizationConfig( - weights_quantization_method=tp.QuantizationMethod.SYMMETRIC, - weights_n_bits=2, - weights_per_channel_threshold=True, - enable_weights_quantization=True, - lut_values_bitwidth=None) - - # define a quantization config to quantize the bias (for layers where there is a bias attribute). - bias_config = AttributeQuantizationConfig( - weights_quantization_method=tp.QuantizationMethod.POWER_OF_TWO, - weights_n_bits=FLOAT_BITWIDTH, - weights_per_channel_threshold=False, - enable_weights_quantization=False, - lut_values_bitwidth=None) - - # Create a default OpQuantizationConfig where we use default_weight_attr_config as the default - # AttributeQuantizationConfig for weights with no specific AttributeQuantizationConfig. - # MCT will compress a layer's kernel and bias according to the configurations that are - # set in KERNEL_ATTR and BIAS_ATTR that are passed in attr_weights_configs_mapping. - default_config = tp.OpQuantizationConfig( - default_weight_attr_config=default_weight_attr_config, - attr_weights_configs_mapping={KERNEL_ATTR: kernel_base_config, - BIAS_ATTR: bias_config}, - activation_quantization_method=tp.QuantizationMethod.POWER_OF_TWO, - activation_n_bits=3, - enable_activation_quantization=True, - quantization_preserving=False, - fixed_scale=None, - fixed_zero_point=None, - simd_size=None) - - # Set default QuantizationConfigOptions in new TargetPlatformModel to be used when no other - # QuantizationConfigOptions is set for an OperatorsSet. - default_configuration_options = tp.QuantizationConfigOptions([default_config]) - tp_model = tp.TargetPlatformModel(default_configuration_options) - with tp_model: - default_qco = tp.get_default_quantization_config_options() - # Group of OperatorsSets that should not be quantized. - tp.OperatorsSet("NoQuantization", - default_qco.clone_and_edit(enable_activation_quantization=False) - .clone_and_edit_weight_attribute(enable_weights_quantization=False)) - # Group of linear OperatorsSets such as convolution and matmul. - tp.OperatorsSet("LinearOp") - - tpc = tp.TargetPlatformCapabilities(tp_model) - with tpc: - # No need to quantize Flatten and Dropout layers - tp.OperationsSetToLayers("NoQuantization", [layers.Flatten, layers.Dropout]) - # Assign the framework layers' attributes to KERNEL_ATTR and BIAS_ATTR that were used during creation - # of the default OpQuantizationConfig. - tp.OperationsSetToLayers("LinearOp", [layers.Dense, layers.Conv2D], - attr_mapping={KERNEL_ATTR: DefaultDict(default_value=KERAS_KERNEL), - BIAS_ATTR: DefaultDict(default_value=BIAS)}) - return tpc - - -def get_model(_num_classes, _input_shape): - """ - Generate example keras model - Args: - _num_classes: Number of classes (10 for MNIST) - _input_shape: input image shape (28x28x1 for MNIST) - - Returns: - Keras model - - """ - _input = layers.Input(shape=_input_shape) - x = layers.Conv2D(16, 3, strides=2, padding='same', activation='relu')(_input) - x = layers.Conv2D(32, 3, strides=2, padding='same', activation='relu')(x) - x = layers.Flatten()(x) - x = layers.Dropout(0.5)(x) - x = layers.Dense(128, activation='relu')(x) - x = layers.Dropout(0.5)(x) - x = layers.Dense(_num_classes, activation='softmax')(x) - return Model(inputs=_input, outputs=x) - - -def get_dataset(_num_classes): - """ - This function returns the MNIST dataset - - Args: - _num_classes: Number of classes (10 for MNIST) - - Returns: - x_train: A tuple of numpy array of training images - y_train: A tuple of numpy array of training labels - x_test: A tuple of numpy array of test images - y_test: A tuple of numpy array of test labels - """ - - # Load the data and split it between train and test sets - (x_train, y_train), (x_test, y_test) = datasets.mnist.load_data() - - # Normalize images - x_train = x_train.astype("float32") / 255 - x_test = x_test.astype("float32") / 255 - - # Add Channels axis to data - x_train = np.expand_dims(x_train, -1) - x_test = np.expand_dims(x_test, -1) - - # convert class vectors to binary class matrices - y_train = tf.keras.utils.to_categorical(y_train, _num_classes) - y_test = tf.keras.utils.to_categorical(y_test, _num_classes) - return x_train, y_train, x_test, y_test - - -def gen_representative_dataset(_images, num_calibration_iterations): - # Return a Callable representative dataset for calibration purposes. - # The function should be called without any arguments, and should return a list numpy arrays (array - # for each model's input). - # In this tutorial, each time the representative dataset is called it returns a list containing a single - # MNIST image of shape (1, 28, 28, 1). - def _generator(): - for _ind in range(num_calibration_iterations): - yield [_images[_ind][np.newaxis, ...]] - return _generator - - -def argument_handler(): - parser = argparse.ArgumentParser() - parser.add_argument('--batch_size', type=int, default=128, - help='batch size for model training.') - parser.add_argument('--num_epochs', type=int, default=15, - help='number of epochs for model training.') - parser.add_argument('--num_calibration_iterations', type=int, default=10, - help='number of iterations for calibration - model quantization before fine-tuning.') - return parser.parse_args() - - -if __name__ == "__main__": - """ - The code below is an example code of a user for fine tuning a float model with the MCT Quantization - Aware Training API. - """ - - # Parse arguments - args = argument_handler() - - # init parameters - num_classes = 10 - input_shape = (28, 28, 1) - batch_size = args.batch_size - epochs = args.num_epochs - - # init model - model = get_model(num_classes, input_shape) - model.summary() - - # init dataset - x_train, y_train, x_test, y_test = get_dataset(num_classes) - - # train float model - model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]) - model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.2) - - # evaluate float model - score = model.evaluate(x_test, y_test, verbose=0) - print(f"Float model test accuracy: {score[1]:02.4f}") - - # prepare a representative dataset callable from the MNIST training images for calibrating the initial - # quantization parameters by the MCT. - representative_dataset = gen_representative_dataset(x_train, args.num_calibration_iterations) - - # prepare model for QAT with MCT and return to user for fine-tuning. Due to the relatively easy - # task of quantizing model trained on MNIST, a custom TPC is used in this example to demonstrate - # the degradation caused by post training quantization. - qat_model, _, _ = mct.qat.keras_quantization_aware_training_init_experimental(model, representative_dataset, - core_config=mct.core.CoreConfig(), - target_platform_capabilities=get_tpc()) - - # Evaluate QAT-ready model accuracy from MCT. This model is fully quantized with QuantizeWrappers - # for weights and tf.quantization.fake_quant_with_min_max_vars for activations - qat_model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"], run_eagerly=True) - score = qat_model.evaluate(x_test, y_test, verbose=0) - print(f"PTQ model test accuracy: {score[1]:02.4f}") - - # fine-tune QAT model from MCT to recover the lost accuracy. - qat_model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.2) - - # Evaluate accuracy after fine-tuning. - score = qat_model.evaluate(x_test, y_test, verbose=0) - print(f"QAT model test accuracy: {score[1]:02.4f}") - - # Finalize QAT model: remove QuantizeWrappers and keep weights quantized as fake-quant values - quantized_model = mct.qat.keras_quantization_aware_training_finalize_experimental(qat_model) - - # Re-evaluate accuracy after finalizing the model (should have the same accuracy as QAT model - # after fine-tuning. Accuracy should be the same as before calling the finalize function. - quantized_model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]) - score = quantized_model.evaluate(x_test, y_test, verbose=0) - print(f"Quantized model test accuracy: {score[1]:02.4f}") - - # Export quantized model to Keras. - # For more details please see: https://github.com/sony/model_optimization/blob/main/model_compression_toolkit/exporter/README.md - _, keras_file_path = tempfile.mkstemp('.keras') # Path of exported model - mct.exporter.keras_export_model(model=quantized_model, - save_model_path=keras_file_path) diff --git a/tutorials/notebooks/pytorch/ptq/example_pytorch_mobilenetv2_cifar100_mixed_precision.py b/tutorials/notebooks/pytorch/ptq/example_pytorch_mobilenetv2_cifar100_mixed_precision.py deleted file mode 100644 index 57c0ae9a1..000000000 --- a/tutorials/notebooks/pytorch/ptq/example_pytorch_mobilenetv2_cifar100_mixed_precision.py +++ /dev/null @@ -1,260 +0,0 @@ -# Copyright 2023 Sony Semiconductor Israel, Inc. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== - -# This tutorial demonstrates how to retrain and quantize MobileNetV2 on CIFAR100 using -# mixed-precision quantization. First we fine-tune a pretrained MobileNetV2 on ImageNet, then -# we use MCT post-training-quantization to compress the weights to 0.75 compression ratio. - -import argparse -import copy -import tempfile -import random - -import torch -import torchvision -from torch import nn, optim -from torchvision import transforms -from tqdm import tqdm - -import model_compression_toolkit as mct -import numpy as np - -def seed_everything(seed_value): - random.seed(seed_value) - np.random.seed(seed_value) - torch.manual_seed(seed_value) - torch.cuda.manual_seed_all(seed_value) - torch.backends.cudnn.deterministic = True - torch.backends.cudnn.benchmark = False - -def argument_handler(): - parser = argparse.ArgumentParser() - parser.add_argument('--retrain_num_epochs', type=int, default=20, - help='Number of epochs for the retraining phase.') - parser.add_argument('--eval_batch_size', type=int, default=32, - help='Batch size for evaluation.') - parser.add_argument('--retrain_batch_size', type=int, default=32, - help='Batch size for retraining.') - parser.add_argument('--retrain_lr', type=float, default=0.001, - help='Learning rate to use during retraining.') - parser.add_argument('--retrain_momentum', type=float, default=0.9, - help='SGD momentum to use during retraining.') - - parser.add_argument('--representative_dataset_dir', type=str, default='./data', - help='Folder path to save the representative dataset.') - parser.add_argument('--ptq_batch_size', type=int, default=50, - help='Batch size for the representative data during PTQ calibration.') - parser.add_argument('--num_calibration_iterations', type=int, default=10, - help='Number of iterations for calibration.') - parser.add_argument('--weights_compression_ratio', type=float, default=0.75, - help='Weights compression ratio to use for mixed-precision quantization.') - parser.add_argument('--mixed_precision_num_of_images', type=int, default=32, - help='Number of images to use for mixed-precision configuration search.') - parser.add_argument('--enable_mixed_precision_gradients_weighting', action='store_true', default=False, - help='Whether to use gradients during mixed-precision configuration search or not.') - parser.add_argument('--seed', type=int, default=0, - help='Seed to set for randomness.') - - return parser.parse_args() - - - -def get_cifar100_trainloader(dataset_folder, transform, train_batch_size): - """ - Get CIFAR100 train loader. - """ - trainset = torchvision.datasets.CIFAR100(root=dataset_folder, train=True, download=True, transform=transform) - trainloader = torch.utils.data.DataLoader(trainset, batch_size=train_batch_size, shuffle=True) - return trainloader - - -def get_cifar100_testloader(dataset_folder, transform, eval_batch_size): - """ - Get CIFAR100 test loader. - """ - testset = torchvision.datasets.CIFAR100(root=dataset_folder, train=False, download=True, transform=transform) - testloader = torch.utils.data.DataLoader(testset, batch_size=eval_batch_size, shuffle=False) - return testloader - - -def evaluate(model, testloader, device): - """ - Evaluate a model using testloader. - - Args: - model: Model to evaluate. - testloader: Test loader to use for evaluation. - device: Device to run evaluation on. - - Returns: - Valuation accuracy. - - """ - model.to(device) - model.eval() # Set the model to evaluation mode - correct = 0 - total = 0 - with torch.no_grad(): - for data in testloader: - images, labels = data - images, labels = images.to(device), labels.to(device) - outputs = model(images) - _, predicted = torch.max(outputs.data, 1) - total += labels.size(0) - correct += (predicted == labels).sum().item() - val_acc = (100 * correct / total) - print('Accuracy: %.2f%%' % val_acc) - return val_acc - - -def retrain(model, transform, device, args): - trainloader = get_cifar100_trainloader(args.representative_dataset_dir, - transform, - args.retrain_batch_size) - - testloader = get_cifar100_testloader(args.representative_dataset_dir, - transform, - args.eval_batch_size) - - model.to(device) - - # Define loss function and optimizer - criterion = nn.CrossEntropyLoss() - optimizer = optim.SGD(model.parameters(), - lr=args.retrain_lr, - momentum=args.retrain_momentum) - - best_acc = 0.0 - # Training loop - for epoch in range(args.retrain_num_epochs): - prog_bar = tqdm(enumerate(trainloader), - total=len(trainloader), - leave=True) - - print(f'Retrain epoch: {epoch}') - for i, data in prog_bar: - inputs, labels = data - inputs, labels = inputs.to(device), labels.to(device) - - # Zero the parameter gradients - optimizer.zero_grad() - - # Forward, backward, and update parameters - outputs = model(inputs) - loss = criterion(outputs, labels) - loss.backward() - optimizer.step() - - val_acc = evaluate(model, testloader, device) - - # Check if this model has the best accuracy, and if so, save it - if val_acc > best_acc: - print(f'Best accuracy so far {val_acc}') - best_acc = val_acc - best_state_dict = copy.deepcopy(model.state_dict()) - - model.load_state_dict(best_state_dict) - return model - - -if __name__ == '__main__': - - # Parse arguments - args = argument_handler() - - seed_everything(args.seed) - - # Load pretrained MobileNetV2 model on ImageNet - model = torchvision.models.mobilenet_v2(pretrained=True) - - # Modify last layer to match CIFAR-100 classes - model.classifier[1] = nn.Linear(model.last_channel, 100) - - # Create preprocessing pipeline for training and evaluation - transform = transforms.Compose([ - transforms.Resize((224, 224)), # Resize images to fit MobileNetV2 input - transforms.ToTensor(), - transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) # Normalize inputs to range [-1, 1] - - # If GPU available, move the model to GPU - device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") - - # Fine-tune the model to adapt to CIFAR100 - model = retrain(model, - transform, - device, - args) - - # Evaluate the retrained model - testloader = get_cifar100_testloader(args.representative_dataset_dir, - transform, - args.eval_batch_size) - evaluate(model, testloader, device) - - # Create representative_data_gen function from the train dataset - trainloader = get_cifar100_trainloader(args.representative_dataset_dir, - transform, - args.retrain_batch_size) - - - def representative_data_gen() -> list: - for _ in range(args.num_calibration_iterations): - yield [next(iter(trainloader))[0]] - - - # Get a TargetPlatformCapabilities object that models the hardware for the quantized model inference. - # Here, for example, we use the default platform that is attached to a Pytorch layers representation. - target_platform_cap = mct.get_target_platform_capabilities('pytorch', 'default') - - # Create a mixed-precision quantization configuration with possible mixed-precision search options. - # MCT will search a mixed-precision configuration (namely, bit-width for each layer) - # and quantize the model according to this configuration. - # The candidates bit-width for quantization should be defined in the target platform model: - configuration = mct.core.CoreConfig(mixed_precision_config=mct.core.MixedPrecisionQuantizationConfig( - num_of_images=args.mixed_precision_num_of_images, - use_hessian_based_scores=args.enable_mixed_precision_gradients_weighting)) - - # Get KPI information to constraint your model's memory size. - # Retrieve a KPI object with helpful information of each KPI metric, - # to constraint the quantized model to the desired memory size. - kpi_data = mct.core.pytorch_kpi_data(model, representative_data_gen, configuration, - target_platform_capabilities=target_platform_cap) - - # Set a constraint for each of the KPI metrics. - # Create a KPI object to limit our returned model's size. Note that this values affects only layers and attributes - # that should be quantized (for example, the kernel of Conv2D in Pytorch will be affected by this value, - # while the bias will not) - # examples: - # weights_compression_ratio = 0.75 - About 0.75 of the model's weights memory size when quantized with 8 bits. - kpi = mct.core.KPI(kpi_data.weights_memory * args.weights_compression_ratio) - - # It is also possible to constraint only part of the KPI metric, e.g., by providing only weights_memory target - # in the past KPI object, e.g., kpi = mct.core.KPI(kpi_data.weights_memory * 0.75) - quantized_model, quantization_info = mct.ptq.pytorch_post_training_quantization(model, - representative_data_gen, - target_kpi=kpi, - core_config=configuration, - target_platform_capabilities=target_platform_cap) - # Finally, we evaluate the quantized model: - print(f'Evaluating quantized model') - evaluate(quantized_model, - testloader, - device) - - # Export quantized model to ONNX - _, onnx_file_path = tempfile.mkstemp('.onnx') # Path of exported model - mct.exporter.pytorch_export_model(model=quantized_model, - save_model_path=onnx_file_path, - repr_dataset=representative_data_gen) From baba97f055cf0249d20289d2a3feed71ead67156 Mon Sep 17 00:00:00 2001 From: idanb Date: Tue, 19 Mar 2024 16:30:43 +0200 Subject: [PATCH 2/7] Remove uneeded *py tutorials and edit the tutorials readm --- ...le_keras_mobilenet_gptq_mixed_precision.py | 160 ------------ .../keras/pruning/example_keras_pruning.py | 98 -------- ...ple_keras_mobilenet_mixed_precision_lut.py | 154 ------------ .../pruning/example_pytorch_pruning.py | 107 -------- ...ample_pytorch_mobilenet_mixed_precision.py | 138 ----------- ...e_pytorch_mobilenet_mixed_precision_lut.py | 135 ---------- .../ptq/example_pytorch_mobilenet_v2.py | 128 ---------- .../pytorch/qat/example_pytorch_qat.py | 233 ------------------ 8 files changed, 1153 deletions(-) delete mode 100644 tutorials/notebooks/keras/gptq/example_keras_mobilenet_gptq_mixed_precision.py delete mode 100644 tutorials/notebooks/keras/pruning/example_keras_pruning.py delete mode 100644 tutorials/notebooks/keras/ptq/example_keras_mobilenet_mixed_precision_lut.py delete mode 100644 tutorials/notebooks/pytorch/pruning/example_pytorch_pruning.py delete mode 100644 tutorials/notebooks/pytorch/ptq/example_pytorch_mobilenet_mixed_precision.py delete mode 100644 tutorials/notebooks/pytorch/ptq/example_pytorch_mobilenet_mixed_precision_lut.py delete mode 100644 tutorials/notebooks/pytorch/ptq/example_pytorch_mobilenet_v2.py delete mode 100644 tutorials/notebooks/pytorch/qat/example_pytorch_qat.py diff --git a/tutorials/notebooks/keras/gptq/example_keras_mobilenet_gptq_mixed_precision.py b/tutorials/notebooks/keras/gptq/example_keras_mobilenet_gptq_mixed_precision.py deleted file mode 100644 index cf0521225..000000000 --- a/tutorials/notebooks/keras/gptq/example_keras_mobilenet_gptq_mixed_precision.py +++ /dev/null @@ -1,160 +0,0 @@ -# Copyright 2022 Sony Semiconductor Israel, Inc. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== - -import argparse - -from keras.applications.mobilenet_v2 import MobileNetV2 - -import model_compression_toolkit as mct -import tempfile - -""" -This tutorial demonstrates how a model (more specifically, MobileNetV2) can be -quantized and optimized using the Model Compression Toolkit (MCT) with -mixed-precision quantization and GPTQ (gradient-based PTQ). -""" - -#################################### -# Preprocessing images -#################################### -import cv2 -import numpy as np - -MEAN = 127.5 -STD = 127.5 -RESIZE_SCALE = 256 / 224 -SIZE = 224 - - -def resize(x): - resize_side = max(RESIZE_SCALE * SIZE / x.shape[0], RESIZE_SCALE * SIZE / x.shape[1]) - height_tag = int(np.round(resize_side * x.shape[0])) - width_tag = int(np.round(resize_side * x.shape[1])) - resized_img = cv2.resize(x, (width_tag, height_tag)) - offset_height = int((height_tag - SIZE) / 2) - offset_width = int((width_tag - SIZE) / 2) - cropped_img = resized_img[offset_height:offset_height + SIZE, offset_width:offset_width + SIZE] - return cropped_img - - -def normalization(x): - return (x - MEAN) / STD - - -def argument_handler(): - parser = argparse.ArgumentParser() - parser.add_argument('--representative_dataset_dir', type=str, required=True, default=None, - help='folder path for the representative dataset.') - parser.add_argument('--batch_size', type=int, default=50, - help='batch size for the representative data.') - parser.add_argument('--num_calibration_iterations', type=int, default=10, - help='number of iterations for calibration.') - parser.add_argument('--num_gptq_training_iterations', type=int, default=5000, - help='number of iterations for gptq training.') - parser.add_argument('--mixed_precision_num_of_images', type=int, default=32, - help='number of images to use for mixed-precision configuration search.') - parser.add_argument('--enable_mixed_precision_gradients_weighting', action='store_true', default=False, - help='Whether to use gradients during mixed-precision configuration search or not.') - parser.add_argument('--enable_gptq_hessian_based_weights', action='store_true', default=False, - help='Whether to use Hessian-based weights for weighted average loss during GPTQ.') - parser.add_argument('--weights_compression_ratio', type=float, default=0.75, - help='weights compression ratio for model memory size reduction.') - return parser.parse_args() - - -if __name__ == '__main__': - - # Parse arguments - args = argument_handler() - - # Set the batch size of the images at each calibration iteration. - batch_size = args.batch_size - - # Set the path to the folder of images to load and use for the representative dataset. - # Notice that the folder have to contain at least one image. - folder = args.representative_dataset_dir - - # Create a representative data generator, which returns a list of images. - # The images can be preprocessed using a list of preprocessing functions. - image_data_loader = mct.core.FolderImageLoader(folder, - preprocessing=[resize, normalization], - batch_size=batch_size) - - - # Create a Callable representative dataset for calibration purposes. - # The function should be called without any arguments, and should return a list numpy arrays (array for each - # model's input). - # For example: A model has two input tensors - one with input shape of [32 X 32 X 3] and the second with - # an input shape of [224 X 224 X 3]. We calibrate the model using batches of 20 images. - # Calling representative_data_gen() should return a list - # of two numpy.ndarray objects where the arrays' shapes are [(20, 3, 32, 32), (20, 3, 224, 224)]. - def representative_data_gen() -> list: - for _ in range(args.num_calibration_iterations): - yield [image_data_loader.sample()] - - - # Get a TargetPlatformModel object that models the hardware for the quantized model inference. - # The model determines the quantization methods to use during the MCT optimization process. - # Here, for example, we use the default target platform model that is attached to a Tensorflow - # layers representation. - target_platform_cap = mct.get_target_platform_capabilities('tensorflow', 'default') - - # Create a model - model = MobileNetV2() - - # Create a mixed-precision quantization configuration. - mixed_precision_config = mct.core.MixedPrecisionQuantizationConfig(num_of_images=args.mixed_precision_num_of_images, - use_hessian_based_scores=args.enable_mixed_precision_gradients_weighting) - - # Create a core quantization configuration, set the mixed-precision configuration, - # and set the number of calibration iterations. - config = mct.core.CoreConfig(mixed_precision_config=mixed_precision_config) - - # Get KPI information to constraint your model's memory size. - # Retrieve a KPI object with helpful information of each KPI metric, - # to constraint the quantized model to the desired memory size. - kpi_data = mct.core.keras_kpi_data(model, representative_data_gen, config, - target_platform_capabilities=target_platform_cap) - - # Set a constraint for each of the KPI metrics. - # Create a KPI object to limit our returned model's size. Note that this values affects only layers and attributes - # that should be quantized (for example, the kernel of Conv2D in Keras will be affected by this value, - # while the bias will not) - # examples: - # weights_compression_ratio = 0.75 - About 0.75 of the model's weights memory size when quantized with 8 bits. - kpi = mct.core.KPI(kpi_data.weights_memory * args.weights_compression_ratio) - - # Create a GPTQ quantization configuration and set the number of training iterations. - gptq_config = mct.gptq.get_keras_gptq_config(n_epochs=args.num_gptq_training_iterations, - use_hessian_based_weights=args.enable_gptq_hessian_based_weights) - - quantized_model, quantization_info = mct.gptq.keras_gradient_post_training_quantization(model, - representative_data_gen, - gptq_config=gptq_config, - core_config=config, - target_platform_capabilities=target_platform_cap, - target_kpi=kpi) - - # Export quantized model to TFLite and Keras. - # For more details please see: https://github.com/sony/model_optimization/blob/main/model_compression_toolkit/exporter/README.md - _, tflite_file_path = tempfile.mkstemp('.tflite') # Path of exported model - mct.exporter.keras_export_model(model=quantized_model, - save_model_path=tflite_file_path, - serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE, - quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT) - - _, keras_file_path = tempfile.mkstemp('.h5') # Path of exported model - mct.exporter.keras_export_model(model=quantized_model, - save_model_path=keras_file_path) diff --git a/tutorials/notebooks/keras/pruning/example_keras_pruning.py b/tutorials/notebooks/keras/pruning/example_keras_pruning.py deleted file mode 100644 index a4ad6a00f..000000000 --- a/tutorials/notebooks/keras/pruning/example_keras_pruning.py +++ /dev/null @@ -1,98 +0,0 @@ -# Copyright 2023 Sony Semiconductor Israel, Inc. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== - -import argparse -import keras.models - -from keras.applications.resnet50 import ResNet50 -import tensorflow as tf - -import model_compression_toolkit as mct -import tempfile -import numpy as np -import cv2 - - -RESIZE_SCALE = 256 / 224 -SIZE = 224 - -def resize(x): - resize_side = max(RESIZE_SCALE * SIZE / x.shape[0], RESIZE_SCALE * SIZE / x.shape[1]) - height_tag = int(np.round(resize_side * x.shape[0])) - width_tag = int(np.round(resize_side * x.shape[1])) - resized_img = cv2.resize(x, (width_tag, height_tag)) - offset_height = int((height_tag - SIZE) / 2) - offset_width = int((width_tag - SIZE) / 2) - cropped_img = resized_img[offset_height:offset_height + SIZE, offset_width:offset_width + SIZE] - return cropped_img - - -def count_model_params(model: keras.models.Model) -> int: - # Function to count the total number of parameters in a given Keras model. - return sum([l.count_params() for l in model.layers]) - -def argument_handler(): - parser = argparse.ArgumentParser() - parser.add_argument('--representative_dataset_dir', type=str, help='Folder path for the representative dataset.') - parser.add_argument('--batch_size', type=int, default=50, help='Batch size for the representative data.') - parser.add_argument('--num_score_approximations', type=int, default=32, - help='Number of scores to estimate the importance of each channel.') - parser.add_argument('--compression_rate', type=float, help='Compression rate to remove from the dense model.') - - return parser.parse_args() - - -if __name__ == '__main__': - args = argument_handler() - - # Create a function to generate representative data used for channels importance approximation. - image_data_loader = mct.core.FolderImageLoader(args.representative_dataset_dir, - preprocessing=[resize, - tf.keras.applications.resnet50.preprocess_input], - batch_size=args.batch_size) - - def representative_data_gen() -> list: - yield [image_data_loader.sample()] - - - # Retrieve the target platform capabilities which include the SIMD size configuration for each layer. - target_platform_cap = mct.get_target_platform_capabilities('tensorflow', - 'default') - - # Load a dense ResNet50 model for pruning. Compute the number of params to - # initialize the KPI to constraint the memory footprint of the pruned model's weights. - dense_model = ResNet50() - dense_nparams = count_model_params(dense_model) - print(f"Model has {dense_nparams} parameters.") - kpi = mct.core.KPI(weights_memory=dense_nparams * 4 * args.compression_rate) - - # Create PruningConfig with the number of approximations MCT will compute as importance metric - # for each channel when using LFH metric to set scores for each output channel that can be removed. - pruning_config = mct.pruning.PruningConfig(num_score_approximations=args.num_score_approximations) - - # Prune the model. - pruned_model, pruning_info = mct.pruning.keras_pruning_experimental(model=dense_model, - target_kpi=kpi, - representative_data_gen=representative_data_gen, - target_platform_capabilities=target_platform_cap, - pruning_config=pruning_config) - - # Count number of params in the pruned model and save it. - pruned_nparams = count_model_params(pruned_model) - print(f"Pruned model has {pruned_nparams} parameters.") - _, keras_file_path = tempfile.mkstemp('.keras') - print(f"Saving pruned model: {keras_file_path}") - keras.models.save_model(pruned_model, keras_file_path) - diff --git a/tutorials/notebooks/keras/ptq/example_keras_mobilenet_mixed_precision_lut.py b/tutorials/notebooks/keras/ptq/example_keras_mobilenet_mixed_precision_lut.py deleted file mode 100644 index efcf0f41f..000000000 --- a/tutorials/notebooks/keras/ptq/example_keras_mobilenet_mixed_precision_lut.py +++ /dev/null @@ -1,154 +0,0 @@ -# Copyright 2022 Sony Semiconductor Israel, Inc. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== - -import argparse - -from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2 - -import model_compression_toolkit as mct -import tempfile - -""" -Mixed precision is a method for quantizing a model using different bit widths -for different layers of the model. -This tutorial demonstrates how to use mixed-precision in MCT to -quantize MobileNetV2 weights, using non-uniform, -lookup table-based quantizer for low precision quantization (2 and 4 bits) -MCT supports non-uniform mixed-precision for weights quantization only. -In this example, activations are quantized with fixed 8-bit precision. -""" - -#################################### -# Preprocessing images -#################################### -import cv2 -import numpy as np - -MEAN = 127.5 -STD = 127.5 -RESIZE_SCALE = 256 / 224 -SIZE = 224 - - -def resize(x): - resize_side = max(RESIZE_SCALE * SIZE / x.shape[0], RESIZE_SCALE * SIZE / x.shape[1]) - height_tag = int(np.round(resize_side * x.shape[0])) - width_tag = int(np.round(resize_side * x.shape[1])) - resized_img = cv2.resize(x, (width_tag, height_tag)) - offset_height = int((height_tag - SIZE) / 2) - offset_width = int((width_tag - SIZE) / 2) - cropped_img = resized_img[offset_height:offset_height + SIZE, offset_width:offset_width + SIZE] - return cropped_img - - -def normalization(x): - return (x - MEAN) / STD - - -def argument_handler(): - parser = argparse.ArgumentParser() - parser.add_argument('--representative_dataset_dir', type=str, required=True, default=None, - help='folder path for the representative dataset.') - parser.add_argument('--batch_size', type=int, default=50, - help='batch size for the representative data.') - parser.add_argument('--num_calibration_iterations', type=int, default=10, - help='number of iterations for calibration.') - parser.add_argument('--weights_compression_ratio', type=float, default=0.4, - help='weights compression ratio.') - parser.add_argument('--mixed_precision_num_of_images', type=int, default=32, - help='number of images to use for mixed-precision configuration search.') - parser.add_argument('--enable_mixed_precision_gradients_weighting', action='store_true', default=False, - help='Whether to use gradients during mixed-precision configuration search or not.') - - return parser.parse_args() - - -if __name__ == '__main__': - - # Parse arguments - args = argument_handler() - - # Set the batch size of the images at each calibration iteration. - batch_size = args.batch_size - - # Set the path to the folder of images to load and use for the representative dataset. - # Notice that the folder have to contain at least one image. - folder = args.representative_dataset_dir - - # Create a representative data generator, which returns a list of images. - # The images can be preprocessed using a list of preprocessing functions. - image_data_loader = mct.core.FolderImageLoader(folder, - preprocessing=[resize, normalization], - batch_size=batch_size) - - # Create a Callable representative dataset for calibration purposes. - # The function should be called without any arguments, and should return a list numpy arrays (array for each - # model's input). - # For example: if the model has two input tensors - one with input shape of 32X32X3 and the second with input - # shape of 224X224X3, and we calibrate the model using batches of 20 images, - # calling representative_data_gen() should return a list - # of two numpy.ndarray objects where the arrays' shapes are [(20, 32, 32, 3), (20, 224, 224, 3)]. - def representative_data_gen() -> list: - for _ in range(args.num_calibration_iterations): - yield [image_data_loader.sample()] - - # Create a model to quantize. - model = MobileNetV2() - - # Create a mixed-precision quantization configuration with possible mixed-precision search options. - # MCT will search a mixed-precision configuration (namely, bit-width for each layer) - # and quantize the model according to this configuration. - # The candidates bit-width for quantization should be defined in the target platform model: - configuration = mct.core.CoreConfig(mixed_precision_config=mct.core.MixedPrecisionQuantizationConfig(num_of_images=args.mixed_precision_num_of_images, - use_hessian_based_scores=args.enable_mixed_precision_gradients_weighting)) - - # Get a TargetPlatformCapabilities object that models the hardware for the quantized model inference. - # In this example, we use a pre-defined platform that allows us to set a non-uniform (LUT) quantizer - # for low precision weights candidates. - # The used platform is attached to a Tensorflow layers representation. - target_platform_cap = mct.get_target_platform_capabilities('tensorflow', 'default', 'v1_lut') - - # Get KPI information to constraint your model's memory size. - # Retrieve a KPI object with helpful information of each KPI metric, - # to constraint the quantized model to the desired memory size. - kpi_data = mct.core.keras_kpi_data(model, representative_data_gen, configuration, - target_platform_capabilities=target_platform_cap) - - # Set a constraint for each of the KPI metrics. - # Create a KPI object to limit our returned model's size. Note that this values affects only layers and attributes - # that should be quantized (for example, the kernel of Conv2D in Keras will be affected by this value, - # while the bias will not): - # examples: - # weights_compression_ratio = 0.4 - About 0.4 of the model's weights memory size when quantized with 8 bits. - kpi = mct.core.KPI(kpi_data.weights_memory * args.weights_compression_ratio) - # Note that in this example, activations are quantized with fixed bit-width (non mixed-precision) of 8-bit. - - quantized_model, quantization_info = mct.ptq.keras_post_training_quantization(model, - representative_data_gen, - target_kpi=kpi, - core_config=configuration, - target_platform_capabilities=target_platform_cap) - - # Export quantized model to TFLite and Keras. - # For more details please see: https://github.com/sony/model_optimization/blob/main/model_compression_toolkit/exporter/README.md - _, tflite_file_path = tempfile.mkstemp('.tflite') # Path of exported model - mct.exporter.keras_export_model(model=quantized_model, - save_model_path=tflite_file_path, - serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE, - quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT) - - _, keras_file_path = tempfile.mkstemp('.h5') # Path of exported model - mct.exporter.keras_export_model(model=quantized_model, - save_model_path=keras_file_path) \ No newline at end of file diff --git a/tutorials/notebooks/pytorch/pruning/example_pytorch_pruning.py b/tutorials/notebooks/pytorch/pruning/example_pytorch_pruning.py deleted file mode 100644 index b26968982..000000000 --- a/tutorials/notebooks/pytorch/pruning/example_pytorch_pruning.py +++ /dev/null @@ -1,107 +0,0 @@ -# Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== - -""" -This tutorial demonstrates how a model (more specifically, MobileNetV2) can be -quantized and optimized using the Model Compression Toolkit (MCT). -""" - -import argparse - -import torch.nn -from torchvision.models import resnet50, ResNet50_Weights -import model_compression_toolkit as mct - -from PIL import Image -from torchvision import transforms -import tempfile - -def np_to_pil(img): - return Image.fromarray(img) - -def count_model_params(model: torch.nn.Module) -> int: - # Function to count the total number of parameters in a given Pytorch model. - return sum(p.numel() for p in model.state_dict().values()) - -def argument_handler(): - parser = argparse.ArgumentParser() - parser.add_argument("--representative_dataset_dir", type=str, required=True, default=None, - help="folder path for the representative dataset.", ) - parser.add_argument("--batch_size", type=int, default=50, help="batch size for the representative data.", ) - parser.add_argument("--num_calibration_iterations", type=int, default=10, - help="number of iterations for calibration.", ) - parser.add_argument('--num_score_approximations', type=int, default=32, - help='Number of scores to estimate the importance of each channel.') - parser.add_argument('--compression_rate', type=float, help='Compression rate to remove from the dense model.') - - return parser.parse_args() - - -if __name__ == "__main__": - - # Parse arguments - args = argument_handler() - - # Create a function to generate representative data used for channels importance approximation. - image_data_loader = mct.core.FolderImageLoader(args.representative_dataset_dir, - preprocessing=[np_to_pil, - transforms.Compose( - [transforms.Resize(256), - transforms.CenterCrop(224), - transforms.ToTensor(), - transforms.Normalize(mean=[0.485, 0.456, 0.406], - std=[0.229, 0.224, 0.225])])], - batch_size=args.batch_size, - ) - - - def representative_data_gen() -> list: - for _ in range(args.num_calibration_iterations): - yield [image_data_loader.sample()] - - - # Retrieve the target platform capabilities which include the SIMD size configuration for each layer. - target_platform_cap = mct.get_target_platform_capabilities("pytorch", "default") - - # Load a dense ResNet50 model for pruning. Compute the number of params to - # initialize the KPI to constraint the memory footprint of the pruned model's weights. - dense_model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V1.DEFAULT) - dense_nparams = count_model_params(dense_model) - print(f"The model has {dense_nparams} parameters.") - - # Define KPI for pruning. Each float32 parameter requires 4 bytes, - # hence we multiply the total parameter count by 4 to calculate the memory footprint. - kpi = mct.core.KPI(weights_memory=dense_nparams * 4 * args.compression_rate) - - # Create PruningConfig with the number of approximations MCT will compute as importance metric - # for each channel when using LFH metric to set scores for each output channel that can be removed. - pruning_config = mct.pruning.PruningConfig(num_score_approximations=args.num_score_approximations) - - # Prune the model. - pruned_model, pruning_info = mct.pruning.pytorch_pruning_experimental(model=dense_model, - target_kpi=kpi, - representative_data_gen=representative_data_gen, - target_platform_capabilities=target_platform_cap, - pruning_config=pruning_config) - - # Count number of params in the pruned model and save it. - pruned_nparams = count_model_params(pruned_model) - print(f"The pruned model has {pruned_nparams} parameters.") - - # Export quantized model to ONNX - _, onnx_file_path = tempfile.mkstemp('.onnx') # Path of exported model - mct.exporter.pytorch_export_model(model=pruned_model, - save_model_path=onnx_file_path, - repr_dataset=representative_data_gen) diff --git a/tutorials/notebooks/pytorch/ptq/example_pytorch_mobilenet_mixed_precision.py b/tutorials/notebooks/pytorch/ptq/example_pytorch_mobilenet_mixed_precision.py deleted file mode 100644 index a5bf0ff02..000000000 --- a/tutorials/notebooks/pytorch/ptq/example_pytorch_mobilenet_mixed_precision.py +++ /dev/null @@ -1,138 +0,0 @@ -# Copyright 2022 Sony Semiconductor Israel, Inc. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== - -import argparse - -from PIL import Image -from torchvision import transforms -from torchvision.models import mobilenet_v2 - -import model_compression_toolkit as mct -import tempfile - -""" -Mixed precision is a method for quantizing a model using different bit widths -for different layers of the model. -This tutorial demonstrates how to use mixed-precision in MCT to -quantize MobileNetV2. -For now, MCT supports mixed-precision for both weights and activation. -""" - -#################################### -# Preprocessing images -#################################### - - -def np_to_pil(img): - return Image.fromarray(img) - - -def argument_handler(): - parser = argparse.ArgumentParser() - parser.add_argument('--representative_dataset_dir', type=str, required=True, default=None, - help='folder path for the representative dataset.') - parser.add_argument('--batch_size', type=int, default=50, - help='batch size for the representative data.') - parser.add_argument('--num_calibration_iterations', type=int, default=10, - help='number of iterations for calibration.') - parser.add_argument('--weights_compression_ratio', type=float, default=0.75, - help='weights compression ratio.') - parser.add_argument('--mixed_precision_num_of_images', type=int, default=32, - help='number of images to use for mixed-precision configuration search.') - parser.add_argument('--enable_mixed_precision_gradients_weighting', action='store_true', default=False, - help='Whether to use gradients during mixed-precision configuration search or not.') - - return parser.parse_args() - - -if __name__ == '__main__': - - # Parse arguments - args = argument_handler() - - # Set the batch size of the images at each calibration iteration. - batch_size = args.batch_size - - # Set the path to the folder of images to load and use for the representative dataset. - # Notice that the folder have to contain at least one image. - folder = args.representative_dataset_dir - - # Create a representative data generator, which returns a list of images. - # The images can be preprocessed using a list of preprocessing functions. - image_data_loader = mct.core.FolderImageLoader(folder, - preprocessing=[np_to_pil, - transforms.Compose([ - transforms.Resize(256), - transforms.CenterCrop(224), - transforms.ToTensor(), - transforms.Normalize(mean=[0.485, 0.456, 0.406], - std=[0.229, 0.224, 0.225]), - ]) - ], - batch_size=batch_size) - - # Create a Callable representative dataset for calibration purposes. - # The function should be called without any arguments, and should return a list numpy arrays (array for each - # model's input). - # For example: if the model has two input tensors - one with input shape of 32X32X3 and the second with input - # shape of 224X224X3, and we calibrate the model using batches of 20 images, - # calling representative_data_gen() should return a list - # of two numpy.ndarray objects where the arrays' shapes are [(20, 32, 32, 3), (20, 224, 224, 3)]. - def representative_data_gen() -> list: - for _ in range(args.num_calibration_iterations): - yield [image_data_loader.sample()] - - # Create a model to quantize. - model = mobilenet_v2() - - # Create a mixed-precision quantization configuration with possible mixed-precision search options. - # MCT will search a mixed-precision configuration (namely, bit-width for each layer) - # and quantize the model according to this configuration. - # The candidates bit-width for quantization should be defined in the target platform model: - configuration = mct.core.CoreConfig(mixed_precision_config=mct.core.MixedPrecisionQuantizationConfig(num_of_images=args.mixed_precision_num_of_images, - use_hessian_based_scores=args.enable_mixed_precision_gradients_weighting)) - - # Get a TargetPlatformCapabilities object that models the hardware for the quantized model inference. - # Here, for example, we use the default platform that is attached to a Pytorch layers representation. - target_platform_cap = mct.get_target_platform_capabilities('pytorch', 'default') - - # Get KPI information to constraint your model's memory size. - # Retrieve a KPI object with helpful information of each KPI metric, - # to constraint the quantized model to the desired memory size. - kpi_data = mct.core.pytorch_kpi_data(model, representative_data_gen, configuration, - target_platform_capabilities=target_platform_cap) - - # Set a constraint for each of the KPI metrics. - # Create a KPI object to limit our returned model's size. Note that this values affects only layers and attributes - # that should be quantized (for example, the kernel of Conv2D in Pytorch will be affected by this value, - # while the bias will not) - # examples: - # weights_compression_ratio = 0.75 - About 0.75 of the model's weights memory size when quantized with 8 bits. - kpi = mct.core.KPI(kpi_data.weights_memory * args.weights_compression_ratio) - - # It is also possible to constraint only part of the KPI metric, e.g., by providing only weights_memory target - # in the past KPI object, e.g., kpi = mct.core.KPI(kpi_data.weights_memory * 0.75) - - quantized_model, quantization_info = mct.ptq.pytorch_post_training_quantization(model, - representative_data_gen, - target_kpi=kpi, - core_config=configuration, - target_platform_capabilities=target_platform_cap) - - # Export quantized model to ONNX - _, onnx_file_path = tempfile.mkstemp('.onnx') # Path of exported model - mct.exporter.pytorch_export_model(model=quantized_model, - save_model_path=onnx_file_path, - repr_dataset=representative_data_gen) \ No newline at end of file diff --git a/tutorials/notebooks/pytorch/ptq/example_pytorch_mobilenet_mixed_precision_lut.py b/tutorials/notebooks/pytorch/ptq/example_pytorch_mobilenet_mixed_precision_lut.py deleted file mode 100644 index 1acff8dc2..000000000 --- a/tutorials/notebooks/pytorch/ptq/example_pytorch_mobilenet_mixed_precision_lut.py +++ /dev/null @@ -1,135 +0,0 @@ -# Copyright 2022 Sony Semiconductor Israel, Inc. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== - -import argparse - -from PIL import Image -from torchvision import transforms -from torchvision.models import mobilenet_v2 - -import model_compression_toolkit as mct - - -""" -Mixed precision is a method for quantizing a model using different bit widths -for different layers of the model. -This tutorial demonstrates how to use mixed-precision in MCT to -quantize MobileNetV2 weights, using non-uniform, -lookup table-based quantizer for low precision quantization (2 and 4 bits) -MCT supports non-uniform mixed-precision for weights quantization only. -In this example, activations are quantized with fixed 8-bit precision. -""" - -#################################### -# Preprocessing images -#################################### - - -def np_to_pil(img): - return Image.fromarray(img) - - -def argument_handler(): - parser = argparse.ArgumentParser() - parser.add_argument('--representative_dataset_dir', type=str, required=True, default=None, - help='folder path for the representative dataset.') - parser.add_argument('--batch_size', type=int, default=50, - help='batch size for the representative data.') - parser.add_argument('--num_calibration_iterations', type=int, default=10, - help='number of iterations for calibration.') - parser.add_argument('--weights_compression_ratio', type=float, default=0.4, - help='weights compression ratio.') - parser.add_argument('--mixed_precision_num_of_images', type=int, default=32, - help='number of images to use for mixed-precision configuration search.') - parser.add_argument('--enable_mixed_precision_gradients_weighting', action='store_true', default=False, - help='Whether to use gradients during mixed-precision configuration search or not.') - - return parser.parse_args() - - -if __name__ == '__main__': - - # Parse arguments - args = argument_handler() - - # Set the batch size of the images at each calibration iteration. - batch_size = args.batch_size - - # Set the path to the folder of images to load and use for the representative dataset. - # Notice that the folder have to contain at least one image. - folder = args.representative_dataset_dir - - # Create a representative data generator, which returns a list of images. - # The images can be preprocessed using a list of preprocessing functions. - image_data_loader = mct.core.FolderImageLoader(folder, - preprocessing=[np_to_pil, - transforms.Compose([ - transforms.Resize(256), - transforms.CenterCrop(224), - transforms.ToTensor(), - transforms.Normalize(mean=[0.485, 0.456, 0.406], - std=[0.229, 0.224, 0.225]), - ]) - ], - batch_size=batch_size) - - # Create a Callable representative dataset for calibration purposes. - # The function should be called without any arguments, and should return a list numpy arrays (array for each - # model's input). - # For example: if the model has two input tensors - one with input shape of 32X32X3 and the second with input - # shape of 224X224X3, and we calibrate the model using batches of 20 images, - # calling representative_data_gen() should return a list - # of two numpy.ndarray objects where the arrays' shapes are [(20, 32, 32, 3), (20, 224, 224, 3)]. - def representative_data_gen() -> list: - for _ in range(args.num_calibration_iterations): - yield [image_data_loader.sample()] - - # Create a model to quantize. - model = mobilenet_v2() - - # Create a mixed-precision quantization configuration with possible mixed-precision search options. - # MCTwill search a mixed-precision configuration (namely, bit-width for each layer) - # and quantize the model according to this configuration. - # The candidates bit-width for quantization should be defined in the target platform model: - configuration = mct.core.CoreConfig(mixed_precision_config=mct.core.MixedPrecisionQuantizationConfig(num_of_images=args.mixed_precision_num_of_images, - use_hessian_based_scores=args.enable_mixed_precision_gradients_weighting)) - - # Get a TargetPlatformCapabilities object that models the hardware for the quantized model inference. - # In this example, we use a pre-defined platform that allows us to set a non-uniform (LUT) quantizer - # for low precision weights candidates. - # The used platform is attached to a Pytorch layers representation. - target_platform_cap = mct.get_target_platform_capabilities('pytorch', 'default', 'v1_lut') - - # Get KPI information to constraint your model's memory size. - # Retrieve a KPI object with helpful information of each KPI metric, - # to constraint the quantized model to the desired memory size. - kpi_data = mct.core.pytorch_kpi_data(model, representative_data_gen, configuration, - target_platform_capabilities=target_platform_cap) - - # Set a constraint for each of the KPI metrics. - # Create a KPI object to limit our returned model's size. Note that this values affects only layers and attributes - # that should be quantized (for example, the kernel of Conv2D in Pytorch will be affected by this value, - # while the bias will not) - # examples: - # weights_compression_ratio = 0.4 - About 0.4 of the model's weights memory size when quantized with 8 bits. - kpi = mct.core.KPI(kpi_data.weights_memory * args.weights_compression_ratio) - # Note that in this example, activations are quantized with fixed bit-width (non mixed-precision) of 8-bit. - - quantized_model, quantization_info = mct.ptq.pytorch_post_training_quantization(model, - representative_data_gen, - target_kpi=kpi, - core_config=configuration, - target_platform_capabilities=target_platform_cap) - diff --git a/tutorials/notebooks/pytorch/ptq/example_pytorch_mobilenet_v2.py b/tutorials/notebooks/pytorch/ptq/example_pytorch_mobilenet_v2.py deleted file mode 100644 index 77fd840f2..000000000 --- a/tutorials/notebooks/pytorch/ptq/example_pytorch_mobilenet_v2.py +++ /dev/null @@ -1,128 +0,0 @@ -# Copyright 2022 Sony Semiconductor Israel, Inc. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== - -""" -This tutorial demonstrates how a model (more specifically, MobileNetV2) can be -quantized and optimized using the Model Compression Toolkit (MCT). -""" - -import argparse -from torchvision.models import mobilenet_v2 -import model_compression_toolkit as mct - -from PIL import Image -from torchvision import transforms -import tempfile - -def np_to_pil(img): - return Image.fromarray(img) - - -def argument_handler(): - parser = argparse.ArgumentParser() - parser.add_argument( - "--representative_dataset_dir", - type=str, - required=True, - default=None, - help="folder path for the representative dataset.", - ) - parser.add_argument( - "--batch_size", - type=int, - default=50, - help="batch size for the representative data.", - ) - parser.add_argument( - "--num_calibration_iterations", - type=int, - default=10, - help="number of iterations for calibration.", - ) - parser.add_argument( - "--z_threshold", - type=int, - default=16, - help="set z threshold for outlier removal algorithm.", - ) - return parser.parse_args() - - -if __name__ == "__main__": - - # Parse arguments - args = argument_handler() - - # Set the batch size of the images at each calibration iteration. - batch_size = args.batch_size - - # Set the path to the folder of images to load and use for the representative dataset. - # Notice that the folder have to contain at least one image. - folder = args.representative_dataset_dir - - # Create a representative data generator, which returns a list of images. - # The images can be preprocessed using a list of preprocessing functions. - image_data_loader = mct.core.FolderImageLoader( - folder, - preprocessing=[ - np_to_pil, - transforms.Compose( - [ - transforms.Resize(256), - transforms.CenterCrop(224), - transforms.ToTensor(), - transforms.Normalize( - mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] - ), - ] - ), - ], - batch_size=batch_size, - ) - - # Create a Callable representative dataset for calibration purposes. - # The function should be called without any arguments, and should return a list numpy arrays (array for each - # model's input). - # For example: A model has two input tensors - one with input shape of [3 X 32 X 32] and the second with - # an input shape of [3 X 224 X 224]. We calibrate the model using batches of 20 images. - # Calling representative_data_gen() should return a list - # of two numpy.ndarray objects where the arrays' shapes are [(20, 3, 32, 32), (20, 3, 224, 224)]. - def representative_data_gen() -> list: - for _ in range(args.num_calibration_iterations): - yield [image_data_loader.sample()] - - # Get a TargetPlatformModel object that models the hardware for the quantized model inference. - # The model determines the quantization methods to use during the MCT optimization process. - # Here, for example, we use the default model that is attached to a Pytorch - # layers representation. - target_platform_cap = mct.get_target_platform_capabilities("pytorch", "default") - - # Create a model and quantize it using the representative_data_gen as the calibration images. - model = mobilenet_v2(pretrained=True) - # set configuration and Configure z threshold algorithm for outlier removal. - core_config = mct.core.CoreConfig(quantization_config=mct.core.QuantizationConfig(z_threshold=args.z_threshold)) - # run post training quantization on the model to get the quantized model output - quantized_model, quantization_info = mct.ptq.pytorch_post_training_quantization( - in_module=model, - representative_data_gen=representative_data_gen, - core_config=core_config, - target_platform_capabilities=target_platform_cap - ) - - # Export quantized model to ONNX - _, onnx_file_path = tempfile.mkstemp('.onnx') # Path of exported model - mct.exporter.pytorch_export_model(model=quantized_model, - save_model_path=onnx_file_path, - repr_dataset=representative_data_gen) \ No newline at end of file diff --git a/tutorials/notebooks/pytorch/qat/example_pytorch_qat.py b/tutorials/notebooks/pytorch/qat/example_pytorch_qat.py deleted file mode 100644 index 286e1ca61..000000000 --- a/tutorials/notebooks/pytorch/qat/example_pytorch_qat.py +++ /dev/null @@ -1,233 +0,0 @@ -# Copyright 2023 Sony Semiconductor Israel, Inc. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== - -""" -This tutorial demonstrates how to perform Quantization Aware Training (QAT) using the Model Compression Toolkit (MCT). -We first train a simple model on MNIST dataset, then quantize the model and transform it to a QAT-ready model. -A QAT-ready model is a model in which certain layers are wrapped by "quantization wrappers" with requested quantizers. -The user can now Fine-Tune the QAT-ready model. Finally, the model is finalized by the MCT which means the -MCT replaces the "quantization wrappers" with their native layers and quantized weights. -""" - - -from __future__ import print_function -import argparse -import torch -import torch.nn as nn -import torch.nn.functional as F -import torch.optim as optim -from torchvision import datasets, transforms -from torch.optim.lr_scheduler import StepLR -import model_compression_toolkit as mct -import tempfile - - -def argument_handler(): - parser = argparse.ArgumentParser() - parser.add_argument('--representative_dataset_dir', type=str, required=True, default=None, - help='folder path for the representative dataset.') - parser.add_argument('--batch_size', type=int, default=64, - help='batch size for the representative data.') - parser.add_argument('--num_calibration_iterations', type=int, default=10, - help='number of iterations for calibration.') - return parser.parse_args() - - -# Let us define the network and some helper functions to train and evaluate the model. -# These are taken from the official Pytorch examples https://github.com/pytorch/examples/blob/main/mnist/main.py -class Net(nn.Module): - def __init__(self): - super(Net, self).__init__() - self.conv1 = nn.Conv2d(1, 32, 3, 1) - self.conv2 = nn.Conv2d(32, 64, 3, 1) - self.dropout1 = nn.Dropout(0.25) - self.dropout2 = nn.Dropout(0.5) - self.fc1 = nn.Linear(9216, 128) - self.fc2 = nn.Linear(128, 10) - - def forward(self, x): - x = self.conv1(x) - x = F.relu(x) - x = self.conv2(x) - x = F.relu(x) - x = F.max_pool2d(x, 2) - x = self.dropout1(x) - x = torch.flatten(x, 1) - x = self.fc1(x) - x = F.relu(x) - x = self.dropout2(x) - x = self.fc2(x) - output = F.log_softmax(x, dim=1) - return output - - -# Training function -def train(model, device, train_loader, optimizer, epoch): - model.train() - for batch_idx, (data, target) in enumerate(train_loader): - data, target = data.to(device), target.to(device) - optimizer.zero_grad() - output = model(data) - loss = F.nll_loss(output, target) - loss.backward() - optimizer.step() - if batch_idx % 100 == 0: - print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( - epoch, batch_idx * len(data), len(train_loader.dataset), - 100. * batch_idx / len(train_loader), loss.item())) - - -# Evaluation function -def test(model, device, test_loader): - model.eval() - test_loss = 0 - correct = 0 - with torch.no_grad(): - for data, target in test_loader: - data, target = data.to(device), target.to(device) - output = model(data) - test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss - pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability - correct += pred.eq(target.view_as(pred)).sum().item() - - test_loss /= len(test_loader.dataset) - - print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format( - test_loss, correct, len(test_loader.dataset), - 100. * correct / len(test_loader.dataset))) - - -if __name__ == '__main__': - - # Parse arguments - args = argument_handler() - - # Set some training parameters - batch_size = args.batch_size - test_batch_size = 1000 - random_seed = 1 - device = torch.device("cuda" if torch.cuda.is_available() else "cpu") - torch.backends.cudnn.enabled = False - torch.manual_seed(random_seed) - dataset_folder = args.representative_dataset_dir - epochs = 1 - gamma = 0.1 - lr = 1.0 - - # Train a Pytorch model on MNIST - # Let us define the dataset loaders, and optimizer and train the model for 2 epochs. - transform = transforms.Compose([ - transforms.ToTensor(), - transforms.Normalize((0.1307,), (0.3081,)) - ]) - dataset1 = datasets.MNIST(dataset_folder, train=True, download=True, - transform=transform) - dataset2 = datasets.MNIST(dataset_folder, train=False, - transform=transform) - train_loader = torch.utils.data.DataLoader(dataset1, num_workers=0, pin_memory=True, batch_size=batch_size, - shuffle=True) - test_loader = torch.utils.data.DataLoader(dataset2, num_workers=0, pin_memory=True, batch_size=test_batch_size, - shuffle=False) - - model = Net().to(device) - optimizer = optim.Adadelta(model.parameters(), lr=lr) - scheduler = StepLR(optimizer, step_size=1, gamma=gamma) - - for epoch in range(1, epochs + 1): - train(model, device, train_loader, optimizer, epoch) - test(model, device, test_loader) - scheduler.step() - - # Hardware-friendly quantization using MCT - def get_tpc(): - """ - Assuming a target hardware that uses power-of-2 thresholds and quantizes weights and activations - to 2 and 3 bits, accordingly. Our assumed hardware does not require quantization of some layers - (e.g. Flatten & Droupout). - This function generates a TargetPlatformCapabilities with the above specification. - - Returns: - TargetPlatformCapabilities object - """ - tp = mct.target_platform - default_config = tp.OpQuantizationConfig( - activation_quantization_method=tp.QuantizationMethod.SYMMETRIC, - weights_quantization_method=tp.QuantizationMethod.SYMMETRIC, - activation_n_bits=3, - weights_n_bits=2, - weights_per_channel_threshold=True, - enable_weights_quantization=True, - enable_activation_quantization=True, - quantization_preserving=False, - fixed_scale=1.0, - fixed_zero_point=0, - weights_multiplier_nbits=0, - simd_size=None) - - default_configuration_options = tp.QuantizationConfigOptions([default_config]) - tp_model = tp.TargetPlatformModel(default_configuration_options) - with tp_model: - tp.OperatorsSet("NoQuantization", - tp.get_default_quantization_config_options().clone_and_edit( - enable_weights_quantization=False, - enable_activation_quantization=False)) - - tpc = tp.TargetPlatformCapabilities(tp_model) - with tpc: - # No need to quantize Flatten and Dropout layers - tp.OperationsSetToLayers("NoQuantization", [nn.Dropout, - torch.flatten]) - - return tpc - - # Prepare a representative dataset callable from the MNIST training images for calibrating the initial - # quantization parameters by the MCT. - image_data_loader = iter(train_loader) - def representative_data_gen(): - for _ in range(args.num_calibration_iterations): - yield [next(image_data_loader)[0]] - - # Prepare model for QAT with MCT and return to user for fine-tuning. Due to the relatively easy - # task of quantizing model trained on MNIST, we use a custom TPC in this example to demonstrate the degradation - # caused by post training quantization - qat_model, quantization_info = mct.qat.pytorch_quantization_aware_training_init_experimental(model, - representative_data_gen, - target_platform_capabilities=get_tpc()) - - # Evaluate QAT-ready model accuracy from MCT. This model is fully quantized with "quantize wrappers" - # accuracy is expected to be significantly lower - test(qat_model, device, test_loader) - - # Fine-tune QAT model from MCT to recover the accuracy. - optimizer = optim.Adam(qat_model.parameters(), lr=lr/10000) - scheduler = StepLR(optimizer, step_size=1, gamma=gamma) - - for epoch in range(1, epochs + 1): - train(qat_model, device, train_loader, optimizer, epoch) - test(qat_model, device, test_loader) - scheduler.step() - - # Finalize QAT model: remove "quantize wrappers" and keep weights quantized as fake-quant values - quantized_model = mct.qat.pytorch_quantization_aware_training_finalize_experimental(qat_model) - - # Re-evaluate accuracy after finalizing the model (should have a better accuracy than QAT model, since now the - # activations are not quantized) - test(quantized_model, device, test_loader) - - # Export quantized model to ONNX - _, onnx_file_path = tempfile.mkstemp('.onnx') # Path of exported model - mct.exporter.pytorch_export_model(model=quantized_model, - save_model_path=onnx_file_path, - repr_dataset=representative_data_gen) \ No newline at end of file From 3d472c9e40eea54b51389cba48b8352743d02dc4 Mon Sep 17 00:00:00 2001 From: idanb Date: Tue, 19 Mar 2024 16:33:37 +0200 Subject: [PATCH 3/7] Remove uneeded *py tutorials and edit the tutorials readm --- tutorials/notebooks/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/notebooks/README.md b/tutorials/notebooks/README.md index 3d3f8723b..af10975d6 100644 --- a/tutorials/notebooks/README.md +++ b/tutorials/notebooks/README.md @@ -18,7 +18,7 @@ These techniques are crucial for optimizing models further and achieving better ## Quantization for Sony-IMX500 deployment This section provides a guide on quantizing pre-trained models to meet specific constraints for deployment on the -processing platform. Our focus will be on quantizing models for deployment on Sony-IMX500 processing platform. +processing platform. Our focus will be on quantizing models for deployment on [Sony-IMX500](https://developer.sony.com/imx500/) processing platform. We will cover various tasks and demonstrate the necessary steps to achieve efficient quantization for optimal deployment performance. - [IMX500 notebooks](IMX500_notebooks.md) From e413aac19954523ac5c37d3fb76adf3fc22a9d39 Mon Sep 17 00:00:00 2001 From: idanb Date: Tue, 19 Mar 2024 17:14:53 +0200 Subject: [PATCH 4/7] Remove uneeded *py tutorials and edit the tutorials readm --- tutorials/{notebooks => }/README.md | 13 ++----------- .../notebooks/{MCT_Features.md => MCT_notebooks.md} | 0 2 files changed, 2 insertions(+), 11 deletions(-) rename tutorials/{notebooks => }/README.md (93%) rename tutorials/notebooks/{MCT_Features.md => MCT_notebooks.md} (100%) diff --git a/tutorials/notebooks/README.md b/tutorials/README.md similarity index 93% rename from tutorials/notebooks/README.md rename to tutorials/README.md index af10975d6..562037ecf 100644 --- a/tutorials/notebooks/README.md +++ b/tutorials/README.md @@ -14,22 +14,13 @@ In these examples, we will cover more advanced topics related to quantization. This includes fine-tuning PTQ (Post-Training Quantization) configurations, exporting models, and exploring advanced compression techniques. These techniques are crucial for optimizing models further and achieving better performance in deployment scenarios. -- [MCT Features](MCT_Features.md) +- [MCT notebooks](notebooks/MCT_notebooks.md) ## Quantization for Sony-IMX500 deployment This section provides a guide on quantizing pre-trained models to meet specific constraints for deployment on the processing platform. Our focus will be on quantizing models for deployment on [Sony-IMX500](https://developer.sony.com/imx500/) processing platform. We will cover various tasks and demonstrate the necessary steps to achieve efficient quantization for optimal deployment performance. -- [IMX500 notebooks](IMX500_notebooks.md) - - - - - - - - - +- [IMX500 notebooks](notebooks/IMX500_notebooks.md) diff --git a/tutorials/notebooks/MCT_Features.md b/tutorials/notebooks/MCT_notebooks.md similarity index 100% rename from tutorials/notebooks/MCT_Features.md rename to tutorials/notebooks/MCT_notebooks.md From e448d8c5f2a6fb7f08d587de9352c5287ae12ef2 Mon Sep 17 00:00:00 2001 From: idanb Date: Tue, 19 Mar 2024 18:16:17 +0200 Subject: [PATCH 5/7] Remove uneeded *py tutorials and edit the tutorials readm --- tutorials/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tutorials/README.md b/tutorials/README.md index 562037ecf..7bc1c50c5 100644 --- a/tutorials/README.md +++ b/tutorials/README.md @@ -6,11 +6,11 @@ engaging and hands-on experience. ## Getting started This "hello world" notebook shows how to quickly quantize a pre-trained model using MCT post training quantization technique both for Keras models and Pytorch models. -- [Keras MobileNetV2 post training quantization](keras/ptq/example_keras_imagenet.ipynb) -- [Pytorch MobileNetV2 post training quantization](pytorch/ptq/example_pytorch_mobilenet_v2.py) +- [Keras MobileNetV2 post training quantization](notebooks/keras/ptq/example_keras_imagenet.ipynb) +- [Pytorch MobileNetV2 post training quantization](notebooks/pytorch/ptq/example_pytorch_mobilenet_v2.py) ## MCT Features -In these examples, we will cover more advanced topics related to quantization. +In this section, we will cover more advanced topics related to quantization. This includes fine-tuning PTQ (Post-Training Quantization) configurations, exporting models, and exploring advanced compression techniques. These techniques are crucial for optimizing models further and achieving better performance in deployment scenarios. From bac699b5918a0e9e2e38181feadcafba674b7f1c Mon Sep 17 00:00:00 2001 From: idanb Date: Thu, 21 Mar 2024 11:37:35 +0200 Subject: [PATCH 6/7] fixes after PR --- tutorials/README.md | 20 ++++++++++---------- tutorials/notebooks/MCT_notebooks.md | 5 +++++ 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/tutorials/README.md b/tutorials/README.md index 7bc1c50c5..8274e60ac 100644 --- a/tutorials/README.md +++ b/tutorials/README.md @@ -1,26 +1,26 @@ # MCT Tutorials +Explore the Model Compression Toolkit (MCT) through our tutorials, +covering compression techniques for Keras and PyTorch models. +Access interactive Jupyter notebooks for hands-on learning. -Dive into the Model-Compression-Toolkit (MCT) with our collection of tutorials, covering a wide -range of compression techniques for Keras and Pytorch models. We provide interactive Jupyter notebooks for an -engaging and hands-on experience. ## Getting started -This "hello world" notebook shows how to quickly quantize a pre-trained model using MCT post training quantization technique both for Keras models and Pytorch models. +Learn how to quickly quantize pre-trained models using MCT's post-training quantization technique for both Keras and PyTorch models. - [Keras MobileNetV2 post training quantization](notebooks/keras/ptq/example_keras_imagenet.ipynb) -- [Pytorch MobileNetV2 post training quantization](notebooks/pytorch/ptq/example_pytorch_mobilenet_v2.py) +- [Pytorch MobileNetV2 post training quantization](notebooks/pytorch/ptq/example_pytorch_quantization_mnist.ipynb) ## MCT Features -In this section, we will cover more advanced topics related to quantization. +This set of tutorials covers all the quantization tools provided by MCT. +The notebooks in this section demonstrate how to configure and run simple and advanced post-training quantization methods. This includes fine-tuning PTQ (Post-Training Quantization) configurations, exporting models, and exploring advanced compression techniques. -These techniques are crucial for optimizing models further and achieving better performance in deployment scenarios. +These techniques are essential for further optimizing models and achieving superior performance in deployment scenarios. - [MCT notebooks](notebooks/MCT_notebooks.md) ## Quantization for Sony-IMX500 deployment + This section provides a guide on quantizing pre-trained models to meet specific constraints for deployment on the -processing platform. Our focus will be on quantizing models for deployment on [Sony-IMX500](https://developer.sony.com/imx500/) processing platform. +[Sony-IMX500](https://developer.sony.com/imx500/) processing platform. We will cover various tasks and demonstrate the necessary steps to achieve efficient quantization for optimal deployment performance. - [IMX500 notebooks](notebooks/IMX500_notebooks.md) - - diff --git a/tutorials/notebooks/MCT_notebooks.md b/tutorials/notebooks/MCT_notebooks.md index b2028ff7c..f1245a2e7 100644 --- a/tutorials/notebooks/MCT_notebooks.md +++ b/tutorials/notebooks/MCT_notebooks.md @@ -1,4 +1,9 @@ # MCT Features +This tutorial set introduces the various quantization tools offered by MCT. +The notebooks included here illustrate the setup and usage of both basic and advanced post-training quantization methods. +You'll learn how to refine PTQ (Post-Training Quantization) settings, export models, and explore advanced compression +techniques such as GPTQ (Gradient-Based Post-Training Quantization), Mixed precision quantization and more. +These techniques are essential for further optimizing models and achieving superior performance in deployment scenarios. ### Keras Tutorials From ade35df7088e6088816a4b9fcce7f86b3d849066 Mon Sep 17 00:00:00 2001 From: idanb Date: Thu, 21 Mar 2024 12:20:11 +0200 Subject: [PATCH 7/7] fixes after PR --- tutorials/README.md | 2 +- tutorials/notebooks/MCT_notebooks.md | 12 ------------ 2 files changed, 1 insertion(+), 13 deletions(-) diff --git a/tutorials/README.md b/tutorials/README.md index 8274e60ac..42decc234 100644 --- a/tutorials/README.md +++ b/tutorials/README.md @@ -19,7 +19,7 @@ These techniques are essential for further optimizing models and achieving super ## Quantization for Sony-IMX500 deployment -This section provides a guide on quantizing pre-trained models to meet specific constraints for deployment on the +This section provides several guides on quantizing pre-trained models to meet specific constraints for deployment on the [Sony-IMX500](https://developer.sony.com/imx500/) processing platform. We will cover various tasks and demonstrate the necessary steps to achieve efficient quantization for optimal deployment performance. diff --git a/tutorials/notebooks/MCT_notebooks.md b/tutorials/notebooks/MCT_notebooks.md index f1245a2e7..250afc923 100644 --- a/tutorials/notebooks/MCT_notebooks.md +++ b/tutorials/notebooks/MCT_notebooks.md @@ -128,15 +128,3 @@ These techniques are essential for further optimizing models and achieving super | [Exporter Usage](pytorch/export/example_pytorch_export.ipynb) | ✅ Export |
- - - - - - - - - - - -