This module contains Eager mode quantization APIs.
This module contains FX graph mode quantization APIs (prototype).
This module contains QConfigMapping for configuring FX graph mode quantization.
This module contains BackendConfig, a config object that defines how quantization is supported in a backend. Currently only used by FX Graph Mode Quantization, but we may extend Eager Mode Quantization to work with this as well.
This module contains a few CustomConfig classes that's used in both eager mode and FX graph mode quantization
This describes the quantization related functions of the torch namespace.
Quantized Tensors support a limited subset of data manipulation methods of the regular full-precision tensor.
This module contains observers which are used to collect statistics about the values observed during calibration (PTQ) or training (QAT).
This module implements modules which are used to perform fake quantization during QAT.
This module defines QConfig objects which are used to configure quantization settings for individual ops.
This module implements the combined (fused) modules conv + relu which can then be quantized.
This module implements the versions of those fused operations needed for quantization aware training.
This module implements the quantized implementations of fused operations like conv + relu. No BatchNorm variants as it's usually folded into convolution for inference.
This module implements the quantized dynamic implementations of fused operations like linear + relu.
This module implements versions of the key nn modules Conv2d() and Linear() which run in FP32 but with rounding applied to simulate the effect of INT8 quantization.
This module implements versions of the key nn modules such as Linear() which run in FP32 but with rounding applied to simulate the effect of INT8 quantization and will be dynamically quantized during inference.
This module implements the quantized versions of the nn layers such as ~`torch.nn.Conv2d` and torch.nn.ReLU.
This module implements the quantized versions of the functional layers such as ~`torch.nn.functional.conv2d` and torch.nn.functional.relu. Note: :meth:`~torch.nn.functional.relu` supports quantized inputs.
This module implements the quantizable versions of some of the nn layers.
These modules can be used in conjunction with the custom module mechanism,
by providing the custom_module_config
argument to both prepare and convert.
Dynamically quantized :class:`~torch.nn.Linear`, :class:`~torch.nn.LSTM`, :class:`~torch.nn.LSTMCell`, :class:`~torch.nn.GRUCell`, and :class:`~torch.nn.RNNCell`.
Note that operator implementations currently only support per channel quantization for weights of the conv and linear operators. Furthermore, the input data is mapped linearly to the quantized data and vice versa as follows:
\begin{aligned} \text{Quantization:}&\\ &Q_\text{out} = \text{clamp}(x_\text{input}/s+z, Q_\text{min}, Q_\text{max})\\ \text{Dequantization:}&\\ &x_\text{out} = (Q_\text{input}-z)*s \end{aligned}
where \text{clamp}(.) is the same as :func:`~torch.clamp` while the scale s and zero point z are then computed as described in :class:``, specifically:
\begin{aligned} \text{if Symmetric:}&\\ &s = 2 \max(|x_\text{min}|, x_\text{max}) / \left( Q_\text{max} - Q_\text{min} \right) \\ &z = \begin{cases} 0 & \text{if dtype is qint8} \\ 128 & \text{otherwise} \end{cases}\\ \text{Otherwise:}&\\ &s = \left( x_\text{max} - x_\text{min} \right ) / \left( Q_\text{max} - Q_\text{min} \right ) \\ &z = Q_\text{min} - \text{round}(x_\text{min} / s) \end{aligned}
where [x_\text{min}, x_\text{max}] denotes the range of the input data while Q_\text{min} and Q_\text{max} are respectively the minimum and maximum values of the quantized dtype.
Note that the choice of s and z implies that zero is represented with no quantization error whenever zero is within the range of the input data or symmetric quantization is being used.
Additional data types and quantization schemes can be implemented through the custom operator mechanism.
- :attr:`torch.qscheme` — Type to describe the quantization scheme of a tensor.
Supported types:
- :attr:`torch.per_tensor_affine` — per tensor, asymmetric
- :attr:`torch.per_channel_affine` — per channel, asymmetric
- :attr:`torch.per_tensor_symmetric` — per tensor, symmetric
- :attr:`torch.per_channel_symmetric` — per channel, symmetric
— Type to describe the data. Supported types:- :attr:`torch.quint8` — 8-bit unsigned integer
- :attr:`torch.qint8` — 8-bit signed integer
- :attr:`torch.qint32` — 32-bit signed integer
