diff --git a/README.md b/README.md index 1ac27bcf..e2087545 100644 --- a/README.md +++ b/README.md @@ -101,6 +101,10 @@ Inference cost for CNV_2W2A.onnx You can read more about the BOPS metric in [this paper](https://www.frontiersin.org/articles/10.3389/frai.2021.676564/full), Section 4.2 Bit Operations. +### Qkeras to Qonnx Converter + +To see details about the qkeras converter and potential limitations check this [document](docs/qkeras-converter/qkeras_to_qonnx.md) + ### Development Install in editable mode in a venv: diff --git a/docs/qkeras-converter/qkeras_to_qonnx.md b/docs/qkeras-converter/qkeras_to_qonnx.md new file mode 100644 index 00000000..8a049620 --- /dev/null +++ b/docs/qkeras-converter/qkeras_to_qonnx.md @@ -0,0 +1,23 @@ +### **Qkeras to Qonnx** + +The converter works by (1) strip QKeras model of quantization attributes and store in a dictionary; (2) convert (as if plain Keras model) using tf2onnx; (3) Insert “Quant” nodes at appropriate locations based on a dictionary of quantization attributes. + +The current version has few issues given how tf2onnx inserts the quant nodes. These problems have suitable workarounds detailed below. + +### Quantized-Relu +The quantized-relu quantization inserts a redundant quantization node when used as output activation of Dense/Conv2D layer. + +Workaround: Only use quantized-relu activation in a seperate QActivation layers. + + + +### Quantized-Bits +The quantized-bits quantization node is not added to the model when used in QActivation layers. + +Workaround: Use quantized-bits only at the output of a Dense/Conv2D layers. + + + +### Ternary Quantization +A threshold of 0.5 must be used when using ternary quantization. +(This is sometimes unstable even with t=0.5)