You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This report describes DSP packing for int8. I would like to extend it for quantization with fewer bits, increasing the speedup and reducing the DSP/LUT utilization even more.
Things to take into consideration:
Since the optimization only concerns hls, there should be an attribute on each layer in python whether the packing implementation is used
The product function should be extended. Depending on the cases below, the structure of the weight matrix might need to change
This would be very cool to have. I wonder, does the linked implementation actually work and the compiler does the right stuff? Also, how are overflows generally handled?
Do you envision this to be an option for every multiplication in every strategy, or a strategy itself? Latter is simpler, former is more useful.
either they are detectable on the output and thus one can recover from those with some cost or if one uses 4-bit (not possible with 8-bit) quantization they can be completely avoided if input packing is done correctly
Do you envision this to be an option for every multiplication in every strategy, or a strategy itself?
From what I have experimented with so far, it seems more feasible to make it a strategy as for most layers, except maybe the dense layer, the implementation has to change to make packing more efficient. For example, on the Convolutional layers, one can pack both inputs of the dsp (as described here) to make the computation more efficient. One input packing image input values and the second packing weights, which can't be directly integrated into the current code
This report describes DSP packing for int8. I would like to extend it for quantization with fewer bits, increasing the speedup and reducing the DSP/LUT utilization even more.
Things to take into consideration:
product
function should be extended. Depending on the cases below, the structure of the weight matrix might need to changeThere is an implementation for 8-bits from @violatingcp [1] [2]
The text was updated successfully, but these errors were encountered: