Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSP packing Vivado backend #1096

Open
steltze opened this issue Oct 28, 2024 · 3 comments
Open

DSP packing Vivado backend #1096

steltze opened this issue Oct 28, 2024 · 3 comments

Comments

@steltze
Copy link

steltze commented Oct 28, 2024

This report describes DSP packing for int8. I would like to extend it for quantization with fewer bits, increasing the speedup and reducing the DSP/LUT utilization even more.

Things to take into consideration:

  • Since the optimization only concerns hls, there should be an attribute on each layer in python whether the packing implementation is used
  • The product function should be extended. Depending on the cases below, the structure of the weight matrix might need to change
    • weight sharing vs input sharing
    • sequential vs cascaded operation

There is an implementation for 8-bits from @violatingcp [1] [2]

@steltze steltze changed the title DSP packing DSP packing Vivado backend Oct 28, 2024
@steltze
Copy link
Author

steltze commented Oct 28, 2024

hey @jmitrevs @vloncar if you have any thoughts let me know

@vloncar
Copy link
Contributor

vloncar commented Oct 30, 2024

This would be very cool to have. I wonder, does the linked implementation actually work and the compiler does the right stuff? Also, how are overflows generally handled?

Do you envision this to be an option for every multiplication in every strategy, or a strategy itself? Latter is simpler, former is more useful.

@steltze
Copy link
Author

steltze commented Dec 4, 2024

had to do some

how are overflows generally handled?

either they are detectable on the output and thus one can recover from those with some cost or if one uses 4-bit (not possible with 8-bit) quantization they can be completely avoided if input packing is done correctly

Do you envision this to be an option for every multiplication in every strategy, or a strategy itself?

From what I have experimented with so far, it seems more feasible to make it a strategy as for most layers, except maybe the dense layer, the implementation has to change to make packing more efficient. For example, on the Convolutional layers, one can pack both inputs of the dsp (as described here) to make the computation more efficient. One input packing image input values and the second packing weights, which can't be directly integrated into the current code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants