-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2 potential bugs related to FloatQuant
#1126
Comments
Thank you for sharing this issue - I'm currently investigating (unfortunately, while juggling some other tasks). Early days my side, but since you said you're writing similar quantization functions, I'll give you some initial feedback anyway. Please take these comments with "a grain of salt".
Thanks, I am able to reproduce this and agree this is unexpected behaviour. What I don't understand is why our other tests (public and internal) haven't caught this. I plan to start addressing this by first creating a more extensive test suite.
I am not convinced of this yet, but maybe I'll change my mind as my investigation continues. ;) |
I'm not following your logic here. The goal if the mask |
Hi Nick,
I agree, in the meantime I realised that the minimum value that exponent can represent is |
I'm going to contradict myself and say that I believe our minifloat quantizers are behaving as expected.
I don't believe I have generated quite an extensive test to "kick the tyres" of our minifloat formats in #1136. @nghielme, if you still unsure, could you check the logic here and tell me if I'm mistaken. If you think there are values that can be represented by a format but not by Brevitas, can you specify the values of the mantissa and exponent in an unsigned format? I believe an extra mantissa bit, or an exponent bias of 1 would be required to represent that value.
The minimum effective value of the exponent is |
Closing - please reopen or create a new issue if you still think this is an issue. |
The issue is to share with you 2 potential issues related to
FloatQuant
implementation:One is related to the definition of the minimum internal scale, used to quantize the mantissa, here
I think the minimum should be
1. - self.exponent_bit_width() - self.exponent_bias() - self.mantissa_bit_width())
or a different expression but I think thatself.exponent_bit_width()
should be part of the expression. This can be deduced also from the way the actual internal scale is computed, here. The first partfloor_ste(torch.log2(torch.abs(x) + eps))
represents in fact the exponent of the floatx
.I noticed for example that
0.3125
is quantized to0.25
using a quantizer with 4 exponent bits, 4 mantissa bits, 0 exponent bias.The other is related to the following expression, here
I think the correct one should be
-x < -max_value
in order to identify the values that exceedmax_value
from the negative side.I am writing a similar quantizer so any comment on the 2 listed points can also be useful for me to write it properly
The text was updated successfully, but these errors were encountered: