Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A first sample version of FloatQuant #159

Open
wants to merge 4 commits into
base: feature/float_quant
Choose a base branch
from

Conversation

nghielme
Copy link

@nghielme nghielme commented Dec 9, 2024

Please, consider that the function should be tested more extensively.
Sample FloatQuant function implemented. A sample use of the function can be found in the Examples. ±inf are clipped to ±max_val. ±NaN are mapped to NaN. The zero is always representable. I tested with subnormals (to be intended as subnormals for the output representation) and the quantizer represented the subnormals with no loss (I didn't extensively tested this part though). I tested the function against Brevitas FloatQuant implementation: they do not always match. For example I think 0.3125 should be representable (x == xq) by a float quantizer with 4bits for mantissa, 4bits for the exponent, 0 bias and 1bit for the sign. Brevitas FloatQuant implementation quantize it to 0.25. Not sure what I should consider correct for this case.

…on can be found in the `Examples`. `±inf` are clipped to `±max_val`. `±NaN` are mapped to `±NaN`. The zero is always representable. I tested with subnormals (to be intended as subnormals for the output representation) and the quantizer represented the subnormals with no loss (I didn't extensively tested this part though). I tested the function against Brevitas `FloatQuant` implementation: they do not always match. For example I think `0.3125` should be representable (`x == xq`) by a float quantizer with 4bits for mantissa, 4bits for the exponent, 0 bias and 1bit for the sign. Brevitas `FloatQuant` implementation quantize it to `0.25`. Not sure what I should consider correct for this case.
@nghielme nghielme requested a review from maltanar December 9, 2024 20:51
@nickfraser
Copy link

Brevitas developer here, thanks for this concrete example - I will look into it ASAP!

@nghielme
Copy link
Author

Hi @nickfraser
I spotted some potential bugs in Brevitas FloatQuant, should I create one issue in the corresponding GH repo in which I can describe them and propose my fix?

@nickfraser
Copy link

Yes, please do - if you can provide minimal examples as well, this will make it much easier for us as well 🙏. Note, dev is our staging branch, so please make sure any issues still exist in dev.

If you have proposed solutions, feel free to make PRs as well (pointing to dev), so long as you follow our contributor guidelines.

exponent_bias=None,
max_val=None,
rounding_mode="ROUND",
lt_subnorm_to_zero=False,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maltanar , this name is terrible! Please, help me in finding a better one 🤦‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants