-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Q dq layout #1642
Q dq layout #1642
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1642
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
torchao/experimental/quant_api.py
Outdated
if not any( | ||
isinstance(layout, layout_class) | ||
for layout_class in [ | ||
QDQLayout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerryzh168 the QDQLayout here fuses the activation quantization ops with the weight ops rather than use to_affine_quantized_intx.
The reason is because using to_affine_quantized_intx means "torch.ops.quant.choose_qparams_affine.default" op will show up in export instead of "torch.ops.quantized_decomposed.choose_qparams_per_token_asymmetric.default" (wanted by ExecuTorch). By fusing, I can control the ops used in register_aqt_quantized_linear_dispatch.
Is there a better way of doing this dispatch? Something like register_aqt_quantized_linear_dispatch, but for the activations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think torch.ops.quant.choose_qparams_affine.default
covers torch.ops.quantized_decomposed.choose_qparams_per_token_asymmetric.default
already? I think we should move executorch delegation to use torch.ops.quant.choose_qparams_affine.default
, @mcr229 and @digantdesai has worked on this before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to make the exported graph equivalent to what is Int8DynActInt4WeightQuantizer produces. If XNNPACK partitioner can handle torch.ops.quant.choose_qparams_affine.default, then that also works. cc @digantdesai @mcr229 to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
torch.ops.quant.choose_qparams_affine.default
coverstorch.ops.quantized_decomposed.choose_qparams_per_token_asymmetric.default
already? I think we should move executorch delegation to usetorch.ops.quant.choose_qparams_affine.default
, @mcr229 and @digantdesai has worked on this before
@jerryzh168 moving executorch delegates and flow to use ops from torch.ops.quant we need to make PT2E flow do the same thing. I think we have circled on this in the past but not clear if we converged whether quantized_decomposed ops and PT2E workflow should move to AO or torch.ops.quant should mvoe to pytorch core. Can you say what the plan is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea choose_qparams_affine is a supported node for the xnnpack partitioner:
https://github.com/pytorch/executorch/blob/073397357118feef0fca91326ed612ce5c60d53b/backends/xnnpack/partition/config/quant_affine_configs.py#L57
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kimishpatel we want to move pt2e workflow to ao, just sent you the plan in internal chat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I'll change QDQ to use ops in torch.ops.quant like choose_qparams_affine and dequantize_affine. This will make it equivalent to PlainLayout today.
torchao/experimental/q_dq_layout.py
Outdated
w_int_data = weight_tensor.tensor_impl.int_data | ||
w_scale = weight_tensor.tensor_impl.scale | ||
w_zero_point = weight_tensor.tensor_impl.zero_point | ||
assert len(weight_tensor.block_size) == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some meaningful message?
weight_dtype=weight_dtype, | ||
granularity=granularity, | ||
has_weight_zeros=has_weight_zeros, | ||
layout=PlainLayout(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cant figure out what this one dispatches to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This took me a while to figure out, but I think it dispatches to the fallback path here: https://github.com/pytorch/ao/blob/main/torchao/dtypes/affine_quantized_tensor_ops.py#L171-L201
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also have some feedback that we should not have a fallback path like this, maybe we can create a dispatch for it as well to make it clearer and remove the fallback path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also have some feedback that we should not have a fallback path like this, maybe we can create a dispatch for it as well to make it clearer and remove the fallback path
Yeah that would be cleaner.
|
||
to_export_with_old_api = copy.deepcopy(model) | ||
|
||
print("Quantizing model") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
unwrap_tensor_subclass(model) | ||
|
||
print("Exporting quantized model") | ||
exported = torch.export.export(model, (activations,), strict=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export_for_training?
if __name__ == "__main__": | ||
unittest.main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this? You can just run this with pytest, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could, but pytest doesn't show me messages like "Testing weight_dtype={weight_dtype}, has_weight_zeros={has_weight_zeros}", which was helpful for debugging.
I can remove the print statements and use pytest, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I presume you will address the pending comments and nothing critical is block, approving
d8feece
to
4830268
Compare
No description provided.