Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q dq layout #1642

Merged
merged 8 commits into from
Feb 5, 2025
Merged

Q dq layout #1642

merged 8 commits into from
Feb 5, 2025

Conversation

metascroy
Copy link
Contributor

No description provided.

Copy link

pytorch-bot bot commented Jan 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1642

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 29, 2025
if not any(
isinstance(layout, layout_class)
for layout_class in [
QDQLayout,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerryzh168 the QDQLayout here fuses the activation quantization ops with the weight ops rather than use to_affine_quantized_intx.

The reason is because using to_affine_quantized_intx means "torch.ops.quant.choose_qparams_affine.default" op will show up in export instead of "torch.ops.quantized_decomposed.choose_qparams_per_token_asymmetric.default" (wanted by ExecuTorch). By fusing, I can control the ops used in register_aqt_quantized_linear_dispatch.

Is there a better way of doing this dispatch? Something like register_aqt_quantized_linear_dispatch, but for the activations.

Copy link
Contributor

@jerryzh168 jerryzh168 Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think torch.ops.quant.choose_qparams_affine.default covers torch.ops.quantized_decomposed.choose_qparams_per_token_asymmetric.default already? I think we should move executorch delegation to use torch.ops.quant.choose_qparams_affine.default, @mcr229 and @digantdesai has worked on this before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to make the exported graph equivalent to what is Int8DynActInt4WeightQuantizer produces. If XNNPACK partitioner can handle torch.ops.quant.choose_qparams_affine.default, then that also works. cc @digantdesai @mcr229 to confirm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think torch.ops.quant.choose_qparams_affine.default covers torch.ops.quantized_decomposed.choose_qparams_per_token_asymmetric.default already? I think we should move executorch delegation to use torch.ops.quant.choose_qparams_affine.default, @mcr229 and @digantdesai has worked on this before

@jerryzh168 moving executorch delegates and flow to use ops from torch.ops.quant we need to make PT2E flow do the same thing. I think we have circled on this in the past but not clear if we converged whether quantized_decomposed ops and PT2E workflow should move to AO or torch.ops.quant should mvoe to pytorch core. Can you say what the plan is?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimishpatel we want to move pt2e workflow to ao, just sent you the plan in internal chat

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I'll change QDQ to use ops in torch.ops.quant like choose_qparams_affine and dequantize_affine. This will make it equivalent to PlainLayout today.

w_int_data = weight_tensor.tensor_impl.int_data
w_scale = weight_tensor.tensor_impl.scale
w_zero_point = weight_tensor.tensor_impl.zero_point
assert len(weight_tensor.block_size) == 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some meaningful message?

weight_dtype=weight_dtype,
granularity=granularity,
has_weight_zeros=has_weight_zeros,
layout=PlainLayout(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cant figure out what this one dispatches to

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code pointer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took me a while to figure out, but I think it dispatches to the fallback path here: https://github.com/pytorch/ao/blob/main/torchao/dtypes/affine_quantized_tensor_ops.py#L171-L201

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also have some feedback that we should not have a fallback path like this, maybe we can create a dispatch for it as well to make it clearer and remove the fallback path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also have some feedback that we should not have a fallback path like this, maybe we can create a dispatch for it as well to make it clearer and remove the fallback path

Yeah that would be cleaner.


to_export_with_old_api = copy.deepcopy(model)

print("Quantizing model")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

unwrap_tensor_subclass(model)

print("Exporting quantized model")
exported = torch.export.export(model, (activations,), strict=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export_for_training?

Comment on lines 172 to 173
if __name__ == "__main__":
unittest.main()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this? You can just run this with pytest, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, but pytest doesn't show me messages like "Testing weight_dtype={weight_dtype}, has_weight_zeros={has_weight_zeros}", which was helpful for debugging.

I can remove the print statements and use pytest, though.

Copy link
Contributor

@kimishpatel kimishpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I presume you will address the pending comments and nothing critical is block, approving

@metascroy metascroy merged commit bc1530b into main Feb 5, 2025
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants