Q dq layout #1642

metascroy · 2025-01-29T20:22:22Z

No description provided.

pytorch-bot · 2025-01-29T20:22:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1642

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCM Infra failures during checkout of PyTorch

This comment was automatically generated by Dr. CI and updates every 15 minutes.

metascroy · 2025-01-29T20:28:24Z

torchao/experimental/quant_api.py

+        if not any(
+            isinstance(layout, layout_class)
+            for layout_class in [
+                QDQLayout,


@jerryzh168 the QDQLayout here fuses the activation quantization ops with the weight ops rather than use to_affine_quantized_intx.

The reason is because using to_affine_quantized_intx means "torch.ops.quant.choose_qparams_affine.default" op will show up in export instead of "torch.ops.quantized_decomposed.choose_qparams_per_token_asymmetric.default" (wanted by ExecuTorch). By fusing, I can control the ops used in register_aqt_quantized_linear_dispatch.

Is there a better way of doing this dispatch? Something like register_aqt_quantized_linear_dispatch, but for the activations.

I think torch.ops.quant.choose_qparams_affine.default covers torch.ops.quantized_decomposed.choose_qparams_per_token_asymmetric.default already? I think we should move executorch delegation to use torch.ops.quant.choose_qparams_affine.default, @mcr229 and @digantdesai has worked on this before

I was trying to make the exported graph equivalent to what is Int8DynActInt4WeightQuantizer produces. If XNNPACK partitioner can handle torch.ops.quant.choose_qparams_affine.default, then that also works. cc @digantdesai @mcr229 to confirm.

I think torch.ops.quant.choose_qparams_affine.default covers torch.ops.quantized_decomposed.choose_qparams_per_token_asymmetric.default already? I think we should move executorch delegation to use torch.ops.quant.choose_qparams_affine.default, @mcr229 and @digantdesai has worked on this before

@jerryzh168 moving executorch delegates and flow to use ops from torch.ops.quant we need to make PT2E flow do the same thing. I think we have circled on this in the past but not clear if we converged whether quantized_decomposed ops and PT2E workflow should move to AO or torch.ops.quant should mvoe to pytorch core. Can you say what the plan is?

yea choose_qparams_affine is a supported node for the xnnpack partitioner:
https://github.com/pytorch/executorch/blob/073397357118feef0fca91326ed612ce5c60d53b/backends/xnnpack/partition/config/quant_affine_configs.py#L57

@kimishpatel we want to move pt2e workflow to ao, just sent you the plan in internal chat

Sounds good.

Sounds good! I'll change QDQ to use ops in torch.ops.quant like choose_qparams_affine and dequantize_affine. This will make it equivalent to PlainLayout today.

kimishpatel · 2025-01-30T04:29:18Z

torchao/experimental/q_dq_layout.py

+    w_int_data = weight_tensor.tensor_impl.int_data
+    w_scale = weight_tensor.tensor_impl.scale
+    w_zero_point = weight_tensor.tensor_impl.zero_point
+    assert len(weight_tensor.block_size) == 2


Can you add some meaningful message?

kimishpatel · 2025-01-30T04:50:50Z

torchao/experimental/tests/test_q_dq_layout.py

+                        weight_dtype=weight_dtype,
+                        granularity=granularity,
+                        has_weight_zeros=has_weight_zeros,
+                        layout=PlainLayout(),


I cant figure out what this one dispatches to

code pointer?

This took me a while to figure out, but I think it dispatches to the fallback path here: https://github.com/pytorch/ao/blob/main/torchao/dtypes/affine_quantized_tensor_ops.py#L171-L201

we also have some feedback that we should not have a fallback path like this, maybe we can create a dispatch for it as well to make it clearer and remove the fallback path

we also have some feedback that we should not have a fallback path like this, maybe we can create a dispatch for it as well to make it clearer and remove the fallback path

Yeah that would be cleaner.

kimishpatel · 2025-01-30T05:06:40Z

torchao/experimental/tests/test_q_dq_layout.py

+
+        to_export_with_old_api = copy.deepcopy(model)
+
+        print("Quantizing model")


kimishpatel · 2025-01-30T05:06:56Z

torchao/experimental/tests/test_q_dq_layout.py

+        unwrap_tensor_subclass(model)
+
+        print("Exporting quantized model")
+        exported = torch.export.export(model, (activations,), strict=True)


export_for_training?

kimishpatel · 2025-01-30T05:07:39Z

torchao/experimental/tests/test_q_dq_layout.py

+if __name__ == "__main__":
+    unittest.main()


Why this? You can just run this with pytest, right?

I could, but pytest doesn't show me messages like "Testing weight_dtype={weight_dtype}, has_weight_zeros={has_weight_zeros}", which was helpful for debugging.

I can remove the print statements and use pytest, though.

kimishpatel

Looks good to me. I presume you will address the pending comments and nothing critical is block, approving

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 29, 2025

metascroy requested review from jerryzh168, digantdesai and kimishpatel January 29, 2025 20:22

metascroy commented Jan 29, 2025

View reviewed changes

kimishpatel reviewed Jan 30, 2025

View reviewed changes

kimishpatel approved these changes Jan 31, 2025

View reviewed changes

metascroy added 7 commits February 5, 2025 15:14

add q-dq layout for ET

0659a61

up

413ed19

up

d241182

up

b0167de

up

59171d8

up

680f222

up

4830268

metascroy force-pushed the q-dq-layout branch from d8feece to 4830268 Compare February 5, 2025 23:46

up

55be374

metascroy merged commit bc1530b into main Feb 5, 2025
15 of 16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q dq layout #1642

Q dq layout #1642

metascroy commented Jan 29, 2025

pytorch-bot bot commented Jan 29, 2025 •

edited

Loading

metascroy Jan 29, 2025

jerryzh168 Jan 29, 2025 •

edited

Loading

metascroy Jan 30, 2025

kimishpatel Jan 30, 2025

mcr229 Jan 30, 2025

jerryzh168 Jan 30, 2025

kimishpatel Jan 31, 2025

metascroy Feb 3, 2025

kimishpatel Jan 30, 2025

kimishpatel Jan 30, 2025

kimishpatel Jan 30, 2025

metascroy Jan 30, 2025

jerryzh168 Jan 30, 2025

kimishpatel Jan 31, 2025

kimishpatel Jan 30, 2025

kimishpatel Jan 30, 2025

kimishpatel Jan 30, 2025

metascroy Jan 30, 2025

kimishpatel left a comment


		to_export_with_old_api = copy.deepcopy(model)

		print("Quantizing model")

Q dq layout #1642

Q dq layout #1642

Conversation

metascroy commented Jan 29, 2025

pytorch-bot bot commented Jan 29, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1642

❗ 1 Active SEVs

Choose a reason for hiding this comment

jerryzh168 Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kimishpatel left a comment

Choose a reason for hiding this comment

pytorch-bot bot commented Jan 29, 2025 •

edited

Loading

jerryzh168 Jan 29, 2025 •

edited

Loading