Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] in Xnnpack EP, the conversion for fused activation param isn't correct #23115

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

mszhanyi
Copy link
Contributor

@mszhanyi mszhanyi commented Dec 16, 2024

Description

In Xnnpack EP, the activation_param's conversion isn't correct for Fp16 model
Sometimes, it may cause an exception that "lower bound must be below upper bound"
Because CPU EP doesn't support FP16 activation fusion now, so the newly added test skips the comparison of the test result.

Motivation and Context

Test Cases

2024-12-23T09:17:39.5038317Z [ RUN      ] XnnpackEP.TestNhwcConvReluFusion_FP16
2024-12-23T09:17:39.5079188Z �[0;93m2024-12-23 09:17:39.505334389 [W:onnxruntime:TestNhwcConvReluFusion_FP16, session_state.cc:1263 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.�[m
2024-12-23T09:17:39.5080635Z �[0;93m2024-12-23 09:17:39.505405629 [W:onnxruntime:TestNhwcConvReluFusion_FP16, session_state.cc:1265 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.�[m
2024-12-23T09:17:39.5099292Z [       OK ] XnnpackEP.TestNhwcConvReluFusion_FP16 (5 ms)
2024-12-23T09:17:39.5100453Z [ RUN      ] XnnpackEP.TestNhwcConvReluClipFusion_FP16
2024-12-23T09:17:39.5145494Z [       OK ] XnnpackEP.TestNhwcConvReluClipFusion_FP16 (5 ms)

Linux XnnPack on ARM64
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1576767&view=logs&j=317a4805-d006-5aac-2a82-238793a22a22&t=5481769e-95ca-5a84-acab-996d49e28b59

@mszhanyi mszhanyi marked this pull request as draft December 16, 2024 13:06
@mszhanyi mszhanyi marked this pull request as ready for review December 17, 2024 02:13
? *reinterpret_cast<const float*>(value.raw_data().data())
: value.float_data()[0];
int32_t arg_type;
if (GetType(arg, arg_type) && arg_type == ONNX_NAMESPACE::TensorProto_DataType_FLOAT16) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if GetType(arg, arg_type) failed here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally type info is always available, so I think this is ok. Shape info may be missing depending on the model.

The Conv op looks to be setup to allow fp32, u8, s8 and optionally fp16. Should this also handle u8 and s8 or should ClipReluChecker limit fusion to fp32 and fp16?

Copy link
Contributor Author

@mszhanyi mszhanyi Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, core runtime Clip fusion only supports float too.

if (initializer) {
Initializer i(*initializer, graph.ModelPath());
switch (initializer->data_type()) {
case ONNX_NAMESPACE::TensorProto_DataType_FLOAT:
value = *i.data<float>();
break;
// double isn't currently supported
// case ONNX_NAMESPACE::TensorProto_DataType_DOUBLE:
// value = static_cast<float>(*i.data<double>());
// break;
case ONNX_NAMESPACE::TensorProto_DataType_FLOAT16:
value = math::halfToFloat(i.data<MLFloat16>()->val);
break;
default:
ORT_THROW("Unexpected data type for Clip input of ", initializer->data_type());
.
Shall we update them together?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @snnn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave the core Clip fusion as-is for now. Can be a separate PR if we think there's a use-case that would benefit.

Are you planning on updating ClipReluChecker to limit the types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ClipQuantFusion is a separate topic as that's about ignoring a Clip or Relu when the Q zp and scale make it redundant.

I was asking if the XNNPACK EP ClipReluChecker needs to be updated to either limit the types it allows, or whether FuseActivation needs to handle u8 or s8 input for the Clip min/max.

This has no checks on types:

const NodeUnit* ClipReluChecker(const NodeUnit& node_unit,
const GraphViewer& graph,
const std::unordered_map<const Node*, const NodeUnit*>& supported_node_unit_map) {

But FuseActivation always uses a float in the activation params and with this PR is explicitly only checking for fp32 and fp16.

e.g. if there's a Conv node with u8 or s8 input it looks like ClipReluChecker will allow the activation, but FuseActivation won't do the right thing as the Clip min/max would be u8 or s8.

Copy link
Contributor Author

@mszhanyi mszhanyi Jan 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked https://onnx.ai/onnx/operators/onnx__Conv.html#type-constraints, Onnx Conv node shouldn't have u8 or s8 inputs. @skottmckay

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XNNPack EP's Conv implementation also handles QLinearConv doesn't it?

Copy link
Contributor Author

@mszhanyi mszhanyi Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But QLinearConv isn't in node_to_be_fuse list yet. Could we add it in the next PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good in that case.

To be safer it would be good to add an else that returns an error so that if we get a datatype other than fp32 or fp16 it isn't silently ignored. If we add QLinearConv to the nodes that can fuse (not sure why we don't allow that - maybe xnnpack doesn't support it) the else will make it much easier for a developer to discover they need to update this code.

Comment on lines 134 to 135
// So far, CPU EP doensn't support Fp16 Conv fusion, so verify_outputs is skipped.
RunAndVerifyOutputsWithEP(ort_model_path, "TestNhwcConvReluClipFusion_FP16", std::move(ep), feeds, params, {}, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite following. There should still be valid output from the CPU EP even if it doesn't fuse, so why can't we use verify_outputs?

Suggested change
// So far, CPU EP doensn't support Fp16 Conv fusion, so verify_outputs is skipped.
RunAndVerifyOutputsWithEP(ort_model_path, "TestNhwcConvReluClipFusion_FP16", std::move(ep), feeds, params, {}, false);
// So far, CPU EP doesn't support Fp16 Conv fusion, so verify_outputs is skipped.
RunAndVerifyOutputsWithEP(ort_model_path, "TestNhwcConvReluClipFusion_FP16", std::move(ep), feeds, params, {}, false);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, fixed

Copy link
Contributor Author

@mszhanyi mszhanyi Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, CPU EP doesn't implement FP16 Clip fusion. The output verification fails because it looks CPU EP falls back to FP32 Clip.

// TODO Add the following activations:
// MlasTanhActivation,
// MlasLogisticActivation,
// MlasClipActivation,

To verify the Xnnpack FP16 conv fusion correctness, I add a new test with a new FP16 model ( with only Conv+Relu).
Current test (Conv+Clip+Relu) is kept because I want to make sure that Conv+Clip fusion can run, that is, the activition parameters are added correctly.

@mszhanyi mszhanyi marked this pull request as draft December 20, 2024 02:53
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

onnxruntime/core/providers/cpu/fp16/fp16_activations.h Outdated Show resolved Hide resolved
@mszhanyi mszhanyi marked this pull request as ready for review December 23, 2024 10:02
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

onnxruntime/core/providers/xnnpack/detail/utils.cc Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants