-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] in Xnnpack EP, the conversion for fused activation param isn't correct #23115
base: main
Are you sure you want to change the base?
Conversation
? *reinterpret_cast<const float*>(value.raw_data().data()) | ||
: value.float_data()[0]; | ||
int32_t arg_type; | ||
if (GetType(arg, arg_type) && arg_type == ONNX_NAMESPACE::TensorProto_DataType_FLOAT16) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if GetType(arg, arg_type)
failed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally type info is always available, so I think this is ok. Shape info may be missing depending on the model.
The Conv op looks to be setup to allow fp32, u8, s8 and optionally fp16. Should this also handle u8 and s8 or should ClipReluChecker limit fusion to fp32 and fp16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, core runtime Clip fusion only supports float too.
onnxruntime/onnxruntime/core/optimizer/utils.cc
Lines 335 to 349 in c6ba7ed
if (initializer) { | |
Initializer i(*initializer, graph.ModelPath()); | |
switch (initializer->data_type()) { | |
case ONNX_NAMESPACE::TensorProto_DataType_FLOAT: | |
value = *i.data<float>(); | |
break; | |
// double isn't currently supported | |
// case ONNX_NAMESPACE::TensorProto_DataType_DOUBLE: | |
// value = static_cast<float>(*i.data<double>()); | |
// break; | |
case ONNX_NAMESPACE::TensorProto_DataType_FLOAT16: | |
value = math::halfToFloat(i.data<MLFloat16>()->val); | |
break; | |
default: | |
ORT_THROW("Unexpected data type for Clip input of ", initializer->data_type()); |
Shall we update them together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @snnn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave the core Clip fusion as-is for now. Can be a separate PR if we think there's a use-case that would benefit.
Are you planning on updating ClipReluChecker to limit the types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ClipQuantFusion is a separate topic as that's about ignoring a Clip or Relu when the Q zp and scale make it redundant.
I was asking if the XNNPACK EP ClipReluChecker needs to be updated to either limit the types it allows, or whether FuseActivation needs to handle u8 or s8 input for the Clip min/max.
This has no checks on types:
onnxruntime/onnxruntime/core/providers/xnnpack/detail/node_support_checker.cc
Lines 42 to 44 in 2d05c4b
const NodeUnit* ClipReluChecker(const NodeUnit& node_unit, | |
const GraphViewer& graph, | |
const std::unordered_map<const Node*, const NodeUnit*>& supported_node_unit_map) { |
But FuseActivation always uses a float in the activation params and with this PR is explicitly only checking for fp32 and fp16.
e.g. if there's a Conv node with u8 or s8 input it looks like ClipReluChecker will allow the activation, but FuseActivation won't do the right thing as the Clip min/max would be u8 or s8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked https://onnx.ai/onnx/operators/onnx__Conv.html#type-constraints, Onnx Conv node shouldn't have u8 or s8 inputs. @skottmckay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XNNPack EP's Conv implementation also handles QLinearConv doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But QLinearConv isn't in node_to_be_fuse list yet. Could we add it in the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be good in that case.
To be safer it would be good to add an else
that returns an error so that if we get a datatype other than fp32 or fp16 it isn't silently ignored. If we add QLinearConv to the nodes that can fuse (not sure why we don't allow that - maybe xnnpack doesn't support it) the else
will make it much easier for a developer to discover they need to update this code.
// So far, CPU EP doensn't support Fp16 Conv fusion, so verify_outputs is skipped. | ||
RunAndVerifyOutputsWithEP(ort_model_path, "TestNhwcConvReluClipFusion_FP16", std::move(ep), feeds, params, {}, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite following. There should still be valid output from the CPU EP even if it doesn't fuse, so why can't we use verify_outputs?
// So far, CPU EP doensn't support Fp16 Conv fusion, so verify_outputs is skipped. | |
RunAndVerifyOutputsWithEP(ort_model_path, "TestNhwcConvReluClipFusion_FP16", std::move(ep), feeds, params, {}, false); | |
// So far, CPU EP doesn't support Fp16 Conv fusion, so verify_outputs is skipped. | |
RunAndVerifyOutputsWithEP(ort_model_path, "TestNhwcConvReluClipFusion_FP16", std::move(ep), feeds, params, {}, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx, fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, CPU EP doesn't implement FP16 Clip fusion. The output verification fails because it looks CPU EP falls back to FP32 Clip.
onnxruntime/onnxruntime/core/providers/cpu/fp16/fp16_activations.h
Lines 74 to 77 in e0e8304
// TODO Add the following activations: | |
// MlasTanhActivation, | |
// MlasLogisticActivation, | |
// MlasClipActivation, |
To verify the Xnnpack FP16 conv fusion correctness, I add a new test with a new FP16 model ( with only Conv+Relu).
Current test (Conv+Clip+Relu) is kept because I want to make sure that Conv+Clip fusion can run, that is, the activition parameters are added correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
Description
In Xnnpack EP, the activation_param's conversion isn't correct for Fp16 model
Sometimes, it may cause an exception that "lower bound must be below upper bound"
Because CPU EP doesn't support FP16 activation fusion now, so the newly added test skips the comparison of the test result.
Motivation and Context
Test Cases
Linux XnnPack on ARM64
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1576767&view=logs&j=317a4805-d006-5aac-2a82-238793a22a22&t=5481769e-95ca-5a84-acab-996d49e28b59