-
Notifications
You must be signed in to change notification settings - Fork 43
Inefficient x64 codegen for splat #191
Comments
Since splat is a high-use instruction, is there a different semantics that would cover most of its uses and also have better codegen? Or would simplifying the codegen for splats just lead to proportionally more complex user code to regain their current functionality? |
This is very specifically an Intel ISA quirk because Is there something specific you would like to propose to mitigate this apart from getting rid of a high value operation? If not, and this is more highlighting a code generation issue - I'm not sure anything can actually be done about it given the different semantics for different bit widths on x64. |
I don't think the key is actually the different semantics, it's that these instructions can't address scalar registers directly and are forced to |
The different semantics are an issue for the specific i16x8.splat that you linked code to, but I agree that the additional |
There doesn't seem to be anything actionable here, so closing this issue - please reopen if you have suggestions for more we can do here. |
Can I get permissions to re-open this? I think the actionable part is to document the possible lowerings that improve the situation in the "implementor's guide" document (do we have one yet?). Specifically on x86, this high-use instruction can be:
|
Not sure if you need permissions to reopen as the original author for the issue, but reopening. This was previously discussed at a meeting (03/06), and there was an AI for Intel folks who were discussing this at the meeting to follow up with PRs/Issues to decide where this document should live, and what form it should take. |
splat
has 2- to 3-instruction lowerings in cranelift and v8. I believe the "splat all ones" and "splat all zeroes" cases are a single-instruction lowering in both platforms but it is unfortunate that other values ofsplat
will incur a multi-instruction overhead, especially sincesplat
would seem to be a high-use instruction.The text was updated successfully, but these errors were encountered: