Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Inefficient x64 codegen for integer comparisons #188

Open
abrown opened this issue Feb 6, 2020 · 2 comments
Open

Inefficient x64 codegen for integer comparisons #188

abrown opened this issue Feb 6, 2020 · 2 comments

Comments

@abrown
Copy link
Contributor

abrown commented Feb 6, 2020

In both cranelift and v8, unsigned integer comparison are lowered to more than 1instruction:

  • unsigned greater/less-than takes 4 instructions; e.g. cranelift and v8
  • both unsigned and signed greater/less-than-or-equal take 2 instructions; e.g. cranelift and v8

These seem like high-use instructions and I wonder if there is any good way to get around this inefficiency.

@dtig
Copy link
Member

dtig commented Feb 18, 2020

For the unsigned greater/less-than in V8, we have an extra pcmpeqd to synthesize all ones, which is something we could get rid of with future optimizations because pxor can take a memory operand. For the greater/less-than-or-equal cases given that there is no one instruction, I think the two instruction sequence is possibly the best option. I doubt this is actionable, will leave it open for now to see if others have opinions about this.

@zeux
Copy link
Contributor

zeux commented Feb 23, 2020

To add to the list, compare for inequality takes 3 instructions, but compare for equality takes 1.

I've ran into this in context of LLVM strength-reducing one to another: it will replace i32x4.gt(value, 0) with i32x4.ne(value, 0) if it knows value is non-negative, which has a slight penalty on the codegen.

I agree that short of a slightly more efficient "not" by using a memory operand (which removes 1 instruction in both cases) there doesn't seem to be anything else that could be done here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants