Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Inefficient x64 codegen for fabs/fneg #187

Open
abrown opened this issue Feb 6, 2020 · 5 comments
Open

Inefficient x64 codegen for fabs/fneg #187

abrown opened this issue Feb 6, 2020 · 5 comments

Comments

@abrown
Copy link
Contributor

abrown commented Feb 6, 2020

In both cranelift (ignore the bitcasts) and v8, floating-point absolute value and floating-point negation are 3-instruction lowerings. I don't believe there is any better lowering than these (is there?) for this type of instruction, but I opened this issue to discuss the inefficiency: I would guess @dtig would say that these are in the class of useful instructions to have (and I agree) but predicting performance across all platforms might be more transparent if Emscripten, e.g., generated Wasm without these instructions. In other words, if Emscripten generated a sequence like const + shift + and instead of fabs it would be more obvious what the cost would be and Wasm compilers could still use a peephole optimization to convert const + shift + and to a single instruction on platforms that support this.

@tlively
Copy link
Member

tlively commented Feb 14, 2020

I'm not sure the benefit of having wasm instruction counts match native instruction counts is worth all the the extra work of making engines pattern match instruction sequences. In general, the higher-level the operation, the more information is available to the engine to enable it to make the best codegen choices possible. It is better to have an operation that takes three instructions to implement than to replace that operation everywhere with three operations if that would make codegen harder for some platform.

@jan-wassenberg
Copy link

FYI AVX-512 and NEON have fabs() instructions.

@zeux
Copy link
Contributor

zeux commented Feb 22, 2020

(sorry, I know you're all tired of me and my "constants are awesome" comments)

fabsf/fneg are single instruction (two uops) with a memory load. You only need a single constant for all of these, which has a high bit set and low bits unset (sign mask).

fneg(v) = xor(v, -0.f)
fabs(v) = andnot(v, -0.f)

So this adds at most one 16b constant to the function that needs to perform this math. -0.f is amazingly useful in other cases, often it can be present in the user code as well to do operations like copysign.

@zeux
Copy link
Contributor

zeux commented Feb 22, 2020

If you don’t like constants, alternative lowering options:

fneg: xorps (to produce zero) + subps; 2 insn if you can’t find a ready to use zero register, 1 if you can
fabs: fneg + maxps = 3 insn (2 if you already have 0 reg).

I am not sure of the exact NaN and negative zero semantics that the spec mandates for these so this may or may not be compliant.

@ngzhian
Copy link
Member

ngzhian commented Oct 8, 2020

Unfortunately, the alternative lowering options won't work since fneg and fabs are specced as purely binary operations, see https://webassembly.github.io/spec/core/exec/numerics.html#op-fneg so the nans need to be preserved.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants