Inefficient x64 codegen for fabs/fneg #187

abrown · 2020-02-06T21:12:37Z

In both cranelift (ignore the bitcasts) and v8, floating-point absolute value and floating-point negation are 3-instruction lowerings. I don't believe there is any better lowering than these (is there?) for this type of instruction, but I opened this issue to discuss the inefficiency: I would guess @dtig would say that these are in the class of useful instructions to have (and I agree) but predicting performance across all platforms might be more transparent if Emscripten, e.g., generated Wasm without these instructions. In other words, if Emscripten generated a sequence like const + shift + and instead of fabs it would be more obvious what the cost would be and Wasm compilers could still use a peephole optimization to convert const + shift + and to a single instruction on platforms that support this.

The text was updated successfully, but these errors were encountered:

tlively · 2020-02-14T01:42:59Z

I'm not sure the benefit of having wasm instruction counts match native instruction counts is worth all the the extra work of making engines pattern match instruction sequences. In general, the higher-level the operation, the more information is available to the engine to enable it to make the best codegen choices possible. It is better to have an operation that takes three instructions to implement than to replace that operation everywhere with three operations if that would make codegen harder for some platform.

jan-wassenberg · 2020-02-14T06:32:35Z

FYI AVX-512 and NEON have fabs() instructions.

zeux · 2020-02-22T19:42:22Z

(sorry, I know you're all tired of me and my "constants are awesome" comments)

fabsf/fneg are single instruction (two uops) with a memory load. You only need a single constant for all of these, which has a high bit set and low bits unset (sign mask).

fneg(v) = xor(v, -0.f)
fabs(v) = andnot(v, -0.f)

So this adds at most one 16b constant to the function that needs to perform this math. -0.f is amazingly useful in other cases, often it can be present in the user code as well to do operations like copysign.

zeux · 2020-02-22T20:48:58Z

If you don’t like constants, alternative lowering options:

fneg: xorps (to produce zero) + subps; 2 insn if you can’t find a ready to use zero register, 1 if you can
fabs: fneg + maxps = 3 insn (2 if you already have 0 reg).

I am not sure of the exact NaN and negative zero semantics that the spec mandates for these so this may or may not be compliant.

ngzhian · 2020-10-08T01:40:04Z

Unfortunately, the alternative lowering options won't work since fneg and fabs are specced as purely binary operations, see https://webassembly.github.io/spec/core/exec/numerics.html#op-fneg so the nans need to be preserved.

ngzhian added the perf documentation label Mar 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inefficient x64 codegen for fabs/fneg #187

Inefficient x64 codegen for fabs/fneg #187

abrown commented Feb 6, 2020

tlively commented Feb 14, 2020

jan-wassenberg commented Feb 14, 2020

zeux commented Feb 22, 2020

zeux commented Feb 22, 2020 •

edited

Loading

ngzhian commented Oct 8, 2020

Inefficient x64 codegen for fabs/fneg #187

Inefficient x64 codegen for fabs/fneg #187

Comments

abrown commented Feb 6, 2020

tlively commented Feb 14, 2020

jan-wassenberg commented Feb 14, 2020

zeux commented Feb 22, 2020

zeux commented Feb 22, 2020 • edited Loading

ngzhian commented Oct 8, 2020

zeux commented Feb 22, 2020 •

edited

Loading