relaxed i32x4.trunc_sat_f32x4_{s,u} i32x4.trunc_sat_f64x2_{s,u}_zero #21

ngzhian · 2021-04-16T22:43:33Z

What are the instructions being proposed?

Relaxed versions of:

i32x4.trunc_sat_f32x4_s
i32x4.trunc_sat_f32x4_u
i32x4.trunc_sat_f64x2_s_zero
i32x4.trunc_sat_f64x2_u_zero

from Simd128. (Names undecided)

What are the semantics of these instructions?

Convert f32x4/f64x2 to i32x4 with truncation (signed/unsigned). If the inputs are out of range or NaNs, the result is implementation-defined.

How will these instructions be implemented? Give examples for at least
x86-64 and ARM64. Also provide reference implementation in terms of 128-bit
Wasm SIMD.

x86/64

relaxed i32x4.trunc_sat_f32x4_s = CVTTPS2DQ
relaxed i32x4.trunc_sat_f32x4_u = VCVTTPS2UDQ (AVX512), Simd128 i32x4.trunc_sat_f32x4_u otherwise (can be slightly optimized to ignore NaNs)
relaxed i32x4.trunc_sat_f64x2_s_zero = CVTTPD2DQ
relaxed i32x4.trunc_sat_f64x2_u_zero = VCVTTPD2UDQ (AVX512), Simd128 i32x4.trunc_sat_f64x2_u_zero

ARM64

relaxed i32x4.trunc_sat_f32x4_s = FCVTZS
relaxed i32x4.trunc_sat_f32x4_u = FCVTZU
relaxed i32x4.trunc_sat_f64x2_s_zero = FCVTZS + SQXTN
relaxed i32x4.trunc_sat_f64x2_u_zero = FCVTZU + UQXTN

ARM NEON

relaxed i32x4.trunc_sat_f32x4_s = vcvt.S32.F32
relaxed i32x4.trunc_sat_f32x4_u = vcvt.U32.F32
relaxed i32x4.trunc_sat_f64x2_s_zero = vcvt.S32.F64 + vcvt.S32.F64 + vmov
relaxed i32x4.trunc_sat_f64x2_u_zero = vcvt.U32.F64 + vcvt.U32.F64 + vmov

Note: On ARM MVE, double precision conversions require Armv8-M Floating-point Extension (FPv5), MVE can be implemented with or without such an extension.

simd128

respective non-relaxed versions i32x4.trunc_sat_f32x4_s, i32x4.trunc_sat_f32x4_u, i32x4.trunc_sat_f64x2_s_zero, i32x4.trunc_sat_f64x2_u_zero.

How does behavior differ across processors? What new fingerprinting surfaces will be exposed?

For i32x4.trunc_sat_f32x4_s:

x86/64 will return 0x8000000 in lanes for out of range or NaNs
ARM/ARM64 will return 0 for NaNs and saturated results of out of range

For i32x4.trunc_sat_f32x4_u:

x86/64 will return 0xFFFFFFFF in lanes for out of range or NaNs, if if AVX512 is available, 0 otherwise (but require more instruction counts)
ARM/ARM64 will return 0 for NaNs and saturated results of out of range

For i32x4.trunc_sat_f64x2_s_zero:

x86/64, 0x80000000 for out of range or NaNs
ARM/ARM64 will return 0 for NaNs and saturated results of out of range

For i32x4.trunc_sat_f64x2_u_zero:

x86/64, 0xFFFFFFFF for out of range or NaNs if AVX512 is available, 0 otherwise
ARM/ARM64 will return 0 for NaNs and saturated results of out of range

What use cases are there?

Conversion instructions are common, if the application can guarantee the input range we can get good performance on all architectures.

The text was updated successfully, but these errors were encountered:

Maratyszcza · 2021-04-18T15:25:17Z

IIRC @zeux had a use-case for these instructions.

It would be useful to consider f64x2 variants in the same proposal.

Maratyszcza · 2021-04-18T15:26:31Z

For i32x4.trunc_sat_f32x4_u, it will depend on implementation choice on x86/64:

if AVX512 is available, same as above, x86/64 will return 0x8000000 in lanes for out of range or NaNs, ARM/ARM64 will return 0

AVX512 version returns 0xFFFFFFFF

ngzhian · 2021-04-19T18:00:34Z

AVX512 version returns 0xFFFFFFFF

Corrected, thanks!

ngzhian · 2021-04-19T18:38:28Z

It would be useful to consider f64x2 variants in the same proposal.

relaxed i64x2.trunc_sat_f64x2_{s,u}? We don't have these instructions in Simd128, so I think it is neater to separate them out.

Maratyszcza · 2021-04-19T21:21:04Z

relaxed i64x2.trunc_sat_f64x2_{s,u}? We don't have these instructions in Simd128, so I think it is neater to separate them out.

The WebAssembly/simd#383 instructions

ngzhian · 2021-04-19T21:49:38Z

i32x4.trunc_sat_f64x2_u_zero and i32x4.trunc_sat_f64x2_s_zero?

Maratyszcza · 2021-04-19T22:16:28Z

Yes

zeux · 2021-04-19T22:29:59Z

Yeah this one is pretty fundamental for many workflows, e.g. in rendering domains it's common to store data as fixed-point integers for GPU consumption but to prepare this data you do some math in floating point and then convert to integer via smth like int(v * 65535.0f + 0.5f) (assuming the value is known to be positive); the float->int truncation can be pretty hot based on the amount of other computation.

It would be nice to also include the rounding variants (on x64 assuming default rounding mode setup you can use cvtps2dq for rounding conversion and cvttps2dq for truncating; unsure what floating point environment is typically used in browser context, if it's undefined then rounding would require vroundps before cvttps).

For WebAssembly#21.

For #21.

yurydelendik · 2021-09-28T15:59:35Z

What will be the exact recipe for relaxed i32x4.trunc_sat_f32x4_u for x86/64 without AVX512? The comment at #247 suggests somewhat long version.

Is the following acceptable or the shorter version exists?

y = relaxed i32x4.trunc_sat_f32x4_u(x) is lowered to:
- MOVAPD xmm_y, xmm_x
- MOVAPD xmm_tmp, [wasm_i32x4_splat(0x4f000000)]
- CMPLTPS xmm_tmp, xmm_x
- PAND xmm_tmp, xmm_x
- PXOR xmm_y, xmm_tmp
- CVTTPS2DQ xmm_y, xmm_y
- PSLLD xmm_tmp, 7
- PADDD xmm_y, xmm_tmp

ngzhian · 2021-09-28T17:08:53Z

it will be CVTTPS2DQ. The relaxed version only guarantees output when inputs are < INT32_MAX and not NaN, which is exactly what CVTTPS2DQ is, which is available since SSE2.

Maratyszcza · 2021-09-28T17:15:39Z

@ngzhian The question was about the unsigned version, and IIUC we don't expect unsigned version to use just CVTTPS2DQ alone.

ngzhian · 2021-09-28T17:16:58Z

Oh oops, sorry I missed that. Hm, then we should reconsider if we want the unsigned version in this. AVX512 is not supported by V8 yet.

Maratyszcza · 2021-09-29T00:19:05Z

IMO it is worth to have unsigned version, both for symmetry and because is it still faster on SSE4.1 than the non-relaxed unsigned version.

…s. r=lth See WebAssembly/relaxed-simd#21 Differential Revision: https://phabricator.services.mozilla.com/D126513

zeux · 2021-10-01T02:44:32Z

Should these instructions have _sat in the name? In the SIMD MVP _sat stands for saturating, but these instructions don't specify exact behavior for out of range inputs.

ngzhian · 2021-10-14T23:58:24Z

What is PSLLD xmm_tmp, 7 for? I think it doesn't work for all cases, consider the input 2147483904.0, this is larger that MAX_INT32, but fits int UINT32, so the result should be 2147483904, or 0x80000100
The hex representation of 2147483904.0 is https://float.exposed/0x4f000001 and if we shift left by 7 it becomes 0x80000080, which is wrong.

yurydelendik · 2021-10-15T14:14:32Z

Agree, there was a mistake 😞 One more operation is needed to make PSLLD work: ADDPS xmm_tmp. xmm_tmp ; PSLLD xmm_tmp, 8.

ngzhian · 2021-10-15T16:33:57Z

Agree, there was a mistake 😞 One more operation is needed to make PSLLD work: ADDPS xmm_tmp. xmm_tmp ; PSLLD xmm_tmp, 8.

Perfect, what a neat trick :) thanks!

ngzhian · 2021-11-01T19:45:47Z

Note: RISC-V V saturates for same width conversions. For f64x2->i32x4 it changes the vector type, and I think there's no guarantee that the top are zeroed.

ngzhian · 2021-11-01T23:24:34Z

On PowerPC VSX xscvdpsxws and xscvdpuxds perform trunc sat

ngzhian · 2022-03-14T20:56:46Z

I think I got the out of range results wrong in this description, ARM/ARM64 doesn't return 0, it saturates.

Codegen details detailed in the relevant github issue. WebAssembly/relaxed-simd#21 Bug: v8:12284 Change-Id: I06c8859035abae775269bdf949ff0f1c2e262859 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3508560 Reviewed-by: Adam Klein <[email protected]> Commit-Queue: Deepti Gandluri <[email protected]> Cr-Commit-Position: refs/heads/main@{#79410}

ngzhian added the instruction-proposal label Apr 16, 2021

ngzhian changed the title ~~relaxed i32x4.trunc_sat_f32x4_{s,u}~~ relaxed i32x4.trunc_sat_f32x4_{s,u} i32x4.trunc_sat_f64x2_{s,u}_zero Apr 19, 2021

ngzhian added a commit to ngzhian/relaxed-simd that referenced this issue Jun 10, 2021

Add relaxed float to int conversions

d3b3ff1

For WebAssembly#21.

ngzhian mentioned this issue Jun 10, 2021

Add relaxed float to int conversions #25

Merged

ngzhian added a commit that referenced this issue Jun 24, 2021

Add relaxed float to int conversions (#25)

4fced21

For #21.

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Sep 29, 2021

Bug 1731853 - Prototype relaxed-SIMD i32x4.trunc_sat_fXXX instruction…

548b6d3

…s. r=lth See WebAssembly/relaxed-simd#21 Differential Revision: https://phabricator.services.mozilla.com/D126513

ngzhian mentioned this issue Sep 29, 2021

SIMD subgroup meeting on 2021-10-01 #37

Closed

aosmond pushed a commit to aosmond/gecko that referenced this issue Sep 30, 2021

Bug 1731853 - Prototype relaxed-SIMD i32x4.trunc_sat_fXXX instruction…

6b32520

…s. r=lth See WebAssembly/relaxed-simd#21 Differential Revision: https://phabricator.services.mozilla.com/D126513

ngzhian added the in-overview Instruction has been added to Overview.md label Feb 18, 2022

tomrittervg mentioned this issue Jun 16, 2022

WebAssembly Relaxed SIMD mozilla/standards-positions#651

Open

alexcrichton mentioned this issue Feb 22, 2023

Float-to-signed-integer conversion in Overview.md vs spec #126

Closed

alexcrichton mentioned this issue Mar 2, 2023

x64: Improve codegen of the relaxed f32-to-u32 instructions bytecodealliance/wasmtime#5913

Open

ngzhian mentioned this issue Mar 23, 2023

Alternative output for i32x4_relaxed_trunc.wast tests #140

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

relaxed i32x4.trunc_sat_f32x4_{s,u} i32x4.trunc_sat_f64x2_{s,u}_zero #21

relaxed i32x4.trunc_sat_f32x4_{s,u} i32x4.trunc_sat_f64x2_{s,u}_zero #21

ngzhian commented Apr 16, 2021 •

edited

Loading

Maratyszcza commented Apr 18, 2021

Maratyszcza commented Apr 18, 2021

ngzhian commented Apr 19, 2021 •

edited

Loading

ngzhian commented Apr 19, 2021

Maratyszcza commented Apr 19, 2021

ngzhian commented Apr 19, 2021

Maratyszcza commented Apr 19, 2021

zeux commented Apr 19, 2021

yurydelendik commented Sep 28, 2021

ngzhian commented Sep 28, 2021

Maratyszcza commented Sep 28, 2021

ngzhian commented Sep 28, 2021

Maratyszcza commented Sep 29, 2021

zeux commented Oct 1, 2021

ngzhian commented Oct 14, 2021

yurydelendik commented Oct 15, 2021 •

edited

Loading

ngzhian commented Oct 15, 2021

ngzhian commented Nov 1, 2021

ngzhian commented Nov 1, 2021

ngzhian commented Mar 14, 2022 •

edited

Loading

relaxed i32x4.trunc_sat_f32x4_{s,u} i32x4.trunc_sat_f64x2_{s,u}_zero #21

relaxed i32x4.trunc_sat_f32x4_{s,u} i32x4.trunc_sat_f64x2_{s,u}_zero #21

Comments

ngzhian commented Apr 16, 2021 • edited Loading

x86/64

ARM64

ARM NEON

simd128

Maratyszcza commented Apr 18, 2021

Maratyszcza commented Apr 18, 2021

ngzhian commented Apr 19, 2021 • edited Loading

ngzhian commented Apr 19, 2021

Maratyszcza commented Apr 19, 2021

ngzhian commented Apr 19, 2021

Maratyszcza commented Apr 19, 2021

zeux commented Apr 19, 2021

yurydelendik commented Sep 28, 2021

ngzhian commented Sep 28, 2021

Maratyszcza commented Sep 28, 2021

ngzhian commented Sep 28, 2021

Maratyszcza commented Sep 29, 2021

zeux commented Oct 1, 2021

ngzhian commented Oct 14, 2021

yurydelendik commented Oct 15, 2021 • edited Loading

ngzhian commented Oct 15, 2021

ngzhian commented Nov 1, 2021

ngzhian commented Nov 1, 2021

ngzhian commented Mar 14, 2022 • edited Loading

ngzhian commented Apr 16, 2021 •

edited

Loading

ngzhian commented Apr 19, 2021 •

edited

Loading

yurydelendik commented Oct 15, 2021 •

edited

Loading

ngzhian commented Mar 14, 2022 •

edited

Loading