-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
relaxed i32x4.trunc_sat_f32x4_{s,u} i32x4.trunc_sat_f64x2_{s,u}_zero #21
Comments
IIRC @zeux had a use-case for these instructions. It would be useful to consider |
AVX512 version returns 0xFFFFFFFF |
Corrected, thanks! |
relaxed i64x2.trunc_sat_f64x2_{s,u}? We don't have these instructions in Simd128, so I think it is neater to separate them out. |
The WebAssembly/simd#383 instructions |
i32x4.trunc_sat_f64x2_u_zero and i32x4.trunc_sat_f64x2_s_zero? |
Yes |
Yeah this one is pretty fundamental for many workflows, e.g. in rendering domains it's common to store data as fixed-point integers for GPU consumption but to prepare this data you do some math in floating point and then convert to integer via smth like It would be nice to also include the rounding variants (on x64 assuming default rounding mode setup you can use cvtps2dq for rounding conversion and cvttps2dq for truncating; unsure what floating point environment is typically used in browser context, if it's undefined then rounding would require vroundps before cvttps). |
What will be the exact recipe for relaxed Is the following acceptable or the shorter version exists?
|
it will be |
@ngzhian The question was about the unsigned version, and IIUC we don't expect unsigned version to use just |
Oh oops, sorry I missed that. Hm, then we should reconsider if we want the unsigned version in this. AVX512 is not supported by V8 yet. |
IMO it is worth to have unsigned version, both for symmetry and because is it still faster on SSE4.1 than the non-relaxed unsigned version. |
…s. r=lth See WebAssembly/relaxed-simd#21 Differential Revision: https://phabricator.services.mozilla.com/D126513
…s. r=lth See WebAssembly/relaxed-simd#21 Differential Revision: https://phabricator.services.mozilla.com/D126513
Should these instructions have |
What is |
Agree, there was a mistake 😞 One more operation is needed to make PSLLD work: |
Perfect, what a neat trick :) thanks! |
Note: RISC-V V saturates for same width conversions. For f64x2->i32x4 it changes the vector type, and I think there's no guarantee that the top are zeroed. |
On PowerPC VSX xscvdpsxws and xscvdpuxds perform trunc sat |
I think I got the out of range results wrong in this description, ARM/ARM64 doesn't return 0, it saturates. |
Codegen details detailed in the relevant github issue. WebAssembly/relaxed-simd#21 Bug: v8:12284 Change-Id: I06c8859035abae775269bdf949ff0f1c2e262859 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3508560 Reviewed-by: Adam Klein <[email protected]> Commit-Queue: Deepti Gandluri <[email protected]> Cr-Commit-Position: refs/heads/main@{#79410}
Relaxed versions of:
i32x4.trunc_sat_f32x4_s
i32x4.trunc_sat_f32x4_u
i32x4.trunc_sat_f64x2_s_zero
i32x4.trunc_sat_f64x2_u_zero
from Simd128. (Names undecided)
Convert f32x4/f64x2 to i32x4 with truncation (signed/unsigned). If the inputs are out of range or NaNs, the result is implementation-defined.
x86-64 and ARM64. Also provide reference implementation in terms of 128-bit
Wasm SIMD.
x86/64
relaxed
i32x4.trunc_sat_f32x4_s
= CVTTPS2DQrelaxed
i32x4.trunc_sat_f32x4_u
= VCVTTPS2UDQ (AVX512), Simd128i32x4.trunc_sat_f32x4_u
otherwise (can be slightly optimized to ignore NaNs)relaxed
i32x4.trunc_sat_f64x2_s_zero
= CVTTPD2DQrelaxed
i32x4.trunc_sat_f64x2_u_zero
= VCVTTPD2UDQ (AVX512), Simd128i32x4.trunc_sat_f64x2_u_zero
ARM64
relaxed
i32x4.trunc_sat_f32x4_s
= FCVTZSrelaxed
i32x4.trunc_sat_f32x4_u
= FCVTZUrelaxed
i32x4.trunc_sat_f64x2_s_zero
= FCVTZS + SQXTNrelaxed
i32x4.trunc_sat_f64x2_u_zero
= FCVTZU + UQXTNARM NEON
relaxed
i32x4.trunc_sat_f32x4_s
= vcvt.S32.F32relaxed
i32x4.trunc_sat_f32x4_u
= vcvt.U32.F32relaxed
i32x4.trunc_sat_f64x2_s_zero
= vcvt.S32.F64 + vcvt.S32.F64 + vmovrelaxed
i32x4.trunc_sat_f64x2_u_zero
= vcvt.U32.F64 + vcvt.U32.F64 + vmovNote: On ARM MVE, double precision conversions require Armv8-M Floating-point Extension (FPv5), MVE can be implemented with or without such an extension.
simd128
respective non-relaxed versions
i32x4.trunc_sat_f32x4_s
,i32x4.trunc_sat_f32x4_u
,i32x4.trunc_sat_f64x2_s_zero
,i32x4.trunc_sat_f64x2_u_zero
.For
i32x4.trunc_sat_f32x4_s
:0x8000000
in lanes for out of range or NaNsFor
i32x4.trunc_sat_f32x4_u
:0xFFFFFFFF
in lanes for out of range or NaNs, if if AVX512 is available,0
otherwise (but require more instruction counts)For
i32x4.trunc_sat_f64x2_s_zero
:0x80000000
for out of range or NaNsFor
i32x4.trunc_sat_f64x2_u_zero
:0xFFFFFFFF
for out of range or NaNs if AVX512 is available,0
otherwiseConversion instructions are common, if the application can guarantee the input range we can get good performance on all architectures.
The text was updated successfully, but these errors were encountered: