-
Notifications
You must be signed in to change notification settings - Fork 43
Include vectorized bit count instructions #6
Comments
MIPS SIMD implementation (MSA instructions) defines following bit count instructions: NLZC.df - Vector element count of leading bits set to 0. There is no CTZ or RBIT instructions, but we could emulate CTZ maybe with 4-5 different instructions (including PCNT.df). |
ARM v7/v8 have: |
Intel has |
Those who added these (e.g. We have a vectorized ambient occlusion example and benchmark that uses a pretty poor vectorized PRNG heavily, and going from a scalar to a vectorized PRNG had a huge performance impact. I don't recall exactly how big this was for this benchmark (I think it was in the ballpark of 1.5-2x for that example, it was one of the latest optimizations we did to catch up with ISPC on performance), but it shouldn't be too hard to switch the PRNG back to a scalar one and get some numbers. |
SSE2+: CLZ/CTZ for 32/64 bit could use a floating point hack. SSE2 8-bit examples:
Count Leading (Redundant) Sign Bits could be emulated as:
|
@AndrewScheidecker mentioned in his review of #1 the possibility of including vectorized bit counting instructions to match the existing scalar instructions. They would have these signatures:
i8x16.clz(x: v128) -> v128
i16x8.clz(x: v128) -> v128
i32x4.clz(x: v128) -> v128
i64x2.clz(x: v128) -> v128
i8x16.ctz(x: v128) -> v128
i16x8.ctz(x: v128) -> v128
i32x4.ctz(x: v128) -> v128
i64x2.ctz(x: v128) -> v128
At least AArch64 has vectorized CLZ and RBIT instructions that could be used to implement this. But they could be quite impractical to emulate on other platforms.
The text was updated successfully, but these errors were encountered: