Support multiplication of a vector against one lane (broadcasted) of another vector #227

bjacob · 2020-05-11T20:59:58Z

Suppose you have two vectors u and v, and you want to multiply all elements of the vector u by a single lane of the vector v, e.g. v[0]. This is a very common thing to do, particularly in float matrix multiplication kernels.

Example.

This should be available for all multiplication instructions, including any multiply-add instructions if added to the spec. Float and integer. This will map directly to the corresponding instructions on ARM and will be implemented on x86 by using a broadcast instruction into a temporary vector.

Rationale for this programming model in WebAsm SIMD:

It's more expressive w.r.t. what many applications need to do.
The fallback is efficient provided well ordered instructions in the generated code. By contrast, the current lack of this instruction forces the WebAsm source to use separate broadcast instructions, which make it essentially impossible for the generated code to be efficient.

See ARM benchmarks in this spreadsheet.
Row 30, NEON_64bit_GEMM_Float32_WithVectorDuplicatingScalar, is the float kernel that one can write without such instructions.
Row 31, NEON_64bit_GEMM_Float32_WithScalar, is the faster float kernel that one can write with such instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiplication of a vector against one lane (broadcasted) of another vector #227

Support multiplication of a vector against one lane (broadcasted) of another vector #227

bjacob commented May 11, 2020

Support multiplication of a vector against one lane (broadcasted) of another vector #227

Support multiplication of a vector against one lane (broadcasted) of another vector #227

Comments

bjacob commented May 11, 2020