Release v1.2.23 · microsoft/Accera

What's Changed

Merged PR 3131: Set masked load/store inbounds flag to true. [Mason
Remy]

Set masked load/store inbounds flag to true

The mask we generate, as well as the rest of our infrastructure, will
prevent out-of-bounds accesses when used properly. Therefore for
performance reasons we don't want MLIR to generate runtime bounds
checking
Merged PR 3130: Recognize and simplify always true EQ and NE CmpOps.
[Mason Remy]

Recognize and simplify always true EQ and NE CmpOps

These would already get simplified after converting to the builtin
dialects, but this makes them happen earlier in the lowering
Merged PR 3129: Optimize 1-row horizontal i16->i32 sum reduction.
[Mason Remy]

Optimize 1-row horizontal i16->i32 sum reduction

Merged PR 3118: vectorize accumulation of results of two masked load
ops. [JUBI TANEJA]

This PR vectorizes a pattern that occurs in MMIF where there are two conditional loads, followed by an accumulation operation, and a conditional store. On vectorizing the following DSL:

        N_input = 8
        N_output = 5
        Input = Array(role=Role.INPUT, element_type=ScalarType.int32, shape=(N_input, ))
        Output = Array(role=Role.INPUT_OUTPUT, element_type=ScalarType.int32, shape=(N_output, ))
        nest = Nest(shape=(N_input, ))
        i, = nest.get_indices()

        @nest.iteration_logic
        def _nest():

            def store_value():
                Output[i] += Input[i]

            _If(i < N_output, store_value)

It produces the following assembly. We are looking for vpmaskmovd instructions that correspond to vector.transfer_read/vector.transfer_write ops in MLIR.

0000000000000030 <test_vectorized_masked_accumulate_3e5de44f3dcca64e>:
  30:   c5 fd 6f 05 00 00 00    vmovdqa 0x0(%rip),%ymm0        # 38 <test_vectorized_masked_accumulate_3e5de44f3dcca64e+0x8>
  37:   00
  38:   c4 e2 7d 8c 0e          vpmaskmovd (%rsi),%ymm0,%ymm1
  3d:   c4 e2 7d 8c 17          vpmaskmovd (%rdi),%ymm0,%ymm2
  42:   c5 ed fe c9             vpaddd %ymm1,%ymm2,%ymm1
  46:   c4 e2 7d 8e 0e          vpmaskmovd %ymm1,%ymm0,(%rsi)
  4b:   c5 f8 77                vzeroupper
  4e:   c3                      retq

Merged PR 3126: [test] Adds more tests for vectorized transpose. [Kern
Handa]

[test] Adds more tests for vectorized transpose
Merged PR 3121: [nfc] Separate bounds checking into separate pass
file. [Mason Remy]

[nfc] Separate bounds checking into separate pass file

This removes the bounds checking code from
ExecutionPlanToAffineLoweringPass and creates a separate pass file for
it. There is no change in when and where the checking occurs (currently
it only happens for caching-generated loads and stores).

In a future change we will further separate the pass and run it at a
different phase of the lowering and plumb controls for
enabling/disabling it to the DSL
Merged PR 3122: Fix reinterpret_cast output memref shape. [Mason Remy]

Fix reinterpret_cast output memref shape
Merged PR 3115: Normalize AffineForOps to have unit stride and begin
at 0. [Mason Remy]

Normalize AffineForOps to have unit stride and begin at 0
Merged PR 3117: Vectorize horizontal multi-dim sum reductions. [Mason
Remy]

Vectorize horizontal multi-dim sum reductions

Recognizes and vectorizes these sum reductions:
4x16xi16 -> 4x1xi32
4x8xi32 -> 4x1xi32
4x8xf32 -> 4x1xf32
Merged PR 3099: Adds pattern rewriting for AVX2 vectorized transpose.
[Kern Handa]

Full Changelog: v1.2.22...v1.2.23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.2.23

What's Changed