Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
commit 11e8fdae41e596d6102e46c37a22a26c94d7fe85 Author: Mason Remy <[email protected]> Date: Thu Mar 2 05:53:10 2023 +0000 Merged PR 3131: Set masked load/store inbounds flag to true Set masked load/store inbounds flag to true The mask we generate, as well as the rest of our infrastructure, will prevent out-of-bounds accesses when used properly. Therefore for performance reasons we don't want MLIR to generate runtime bounds checking commit 14a04925721ed575befc65e93e4670e27e4d1063 Author: Mason Remy <[email protected]> Date: Thu Mar 2 00:28:38 2023 +0000 Merged PR 3130: Recognize and simplify always true EQ and NE CmpOps Recognize and simplify always true EQ and NE CmpOps These would already get simplified after converting to the builtin dialects, but this makes them happen earlier in the lowering commit 91b76428c61a52d454ac5ae8fa6485edd9bdfbe5 Author: Mason Remy <[email protected]> Date: Wed Mar 1 23:46:29 2023 +0000 Merged PR 3129: Optimize 1-row horizontal i16->i32 sum reduction Optimize 1-row horizontal i16->i32 sum reduction commit be987bcf641c09dd43d959cc7e8a1b37d33ba591 Author: JUBI TANEJA <[email protected]> Date: Wed Mar 1 19:59:34 2023 +0000 Merged PR 3118: vectorize accumulation of results of two masked load ops This PR vectorizes a pattern that occurs in MMIF where there are two conditional loads, followed by an accumulation operation, and a conditional store. On vectorizing the following DSL: ``` N_input = 8 N_output = 5 Input = Array(role=Role.INPUT, element_type=ScalarType.int32, shape=(N_input, )) Output = Array(role=Role.INPUT_OUTPUT, element_type=ScalarType.int32, shape=(N_output, )) nest = Nest(shape=(N_input, )) i, = nest.get_indices() @nest.iteration_logic def _nest(): def store_value(): Output[i] += Input[i] _If(i < N_output, store_value) ``` It produces the following assembly. We are looking for `vpmaskmovd` instructions that correspond to vector.transfer_read/vector.transfer_write ops in MLIR. ``` 0000000000000030 <test_vectorized_masked_accumulate_3e5de44f3dcca64e>: 30: c5 fd 6f 05 00 00 00 vmovdqa 0x0(%rip),%ymm0 # 38 <test_vectorized_masked_accumulate_3e5de44f3dcca64e+0x8> 37: 00 38: c4 e2 7d 8c 0e vpmaskmovd (%rsi),%ymm0,%ymm1 3d: c4 e2 7d 8c 17 vpmaskmovd (%rdi),%ymm0,%ymm2 42: c5 ed fe c9 vpaddd %ymm1,%ymm2,%ymm1 46: c4 e2 7d 8e 0e vpmaskmovd %ymm1,%ymm0,(%rsi) 4b: c5 f8 77 vzeroupper 4e: c3 retq ``` commit 69b87522136cae60b0f5b4d62919a2ebd5577933 Author: Kern Handa <[email protected]> Date: Wed Mar 1 17:47:14 2023 +0000 Merged PR 3126: [test] Adds more tests for vectorized transpose [test] Adds more tests for vectorized transpose commit c4d81701faf3351218cd69726c487f642e4bfca0 Author: Mason Remy <[email protected]> Date: Wed Mar 1 06:48:35 2023 +0000 Merged PR 3121: [nfc] Separate bounds checking into separate pass file [nfc] Separate bounds checking into separate pass file This removes the bounds checking code from ExecutionPlanToAffineLoweringPass and creates a separate pass file for it. There is no change in when and where the checking occurs (currently it only happens for caching-generated loads and stores). In a future change we will further separate the pass and run it at a different phase of the lowering and plumb controls for enabling/disabling it to the DSL commit b221544937f8776d48a8f9daddf601378534705b Author: Mason Remy <[email protected]> Date: Wed Mar 1 01:18:59 2023 +0000 Merged PR 3122: Fix reinterpret_cast output memref shape Fix reinterpret_cast output memref shape commit eb3582ba07cb4118f73bb630589f07de27ba9c50 Author: Mason Remy <[email protected]> Date: Fri Feb 24 23:51:30 2023 +0000 Merged PR 3115: Normalize AffineForOps to have unit stride and begin at 0 Normalize AffineForOps to have unit stride and begin at 0 commit 3ec2bd7f5353a4119294095eb5084a1e7a298051 Author: Mason Remy <[email protected]> Date: Fri Feb 24 22:26:13 2023 +0000 Merged PR 3117: Vectorize horizontal multi-dim sum reductions Vectorize horizontal multi-dim sum reductions Recognizes and vectorizes these sum reductions: 4x16xi16 -> 4x1xi32 4x8xi32 -> 4x1xi32 4x8xf32 -> 4x1xf32 commit 6f46df5ba99eeb237dcbbdda28a0975964af1186 Author: Kern Handa <[email protected]> Date: Fri Feb 24 11:13:45 2023 +0000 Merged PR 3099: Adds pattern rewriting for AVX2 vectorized transpose
- Loading branch information