Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
commit 18db5586d1a45f65fd98ea1a21d5fb87db5d2dbf Author: Lisa Ong <[email protected]> Date: Thu Mar 16 03:46:54 2023 +0000 Merged PR 3160: [security] bump onnx to 1.13.0 This resolves a high severity dependabot alert commit 07d16bf787bffa3be93dd7902a402e7e5e660596 Author: Mason Remy <[email protected]> Date: Thu Mar 16 02:17:51 2023 +0000 Merged PR 3157: Dynamic split dim tests Dynamic split dim tests commit 7c5b9a18adbba2ec10461118fb061365e34f5ed0 Author: Denny Sun <[email protected]> Date: Wed Mar 15 01:47:45 2023 +0000 Merged PR 3158: Do not unroll the profiling ops when vectorization enabled when vectorization is enabled, the ops in kernel get unrolled, for example, without this fix the timer added to inner kernel will have 8 copies, which is definitely wrong. commit df217f2e731c2609674da57662eaf1ed6b4a40b0 Author: Denny Sun <[email protected]> Date: Mon Mar 13 06:18:41 2023 +0000 Merged PR 3153: Fix the lowering issue of the profiling ops With this fix the kernel level profiling support can work end to end. Here is some example about how to use it: ``` @tile_nest.iteration_logic def _tile_logic(): EnterProfileRegion("pack_b_fn_outer") pack_b_fn(B, B_temp, j, k) ExitProfileRegion("pack_b_fn_outer") EnterProfileRegion("matmul_fn_outer") matmul_fn(A, B, C, B_temp, i, j, k) ExitProfileRegion("matmul_fn_outer") PrintProfileResults() ``` The timings printed out look like: ``` matmul_fn_outer 1 0.000100 ms pack_b_fn_outer 1 0.000400 ms matmul_fn_outer 2 0.000400 ms pack_b_fn_outer 2 0.001200 ms matmul_fn_outer 3 0.000600 ms pack_b_fn_outer 3 0.001700 ms matmul_fn_outer 4 0.000800 ms pack_b_fn_outer 4 0.002300 ms matmul_fn_outer 5 0.000900 ms pack_b_fn_outer 5 0.002700 ms matmul_fn_outer 6 0.001200 ms pack_b_fn_outer 6 0.003200 ms matmul_fn_outer 7 0.001500 ms pack_b_fn_outer 7 0.003700 ms matmul_fn_outer 8 0.001700 ms pack_b_fn_outer 8 0.004000 ms matmul_fn_outer 9 0.002000 ms pack_b_fn_outer 9 0.004500 ms matmul_fn_outer 10 0.002200 ms pack_b_fn_outer 10 0.004800 ms matmul_fn_outer 11 0.002400 ms pack_b_fn_outer 11 0.005300 ms matmul_fn_outer 12 0.002700 ms pack_b_fn_outer 12 0.006500 ms matmul_fn_outer 13 0.003100 ms pack_b_fn_outer 13 0.007400 ms matmul_fn_outer 14 0.003400 ms pack_b_fn_outer 14 0.007800 ms matmul_fn_outer 15 0.003700 ms pack_b_fn_outer 15 0.008300 ms matmul_fn_outer 16 0.004000 ms pack_b_fn_outer 16 0.008800 ms matmul_fn_outer 17 0.004400 ms pack_b_fn_outer 17 0.009199 ms matmul_fn_outer 18 0.004800 ms pack_b_fn_outer 18 0.009599 ms matmul_fn_outer 19 0.005100 ms pack_b_fn_outer 19 0.010099 ms matmul_fn_outer 20 0.005400 ms pack_b_fn_outer 20 0.010599 ms matmul_fn_outer 21 0.006000 ms pack_b_fn_outer 21 0.011299 ms matmul_fn_outer 22 0.006300 ms pack_b_fn_outer 22 0.011899 ms matmul_fn_outer 23 0.006500 ms pack_b_fn_outer 23 0.012299 ms matmul_fn_outer 24 0.006701 ms pack_b_fn_outer 24 0.012699 ms matmul_fn_outer 25 0.006901 ms pack_b_fn_outer 25 0.013099 ms matmul_fn_outer 26 0.007101 ms pack_b_fn_outer 26 0.013399 ms matmul_fn_outer 27 0.007300 ms pack_b_fn_outer 27 0.013799 ms matmul_fn_outer 28 0.007401 ms pack_b_fn_outer 28 0.014100 ms matmul_fn_outer 29 0.007601 ms pack_b_fn_outer 29 0.014600 ms matmul_fn_outer 30 0.007801 ms pack_b_fn_outer 30 0.015000 ms matmul_fn_outer 31 0.007901 ms pack_b_fn_outer 31 0.015399 ms matmul_fn_outer 32 0.008101 ms pack_b_fn_outer 32 0.015699 ms matmul_fn_outer 33 0.008301 ms pack_b_fn_outer 33 0.015999 ms matmul_fn_outer 34 0.008601 ms pack_b_fn_outer 34 0.016... commit 3572c2b081198e1631f2df208c07490c6d4b4bf5 Author: Lisa Ong <[email protected]> Date: Fri Mar 10 10:57:39 2023 +0000 Merged PR 3152: [nfc] [test] Skip fast_exp mlas tests on unsupported Aarch64 These tests generate `llvm.x86.avx.max.ps.256` which is not supported on non-intel processors like Apple M1 ``` %28 = load <8 x float>, <8 x float>* %27, align 4, !dbg !19 %29 = call <8 x float> @llvm.x86.avx.max.ps.256(<8 x float> %28, <8 x float> <float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000>), !dbg !20 %30 = call <8 x float> @llvm.fmuladd.v8f32(<8 x float> %29, <8 x float> <float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000>, <8 x float> <float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000>), !dbg !21 %31 = fsub <8 x float> %30, <float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000>, !dbg !22 ```
- Loading branch information