v1.2.25
What's Changed
-
Merged PR 3160: [security] bump onnx to 1.13.0. [Lisa Ong]
This resolves a high severity dependabot alert
-
Merged PR 3157: Dynamic split dim tests. [Mason Remy]
Dynamic split dim tests
-
Merged PR 3158: Do not unroll the profiling ops when vectorization
enabled. [Denny Sun]when vectorization is enabled, the ops in kernel get unrolled, for example, without this fix the timer added to inner kernel will have 8 copies, which is definitely wrong.
-
Merged PR 3153: Fix the lowering issue of the profiling ops. [Denny
Sun]With this fix the kernel level profiling support can work end to end. Here is some example about how to use it:
@tile_nest.iteration_logic def _tile_logic(): EnterProfileRegion("pack_b_fn_outer") pack_b_fn(B, B_temp, j, k) ExitProfileRegion("pack_b_fn_outer") EnterProfileRegion("matmul_fn_outer") matmul_fn(A, B, C, B_temp, i, j, k) ExitProfileRegion("matmul_fn_outer") PrintProfileResults()
The timings printed out look like:
matmul_fn_outer 1 0.000100 ms pack_b_fn_outer 1 0.000400 ms matmul_fn_outer 2 0.000400 ms pack_b_fn_outer 2 0.001200 ms matmul_fn_outer 3 0.000600 ms pack_b_fn_outer 3 0.001700 ms matmul_fn_outer 4 0.000800 ms pack_b_fn_outer 4 0.002300 ms matmul_fn_outer 5 0.000900 ms pack_b_fn_outer 5 0.002700 ms matmul_fn_outer 6 0.001200 ms pack_b_fn_outer 6 0.003200 ms matmul_fn_outer 7 0.001500 ms pack_b_fn_outer 7 0.003700 ms matmul_fn_outer 8 0.001700 ms pack_b_fn_outer 8 0.004000 ms matmul_fn_outer 9 0.002000 ms pack_b_fn_outer 9 0.004500 ms matmul_fn_outer 10 0.002200 ms pack_b_fn_outer 10 0.004800 ms matmul_fn_outer 11 0.002400 ms pack_b_fn_outer 11 0.005300 ms matmul_fn_outer 12 0.002700 ms pack_b_fn_outer 12 0.006500 ms matmul_fn_outer 13 0.003100 ms pack_b_fn_outer 13 0.007400 ms matmul_fn_outer 14 0.003400 ms pack_b_fn_outer 14 0.007800 ms matmul_fn_outer 15 0.003700 ms pack_b_fn_outer 15 0.008300 ms matmul_fn_outer 16 0.004000 ms pack_b_fn_outer 16 0.008800 ms matmul_fn_outer 17 0.004400 ms pack_b_fn_outer 17 0.009199 ms matmul_fn_outer 18 0.004800 ms pack_b_fn_outer 18 0.009599 ms matmul_fn_outer 19 0.005100 ms pack_b_fn_outer 19 0.010099 ms matmul_fn_outer 20 0.005400 ms pack_b_fn_outer 20 0.010599 ms matmul_fn_outer 21 0.006000 ms pack_b_fn_outer 21 0.011299 ms matmul_fn_outer 22 0.006300 ms pack_b_fn_outer 22 0.011899 ms matmul_fn_outer 23 0.006500 ms pack_b_fn_outer 23 0.012299 ms matmul_fn_outer 24 0.006701 ms pack_b_fn_outer 24 0.012699 ms matmul_fn_outer 25 0.006901 ms pack_b_fn_outer 25 0.013099 ms matmul_fn_outer 26 0.007101 ms pack_b_fn_outer 26 0.013399 ms matmul_fn_outer 27 0.007300 ms pack_b_fn_outer 27 0.013799 ms matmul_fn_outer 28 0.007401 ms pack_b_fn_outer 28 0.014100 ms matmul_fn_outer 29 0.007601 ms pack_b_fn_outer 29 0.014600 ms matmul_fn_outer 30 0.007801 ms pack_b_fn_outer 30 0.015000 ms matmul_fn_outer 31 0.007901 ms pack_b_fn_outer 31 0.015399 ms matmul_fn_outer 32 0.008101 ms pack_b_fn_outer 32 0.015699 ms matmul_fn_outer 33 0.008301 ms pack_b_fn_outer 33 0.015999 ms matmul_fn_outer 34 0.008601 ms pack_b_fn_outer 34 0.016...
-
Merged PR 3152: [nfc] [test] Skip fast_exp mlas tests on unsupported
Aarch64. [Lisa Ong]These tests generate
llvm.x86.avx.max.ps.256
which is not supported on non-intel processors like Apple M1%28 = load <8 x float>, <8 x float>* %27, align 4, !dbg !19 %29 = call <8 x float> @llvm.x86.avx.max.ps.256(<8 x float> %28, <8 x float> <float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000>), !dbg !20 %30 = call <8 x float> @llvm.fmuladd.v8f32(<8 x float> %29, <8 x float> <float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000>, <8 x float> <float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000>), !dbg !21 %31 = fsub <8 x float> %30, <float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000>, !dbg !22
Full Changelog: v1.2.24...v1.2.25