Skip to content

v1.2.25

Compare
Choose a tag to compare
@lisaong lisaong released this 16 Mar 06:50
· 4 commits to main since this release

What's Changed

  • Merged PR 3160: [security] bump onnx to 1.13.0. [Lisa Ong]

    This resolves a high severity dependabot alert

  • Merged PR 3157: Dynamic split dim tests. [Mason Remy]

    Dynamic split dim tests

  • Merged PR 3158: Do not unroll the profiling ops when vectorization
    enabled. [Denny Sun]

    when vectorization is enabled, the ops in kernel get unrolled, for example, without this fix the timer added to inner kernel will have 8 copies, which is definitely wrong.

  • Merged PR 3153: Fix the lowering issue of the profiling ops. [Denny
    Sun]

    With this fix the kernel level profiling support can work end to end. Here is some example about how to use it:

            @tile_nest.iteration_logic
            def _tile_logic():
                EnterProfileRegion("pack_b_fn_outer")
                pack_b_fn(B, B_temp, j, k)
                ExitProfileRegion("pack_b_fn_outer")
    
                EnterProfileRegion("matmul_fn_outer")
                matmul_fn(A, B, C, B_temp, i, j, k)
                ExitProfileRegion("matmul_fn_outer")
    
                PrintProfileResults()
    

    The timings printed out look like:

    matmul_fn_outer 1       0.000100 ms
    pack_b_fn_outer 1       0.000400 ms
    matmul_fn_outer 2       0.000400 ms
    pack_b_fn_outer 2       0.001200 ms
    matmul_fn_outer 3       0.000600 ms
    pack_b_fn_outer 3       0.001700 ms
    matmul_fn_outer 4       0.000800 ms
    pack_b_fn_outer 4       0.002300 ms
    matmul_fn_outer 5       0.000900 ms
    pack_b_fn_outer 5       0.002700 ms
    matmul_fn_outer 6       0.001200 ms
    pack_b_fn_outer 6       0.003200 ms
    matmul_fn_outer 7       0.001500 ms
    pack_b_fn_outer 7       0.003700 ms
    matmul_fn_outer 8       0.001700 ms
    pack_b_fn_outer 8       0.004000 ms
    matmul_fn_outer 9       0.002000 ms
    pack_b_fn_outer 9       0.004500 ms
    matmul_fn_outer 10      0.002200 ms
    pack_b_fn_outer 10      0.004800 ms
    matmul_fn_outer 11      0.002400 ms
    pack_b_fn_outer 11      0.005300 ms
    matmul_fn_outer 12      0.002700 ms
    pack_b_fn_outer 12      0.006500 ms
    matmul_fn_outer 13      0.003100 ms
    pack_b_fn_outer 13      0.007400 ms
    matmul_fn_outer 14      0.003400 ms
    pack_b_fn_outer 14      0.007800 ms
    matmul_fn_outer 15      0.003700 ms
    pack_b_fn_outer 15      0.008300 ms
    matmul_fn_outer 16      0.004000 ms
    pack_b_fn_outer 16      0.008800 ms
    matmul_fn_outer 17      0.004400 ms
    pack_b_fn_outer 17      0.009199 ms
    matmul_fn_outer 18      0.004800 ms
    pack_b_fn_outer 18      0.009599 ms
    matmul_fn_outer 19      0.005100 ms
    pack_b_fn_outer 19      0.010099 ms
    matmul_fn_outer 20      0.005400 ms
    pack_b_fn_outer 20      0.010599 ms
    matmul_fn_outer 21      0.006000 ms
    pack_b_fn_outer 21      0.011299 ms
    matmul_fn_outer 22      0.006300 ms
    pack_b_fn_outer 22      0.011899 ms
    matmul_fn_outer 23      0.006500 ms
    pack_b_fn_outer 23      0.012299 ms
    matmul_fn_outer 24      0.006701 ms
    pack_b_fn_outer 24      0.012699 ms
    matmul_fn_outer 25      0.006901 ms
    pack_b_fn_outer 25      0.013099 ms
    matmul_fn_outer 26      0.007101 ms
    pack_b_fn_outer 26      0.013399 ms
    matmul_fn_outer 27      0.007300 ms
    pack_b_fn_outer 27      0.013799 ms
    matmul_fn_outer 28      0.007401 ms
    pack_b_fn_outer 28      0.014100 ms
    matmul_fn_outer 29      0.007601 ms
    pack_b_fn_outer 29      0.014600 ms
    matmul_fn_outer 30      0.007801 ms
    pack_b_fn_outer 30      0.015000 ms
    matmul_fn_outer 31      0.007901 ms
    pack_b_fn_outer 31      0.015399 ms
    matmul_fn_outer 32      0.008101 ms
    pack_b_fn_outer 32      0.015699 ms
    matmul_fn_outer 33      0.008301 ms
    pack_b_fn_outer 33      0.015999 ms
    matmul_fn_outer 34      0.008601 ms
    pack_b_fn_outer 34      0.016...
    
  • Merged PR 3152: [nfc] [test] Skip fast_exp mlas tests on unsupported
    Aarch64. [Lisa Ong]

    These tests generate llvm.x86.avx.max.ps.256 which is not supported on non-intel processors like Apple M1

      %28 = load <8 x float>, <8 x float>* %27, align 4, !dbg !19
      %29 = call <8 x float> @llvm.x86.avx.max.ps.256(<8 x float> %28, <8 x float> <float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000, float 0xC0561814A0000000>), !dbg !20
      %30 = call <8 x float> @llvm.fmuladd.v8f32(<8 x float> %29, <8 x float> <float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000, float 0x3FF7154760000000>, <8 x float> <float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000>), !dbg !21
      %31 = fsub <8 x float> %30, <float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000, float 0x4168000000000000>, !dbg !22
    
    

Full Changelog: v1.2.24...v1.2.25