Skip to content

v1.2.12

Compare
Choose a tag to compare
@lisaong lisaong released this 21 Nov 03:02
· 20 commits to main since this release

What's Changed

  • Merged PR 2953: Workaround debug mode failures with dimension argument
    ordering. [Lisa Ong]

    • Order dimension arguments after Array args to avoid this lowering issue in Debug mode (until Debug mode is fixed)
    test_all_dynamic_sizes_static_unroll_matmul_llvm.mlir:236:28: error: use of value '%7' expects different type than prior uses: 'i64' vs '!llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>'
        %42 = llvm.insertvalue %7, %41[3, 0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>
                               ^
    /Users/lisaong/work/staging/Accera/build/lib.macosx-11.1-arm64-3.10/test_acccgen/test_all_dynamic_sizes_static_unroll_matmul/_tmp/test_all_dynamic_sizes_static_unroll_matmul/test_all_dynamic_sizes_static_unroll_matmul_llvm.mlir:201:5: note: prior use here
        %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2 x i64>, array<2 x i64>)>
        ^
    
    • Enable DEV_MODE tests in one CI pipeline so that we can catch these in the future
  • Merged PR 2950: [Release] Rev docs to v1.2.12. [Lisa Ong]

    In preparation for 1.2.12 release EOW

  • Merged PR 2946: Fix hierarchical partial fusing. [Mason Remy]

    Fix hierarchical partial fusing

    Index attributes in fragment predicate ops weren't getting updated as
    part of fusion mapping old indices to new fused indices. This fix is a
    quick change to recursively walk predicates and update their index
    attributes manually.
    In the future we could use SymbolicIndexOps and rely on
    BlockAndValueMapping replacements in clone, however this will also
    require that we don't create as many duplicate SymbolicIndexOps for the
    same Index

  • Merged PR 2942: Hold onto intermediate split indices when fusing.
    [Mason Remy]

    Hold onto intermediate split indices when fusing

    When we split a loop multiple times, the outer index references the
    inner intermediate split indices in affine expressions, even if those
    indices get further split and are no longer loop indices. We have been
    discarding them because they aren't loop indices or dimension indices,
    but they wound up getting re-added to the transformed domain by
    serialization and this led to fusion bugs.

  • Merged PR 2834: match and rewrite a pattern to vectorize int16 matmul.
    [JUBI TANEJA]

    This rewrite rule matches the jj and kk loops in int16 matmul, where outer loop jj {0..8} is followed by an inner loop kk {0..2}. It vectorizes the jj and kk loop and replaces each affine op by a vectorized op. At the end, it generates vpmaddwd instruction for MatMul.

  • Merged PR 2918: Support vectorization and static size caching for
    split dynamic range. [Mason Remy]

    Support vectorization and static size caching for split dynamic range
    loops

  • Merged PR 2914: Support static loop splits of dynamic sized ranges.
    [Mason Remy]

    Support static loop splits of dynamic sized ranges

    This change creates a specialization of the AffineConstraintsHelper that
    works with Loopnest concepts and uses that in LoopNestBuilder to update
    the loop split generation

  • Merged PR 2911: Support dynamic ranges in ScheduledLoopOp. [Mason
    Remy]

    Support dynamic ranges in ScheduledLoopOp

  • Merged PR 2907: Implement initial affine constraint helper for dynamic
    size loop. [Mason Remy]

    Implement initial affine constraint helper for dynamic size loop
    handling

    Implements a wrapper around mlir::FlatAffineValueConstraints and a set
    of low-level tests using it that enable static-sized splitting of
    dynamic loop ranges

  • Merged PR 2935: Remove thread coarsening factor > 4 from GPU
    benchmarks. [Captain Jack Sparrow]

    Remove thread coarsening factor > 4 from GPU benchmarks

  • Merged PR 2932: Upgrade to CUDA 11.8. [Captain Jack Sparrow]

    Upgrade to CUDA 11.8

  • Merged PR 2931: Update to ROCm 5.3. [Captain Jack Sparrow]

    Update to ROCm 5.3

  • Merged PR 2926: Plumb parameter usages to emitted HAT files. [Lisa
    Ong]

  • Merged PR 2927: Reduce benchmark configs using thread coarsening.
    [Captain Jack Sparrow]

    Reduce benchmark configs using thread coarsening

  • Merged PR 2925: Add optional optimization hint for number of thread
    blocks per SM. [Captain Jack Sparrow]

    Add optional optimization hint for number of thread blocks per SM

    Related work items: #3736