Skip to content

v1.2.13

Compare
Choose a tag to compare
@lisaong lisaong released this 14 Dec 10:10
· 19 commits to main since this release
  • Merged PR 2987: Add support for max/min/round ops and vectorizing
    those ops. [Mason Remy]

    Add support for max/min/round ops and vectorizing those ops

  • Merged PR 2963: Control TEMP array allocation location. [Mason Remy]

    Control TEMP array allocation location

  • Merged PR 2962: Expand vpmaddwd matching and add intrinsic call.
    [Mason Remy]

    Expand vpmaddwd matching and add intrinsic call

    Matches more vpmaddwd cases and creates a pathway to invoking the LLVM
    intrinsic directly.

  • Merged PR 2961: Match more vectorization patterns and support
    vectorized cast. [Mason Remy]

    Match more vectorization patterns and support vectorized cast

    Tries to match and rewrite vectorization patterns:

    • 2-loop interleaving store -> vector shuffle and store
    • simple horizontal reductions (not always efficient currently)
    • vectorized casts

    Makes vectorization of non-innermost loops do a per-op "inplace" unroll and
    vectorize the innermost loop
    TODO : update documentation to describe this behavior better

  • Merged PR 2960: Enable marking functions as no-inline-into. [Mason
    Remy]

    Enable marking functions as no-inline-into

    Functions marked no-inline-into
    won't inline calls to other functions within their body. This
    is a useful compiler performance (not emitted code performance)
    optimization when we have many nested functions calls

  • Merged PR 2986: [output array] Emit range function with input_output
    type arguments. [Denny Sun]

    Instead of using output type, we use input_output instead to generate two functions for the Range function.
    Now Accera can successfully generate code for range function.

    # Generate functions like:
    # get_size(float start, float limit, float delta, int64_t* output_dim);
    # get_array(int64_t input_dim, float* output, float start, float delta);
    
  • Merged PR 2959: Improved affine for op range simplification. [Mason
    Remy]

    Improved affine for op range simplification

    Add range value / constant-cmp-result patterns and affine for op range
    simplifications to the affine simplification pass and run it after
    inlining functions.
    When inlining a dynamically-sized function into a statically-sized
    function, this change is useful for resolving the dynamic ranges to
    constants and pruning dynamic-range loops that are not needed given the
    specific constant value being used.

  • Merged PR 2958: Hack to erase loops in a nest to support nest-of-nest
    or overfused. [Mason Remy]

    Hack to erase loops in a nest to support nest-of-nest or overfused
    scenarios

    This change enables an action plan to erase loops. Typically this would
    be used when an outer nest traverses tiles and invokes an inner nest (or
    multiple nests) which operate within each tile. The outer nest still
    needs to cover the full iteration space, however after splitting by the
    tile sizes a user will not want the outer nest to perform the inner
    loops

  • Merged PR 2985: [release] Rev docs to 1.2.13. [Lisa Ong]

  • Merged PR 2983: Increase timeouts of GPU benchmarks. [Captain Jack
    Sparrow]

    Increase timeouts of GPU benchmarks

  • Merged PR 2982: Work around bug with redundant splits of dynamic
    dimensions. [Mason Remy]

    Work around bug with redundant splits of dynamic dimensions

  • Merged PR 2972: Build both static and dynamic binaries by default, put
    both in aux dependencies. [Kern Handa]

  • Merged PR 2975: Updates llc/opt build flags to enable more
    optimizations by default. [Kern Handa]

    Updates llc/opt build flags to enable more optimizations by default

  • Merged PR 2977: Updates CMake to do FindPython before pybind11 config.
    [Kern Handa]

    Updates CMake to do FindPython before pybind11 config

  • Merged PR 2955: Reduce Linux PR runtime to under 60mins. [Lisa Ong]

    Filter DEV_MODE reruns to dsl_tests.py, this is not comprehensive and is a best effort.

Full Changelog: v1.2.12...v1.2.13