v1.2.24
What's Changed
-
Merged PR 3150: Change high precision fp to not perform contraction.
[Mason Remy]Change high precision fp to not perform contraction
Also change value library FMA to use the math dialect FmaOp and
vectorize to the vector dialect FMAOp -
Merged PR 3147: Fix vector cast with same bitwidth. [Mason Remy]
Fix vector cast with same bitwidth.
accv.cast vector<16xi8> to vector<16xui8>
was erroneously lowering to
cast vector<16xi8> to ui8 -
Merged PR 3149: Improve 1-D horizontal sum reductions for 8xf32 and
8xi32. [Mason Remy]Improve 1-D horizontal sum reductions for 8xf32 and 8xi32
-
Merged PR 3148: Adds Package level FP precision override. [Kern Handa]
-
Merged PR 3144: Removes fp precision as an option for Package.build.
[Kern Handa]The fp-contract option being used in
accc.py
was overriding the recent addition of the fp precision specification at the function level. Since there's now an equivalent default for each function, we shouldn't have need of the option to be specified tollc
andopt
during build time. -
Merged PR 3143: Add dsl test for profiling op. [Denny Sun]
- add profiling enable flag to Package.build()
- add a dsl test
-
Merged PR 3022: Assert the arg order in debug mode. [Denny Sun]
Dimension arg should precede array arg in the arg list for debug mode.
-
Merged PR 3137: expose profiling function to DSL. [Denny Sun]
expose profiling function to DSL
-
Merged PR 3142: [Release] Tie accera-llvm versioning to LLVM version.
[Lisa Ong]This change introduces a new versioning schema for accera-llvm that follows LLVM's versioning, while allowing for Accera versioned forks:
<llvm_major>.<llvm_minor>.<llvm_micro><accera_micro> = (N+).(N+).(N+)(N{2})
This overloads the micro version field due to constraints on Python versioning: https://peps.python.org/pep-0440/
Examples:
- Current LLVM fork is 14.0.6-2:
accera_llvm.14.0.602
, which means LLVM 14.0.6 + accera fork v2 - If/when upgrading to LLVM 15.0.7:
accera_llvm.15.0.700
- Then when we rev the Accera fork to LLVM 15.0.7-1:
accera_llvm.15.0.701
Limitations:
- We don't expect Accera's fork to span beyond 2-digit versions
Alternatives:
- Omit the 0 delimiters, if we think it is unlikely that Accera forks will rev micro versions beyond single-digit. Accera forks may rev more often if we don't update LLVM.
- Use a dev version, e.g. accera_llvm.14.0.6.dev4. Downside is that this looks unofficial - devN is intended for developmental releases rather than official PyPI releases. That said, the whole Accera project is developmental :)
- Current LLVM fork is 14.0.6-2:
-
Merged PR 3139: Allows setting precision of fp ops per function. [Kern
Handa]Allows setting precision of fp ops per function
-
Merged PR 3140: Fix bug with reinterpret casts of unrealized
conversion casts. [Mason Remy]Fix bug with reinterpret casts of unrealized conversion casts.
This happens when we do a heap alloc followed by a reinterpret cast, but
it can come up in other scenarios too -
Merged PR 3135: [nfc] Add XeonE5 benchmark machine to targets, bump
hatlib dependency. [Lisa Ong]Best guesses at cache sizes and cache lines from: https://en.wikichip.org/wiki/intel/xeon_e5/e5-2673_v4
Full Changelog: v1.2.23...v1.2.24