Releases: ROCm/Tensile
v2.2.3 - SplitU and WorkGroupMapping
SplitU
If you have large summations but small C tensor, then you can create extra parallelism by splitting up the summation; This allows smaller C tensors to fill up larger GPUs.
WorkGroupMapping
Changes which work-groups operate on which tiles of tensor C. This can help performance by improving caching.
v2.2.0 - Recursive Solution Selection Logic
Rather than choosing solutions based on size=M*N, the recursive solution selection logic (SSL) now chooses solutions based on M, N and K, by recursively partitioning the dimensions.
v2.0.0 - Benchmarking Overhaul: Faster, Simpler, Programmable
The benchmarking protocol has been completely re-designed to use config.yaml files rather an applications needing to generate problem.xml files.
Tensile is now an installable python module.
Please read the wiki to understand all the new features.
v1.1.0 - Bug Fixes
Several bug fixes for rocBLAS.
v0.1 - Preview Release
Full support for tensor contractions for BLAS and DNN.