0.17.2

github-actions released this 19 Sep 10:14

7b52161

Added

Obscure loop optimisation (#1110).
Faster matrix transposition in C backend.
Library code generated with CUDA backend can now be called from
multiple threads.
Better optimisation of concatenations of array literals and
replicates.
Array creation C API functions now accept const pointers.
Arrays can now be indexed (but not sliced) with any signed integer
type (#1122).
Added --list-devices command to OpenCL binaries (#1131)
Added --help command to C, CUDA and OpenCL binaries (#1131)

Removed

The integer modules no longer contain iota and replicate
functions. The top-level ones still exist.
The size module type has been removed from the prelude.

Changed

Range literals may no longer be produced from unsigned integers.

Fixed

Entry points with names that are not valid C (or Python)
identifiers are now pointed out as problematic, rather than
generating invalid C code.
Exotic tiling bug (#1112).
Missing synchronisation for in-place updates at group level.
Fixed (in a hacky way) an issue where reduce_by_index would use
too much local memory on AMD GPUs when using the OpenCL backend.

Assets 3