0.17.2
Added
-
Obscure loop optimisation (#1110).
-
Faster matrix transposition in C backend.
-
Library code generated with CUDA backend can now be called from
multiple threads. -
Better optimisation of concatenations of array literals and
replicates. -
Array creation C API functions now accept
const
pointers. -
Arrays can now be indexed (but not sliced) with any signed integer
type (#1122). -
Added --list-devices command to OpenCL binaries (#1131)
-
Added --help command to C, CUDA and OpenCL binaries (#1131)
Removed
-
The integer modules no longer contain
iota
andreplicate
functions. The top-level ones still exist. -
The
size
module type has been removed from the prelude.
Changed
- Range literals may no longer be produced from unsigned integers.
Fixed
-
Entry points with names that are not valid C (or Python)
identifiers are now pointed out as problematic, rather than
generating invalid C code. -
Exotic tiling bug (#1112).
-
Missing synchronisation for in-place updates at group level.
-
Fixed (in a hacky way) an issue where
reduce_by_index
would use
too much local memory on AMD GPUs when using the OpenCL backend.