Releases: diku-dk/futhark
0.18.4
Added
-
When compiling to binaries in the C-based backends, the compiler
now respects theCFLAGS
andCC
environment variables. -
GPU backends: avoid some bounds-checks for parallel sections
inside intra-kernel loops. -
The
cuda
backend now uses a much faster single-passscan
implementation, although only for nonsegmented scans where the
operator operates on scalars.
Fixed
-
futhark dataset
now correctly detects trailing commas in textual
input (#1189). -
Fixed local memory capacity check for intra-group-parallel GPU kernels.
-
Fixed compiler bug on segmented rotates where the rotation amount
is variant to the nest (#1192). -
futhark repl
no longer crashes on type errors in given file (#1193). -
Fixed a simplification error for certain arithmetic expressions
(#1194). -
Fixed a small uniqueness-related bug in the compilation of
operator section. -
Sizes of opaque entry point arguments are now properly checked
(related to #1198).
0.18.3
0.18.2
Added
-
The GPU loop tiler can now handle loops where only a subset of the
input arrays are tiled. Matrix-vector multiplication is one
important program where this helps (#1145). -
The number of threads used by the
multicore
backend is now
configurable (--num-threads
and
futhark_context_config_set_num_threads()
). (#1162)
Fixed
-
PyOpenCL backend would mistakenly still streat entry point
argument sizes as 32 bit. -
Warnings are now reported even for programs with type errors.
-
Multicore backend now works properly for very large iteration
spaces. -
A few internal generated functions (
init_constants()
,
free_constants()
) were mistakenly declared non-static. -
Process exit code is now nonzero when compiler bugs and
limitations are encountered. -
Multicore backend crashed on
reduce_by_index
with nonempty target
and empty input. -
Fixed a flattening issue for certain complex
map
nestings
(#1168). -
Made API function
futhark_context_clear_caches()
thread safe
(#1169). -
API functions for freeing opaque objects are now thread-safe
(#1169). -
Tools such as
futhark dataset
no longer crash with an internal
error if writing to a broken pipe (but they will return a nonzero
exit code). -
Defunctionalisation had a name shadowing issue that would crop up
for programs making very advanced use of functional
representations (#1174). -
Type checker erroneously permitted pattern-matching on string
literals (this would fail later in the compiler). -
New coverage checker for pattern matching, which is more correct.
However, it may not provide quite as nice counter-examples
(#1134). -
Fix rare internalisation error (#1177).
0.16.5
0.18.1
0.17.3
Added
- Improved parallelisation of
futhark bench
compilation.
Fixed
-
Dataset generation for test programs now use the right
futhark
executable (#1133). -
Really fix NaN comparisons in interpreter (#1070, again).
-
Fix entry points with a parameter that is a sum type where
multiple constructors contain arrays of the same statically known
size. -
Fix in monomorphisation of types with constant sizes.
-
Fix in in-place lowering (#1142).
-
Fix tiling inside multiple nested loops (#1143).
0.17.2
Added
-
Obscure loop optimisation (#1110).
-
Faster matrix transposition in C backend.
-
Library code generated with CUDA backend can now be called from
multiple threads. -
Better optimisation of concatenations of array literals and
replicates. -
Array creation C API functions now accept
const
pointers. -
Arrays can now be indexed (but not sliced) with any signed integer
type (#1122). -
Added --list-devices command to OpenCL binaries (#1131)
-
Added --help command to C, CUDA and OpenCL binaries (#1131)
Removed
-
The integer modules no longer contain
iota
andreplicate
functions. The top-level ones still exist. -
The
size
module type has been removed from the prelude.
Changed
- Range literals may no longer be produced from unsigned integers.
Fixed
-
Entry points with names that are not valid C (or Python)
identifiers are now pointed out as problematic, rather than
generating invalid C code. -
Exotic tiling bug (#1112).
-
Missing synchronisation for in-place updates at group level.
-
Fixed (in a hacky way) an issue where
reduce_by_index
would use
too much local memory on AMD GPUs when using the OpenCL backend.
0.16.4
Added
-
#[unroll]
attribute. -
Better error message when writing
a[i][j]
(#1095). -
Better error message when missing "in" (#1091).
Fixed
-
Fixed compiler crash on certain patterns of nested parallelism
(#1068, #1069). -
NaN comparisons are now done properly in interpreter (#1070).
-
Fix incorrect movement of array indexing into branches
if
s
(#1073). -
Fix defunctorisation bug (#1088).
-
Fix issue where loop tiling might generate out-of-bounds reads
(#1094). -
Scans of empty arrays no longer result in out-of-bounds memory
reads. -
Fix yet another defunctionalisation bug due to missing
eta-expansion (#1100).
0.16.3
Added
-
random
input blocks forfuthark test
andfuthark bench
now
support floating-point literals, which must always have either an
f32
orf64
suffix. -
The
cuda
backend now supports the-d
option for executables. -
The integer modules now contain a
ctz
function for counting
trailing zeroes.
Fixed
-
The
pyopencl
backend now works with OpenCL devices that have
multiple types (most importantly, oclgrind). -
Fix barrier divergence when generating code for group-level
colletive copies in GPU backend. -
Intra-group flattening now looks properly inside of branches.
-
Intra-group flattened code versions are no longer used when the
resulting workgroups would have less than 32 threads (with default
thresholds anyway) (#1064).
0.16.2
Added
futhark autotune
: added--pass-option
.
Fixed
-
futhark bench
: progress bar now correct when number of runs is
less than 10 (#1050). -
Aliases of arguments passed for consuming parameters are now
properly checked (#1053). -
When using a GPU backend, errors are now properly cleared.
Previously, once e.g. an out-of-bounds error had occurred, all
future operations would fail with the same error. -
Size-coercing a transposed array no longer leads to invalid code
generation (#1054).