Skip to content

Releases: diku-dk/futhark

0.18.4

18 Dec 08:35
Compare
Choose a tag to compare

Added

  • When compiling to binaries in the C-based backends, the compiler
    now respects the CFLAGS and CC environment variables.

  • GPU backends: avoid some bounds-checks for parallel sections
    inside intra-kernel loops.

  • The cuda backend now uses a much faster single-pass scan
    implementation, although only for nonsegmented scans where the
    operator operates on scalars.

Fixed

  • futhark dataset now correctly detects trailing commas in textual
    input (#1189).

  • Fixed local memory capacity check for intra-group-parallel GPU kernels.

  • Fixed compiler bug on segmented rotates where the rotation amount
    is variant to the nest (#1192).

  • futhark repl no longer crashes on type errors in given file (#1193).

  • Fixed a simplification error for certain arithmetic expressions
    (#1194).

  • Fixed a small uniqueness-related bug in the compilation of
    operator section.

  • Sizes of opaque entry point arguments are now properly checked
    (related to #1198).

0.18.3

12 Nov 08:24
Compare
Choose a tag to compare

Fixed

  • Python backend now disables spurious NumPy overflow warnings for
    both library and binary code (#1180).

  • Undid deadlocking over-synchronisation for freeing opaque objects.

  • futhark datacmp now handles bad input files better (#1181).

0.18.2

07 Nov 22:40
Compare
Choose a tag to compare

Added

  • The GPU loop tiler can now handle loops where only a subset of the
    input arrays are tiled. Matrix-vector multiplication is one
    important program where this helps (#1145).

  • The number of threads used by the multicore backend is now
    configurable (--num-threads and
    futhark_context_config_set_num_threads()). (#1162)

Fixed

  • PyOpenCL backend would mistakenly still streat entry point
    argument sizes as 32 bit.

  • Warnings are now reported even for programs with type errors.

  • Multicore backend now works properly for very large iteration
    spaces.

  • A few internal generated functions (init_constants(),
    free_constants()) were mistakenly declared non-static.

  • Process exit code is now nonzero when compiler bugs and
    limitations are encountered.

  • Multicore backend crashed on reduce_by_index with nonempty target
    and empty input.

  • Fixed a flattening issue for certain complex map nestings
    (#1168).

  • Made API function futhark_context_clear_caches() thread safe
    (#1169).

  • API functions for freeing opaque objects are now thread-safe
    (#1169).

  • Tools such as futhark dataset no longer crash with an internal
    error if writing to a broken pipe (but they will return a nonzero
    exit code).

  • Defunctionalisation had a name shadowing issue that would crop up
    for programs making very advanced use of functional
    representations (#1174).

  • Type checker erroneously permitted pattern-matching on string
    literals (this would fail later in the compiler).

  • New coverage checker for pattern matching, which is more correct.
    However, it may not provide quite as nice counter-examples
    (#1134).

  • Fix rare internalisation error (#1177).

0.16.5

04 Nov 21:25
Compare
Choose a tag to compare

Fixed

  • Made API function futhark_context_clear_caches() thread safe
    (#1169).

  • API functions for freeing opaque objects are now thread-safe
    (#1169).

0.18.1

08 Oct 14:32
Compare
Choose a tag to compare

Added

  • Experimental multi-threaded CPU backend, multicore.

Changed

  • All sizes are now of type i64. This has wide-ranging
    implications and most programs will need to be updated (#134).

0.17.3

06 Oct 11:59
Compare
Choose a tag to compare

Added

  • Improved parallelisation of futhark bench compilation.

Fixed

  • Dataset generation for test programs now use the right futhark
    executable (#1133).

  • Really fix NaN comparisons in interpreter (#1070, again).

  • Fix entry points with a parameter that is a sum type where
    multiple constructors contain arrays of the same statically known
    size.

  • Fix in monomorphisation of types with constant sizes.

  • Fix in in-place lowering (#1142).

  • Fix tiling inside multiple nested loops (#1143).

0.17.2

19 Sep 10:14
Compare
Choose a tag to compare

Added

  • Obscure loop optimisation (#1110).

  • Faster matrix transposition in C backend.

  • Library code generated with CUDA backend can now be called from
    multiple threads.

  • Better optimisation of concatenations of array literals and
    replicates.

  • Array creation C API functions now accept const pointers.

  • Arrays can now be indexed (but not sliced) with any signed integer
    type (#1122).

  • Added --list-devices command to OpenCL binaries (#1131)

  • Added --help command to C, CUDA and OpenCL binaries (#1131)

Removed

  • The integer modules no longer contain iota and replicate
    functions. The top-level ones still exist.

  • The size module type has been removed from the prelude.

Changed

  • Range literals may no longer be produced from unsigned integers.

Fixed

  • Entry points with names that are not valid C (or Python)
    identifiers are now pointed out as problematic, rather than
    generating invalid C code.

  • Exotic tiling bug (#1112).

  • Missing synchronisation for in-place updates at group level.

  • Fixed (in a hacky way) an issue where reduce_by_index would use
    too much local memory on AMD GPUs when using the OpenCL backend.

0.16.4

27 Aug 14:35
Compare
Choose a tag to compare

Added

  • #[unroll] attribute.

  • Better error message when writing a[i][j] (#1095).

  • Better error message when missing "in" (#1091).

Fixed

  • Fixed compiler crash on certain patterns of nested parallelism
    (#1068, #1069).

  • NaN comparisons are now done properly in interpreter (#1070).

  • Fix incorrect movement of array indexing into branches ifs
    (#1073).

  • Fix defunctorisation bug (#1088).

  • Fix issue where loop tiling might generate out-of-bounds reads
    (#1094).

  • Scans of empty arrays no longer result in out-of-bounds memory
    reads.

  • Fix yet another defunctionalisation bug due to missing
    eta-expansion (#1100).

0.16.3

30 Jul 09:44
Compare
Choose a tag to compare

Added

  • random input blocks for futhark test and futhark bench now
    support floating-point literals, which must always have either an
    f32 or f64 suffix.

  • The cuda backend now supports the -d option for executables.

  • The integer modules now contain a ctz function for counting
    trailing zeroes.

Fixed

  • The pyopencl backend now works with OpenCL devices that have
    multiple types (most importantly, oclgrind).

  • Fix barrier divergence when generating code for group-level
    colletive copies in GPU backend.

  • Intra-group flattening now looks properly inside of branches.

  • Intra-group flattened code versions are no longer used when the
    resulting workgroups would have less than 32 threads (with default
    thresholds anyway) (#1064).

0.16.2

15 Jul 12:54
Compare
Choose a tag to compare

Added

  • futhark autotune: added --pass-option.

Fixed

  • futhark bench: progress bar now correct when number of runs is
    less than 10 (#1050).

  • Aliases of arguments passed for consuming parameters are now
    properly checked (#1053).

  • When using a GPU backend, errors are now properly cleared.
    Previously, once e.g. an out-of-bounds error had occurred, all
    future operations would fail with the same error.

  • Size-coercing a transposed array no longer leads to invalid code
    generation (#1054).