Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

diku-dk / futhark Public

Notifications You must be signed in to change notification settings
Fork 172
Star 2.5k

Code
Issues 64
Pull requests 6
Discussions
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security
Insights

Releases: diku-dk/futhark

Releases · diku-dk/futhark

0.18.4

18 Dec 08:35

Compare

Choose a tag to compare

Loading

0.18.4

Added

When compiling to binaries in the C-based backends, the compiler
now respects the CFLAGS and CC environment variables.
GPU backends: avoid some bounds-checks for parallel sections
inside intra-kernel loops.
The cuda backend now uses a much faster single-pass scan
implementation, although only for nonsegmented scans where the
operator operates on scalars.

Fixed

futhark dataset now correctly detects trailing commas in textual
input (#1189).
Fixed local memory capacity check for intra-group-parallel GPU kernels.
Fixed compiler bug on segmented rotates where the rotation amount
is variant to the nest (#1192).
futhark repl no longer crashes on type errors in given file (#1193).
Fixed a simplification error for certain arithmetic expressions
(#1194).
Fixed a small uniqueness-related bug in the compilation of
operator section.
Sizes of opaque entry point arguments are now properly checked
(related to #1198).

Assets 3

Loading

All reactions

0.18.3

12 Nov 08:24

Compare

Choose a tag to compare

Loading

0.18.3

Fixed

Python backend now disables spurious NumPy overflow warnings for
both library and binary code (#1180).
Undid deadlocking over-synchronisation for freeing opaque objects.
futhark datacmp now handles bad input files better (#1181).

Assets 3

Loading

All reactions

0.18.2

07 Nov 22:40

Compare

Choose a tag to compare

Loading

0.18.2

Added

The GPU loop tiler can now handle loops where only a subset of the
input arrays are tiled. Matrix-vector multiplication is one
important program where this helps (#1145).
The number of threads used by the multicore backend is now
configurable (--num-threads and
futhark_context_config_set_num_threads()). (#1162)

Fixed

PyOpenCL backend would mistakenly still streat entry point
argument sizes as 32 bit.
Warnings are now reported even for programs with type errors.
Multicore backend now works properly for very large iteration
spaces.
A few internal generated functions (init_constants(),
free_constants()) were mistakenly declared non-static.
Process exit code is now nonzero when compiler bugs and
limitations are encountered.
Multicore backend crashed on reduce_by_index with nonempty target
and empty input.
Fixed a flattening issue for certain complex map nestings
(#1168).
Made API function futhark_context_clear_caches() thread safe
(#1169).
API functions for freeing opaque objects are now thread-safe
(#1169).
Tools such as futhark dataset no longer crash with an internal
error if writing to a broken pipe (but they will return a nonzero
exit code).
Defunctionalisation had a name shadowing issue that would crop up
for programs making very advanced use of functional
representations (#1174).
Type checker erroneously permitted pattern-matching on string
literals (this would fail later in the compiler).
New coverage checker for pattern matching, which is more correct.
However, it may not provide quite as nice counter-examples
(#1134).
Fix rare internalisation error (#1177).

Assets 3

Loading

All reactions

0.16.5

04 Nov 21:25

Compare

Choose a tag to compare

Loading

0.16.5

Fixed

Made API function futhark_context_clear_caches() thread safe
(#1169).
API functions for freeing opaque objects are now thread-safe
(#1169).

Assets 3

Loading

All reactions

0.18.1

08 Oct 14:32

Compare

Choose a tag to compare

Loading

0.18.1

Added

Experimental multi-threaded CPU backend, multicore.

Changed

All sizes are now of type i64. This has wide-ranging
implications and most programs will need to be updated (#134).

Assets 3

Loading

All reactions

0.17.3

06 Oct 11:59

Compare

Choose a tag to compare

Loading

0.17.3

Added

Improved parallelisation of futhark bench compilation.

Fixed

Dataset generation for test programs now use the right futhark
executable (#1133).
Really fix NaN comparisons in interpreter (#1070, again).
Fix entry points with a parameter that is a sum type where
multiple constructors contain arrays of the same statically known
size.
Fix in monomorphisation of types with constant sizes.
Fix in in-place lowering (#1142).
Fix tiling inside multiple nested loops (#1143).

Assets 3

Loading

All reactions

0.17.2

19 Sep 10:14

Compare

Choose a tag to compare

Loading

0.17.2

Added

Obscure loop optimisation (#1110).
Faster matrix transposition in C backend.
Library code generated with CUDA backend can now be called from
multiple threads.
Better optimisation of concatenations of array literals and
replicates.
Array creation C API functions now accept const pointers.
Arrays can now be indexed (but not sliced) with any signed integer
type (#1122).
Added --list-devices command to OpenCL binaries (#1131)
Added --help command to C, CUDA and OpenCL binaries (#1131)

Removed

The integer modules no longer contain iota and replicate
functions. The top-level ones still exist.
The size module type has been removed from the prelude.

Changed

Range literals may no longer be produced from unsigned integers.

Fixed

Entry points with names that are not valid C (or Python)
identifiers are now pointed out as problematic, rather than
generating invalid C code.
Exotic tiling bug (#1112).
Missing synchronisation for in-place updates at group level.
Fixed (in a hacky way) an issue where reduce_by_index would use
too much local memory on AMD GPUs when using the OpenCL backend.

Assets 3

Loading

All reactions

0.16.4

27 Aug 14:35

Compare

Choose a tag to compare

Loading

0.16.4

Added

#[unroll] attribute.
Better error message when writing a[i][j] (#1095).
Better error message when missing "in" (#1091).

Fixed

Fixed compiler crash on certain patterns of nested parallelism
(#1068, #1069).
NaN comparisons are now done properly in interpreter (#1070).
Fix incorrect movement of array indexing into branches ifs
(#1073).
Fix defunctorisation bug (#1088).
Fix issue where loop tiling might generate out-of-bounds reads
(#1094).
Scans of empty arrays no longer result in out-of-bounds memory
reads.
Fix yet another defunctionalisation bug due to missing
eta-expansion (#1100).

Assets 3

Loading

All reactions

0.16.3

30 Jul 09:44

Compare

Choose a tag to compare

Loading

0.16.3

Added

random input blocks for futhark test and futhark bench now
support floating-point literals, which must always have either an
f32 or f64 suffix.
The cuda backend now supports the -d option for executables.
The integer modules now contain a ctz function for counting
trailing zeroes.

Fixed

The pyopencl backend now works with OpenCL devices that have
multiple types (most importantly, oclgrind).
Fix barrier divergence when generating code for group-level
colletive copies in GPU backend.
Intra-group flattening now looks properly inside of branches.
Intra-group flattened code versions are no longer used when the
resulting workgroups would have less than 32 threads (with default
thresholds anyway) (#1064).

Assets 3

Loading

All reactions

0.16.2

15 Jul 12:54

Compare

Choose a tag to compare

Loading

0.16.2

Added

futhark autotune: added --pass-option.

Fixed

futhark bench: progress bar now correct when number of runs is
less than 10 (#1050).
Aliases of arguments passed for consuming parameters are now
properly checked (#1053).
When using a GPU backend, errors are now properly cleared.
Previously, once e.g. an out-of-bounds error had occurred, all
future operations would fail with the same error.
Size-coercing a transposed array no longer leads to invalid code
generation (#1054).

Assets 3

Loading

All reactions

Previous 1 2 … 6 7 8 9 10 11 12 Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.