Summary

This release significantly enhances GPU utilization through a new tensor transaction mechanism for batched sync operations and simultaneous reads of multiple bindings for CubeCL runtimes. It also includes multiple performance optimizations like mixed precision support for matrix multiplication and convolution operations, as well as notable GEMM improvements.

Backend capabilities have been expanded with a new remote backend for distributed computing, improved SPIR-V support, custom operations fusion and an experimental fused matrix multiplication.

Training components have been expanded to support semantic segmentation and object detection datasets, new training metrics and improved training performance thanks to an async metric processor.

As with previous releases, this version includes various bug fixes, further performance optimizations, new tensor operations and enhanced documentation.

Module & Tensor

Add warning in docstring for indices bound checks (#2462) @laggui
Add remainder op for tensor (#2427) @med1844
Add float cast tensor op (#2483 #2511 #2538 #2586 #2671) @laggui
Add step learning rate scheduler (#2423) @towerpark
Add tensor split operator (#2490) @agelas
Add tensor transaction mechanism to batch multiple sync operations (#2521) @nathanielsimard
[Breaking] Make .init() method of LR schedulers return Result (#2527) @towerpark
Make optimizer state public (#2561) @ArthurBrussee
Accept function pointer or closure for freq scaling (#2634) @laggui
Change pad value w/ ElementConversion (#2653) @laggui
Add checks for even padding when kernel size is even (#2677) @laggui

Bug Fixes

Fix unsqueeze dims with multiple trailing negative indices (#2496) @laggui
Fix one_hot implementation for Int Tensors (#2501) @maun
Fix tensor prod and prod dim containing nan values (#2515) @quinton11
Expose ItemLazy to be able to implement for custom types (#2525) @laggui
Check nonzero stride, dilation and groups (#2540) @laggui
Module derive types should inherit visibility (#2610) @laggui
Add dropout prob check (#2695) @laggui

Backends

Add remote Backend (#2463) @nathanielsimard
Add support for custom operations fusion (#2486) @ArthurBrussee
[Breaking] Remove precision bridge (#2538) @laggui
Add fused matmul under fusion experimental feature flag (#2622 #2690) @nathanielsimard

Bug Fixes

Prevent various OOB accesses and discontiguous buffer bugs (#2467) @wingertge
Fix autodiff memory management by verifying parent nodes' existence (#2488) @jnamika
Fix burn remote deadlock + burn fusion draining (#2492) @nathanielsimard
Remove dtype rewrite (#2528) @ArthurBrussee
Fix reduce autotune key no anchor (#2696) @nathanielsimard

Documentation & Examples

Add wgpu-spirv and hip-jit features to text-classification example (#2422) @syl20bnr
Add tensor basic ops examples (#2468) @quinton11
Add segmentation mask to burn book (#2495) @anthonytorlucci
Add numeric tensor examples (#2514) @quinton11
Add module mapper book examples (#2621 #2632) @laggui

Fixes

Fix output dim in embedding nn docstring (#2452) @getumen
Fix tri mask ops return docstring (#2517) @laggui
Fix the incorrect link in contributor-books (#2583) @tiruka
Fix the broken WGSL link in the README (#2607) @korbexmachina
Fix module visitor and mapper trait definition in the book (#2609) @laggui
Fix load_file usage to keep using model (#2672) @laggui
Don't mention a fixed candle bug (#2689) @kitterion

ONNX Support

Format all type names (#2436) @samolego
Add ONNX op Random Normal Like (#2441) @tiruka
Add ONNX op Random Uniform Like (#2448) @tiruka
Infer convolution kernel shape from weight (#2544) @laggui

Enhancements

Improve ndarray tensor creation from memory (#2439) @nathanielsimard
Dont attempt naive reduction when reduce_dim is too high (#2414) @ArthurBrussee
Add more type support for burn-jit (#2454) @wingertge
Rewrite legacy cpa kernels (#2455) @wingertge
Implicit GEMM optimizations/bug fixes (#2499) @wingertge
Add custom NCHW to NHWC kernel for implicit GEMM (optimization) (#2530) @wingertge
Support 8-bit bool for JitBackend (#2526) @wingertge
Implicit gemm rewrite optimization (#2545) @wingertge
Fix autotune error handling (#2670) @nathanielsimard
Use float intrinsics for deform_conv2d backward, fix into_data for padded tensors (#2681) @wingertge

Refactoring

Migrate to cubecl IR refactor (#2418) @wingertge
DefaultDevice should be an alias of BestAvailable (#2443) @ArthurBrussee
Replace crates by dependi (#2477) @vincentmasse
Refactor quantization tensor data representation (#2479) @laggui
Use alias for more consistent typing (#2497) @loganbnielsen
Add QTensorOps docs + refactor tests to simplify inputs (#2557) @laggui
Update for rust 1.83 (#2562 #2605) @laggui
Matmul + CubeCL Update (#2551) @nathanielsimard
Migrate matmul autotune to macro and fix accelerated (#2584) @wingertge
Refactor jit quantized tensor representation (#2604) @laggui
[Breaking] Fix alignment issue of TensorData bytes (#2416) @WorldSEnder
Refactor quantized bytes representation (#2627) @laggui
Update to new cubecl with improved compilation times (#2654) @nathanielsimard
Refactor unary + binary kernels (#2665) @nathanielsimard
Import code from github-device-flow crate for burnbench (#2667) @syl20bnr
Fix web examples and conflicting feature flags w/ default-features = false (#2691) @laggui
Use cubecl reduce w/ autotune (#2673) @maxtremblay

Miscellaneous

Use core::error::Error for no-std (#2346) @antimora
Update deny.toml to follow the spec changes of cargo-deny (#2408) @tiruka
Add segmentation mask to ImageFolderDataset (#2426) @anthonytorlucci
Add ROC AUC metric (#2466) @vincentmasse
Async Processor: run train metrics & dashboard on another thread (#2482) @nathanielsimard
Add precision classification metric (#2293) @tsanona
Add test int one_hot and change ops docs in the book (#2519) @tsanona
Add option to request manual quit on tui (#2489) @vincentmasse
Reduce log spam (#2556) @ArthurBrussee
Add ImageDatasetItem image path field (#2558) @wangjiawen2013
Fix xtask command with last version (#2566 #2582) @syl20bnr
Remove duplicate jit conv2d test (#2581) @tiruka
Relax Fn requirements for param map (#2620) @ArthurBrussee
Extend ImageFolderDataset to support import of COCO detection (#2612) @jin-eld
Add recall metric (#2518) @tsanona
Propagate audio feature flag (#2633) @laggui
Add F-score metric (#2648) @tsanona
Implement benchmark for reduce kernel (#2692) @maxtremblay

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.16.0

Summary

Module & Tensor

Bug Fixes

Backends

Bug Fixes

Documentation & Examples

Fixes

ONNX Support

Enhancements

Refactoring

Miscellaneous

Contributors