Skip to content

v0.30.0: Advanced optimizer support, MoE DeepSpeed support, add upcasting for FSDP, and more

Compare
Choose a tag to compare
@muellerzr muellerzr released this 03 May 15:29
· 246 commits to main since this release

Core

  • We've simplified the tqdm wrapper to make it fully passthrough, no need to have tqdm(main_process_only, *args), it is now just tqdm(*args) and you can pass in is_main_process as a kwarg.
  • We've added support for advanced optimizer usage:
  • Enable BF16 autocast to everything during FP8 and enable FSDP by @muellerzr in #2655
  • Support dataloader send_to_device calls to use non-blocking by @drhead in #2685
  • allow gather_for_metrics to be more flexible by @SunMarc in #2710
  • Add cann version info to command accelerate env for NPU by @statelesshz in #2689
  • Add MLU rng state setter by @ArthurinRUC in #2664
  • device agnostic testing for hooks&utils&big_modeling by @statelesshz in #2602

Documentation

  • Through collaboration between @fabianlim (lead contribuitor), @stas00, @pacman100, and @muellerzr we have a new concept guide out for FSDP and DeepSpeed explicitly detailing how each interop and explaining fully and clearly how each of those work. This was a momumental effort by @fabianlim to ensure that everything can be as accurate as possible to users. I highly recommend visiting this new documentation, available here
  • New distributed inference examples have been added thanks to @SunMarc in #2672
  • Fixed some docs for using internal trackers by @brentyi in #2650

DeepSpeed

  • Accelerate can now handle MoE models when using deepspeed, thanks to @pacman100 in #2662
  • Allow "auto" for gradient clipping in YAML by @regisss in #2649
  • Introduce a deepspeed-specific Docker image by @muellerzr in #2707. To use, pull the gpu-deepspeed tag docker pull huggingface/accelerate:cuda-deepspeed-nightly

Megatron

Big Modeling

  • Add strict arg to load_checkpoint_and_dispatch by @SunMarc in #2641

Bug Fixes

  • Fix up state with xla + performance regression by @muellerzr in #2634
  • Parenthesis on xpu_available by @muellerzr in #2639
  • Fix is_train_batch_min type in DeepSpeedPlugin by @yhna940 in #2646
  • Fix backend check by @jiqing-feng in #2652
  • Fix the rng states of sampler's generator to be synchronized for correct sharding of dataset across GPUs by @pacman100 in #2694
  • Block AMP for MPS device by @SunMarc in #2699
  • Fixed issue when doing multi-gpu training with bnb when the first gpu is not used by @SunMarc in #2714
  • Fixup free_memory to deal with garbage collection by @muellerzr in #2716
  • Fix sampler serialization failing by @SunMarc in #2723
  • Fix deepspeed offload device type in the arguments to be more accurate by @yhna940 in #2717

Full Changelog

New Contributors

Full Changelog: v0.29.3...v0.30.0