Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDP fixes #534

Merged
merged 3 commits into from
Oct 29, 2023
Merged

DDP fixes #534

merged 3 commits into from
Oct 29, 2023

Conversation

nkemnitz
Copy link
Collaborator

  • added @akhileshh's suggestion for the main node address.
  • found a way to name the WandbLogger runs based on their global rank
  • fixed a bug I introduced in the last PR: each DDP device must perform the exact same set of operations, otherwise you get CollectiveFingerPrint mismatches. More specifically: All devices must run the TraceCallback once, even if it's the exact same result and only rank 0 writes the checkpoint.

@codecov
Copy link

codecov bot commented Oct 29, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (c48ac5e) 100.00% compared to head (2e7a407) 100.00%.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #534   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          127       127           
  Lines         4231      4231           
=========================================
  Hits          4231      4231           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@supersergiy supersergiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes!

@supersergiy supersergiy merged commit d3e2828 into main Oct 29, 2023
@nkemnitz nkemnitz deleted the nkem/fix-collective-mismatch branch October 30, 2023 16:36
nkemnitz added a commit that referenced this pull request Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants