Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete Device2Host caused by comm with device and host #2840

Merged
merged 25 commits into from
Dec 21, 2024

Conversation

zhaozheng09
Copy link
Contributor

@zhaozheng09 zhaozheng09 commented Nov 22, 2024

Delete some D2H ops

if not will be bring a sync within host and device, so replace it from if not to torch.where

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

before:
image
after:
image
Host will be block until device done, this D2H maybe block pipeline of host and device, so I use where op to replace if not

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pls add tests for this fix?
and maybe pls update the title to better express what the PR does

Copy link

codecov bot commented Nov 25, 2024

Codecov Report

Attention: Patch coverage is 90.90909% with 2 lines in your changes missing coverage. Please review.

Project coverage is 69%. Comparing base (b9ab4bc) to head (6ced9ed).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #2840    +/-   ##
=======================================
- Coverage      69%     69%    -0%     
=======================================
  Files         346     332    -14     
  Lines       19130   18957   -173     
=======================================
- Hits        13228   13054   -174     
- Misses       5902    5903     +1     

@zhaozheng09 zhaozheng09 changed the title async host/device . delete some H2D op Nov 26, 2024
@zhaozheng09 zhaozheng09 changed the title delete some H2D op delete some D2H op Nov 26, 2024
@zhaozheng09
Copy link
Contributor Author

can we pls add tests for this fix? and maybe pls update the title to better express what the PR does

image

must be 100% cover ? this code will raise in old torch version, I don't know how to cover, please give me an example ? thx very much.

@zhaozheng09 zhaozheng09 changed the title delete some D2H op Delete Device2Host caused by comm with device and host Nov 26, 2024
@Borda
Copy link
Member

Borda commented Nov 26, 2024

this code will raise in old torch version, I don't know how to cover, please give me an example ? thx very much

ok, so you say that this PR does not have any impact on functional metrics results but improves performances, correct?

@mergify mergify bot added the ready label Nov 27, 2024
@SkafteNicki
Copy link
Member

did a couple of changes to the PR:

  • move the logic torch.where logic to a separate function such that it could be used multiple places
  • made the testing of cpu+sigmoid+half less restrictive e.g. instead of always being skipped it is now only skipped for older pytorch builds

@mergify mergify bot removed the ready label Dec 4, 2024
compat cpu and device
@mergify mergify bot removed the has conflicts label Dec 11, 2024
@mergify mergify bot added the ready label Dec 16, 2024
CHANGELOG.md Show resolved Hide resolved
@Borda Borda enabled auto-merge (squash) December 17, 2024 18:12
Copy link
Contributor

@lantiga lantiga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

In general we will have to become sensitive to data-dependent control flow, to make torchmetrics more compiler-friendly in the future. This applies to both before and after this PR of course, having the condition in a separate function like it is now allows us to side-step the problem, so the final state is better from this point of view.

@Borda Borda merged commit 3ff199c into Lightning-AI:master Dec 21, 2024
63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants