Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build(deps): update pytorch-lightning requirement from <2.0.0,>=1.9.0 to >=1.9.0,<2.1 in /requirements #2175

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Oct 16, 2023

Updates the requirements on pytorch-lightning to permit the latest version.

Release notes

Sourced from pytorch-lightning's releases.

Lightning 2.1: Train Bigger, Better, Faster

Lightning AI is excited to announce the release of Lightning 2.1 ⚡ It's the culmination of work from 79 contributors who have worked on features, bug-fixes, and documentation for a total of over 750+ commits since v2.0.

The theme of 2.1 is "bigger, better, faster": Bigger because training large multi-billion parameter models has gotten even more efficient thanks to FSDP, efficient initialization and sharded checkpointing improvements, better because it's easier than ever to scale models without making substantial code changes or installing third-party packages and faster because it leverages the latest hardware features to speed up training in low-bit precision thanks to new precision plugins like bitsandbytes and transformer engine. And of course, as the name implies, this release fully leverages the latest features in PyTorch 2.1 🎉

Highlights

Improvements To Large-Scale Training With FSDP

The FSDP strategy for training large billion-parameter models gets substantial improvements and new features in Lightning 2.1, both in Trainer and Fabric (in case you didn't know, Fabric is the latest addition to the Lightning family of tools to scale models without the boilerplate code). FSDP is now more user-friendly to configure, has memory management and speed improvements, and we have a brand new end-to-end user guide with best practices (Trainer, Fabric).

Efficient Saving and Loading of Large Checkpoints

When training large billion-parameter models with FSDP, saving and resuming training, or even just loading model parameters for finetuning can be challenging, as users are are often plagued by out-of-memory errors and speed bottlenecks.

In 2.1, we made several improvements. Starting with saving checkpoints, we added support for distributed/sharded checkpoints, enabled through the setting state_dict_type in the strategy (#18364, #18358):

Trainer:

import lightning as L
from lightning.pytorch.strategies import FSDPStrategy
Default used by the strategy
strategy = FSDPStrategy(state_dict_type="full")
Enable saving distributed checkpoints
</tr></table>

... (truncated)

Commits

You can trigger a rebase of this PR by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

📚 Documentation preview 📚: https://torchmetrics--2175.org.readthedocs.build/en/2175/

Note
Automatic rebases have been disabled on this pull request as it has been open for over 30 days.

@dependabot dependabot bot added the test / CI testing or CI label Oct 16, 2023
@dependabot dependabot bot requested a review from a team October 16, 2023 23:59
@codecov
Copy link

codecov bot commented Oct 17, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69%. Comparing base (b1d519c) to head (e4fe1aa).
Report is 1 commits behind head on master.

Additional details and impacted files
@@          Coverage Diff           @@
##           master   #2175   +/-   ##
======================================
  Coverage      69%     69%           
======================================
  Files         313     313           
  Lines       17617   17617           
======================================
  Hits        12162   12162           
  Misses       5455    5455           

@Borda
Copy link
Member

Borda commented Oct 17, 2023

pending on #2129

@Borda
Copy link
Member

Borda commented Oct 17, 2023

@dependabot rebase

@dependabot dependabot bot force-pushed the dependabot-pip-requirements-pytorch-lightning-gte-1.6.0-and-lt-2.2.0 branch 3 times, most recently from 55f1c45 to f96e4c5 Compare October 17, 2023 15:58
@dependabot dependabot bot force-pushed the dependabot-pip-requirements-pytorch-lightning-gte-1.6.0-and-lt-2.2.0 branch from f96e4c5 to a7e1731 Compare October 17, 2023 16:04
@mergify mergify bot removed the has conflicts label Oct 17, 2023
@SkafteNicki
Copy link
Member

PR on lightning side that seems to break tests: Lightning-AI/pytorch-lightning#18567

@mergify mergify bot added ready and removed ready labels Oct 17, 2023
@SkafteNicki
Copy link
Member

@Borda I fixed a few issues, but I am still unable to reproduce the difference in values for the integration testing.
As it is only GPU it is failing I am fearing that we are running into problems with the 3090 cards as we have done in the past

requirements/_integrate.txt Outdated Show resolved Hide resolved
@mergify mergify bot added ready and removed ready labels Oct 20, 2023
@Borda Borda changed the title build(deps): update pytorch-lightning requirement from <2.1.0,>=1.6.0 to >=1.6.0,<2.2.0 in /requirements build(deps): update pytorch-lightning requirement from <2.1.0,>=1.9.0 to >=1.9.0,<2.2.0 in /requirements Oct 24, 2023
@Borda Borda changed the title build(deps): update pytorch-lightning requirement from <2.1.0,>=1.9.0 to >=1.9.0,<2.2.0 in /requirements build(deps): update pytorch-lightning requirement from <2.0.0,>=1.9.0 to >=1.9.0,<2.2.0 in /requirements Oct 24, 2023
@Borda Borda enabled auto-merge (squash) October 24, 2023 16:01
@Borda
Copy link
Member

Borda commented Mar 19, 2024

it is strange as it passes for me locally with single GPU

@Borda
Copy link
Member

Borda commented Mar 28, 2024

it is strange, seems to hang with test_metric_lightning_log on GPU with torch 2.x

@Borda Borda disabled auto-merge April 10, 2024 08:05
.azure/gpu-integrations.yml Outdated Show resolved Hide resolved
@Borda Borda force-pushed the dependabot-pip-requirements-pytorch-lightning-gte-1.6.0-and-lt-2.2.0 branch from 84dda8b to a4676c7 Compare May 6, 2024 23:32
@Borda Borda force-pushed the dependabot-pip-requirements-pytorch-lightning-gte-1.6.0-and-lt-2.2.0 branch from 7307925 to c85fe11 Compare May 6, 2024 23:44
@Borda
Copy link
Member

Borda commented May 6, 2024

so the problem is the multi-GPU case which was likely the switched for defaults between 1.x and 2.x
cc: @awaelchli @SkafteNicki

@Borda Borda enabled auto-merge (squash) May 6, 2024 23:49
@Borda Borda disabled auto-merge May 7, 2024 08:30
@Borda Borda merged commit 25bf7a6 into master May 7, 2024
67 of 70 checks passed
@Borda Borda deleted the dependabot-pip-requirements-pytorch-lightning-gte-1.6.0-and-lt-2.2.0 branch May 7, 2024 08:31
baskrahmer pushed a commit to baskrahmer/torchmetrics that referenced this pull request May 13, 2024
… to >=1.9.0,<2.1 in /requirements (Lightning-AI#2175)

* build(deps): update pytorch-lightning requirement in /requirements

Updates the requirements on [pytorch-lightning](https://github.com/Lightning-AI/lightning) to permit the latest version.
- [Release notes](https://github.com/Lightning-AI/lightning/releases)
- [Commits](Lightning-AI/pytorch-lightning@1.6.0...2.1.0)

---
updated-dependencies:
- dependency-name: pytorch-lightning
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* fix for v2.0 or higher
* <2.1.0
* fix seed
* Apply suggestions from code review

CI would run only with single GPU

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: SkafteNicki <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka <[email protected]>
Borda pushed a commit that referenced this pull request May 15, 2024
… to >=1.9.0,<2.1 in /requirements (#2175)

* build(deps): update pytorch-lightning requirement in /requirements

Updates the requirements on [pytorch-lightning](https://github.com/Lightning-AI/lightning) to permit the latest version.
- [Release notes](https://github.com/Lightning-AI/lightning/releases)
- [Commits](Lightning-AI/pytorch-lightning@1.6.0...2.1.0)

---
updated-dependencies:
- dependency-name: pytorch-lightning
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* fix for v2.0 or higher
* <2.1.0
* fix seed
* Apply suggestions from code review

CI would run only with single GPU

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: SkafteNicki <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka <[email protected]>
(cherry picked from commit 25bf7a6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test / CI testing or CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants