Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compact legendre polynomials #164

Merged
merged 6 commits into from
Jan 23, 2025

Conversation

lukasm91
Copy link
Collaborator

Legendre polynomials don't need to be stored zero-padded, we can just concatenate them with proper zero padding.

E.g. on tco2559, this saves up to almost 40GB per rank (used to be 64 GB, now it is 26 GB). Initially, we did this for Leonardo, because legendre coefficients used almost the whole device memory.

@lukasm91 lukasm91 changed the base branch from main to develop October 16, 2024 11:34
@samhatfield samhatfield added the enhancement New feature or request label Oct 17, 2024
@lukasm91 lukasm91 marked this pull request as draft October 30, 2024 09:07
@lukasm91
Copy link
Collaborator Author

(convert to draft for the moment ==> it is fnished but we should first merge the other PR, then rebase and review)

@samhatfield samhatfield added the gpu label Dec 9, 2024
@lukasm91 lukasm91 force-pushed the compact-legendre-polynomials branch from 04194bc to d0d2349 Compare December 16, 2024 08:51
@lukasm91 lukasm91 force-pushed the compact-legendre-polynomials branch from d0d2349 to 2a87188 Compare December 16, 2024 09:00
@lukasm91
Copy link
Collaborator Author

lukasm91 commented Dec 16, 2024

Beside replacing ext_acc with the proper "copy module", we are only left with this PR from the old GPU branch which I now rebased on top of develop. This PR is compacting the legendre polynomials and with this removing zero padding.

This PR is not expected to interfere with any other PR, as it only touches the GEMMs. Since I am touching the CUDA interfaces anyway, I also added const to pointers in the interface.

  • Applied clang-format
  • Tested that everything is properly deallocated
  • Tested large sizes for no overflows.

@samhatfield @wdeconinck Feel free to review when you find time.

@lukasm91 lukasm91 marked this pull request as ready for review December 16, 2024 09:06
Copy link
Collaborator

@wdeconinck wdeconinck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks all fine and what a nice saving!
Perhaps @samhatfield can run it in few places to double check all is in order ?

@samhatfield
Copy link
Collaborator

If I can find the time this week, I will take a look. It might have to wait until next year.

@samhatfield
Copy link
Collaborator

Sorry, I am gradually getting through this finally :) Will add some more comments soon.

@samhatfield
Copy link
Collaborator

Sorry @lukasm91, #180 has introduced a merge conflict, but it looks quite simple to resolve.

@lukasm91
Copy link
Collaborator Author

no worry - I merged it with develop.

@samhatfield
Copy link
Collaborator

If I understand right, you have basically flattened the Legendre polynomial work arrays (ZAA, ZAS), which contain the polynomials for every zonal wavenumber, from 3D to 1D, borrowing the same offset indexing style of the other 1D work arrays. The 3D representation was wasteful because we allocated the same amount of memory for each zonal wavenumber despite that the size of the polynomial array decreases as you go up in wavenumber. Very nice work indeed.

I am happy to merge this pending the conversation above and once you give me the go ahead @lukasm91.

@lukasm91
Copy link
Collaborator Author

I just modified the consts to be "consistent" now. Your description makes sense! I think from my side we are ready to merge.

@samhatfield samhatfield merged commit 7e71ea8 into ecmwf-ifs:develop Jan 23, 2025
11 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request gpu
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants