Compact legendre polynomials #164

lukasm91 · 2024-10-16T11:23:00Z

Legendre polynomials don't need to be stored zero-padded, we can just concatenate them with proper zero padding.

E.g. on tco2559, this saves up to almost 40GB per rank (used to be 64 GB, now it is 26 GB). Initially, we did this for Leonardo, because legendre coefficients used almost the whole device memory.

lukasm91 · 2024-10-30T09:08:24Z

(convert to draft for the moment ==> it is fnished but we should first merge the other PR, then rebase and review)

lukasm91 · 2024-12-16T09:06:17Z

Beside replacing ext_acc with the proper "copy module", we are only left with this PR from the old GPU branch which I now rebased on top of develop. This PR is compacting the legendre polynomials and with this removing zero padding.

This PR is not expected to interfere with any other PR, as it only touches the GEMMs. Since I am touching the CUDA interfaces anyway, I also added const to pointers in the interface.

Applied clang-format
Tested that everything is properly deallocated
Tested large sizes for no overflows.

@samhatfield @wdeconinck Feel free to review when you find time.

wdeconinck

I think this looks all fine and what a nice saving!
Perhaps @samhatfield can run it in few places to double check all is in order ?

samhatfield · 2024-12-18T14:19:42Z

If I can find the time this week, I will take a look. It might have to wait until next year.

src/trans/gpu/external/setup_trans.F90

samhatfield · 2025-01-15T10:24:05Z

Sorry, I am gradually getting through this finally :) Will add some more comments soon.

src/trans/gpu/external/setup_trans.F90

…olynomials

samhatfield · 2025-01-21T13:52:42Z

Sorry @lukasm91, #180 has introduced a merge conflict, but it looks quite simple to resolve.

…olynomials

lukasm91 · 2025-01-21T14:06:10Z

no worry - I merged it with develop.

src/trans/gpu/algor/hicblas_gemm.hip.cpp

src/trans/gpu/algor/hicblas_cutlass.cuda.h

samhatfield · 2025-01-23T13:05:37Z

If I understand right, you have basically flattened the Legendre polynomial work arrays (ZAA, ZAS), which contain the polynomials for every zonal wavenumber, from 3D to 1D, borrowing the same offset indexing style of the other 1D work arrays. The 3D representation was wasteful because we allocated the same amount of memory for each zonal wavenumber despite that the size of the polynomial array decreases as you go up in wavenumber. Very nice work indeed.

I am happy to merge this pending the conversation above and once you give me the go ahead @lukasm91.

lukasm91 · 2025-01-23T13:54:53Z

I just modified the consts to be "consistent" now. Your description makes sense! I think from my side we are ready to merge.

lukasm91 changed the base branch from main to develop October 16, 2024 11:34

samhatfield added the enhancement New feature or request label Oct 17, 2024

lukasm91 marked this pull request as draft October 30, 2024 09:07

samhatfield added the gpu label Dec 9, 2024

lukasm91 force-pushed the compact-legendre-polynomials branch from 04194bc to d0d2349 Compare December 16, 2024 08:51

compact legendre polynomials

2a87188

lukasm91 force-pushed the compact-legendre-polynomials branch from d0d2349 to 2a87188 Compare December 16, 2024 09:00

lukasm91 marked this pull request as ready for review December 16, 2024 09:06

wdeconinck approved these changes Dec 17, 2024

View reviewed changes

samhatfield reviewed Jan 9, 2025

View reviewed changes

src/trans/gpu/external/setup_trans.F90 Outdated Show resolved Hide resolved

Remove redundant use statement

7b7c262

samhatfield reviewed Jan 20, 2025

View reviewed changes

src/trans/gpu/external/setup_trans.F90 Outdated Show resolved Hide resolved

lukasm91 added 2 commits January 20, 2025 07:15

remove imloc0 from setup_trans

a754ecd

Merge remote-tracking branch 'public/develop' into compact-legendre-p…

b6dd0c2

…olynomials

Merge remote-tracking branch 'public/develop' into compact-legendre-p…

3e4b947

…olynomials

samhatfield reviewed Jan 22, 2025

View reviewed changes

src/trans/gpu/algor/hicblas_gemm.hip.cpp Outdated Show resolved Hide resolved

samhatfield reviewed Jan 22, 2025

View reviewed changes

src/trans/gpu/algor/hicblas_cutlass.cuda.h Show resolved Hide resolved

make all const west const

fdab7f0

samhatfield merged commit 7e71ea8 into ecmwf-ifs:develop Jan 23, 2025
11 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compact legendre polynomials #164

Compact legendre polynomials #164

lukasm91 commented Oct 16, 2024

lukasm91 commented Oct 30, 2024

lukasm91 commented Dec 16, 2024 •

edited

Loading

wdeconinck left a comment

samhatfield commented Dec 18, 2024

samhatfield commented Jan 15, 2025

samhatfield commented Jan 21, 2025

lukasm91 commented Jan 21, 2025

samhatfield commented Jan 23, 2025

lukasm91 commented Jan 23, 2025

Compact legendre polynomials #164

Compact legendre polynomials #164

Conversation

lukasm91 commented Oct 16, 2024

lukasm91 commented Oct 30, 2024

lukasm91 commented Dec 16, 2024 • edited Loading

wdeconinck left a comment

Choose a reason for hiding this comment

samhatfield commented Dec 18, 2024

samhatfield commented Jan 15, 2025

samhatfield commented Jan 21, 2025

lukasm91 commented Jan 21, 2025

samhatfield commented Jan 23, 2025

lukasm91 commented Jan 23, 2025

lukasm91 commented Dec 16, 2024 •

edited

Loading