-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Support #67
GPU Support #67
Conversation
Ping @TeachRaccooon. OpenMP isn't correctly linked into RandLAPACK when building with CUDA. The problem is that CUDA requires specific compiler flags in order to interface with OpenMP code. So you'll need to change RandLAPACK's top-level include(compiler_flags) before the # set default compiler flags
if (NOT CMAKE_CXX_FLAGS)
set(tmp "-fPIC -std=c++20 -Wall -Wextra -Wno-unknown-pragmas")
if ((APPLE) AND ("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang"))
set(tmp "${tmp} -stdlib=libc++")
endif()
if ("${CMAKE_BUILD_TYPE}" MATCHES "Release")
set(tmp "${tmp} -O3 -march=native -mtune=native -fno-trapping-math -fno-math-errno")
if (NOT "${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang")
set(tmp "${tmp} -fno-signaling-nans")
endif()
endif()
set(CMAKE_CXX_FLAGS "${tmp}"
CACHE STRING "RandLAPACK build defaults"
FORCE)
endif()
if (NOT CMAKE_CUDA_FLAGS)
set(tmp "--default-stream per-thread --expt-relaxed-constexpr")
if ("${CMAKE_BUILD_TYPE}" MATCHES "Release")
set(tmp "${tmp} -Xcompiler -fopenmp,-Wall,-Wextra,-O3,-march=native,-mtune=native,-fno-trapping-math,-fno-math-errno")
if (NOT "${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang")
set(tmp "${tmp},-fno-signaling-nans")
endif()
elseif ("${CMAKE_BUILD_TYPE}" MATCHES "Debug")
set(tmp "${tmp} -g -G -Xcompiler -fopenmp,-Wall,-Wextra,-O0,-g")
endif()
set(CMAKE_CUDA_FLAGS "${tmp}"
CACHE STRING "CUDA compiler build defaults"
FORCE)
string(REGEX REPLACE "-O[0-9]" "-O3" tmp "${CMAKE_CXX_FLAGS_RELEASE}")
set(CMAKE_CXX_FLAGS_RELEASE "${tmp}"
CACHE STRING "CUDA compiler build defaults"
FORCE)
string(REGEX REPLACE "-O[0-9]" "-O3" tmp "${CMAKE_CUDA_FLAGS_RELEASE}")
set(CMAKE_CUDA_FLAGS_RELEASE "${tmp}"
CACHE STRING "CUDA compiler build defaults"
FORCE)
endif() I have these changes made locally in Jonathan's cloned version of RandLAPACK's GPU branch. You can just make the changes when convenience here on the main PR, then Jonathan can pull them down properly later on. |
Just pushed that change. |
INSTALL.md
Outdated
## 0. Software requirements | ||
RandLAPACK_GPU temporary requirements: | ||
GNU 13.1.0 | ||
NVIDIA 12.4.131 (make sure to use driver v 550) | ||
CMAKE 3.29.2 | ||
All that is used to ensure we can compile with C++20 features with no issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change CMake spec to 3.27.
Also, make sure RandLAPACK can compile even if CUDA isn't present. Obviously you can't do anything GPU-based, but the CPU-only functionality should still work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not working without the GPU support rn, I'll need to figure out a way for that,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of trivial or almost-trivial comments. Some minor comments.
@rileyjmurray I just pushed a solution to the issue of the project not compiling without CUDA support. |
@TeachRaccooon can we just have CMake look for a custom flag like "RequireCUDA" that defaults to false/undefined? Then we'd try to setup CUDA if (and only if) that flag was present and true. |
Adding CUDA toolkit into our workflow per GPU_SUPPORT PR.
…POTRF+SYRK on a GPU not staying strictly in fp64 land. Thee code here is not polished and will be reverted
…n executable, so ./bin/RandLAPACK_tests_gpu doesnt dispatch benchmarks by default
e903f41
to
16cdbb8
Compare
This PR inherits commits originally introduced in PR #67. The discussion of some of the details can also be found there. The list of changes is as follows: 1. Introduces a CMake build option for GPU support (specifically, CUDA support) in RandLAPACK. This is enabled with ``-DRequireCUDA=ON``. 2. Introduces rl_cuda_kernels.cuh - file contains various utility GPU functions, including some BLAS and LAPACK-level routines. 3. Introduces rl_cqrrpt_gpu.cuh, - a GPU version of CQRRPT. Note that since many parts of CQRRPT (including sketching) do not (currently) have GPU versions, the data offload happens inside of the algorithm. The input data is expected to be located on a CPU. 4. Introduces rl_cqrrp_gpu.cuh - a GPU version of CQRRP algorithm, which accepts data allocated on a GPU. 5. Includes tests for the functions from the above files and benchmarks (living in test space) for CQRRP algorithm. In the future, these should be moved into benchmarking space (built separately). For now, we can avoid running these with the rest of the tests by using `ctest --gtest_filter=-*bench*`. Issues #77 - #80 are related to this PR. --------- Co-authored-by: Riley John Murray <[email protected]> Co-authored-by: Riley John Murray <[email protected]> Co-authored-by: Max Melnichenko <[email protected]> Co-authored-by: rileyjmurray <[email protected]>
Introduces barebones GPU support into RandLAPACK.
This is a WIP PR.