Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compile failure on RTX 4090 #2076

Merged
merged 6 commits into from
Jan 17, 2024

Conversation

JieFengWang
Copy link
Contributor

@JieFengWang JieFengWang commented Jan 3, 2024

[bug]Fix compile failure on RTX 4090. related issue (#2073)

@JieFengWang JieFengWang requested a review from a team as a code owner January 3, 2024 03:12
Copy link

copy-pr-bot bot commented Jan 3, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the cpp label Jan 3, 2024
@cjnolet
Copy link
Member

cjnolet commented Jan 3, 2024

/ok to test

@JieFengWang
Copy link
Contributor Author

/ok to test

Hi, cjnolet. I see a pr / checks / check-style (push) failure due to the license. Should I modify the license as the checker recommend:

- * Copyright (c) 2023, NVIDIA CORPORATION.
+ * Copyright (c) 2023-2024, NVIDIA CORPORATION.

or waiting for the raft group to replace all the copyright information?

@wphicks
Copy link
Contributor

wphicks commented Jan 3, 2024

@JieFengWang Yes, if you could go ahead and update the header, that would be much appreciated. Thanks!

Copy link
Contributor

@wphicks wphicks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

I have a couple of questions since we don't have 4090's as part of of our general CI and testing process.

  1. After compiling the template project with this change, does it also run without error on the 4090 hardware?

  2. Did you also compile the libraft shared library from source or did it work with the pre-installed libraft from conda/pip?

Holding this open for now because sm_89 requires a larger discussion across RAPIDS infrastructure. RAPIDS only builds for sm_80 and sm_86, which should be able to run on sm_89 hardware but this has not been officially tested for RAFT yet. It will be helpful to know if it works for you.

Please feel free to build with this change in the meantime!

@cjnolet
Copy link
Member

cjnolet commented Jan 3, 2024

Should I modify the license as the checker recommend:

@JieFengWang an easy way to make sure your commits always pass the style checker is to use pre-commit (instructions).

Of course, you can always make the change manually but pre-commit tends to make the process much easier.

@cjnolet cjnolet added bug Something isn't working non-breaking Non-breaking change labels Jan 3, 2024
@JieFengWang
Copy link
Contributor Author

JieFengWang commented Jan 4, 2024

Thanks for the fix.

I have a couple of questions since we don't have 4090's as part of of our general CI and testing process.

  1. After compiling the template project with this change, does it also run without error on the 4090 hardware?
  2. Did you also compile the libraft shared library from source or did it work with the pre-installed libraft from conda/pip?

@cjnolet

  1. Yes, with this change, the CAGRA_EXAMPLE example runs smoothly without any error.
[100%] Linking CUDA executable CAGRA_EXAMPLE
[100%] Built target CAGRA_EXAMPLE
jfwang@ubuntu:~/proj/learn_raft$ ./build/CAGRA_EXAMPLE 
Building CAGRA index (search graph)
CAGRA index has 10000 vectors
CAGRA graph has degree 64, graph size [10000, 64]
Query 0 neighbor indices: =[4229,5859,4754,6069,7114,6124,6994,2289,5354,1864,5544,6589];
Query 0 neighbor distances: =[2169.71,2186.09,2205.57,2213.63,2216.48,2217.08,2223.38,2224.23,2224.99,2225.29,2226.21,2228.28];
Query 1 neighbor indices: =[4229,1864,5859,6069,8219,5354,4754,2289,4129,7114,5154,2309];
Query 1 neighbor distances: =[2206.34,2236.74,2237.02,2240.71,2246.73,2248.51,2248.52,2249.27,2249.59,2251.96,2255.32,2256.21];
  1. Before I made this change, I tried to install raft three times.
  • pre-installed libraft from mamba is not working.
    It shows an Error : "ptxas error : Value of threads per SM for entry ZN4raft9neighbors12experimental10nn_descent6detail17local_join_kernelIiNS3_12InternalID_tIiEEEEvPKT_S9_PK4int2S9_S9_SC_iPK6__halfiPT0_PfiPiSI is out of range. .minnctapersm will be ignored"
  • My first time compiling from source via cmake is compiled successfully but with a warning
    The warning info is the same as the previous error: "Value of threads per SM for entry ZN4raft9neighbors12experimental10nn_descent6detail17local_join_kernelIiNS3_12InternalID_tIiEEEEvPKT_S9_PK4int2S9_S9_SC_iPK6__halfiPT0_PfiPiSI is out of range."
  • My second compile attempt from the source
    The compiling is done, but running the CAGRA_EXAMPLE fails with the same error as the pre-installed libraft from mamba.
  1. additional info: before I made this change, compiling the template in isolation fails with the same error Value of threads per SM for entry ZN4raft9neighbors12experimental10nn_descent6detail17local_join_kernelIiNS3_12InternalID_tIiEEEEvPKT_S9_PK4int2S9_S9_SC_iPK6__halfiPT0_PfiPiSI is out of range. .minnctapersm will be ignored

@JieFengWang
Copy link
Contributor Author

@JieFengWang an easy way to make sure your commits always pass the style checker is to use pre-commit (instructions).

Of course, you can always make the change manually but pre-commit tends to make the process much easier.

Got it. I have modified it manually since I am new to pre-commit. I will get to learn it from now.

@wphicks
Copy link
Contributor

wphicks commented Jan 4, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Jan 4, 2024

Thanks for the summary, @JieFengWang. That's super helpful.

RAPIDS packages are built for sm_80 and sm_86 but not sm_89. Normally, sm_89 should support sm_86 ptx code but I believe since the nn-descent header is also being compiled with the template project, and the header is not explicitly checking for sm_89, there's a mismatch, which will throw the error when it goes to look up the symbols. Your changes do fix this, but only for sm_89 and will still fail if used with sm_87, for example.

Note that we check __CUDA_ARCH__ a lot throughout the RAFT codebase but it's almost never for strict equality. I think checking sm_75 for strict equality is okay since it's the only compute architecture that has 1024 max resident threads per SM. To catch everything from sm_86 to sm_89, let's do a simple range check instead:

#if (__CUDA_ARCH__ >= 860) && (__CUDA_ARCH__ < 900)

The question now is whether or not this change will work nicely when building against the RAFT pre-compiled library. I think we will need to test this also for completeness.

@JieFengWang
Copy link
Contributor Author

#if (__CUDA_ARCH__ >= 860) && (__CUDA_ARCH__ < 900)

The question now is whether or not this change will work nicely when building against the RAFT pre-compiled library. I think we will need to test this also for completeness.

@cjnolet Yes, is there anything else I can do next? I only have the RTX3090 (sm86) and RTX4090 (sm89) available for testing.

@cjnolet
Copy link
Member

cjnolet commented Jan 5, 2024

@JieFengWang I think a good initial test here would be to build and install libraft for the sm_86 architecture and then try building and running the example template for the sm_89 architecture (using the libraft that was built for sm_86).

If you are able to do this test with your 3090 and 4090 cards then that would be a massive help.

@JieFengWang
Copy link
Contributor Author

@JieFengWang I think a good initial test here would be to build and install libraft for the sm_86 architecture and then try building and running the example template for the sm_89 architecture (using the libraft that was built for sm_86).

If you are able to do this test with your 3090 and 4090 cards then that would be a massive help.

@cjnolet If installing libraft from source, this may be a bit tricky. Because the 3090 and 4090 are not on the same physical machine. Can I compile a python wheel file, then copy the .whl file from 3090 machine and to install it on 4090 machine. Or using the compiled libraft.so or libraft.a from 3090 to link it on 4090 machine?

@cjnolet
Copy link
Member

cjnolet commented Jan 5, 2024

@JieFengWang Got it. Actually, it's possible to build libraft for an architecture that isn't installed in the machine, so the easiest route might be to build and install libraft for sm_80 and sm_86, then use that version of libraft to build the example template for sm_89.

@JieFengWang
Copy link
Contributor Author

JieFengWang commented Jan 5, 2024

@JieFengWang Got it. Actually, it's possible to build libraft for an architecture that isn't installed in the machine, so the easiest route might be to build and install libraft for sm_80 and sm_86, then use that version of libraft to build the example template for sm_89.

@cjnolet Got it, but I may need your some help. I know how to set this architecture in simple CUDA projects via cmake set(CMAKE_CUDA_ARCHITECTURES 89) or nvcc nvcc -arch=sm_70 directly. But I am not familiar with raft's cmakelist files ie https://github.com/rapidsai/raft/blob/branch-24.02/cpp/CMakeLists.txt and https://github.com/rapidsai/raft/blob/branch-24.02/cpp/template/CMakeLists.txt. I don't know how to set the GPU architecture in this two files.
For example the 144-th to 158-th lines of cpp/CMakeLists.txt:

if(NOT BUILD_CPU_ONLY)
  # CUDA runtime
  rapids_cuda_init_runtime(USE_STATIC ${CUDA_STATIC_RUNTIME})
  # * find CUDAToolkit package
  # * determine GPU architectures
  # * enable the CMake CUDA language
  # * set other CUDA compilation flags
  rapids_find_package(
    CUDAToolkit REQUIRED
    BUILD_EXPORT_SET raft-exports
    INSTALL_EXPORT_SET raft-exports
  )
else()
  add_compile_definitions(BUILD_CPU_ONLY)
endif()

and the 27-th line of template's cmakelist

@cjnolet
Copy link
Member

cjnolet commented Jan 6, 2024

Hi @JieFengWang,

The easiest way to compile and install libraft might be to use the build.sh at the root of the repository. There's an option in the script to build for all supported RAPIDS architectures (instead of the one that happens to be installed in the machine).

./build.sh libraft --compile-lib --allgpuarch

Of course, this may take awhile to compile since it'll build for all of the supported architectures. You can modify the script and change where it sets RAFT_CMAKE_CUDA_ARCHITECTURES="RAPIDS" to sm_80;sm_86 instead.

If you still have trouble building, another thing we can try is to short circuit the build by installing the nightly of raft (through conda or pip) and then modifying the nn_descent header in place in the installed version. That might be the quickest way to test this.

@JieFengWang
Copy link
Contributor Author

The easiest way to compile and install libraft might be to use the build.sh at the root of the repository. There's an option in the script to build for all supported RAPIDS architectures (instead of the one that happens to be installed in the machine).

./build.sh libraft --compile-lib --allgpuarch

Of course, this may take awhile to compile since it'll build for all of the supported architectures. You can modify the script and change where it sets RAFT_CMAKE_CUDA_ARCHITECTURES="RAPIDS" to sm_80;sm_86 instead.

If you still have trouble building, another thing we can try is to short circuit the build by installing the nightly of raft (through conda or pip) and then modifying the nn_descent header in place in the installed version. That might be the quickest way to test this.

@cjnolet Sorry for being a bit busy in the last few days. Got it, I will test it ASAP.

@cjnolet
Copy link
Member

cjnolet commented Jan 10, 2024

/ok to test

@JieFengWang
Copy link
Contributor Author

If you still have trouble building, another thing we can try is to short circuit the build by installing the nightly of raft (through conda or pip) and then modifying the nn_descent header in place in the installed version. That might be the quickest way to test this.

Hi @cjnolet . This is my recent attempt. after I installed the nightly raft via mamba, the compile of template is fast and smoothly. But I have no place to change the nn_descent header since no raft source code in the _deps folder. Here is the detailed info:

  1. install the nightly of raft (through mamba)

    mamba install -c rapidsai-nightly -c conda-forge -c nvidia raft-dask pylibraft cuda-version=12.0
    • installation result

      Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0] on linux
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import pylibraft
      >>> pylibraft.__version__
      '24.02.00a62'
      >>>
  2. compile the template

    ./build.sh
    
    [ 33%] Built target IVF_FLAT_EXAMPLE
    [ 66%] Built target CAGRA_EXAMPLE
    [100%] Built target IVF_PQ_EXAMPLE
    • (myenv) jfwang@4090server:~/proj/template$ ./build/CAGRA_EXAMPLE 
      Building CAGRA index (search graph)
      CAGRA index has 10000 vectors
      CAGRA graph has degree 64, graph size [10000, 64]
      Query 0 neighbor indices: =[4229,5859,4754,6069,7114,6124,6994,2289,5354,1864,5544,6589];
      Query 0 neighbor distances: =[2169.71,2186.09,2205.57,2213.63,2216.48,2217.08,2223.38,2224.23,2224.99,2225.29,2226.21,2228.28];
      Query 1 neighbor indices: =[4229,1864,5859,6069,8219,5354,4754,2289,4129,7114,5154,2309];
      Query 1 neighbor distances: =[2206.34,2236.74,2237.02,2240.71,2246.73,2248.51,2248.52,2249.27,2249.59,2251.96,2255.32,2256.21];
      Query 2 neighbor indices: =[4229,5859,5354,6069,1864,7114,4754,5544,2309,2289,6589,1124];
      Query 2 neighbor distances: =[2302.99,2329.27,2334.12,2343.74,2345.25,2348.94,2353.94,2361.96,2365.6,2366.11,2366.88,2366.95];
      Query 3 neighbor indices: =[4229,1864,5859,2289,6069,5354,4754,2309,5544,7114,6589,5154];
      Query 3 neighbor distances: =[2164.26,2200.32,2206.3,2207.8,2213.33,2213.46,2214.19,2220.65,2227.93,2229.31,2229.85,2232.81];
      Query 4 neighbor indices: =[4229,2309,6069,5859,6589,1864,5354,4754,7114,4129,2289,6124];
      Query 4 neighbor distances: =[2189.25,2235.77,2237.34,2239.6,2245.1,2247.21,2248,2249.52,2264.84,2264.95,2266.2,2266.82];
      Query 5 neighbor indices: =[4229,5859,6069,1864,5354,5154,2289,6124,4754,2309,7114,5544];
      Query 5 neighbor distances: =[2179.1,2189.2,2197.68,2203.15,2205.7,2210.14,2215.63,2216.25,2221.2,2224.35,2230.57,2233.13];
      Query 6 neighbor indices: =[4229,5859,5354,6069,2309,9029,1864,2289,6124,6589,4754,5544];
      Query 6 neighbor distances: =[2168.45,2197.07,2209.82,2211.66,2214.77,2219.12,2220.03,2228.26,2231.18,2232.54,2234.5,2240.72];
      Query 7 neighbor indices: =[4229,5859,5354,1864,4754,2289,6069,6589,4129,2309,5154,8999];
      Query 7 neighbor distances: =[2260.95,2265.19,2284.81,2287.92,2298.18,2301.87,2312.19,2312.94,2313.89,2314.65,2318.54,2324.04];
      Query 8 neighbor indices: =[4229,1864,7114,6069,2309,4754,2289,5859,6124,6994,5354,9029];
      Query 8 neighbor distances: =[2251.12,2284.71,2287.42,2287.58,2289.47,2291.78,2294.65,2295.96,2305.73,2306.75,2308.16,2309.62];
      Query 9 neighbor indices: =[4229,5859,5354,6069,2309,7114,6589,1864,4754,9029,6124,5109];
      Query 9 neighbor distances: =[2221.84,2244.71,2247.43,2260.68,2266.57,2267.68,2270.96,2271.6,2279.38,2280.96,2282.17,2283.62];

The compile is fast and smoothly. Unlink my previous attempt (compile with only the template folder, no installation from source code , no installation from /conda/pip/mamba) , there is no full raft souce code downloaded at template/build/_deps/rapids-cmake-src . I think it should caused by I have install the nightly libraft via mamba. So I have no place to change the nn_descent.cuh 's header file.

@cjnolet
Copy link
Member

cjnolet commented Jan 16, 2024

Thanks for checking this @JieFengWang. We've had a few folks verify this internally as well and I think we're good aceept it. Thanks for the contribution!

@cjnolet
Copy link
Member

cjnolet commented Jan 16, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Jan 17, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Jan 17, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Jan 17, 2024

/merge

@rapids-bot rapids-bot bot merged commit 1e4961e into rapidsai:branch-24.02 Jan 17, 2024
61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cpp non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

3 participants