Fix compile failure on RTX 4090 #2076

JieFengWang · 2024-01-03T03:12:19Z

[bug]Fix compile failure on RTX 4090. related issue (#2073)

copy-pr-bot · 2024-01-03T03:12:23Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cjnolet · 2024-01-03T04:41:57Z

/ok to test

JieFengWang · 2024-01-03T06:59:14Z

/ok to test

Hi, cjnolet. I see a pr / checks / check-style (push) failure due to the license. Should I modify the license as the checker recommend:

- * Copyright (c) 2023, NVIDIA CORPORATION.
+ * Copyright (c) 2023-2024, NVIDIA CORPORATION.

or waiting for the raft group to replace all the copyright information?

wphicks · 2024-01-03T18:57:07Z

@JieFengWang Yes, if you could go ahead and update the header, that would be much appreciated. Thanks!

wphicks

LGTM

cjnolet

Thanks for the fix.

I have a couple of questions since we don't have 4090's as part of of our general CI and testing process.

After compiling the template project with this change, does it also run without error on the 4090 hardware?
Did you also compile the libraft shared library from source or did it work with the pre-installed libraft from conda/pip?

Holding this open for now because sm_89 requires a larger discussion across RAPIDS infrastructure. RAPIDS only builds for sm_80 and sm_86, which should be able to run on sm_89 hardware but this has not been officially tested for RAFT yet. It will be helpful to know if it works for you.

Please feel free to build with this change in the meantime!

cjnolet · 2024-01-03T19:59:10Z

Should I modify the license as the checker recommend:

@JieFengWang an easy way to make sure your commits always pass the style checker is to use pre-commit (instructions).

Of course, you can always make the change manually but pre-commit tends to make the process much easier.

JieFengWang · 2024-01-04T01:51:42Z

Thanks for the fix.

I have a couple of questions since we don't have 4090's as part of of our general CI and testing process.

After compiling the template project with this change, does it also run without error on the 4090 hardware?

Did you also compile the libraft shared library from source or did it work with the pre-installed libraft from conda/pip?

@cjnolet

Yes, with this change, the CAGRA_EXAMPLE example runs smoothly without any error.

[100%] Linking CUDA executable CAGRA_EXAMPLE
[100%] Built target CAGRA_EXAMPLE
jfwang@ubuntu:~/proj/learn_raft$ ./build/CAGRA_EXAMPLE 
Building CAGRA index (search graph)
CAGRA index has 10000 vectors
CAGRA graph has degree 64, graph size [10000, 64]
Query 0 neighbor indices: =[4229,5859,4754,6069,7114,6124,6994,2289,5354,1864,5544,6589];
Query 0 neighbor distances: =[2169.71,2186.09,2205.57,2213.63,2216.48,2217.08,2223.38,2224.23,2224.99,2225.29,2226.21,2228.28];
Query 1 neighbor indices: =[4229,1864,5859,6069,8219,5354,4754,2289,4129,7114,5154,2309];
Query 1 neighbor distances: =[2206.34,2236.74,2237.02,2240.71,2246.73,2248.51,2248.52,2249.27,2249.59,2251.96,2255.32,2256.21];

Before I made this change, I tried to install raft three times.

pre-installed libraft from mamba is not working.
It shows an Error : "ptxas error : Value of threads per SM for entry ZN4raft9neighbors12experimental10nn_descent6detail17local_join_kernelIiNS3_12InternalID_tIiEEEEvPKT_S9_PK4int2S9_S9_SC_iPK6__halfiPT0_PfiPiSI is out of range. .minnctapersm will be ignored"
My first time compiling from source via cmake is compiled successfully but with a warning
The warning info is the same as the previous error: "Value of threads per SM for entry ZN4raft9neighbors12experimental10nn_descent6detail17local_join_kernelIiNS3_12InternalID_tIiEEEEvPKT_S9_PK4int2S9_S9_SC_iPK6__halfiPT0_PfiPiSI is out of range."
My second compile attempt from the source
The compiling is done, but running the CAGRA_EXAMPLE fails with the same error as the pre-installed libraft from mamba.

additional info: before I made this change, compiling the template in isolation fails with the same error Value of threads per SM for entry ZN4raft9neighbors12experimental10nn_descent6detail17local_join_kernelIiNS3_12InternalID_tIiEEEEvPKT_S9_PK4int2S9_S9_SC_iPK6__halfiPT0_PfiPiSI is out of range. .minnctapersm will be ignored

…sue (rapidsai#2073) Signed-off-by: jeff <[email protected]>

JieFengWang · 2024-01-04T06:55:33Z

@JieFengWang an easy way to make sure your commits always pass the style checker is to use pre-commit (instructions).

Of course, you can always make the change manually but pre-commit tends to make the process much easier.

Got it. I have modified it manually since I am new to pre-commit. I will get to learn it from now.

wphicks · 2024-01-04T18:02:08Z

/ok to test

cjnolet · 2024-01-04T21:29:28Z

Thanks for the summary, @JieFengWang. That's super helpful.

RAPIDS packages are built for sm_80 and sm_86 but not sm_89. Normally, sm_89 should support sm_86 ptx code but I believe since the nn-descent header is also being compiled with the template project, and the header is not explicitly checking for sm_89, there's a mismatch, which will throw the error when it goes to look up the symbols. Your changes do fix this, but only for sm_89 and will still fail if used with sm_87, for example.

Note that we check __CUDA_ARCH__ a lot throughout the RAFT codebase but it's almost never for strict equality. I think checking sm_75 for strict equality is okay since it's the only compute architecture that has 1024 max resident threads per SM. To catch everything from sm_86 to sm_89, let's do a simple range check instead:

#if (__CUDA_ARCH__ >= 860) && (__CUDA_ARCH__ < 900)

The question now is whether or not this change will work nicely when building against the RAFT pre-compiled library. I think we will need to test this also for completeness.

JieFengWang · 2024-01-05T02:51:08Z

#if (__CUDA_ARCH__ >= 860) && (__CUDA_ARCH__ < 900)
The question now is whether or not this change will work nicely when building against the RAFT pre-compiled library. I think we will need to test this also for completeness.

@cjnolet Yes, is there anything else I can do next? I only have the RTX3090 (sm86) and RTX4090 (sm89) available for testing.

cjnolet · 2024-01-05T03:47:03Z

@JieFengWang I think a good initial test here would be to build and install libraft for the sm_86 architecture and then try building and running the example template for the sm_89 architecture (using the libraft that was built for sm_86).

If you are able to do this test with your 3090 and 4090 cards then that would be a massive help.

JieFengWang · 2024-01-05T04:25:35Z

@JieFengWang I think a good initial test here would be to build and install libraft for the sm_86 architecture and then try building and running the example template for the sm_89 architecture (using the libraft that was built for sm_86).

If you are able to do this test with your 3090 and 4090 cards then that would be a massive help.

@cjnolet If installing libraft from source, this may be a bit tricky. Because the 3090 and 4090 are not on the same physical machine. Can I compile a python wheel file, then copy the .whl file from 3090 machine and to install it on 4090 machine. Or using the compiled libraft.so or libraft.a from 3090 to link it on 4090 machine?

cjnolet · 2024-01-05T04:34:26Z

@JieFengWang Got it. Actually, it's possible to build libraft for an architecture that isn't installed in the machine, so the easiest route might be to build and install libraft for sm_80 and sm_86, then use that version of libraft to build the example template for sm_89.

JieFengWang · 2024-01-05T07:15:03Z

@JieFengWang Got it. Actually, it's possible to build libraft for an architecture that isn't installed in the machine, so the easiest route might be to build and install libraft for sm_80 and sm_86, then use that version of libraft to build the example template for sm_89.

@cjnolet Got it, but I may need your some help. I know how to set this architecture in simple CUDA projects via cmake set(CMAKE_CUDA_ARCHITECTURES 89) or nvcc nvcc -arch=sm_70 directly. But I am not familiar with raft's cmakelist files ie https://github.com/rapidsai/raft/blob/branch-24.02/cpp/CMakeLists.txt and https://github.com/rapidsai/raft/blob/branch-24.02/cpp/template/CMakeLists.txt. I don't know how to set the GPU architecture in this two files.
For example the 144-th to 158-th lines of cpp/CMakeLists.txt:

if(NOT BUILD_CPU_ONLY)
  # CUDA runtime
  rapids_cuda_init_runtime(USE_STATIC ${CUDA_STATIC_RUNTIME})
  # * find CUDAToolkit package
  # * determine GPU architectures
  # * enable the CMake CUDA language
  # * set other CUDA compilation flags
  rapids_find_package(
    CUDAToolkit REQUIRED
    BUILD_EXPORT_SET raft-exports
    INSTALL_EXPORT_SET raft-exports
  )
else()
  add_compile_definitions(BUILD_CPU_ONLY)
endif()

and the 27-th line of template's cmakelist

cjnolet · 2024-01-06T17:25:34Z

Hi @JieFengWang,

The easiest way to compile and install libraft might be to use the build.sh at the root of the repository. There's an option in the script to build for all supported RAPIDS architectures (instead of the one that happens to be installed in the machine).

./build.sh libraft --compile-lib --allgpuarch

Of course, this may take awhile to compile since it'll build for all of the supported architectures. You can modify the script and change where it sets RAFT_CMAKE_CUDA_ARCHITECTURES="RAPIDS" to sm_80;sm_86 instead.

If you still have trouble building, another thing we can try is to short circuit the build by installing the nightly of raft (through conda or pip) and then modifying the nn_descent header in place in the installed version. That might be the quickest way to test this.

cpp/include/raft/neighbors/detail/nn_descent.cuh

Co-authored-by: Corey J. Nolet <[email protected]>

JieFengWang · 2024-01-10T01:32:02Z

The easiest way to compile and install libraft might be to use the build.sh at the root of the repository. There's an option in the script to build for all supported RAPIDS architectures (instead of the one that happens to be installed in the machine).
./build.sh libraft --compile-lib --allgpuarch
Of course, this may take awhile to compile since it'll build for all of the supported architectures. You can modify the script and change where it sets RAFT_CMAKE_CUDA_ARCHITECTURES="RAPIDS" to sm_80;sm_86 instead.

If you still have trouble building, another thing we can try is to short circuit the build by installing the nightly of raft (through conda or pip) and then modifying the nn_descent header in place in the installed version. That might be the quickest way to test this.

@cjnolet Sorry for being a bit busy in the last few days. Got it, I will test it ASAP.

cjnolet · 2024-01-10T14:42:24Z

/ok to test

JieFengWang · 2024-01-15T03:31:18Z

If you still have trouble building, another thing we can try is to short circuit the build by installing the nightly of raft (through conda or pip) and then modifying the nn_descent header in place in the installed version. That might be the quickest way to test this.

Hi @cjnolet . This is my recent attempt. after I installed the nightly raft via mamba, the compile of template is fast and smoothly. But I have no place to change the nn_descent header since no raft source code in the _deps folder. Here is the detailed info:

install the nightly of raft (through mamba)

mamba install -c rapidsai-nightly -c conda-forge -c nvidia raft-dask pylibraft cuda-version=12.0

installation result

Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pylibraft
>>> pylibraft.__version__
'24.02.00a62'
>>>

compile the template

./build.sh

[ 33%] Built target IVF_FLAT_EXAMPLE
[ 66%] Built target CAGRA_EXAMPLE
[100%] Built target IVF_PQ_EXAMPLE

(myenv) jfwang@4090server:~/proj/template$ ./build/CAGRA_EXAMPLE 
Building CAGRA index (search graph)
CAGRA index has 10000 vectors
CAGRA graph has degree 64, graph size [10000, 64]
Query 0 neighbor indices: =[4229,5859,4754,6069,7114,6124,6994,2289,5354,1864,5544,6589];
Query 0 neighbor distances: =[2169.71,2186.09,2205.57,2213.63,2216.48,2217.08,2223.38,2224.23,2224.99,2225.29,2226.21,2228.28];
Query 1 neighbor indices: =[4229,1864,5859,6069,8219,5354,4754,2289,4129,7114,5154,2309];
Query 1 neighbor distances: =[2206.34,2236.74,2237.02,2240.71,2246.73,2248.51,2248.52,2249.27,2249.59,2251.96,2255.32,2256.21];
Query 2 neighbor indices: =[4229,5859,5354,6069,1864,7114,4754,5544,2309,2289,6589,1124];
Query 2 neighbor distances: =[2302.99,2329.27,2334.12,2343.74,2345.25,2348.94,2353.94,2361.96,2365.6,2366.11,2366.88,2366.95];
Query 3 neighbor indices: =[4229,1864,5859,2289,6069,5354,4754,2309,5544,7114,6589,5154];
Query 3 neighbor distances: =[2164.26,2200.32,2206.3,2207.8,2213.33,2213.46,2214.19,2220.65,2227.93,2229.31,2229.85,2232.81];
Query 4 neighbor indices: =[4229,2309,6069,5859,6589,1864,5354,4754,7114,4129,2289,6124];
Query 4 neighbor distances: =[2189.25,2235.77,2237.34,2239.6,2245.1,2247.21,2248,2249.52,2264.84,2264.95,2266.2,2266.82];
Query 5 neighbor indices: =[4229,5859,6069,1864,5354,5154,2289,6124,4754,2309,7114,5544];
Query 5 neighbor distances: =[2179.1,2189.2,2197.68,2203.15,2205.7,2210.14,2215.63,2216.25,2221.2,2224.35,2230.57,2233.13];
Query 6 neighbor indices: =[4229,5859,5354,6069,2309,9029,1864,2289,6124,6589,4754,5544];
Query 6 neighbor distances: =[2168.45,2197.07,2209.82,2211.66,2214.77,2219.12,2220.03,2228.26,2231.18,2232.54,2234.5,2240.72];
Query 7 neighbor indices: =[4229,5859,5354,1864,4754,2289,6069,6589,4129,2309,5154,8999];
Query 7 neighbor distances: =[2260.95,2265.19,2284.81,2287.92,2298.18,2301.87,2312.19,2312.94,2313.89,2314.65,2318.54,2324.04];
Query 8 neighbor indices: =[4229,1864,7114,6069,2309,4754,2289,5859,6124,6994,5354,9029];
Query 8 neighbor distances: =[2251.12,2284.71,2287.42,2287.58,2289.47,2291.78,2294.65,2295.96,2305.73,2306.75,2308.16,2309.62];
Query 9 neighbor indices: =[4229,5859,5354,6069,2309,7114,6589,1864,4754,9029,6124,5109];
Query 9 neighbor distances: =[2221.84,2244.71,2247.43,2260.68,2266.57,2267.68,2270.96,2271.6,2279.38,2280.96,2282.17,2283.62];

The compile is fast and smoothly. Unlink my previous attempt (compile with only the template folder, no installation from source code , no installation from /conda/pip/mamba) , there is no full raft souce code downloaded at template/build/_deps/rapids-cmake-src . I think it should caused by I have install the nightly libraft via mamba. So I have no place to change the nn_descent.cuh 's header file.

cjnolet · 2024-01-16T21:05:33Z

Thanks for checking this @JieFengWang. We've had a few folks verify this internally as well and I think we're good aceept it. Thanks for the contribution!

cjnolet · 2024-01-16T23:02:25Z

/ok to test

cjnolet · 2024-01-17T01:43:57Z

/ok to test

cjnolet · 2024-01-17T10:15:47Z

/ok to test

cjnolet · 2024-01-17T10:28:36Z

/merge

JieFengWang requested a review from a team as a code owner January 3, 2024 03:12

github-actions bot added the cpp label Jan 3, 2024

wphicks approved these changes Jan 3, 2024

View reviewed changes

cjnolet requested changes Jan 3, 2024

View reviewed changes

cjnolet assigned JieFengWang Jan 3, 2024

cjnolet added bug Something isn't working non-breaking Non-breaking change labels Jan 3, 2024

Fix compile failure on RTX 4090 and update Copyright year. related is…

bb6aca8

…sue (rapidsai#2073) Signed-off-by: jeff <[email protected]>

JieFengWang force-pushed the branch-24.02 branch from bac6351 to bb6aca8 Compare January 4, 2024 01:58

Merge branch 'branch-24.02' into branch-24.02

6cbc24e

wphicks mentioned this pull request Jan 5, 2024

Allow RAFT to compile and run successfully on sm_89 wphicks/raft#4

Merged

cjnolet requested changes Jan 9, 2024

View reviewed changes

cpp/include/raft/neighbors/detail/nn_descent.cuh Outdated Show resolved Hide resolved

Update cpp/include/raft/neighbors/detail/nn_descent.cuh

661f637

Co-authored-by: Corey J. Nolet <[email protected]>

Merge branch 'branch-24.02' into branch-24.02

068f546

cjnolet approved these changes Jan 16, 2024

View reviewed changes

Merge branch 'branch-24.02' into branch-24.02

56efa5d

Merge branch 'branch-24.02' into branch-24.02

10aaa80

rapids-bot bot merged commit 1e4961e into rapidsai:branch-24.02 Jan 17, 2024
61 checks passed

JieFengWang mentioned this pull request Jan 18, 2024

[BUG] Compile the template Fails with RTX 4090 #2073

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix compile failure on RTX 4090 #2076

Fix compile failure on RTX 4090 #2076

JieFengWang commented Jan 3, 2024 •

edited

Loading

copy-pr-bot bot commented Jan 3, 2024

cjnolet commented Jan 3, 2024

JieFengWang commented Jan 3, 2024

wphicks commented Jan 3, 2024

wphicks left a comment

cjnolet left a comment •

edited

Loading

cjnolet commented Jan 3, 2024

JieFengWang commented Jan 4, 2024 •

edited

Loading

JieFengWang commented Jan 4, 2024

wphicks commented Jan 4, 2024

cjnolet commented Jan 4, 2024 •

edited

Loading

JieFengWang commented Jan 5, 2024

cjnolet commented Jan 5, 2024 •

edited

Loading

JieFengWang commented Jan 5, 2024

cjnolet commented Jan 5, 2024

JieFengWang commented Jan 5, 2024 •

edited

Loading

cjnolet commented Jan 6, 2024 •

edited

Loading

JieFengWang commented Jan 10, 2024

cjnolet commented Jan 10, 2024

JieFengWang commented Jan 15, 2024

cjnolet commented Jan 16, 2024

cjnolet commented Jan 16, 2024

cjnolet commented Jan 17, 2024

cjnolet commented Jan 17, 2024

cjnolet commented Jan 17, 2024

Fix compile failure on RTX 4090 #2076

Fix compile failure on RTX 4090 #2076

Conversation

JieFengWang commented Jan 3, 2024 • edited Loading

copy-pr-bot bot commented Jan 3, 2024

cjnolet commented Jan 3, 2024

JieFengWang commented Jan 3, 2024

wphicks commented Jan 3, 2024

wphicks left a comment

Choose a reason for hiding this comment

cjnolet left a comment • edited Loading

Choose a reason for hiding this comment

cjnolet commented Jan 3, 2024

JieFengWang commented Jan 4, 2024 • edited Loading

JieFengWang commented Jan 4, 2024

wphicks commented Jan 4, 2024

cjnolet commented Jan 4, 2024 • edited Loading

JieFengWang commented Jan 5, 2024

cjnolet commented Jan 5, 2024 • edited Loading

JieFengWang commented Jan 5, 2024

cjnolet commented Jan 5, 2024

JieFengWang commented Jan 5, 2024 • edited Loading

cjnolet commented Jan 6, 2024 • edited Loading

JieFengWang commented Jan 10, 2024

cjnolet commented Jan 10, 2024

JieFengWang commented Jan 15, 2024

cjnolet commented Jan 16, 2024

cjnolet commented Jan 16, 2024

cjnolet commented Jan 17, 2024

cjnolet commented Jan 17, 2024

cjnolet commented Jan 17, 2024

JieFengWang commented Jan 3, 2024 •

edited

Loading

cjnolet left a comment •

edited

Loading

JieFengWang commented Jan 4, 2024 •

edited

Loading

cjnolet commented Jan 4, 2024 •

edited

Loading

cjnolet commented Jan 5, 2024 •

edited

Loading

JieFengWang commented Jan 5, 2024 •

edited

Loading

cjnolet commented Jan 6, 2024 •

edited

Loading