Draft: Factor build/link/use of VecGeom/CUDA using libraries for simplicity #279

drbenmorgan · 2024-02-29T15:43:50Z

Due to VecGeom's legacy CUDA design and the exposure of device functions in public APIs of libraries, ensuring correct singular device code links in final executables is, and continues to be, awkward. This largely means reproducing VecGeom's pattern of shared/static/device linked libraries downstream so that usage can be appropriately tracked and the correct linking done at each stage. A suitable CMake module that packages all of the needed functionality is available from the Celeritas project where it's also been tested across a range of use cases and is therefore now pretty robust (as it can be).

This PR imports this CMake module into AdePT (it's a completely standalone file) and ports the build, link and use of AdePT to use it. The primary changes are:

Provide a FindVecGeom wrapper module to add RDC properties to VecGeom targets (to possibly be upstreamed in VecGeom)
Replace target_XXX calls with cuda_rdc_XXX equivalents
Remove obsolete AdePT_cuda and AdePT_standalone libraries that CudaRdcUtils build and handles automatically locally and in downstream use.
Install FindVecGeom and CudaRdcUtils for use by downstream clients

I've tested this locally on a Rocky 9/GCC 11/Cuda 12 system with an underlying VecGeom install with static libs and CUDA enabled. The build and tests all run fine and without problem. I've also been able to take one of the examples, recast it into a standalone project and build and run it against an install of AdePT. However, CI and wider use may show up additional bugs...

Pinging @pcanal and @sethrj for info and any additional comments they want to make.

phsft-bot · 2024-02-29T15:46:01Z

Can one of the admins verify this patch?

pcanal · 2024-02-29T15:54:11Z

For simplicity of future update to CudaRdcUtils.cmake, the git commit log should indicate which commit from either the Celeritas repository or the CudaRdc repository the file was copied from.

pcanal · 2024-02-29T16:06:34Z

test/CMakeLists.txt

@@ -8,8 +8,8 @@ macro(build_tests TESTS)
  foreach(TEST ${TESTS})
    get_filename_component(TARGET_NAME ${TEST} NAME_WE)
    add_executable(${TARGET_NAME} ${TEST})
-    set_target_properties(${TARGET_NAME} PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
-    target_link_libraries(${TARGET_NAME} AdePT_cuda AdePT_G4_integration AdePT)
+    # NB: We do not use PRIVATE for executables due to an apparent limitation in CudaRDCUtils


What was the symptoms/issues?

The output from the cmake step if adding PRIVATE here:

CMake Error at cmake/CudaRdcUtils.cmake:850 (target_link_libraries): The keyword signature for target_link_libraries has already been used with the target "test_atomic". All uses of target_link_libraries with a target must be either all-keyword or all-plain. The uses of the keyword signature are here: * cmake/CudaRdcUtils.cmake:738 (target_link_libraries) Call Stack (most recent call first): test/CMakeLists.txt:12 (cuda_rdc_target_link_libraries) test/CMakeLists.txt:76 (build_tests)

The target_link_libraries at 850 is "bare" and could probably use a direct PRIVATE though I do recall there are use cases for PUBLIC linking of executables (plugins?) if only for CPU.

Is there an actual difference on the link line of the executable between the 2 settings? (I naively thought it would only be a setting for dependency propagation in CMake and thus 'useless' for executable).

I've never observed a difference, though I wondered if there's some connection to symbol visibility/export. We can probably safely ignore any difference for now given the scope in which the RDC linking is required.

JuanGonzalezCaminero · 2024-02-29T16:09:44Z

README.md

-                          <TARGET INCLUDE DIRECTORIES>
-                          ${AdePT_INCLUDE_DIRS})
+```cmake
+include(CudaRdcUtils)


Could this be done transparently in AdePTConfig.cmake.in?

I'd prefer to enforce an explicit include for now for clarity in downstream use It also matches the Celeritas pattern of use. We can of course review (in both) in future.

agheata · 2024-03-01T14:37:32Z

@drbenmorgan , both me an @JuanGonzalezCaminero are getting locally linking errors with this branch. I have the VecGeom master and cmake 3.28.3

[ 47%] Linking CXX executable ../BuildProducts/bin/test_atomic
/usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::ReturnTracks(std::vector<adeptint::TrackData, std::allocator<adeptint::TrackData> >*, int)'
/usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::ProcessGPUHits(HostScoring&, HostScoring::Stats&)'
/usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::InitScoringData(adeptint::VolAuxData*)'
/usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::GetUniformFieldZ()'
/usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::InitVolAuxData(adeptint::VolAuxData*, G4HepEmState*, bool, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)'
/usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::CheckGeometry(G4HepEmState*)'

However, I checked and the symbols are defined in libAdePT_G4_integration.so

0000000000071410 T AdePTGeant4Integration::ReturnTracks(std::vector<adeptint::TrackData, std::allocator<adeptint::TrackData> >*, int)

The error is the same if I change manually the link order of libAdePT.so and libAdePT_G4_integration.so
Any idea?

agheata · 2024-03-01T14:48:57Z

also getting when linking example1:

make[1]: *** [CMakeFiles/Makefile2:1376: test/CMakeFiles/test_atomic.dir/all] Error 2
/usr/bin/ld: /home/agheata/geant_src/VecGeom_install/lib/libvecgeom.a(UnplacedCone.cpp.o): undefined reference to symbol '_ZNK7vecgeom3cxx9DevicePtrINS_4cuda13SUnplacedConeINS2_9ConeTypes13UniversalConeEEEE9ConstructIJdddddddEEEvDpT_'
/usr/bin/ld: /home/agheata/geant_src/VecGeom_install/lib/libvecgeomcuda.so: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make[2]: *** [examples/Example1/CMakeFiles/example1.dir/build.make:392: BuildProducts/bin/example1] Error 1

drbenmorgan · 2024-03-01T14:49:45Z

@agheata, I've been chatting to @JuanGonzalezCaminero on Mattermost about this - I'm in the process of rebuilding my local stack with the newer CUDA and CMake that he has to see if I can reproduce. What I am now spotting is there is a undeclared (to CMake) circular dependence between libAdePT and libAdePT_G4_integration....

drbenmorgan · 2024-03-01T14:51:31Z

@pcanal may have insight on the

/usr/bin/ld: /home/agheata/geant_src/VecGeom_install/lib/libvecgeomcuda.so: error adding symbols: DSO missing from command line

error. I believe was seen as well in Celeritas and patched?

drbenmorgan · 2024-03-01T15:38:17Z

One other thing to check: what is the full list of Vecgeom libraries in your installs (e.g. under /home/agheata/geant_src/VecGeom_install/lib)? I've tried locally with both static and shared builds and am simply not able to reproduce. Could you also let me know the versions of VecGeom and Geant4 (down to commit) you are building against, and the exact set of build options used to configure and install them please?

There's some weird difference or specific configuration setup that's causing these problems for you but not me or the CI system.

pcanal · 2024-03-01T23:08:01Z

@pcanal may have insight on the
/usr/bin/ld: /home/agheata/geant_src/VecGeom_install/lib/libvecgeomcuda.so: error adding symbols: DSO missing from command line
error. I believe was seen as well in Celeritas and patched?

Yes we have seem similar but slightly different. The fix was celeritas-project/celeritas@03b449a (#1102). A description of the problem can be found at: https://zhangboyi.gitlab.io/post/2020-09-14-resolve-dso-missing-from-command-line-error/

The bottom-line is that you might need to add explicitly more library to the link library list or .... something is messed up in the installation (eg. installation of both shared and static version of VecGeom in the same area or .... )

agheata · 2024-03-04T08:17:01Z

there is a undeclared (to CMake) circular dependence between libAdePT and libAdePT_G4_integration....

Indeed, we are looking now into how to break the dependency of lib AdePT on libAdePT_G4_integration

agheata · 2024-03-04T08:45:14Z

One other thing to check: what is the full list of Vecgeom libraries in your installs (e.g. under /home/agheata/geant_src/VecGeom_install/lib)?

libvecgeom.a libvecgeomcuda.so libvecgeomcuda_static.a libvgdml.a

Could you also let me know the versions of VecGeom and Geant4 (down to commit) you are building against, and the exact set of build options used to configure and install them please?
* 89ec50d4 (HEAD -> master, origin/master, origin/HEAD) VECGEOM-618 : Added new definition of SafetyToOut to fix SafetyToOut issue as mentioned in JIRA issue-618 [Raman Sehgal]

cmake  $GEANT_ROOT/VecGeom \
         -DCMAKE_INSTALL_PREFIX="$INSTALL_FOLDER" \
         -DCMAKE_PREFIX_PATH="$VECCORE_ROOT" \
         -DVECGEOM_GDML=ON \
         -DVECGEOM_ROOT=OFF \
         -DVECGEOM_GEANT4=OFF \
         -DVECGEOM_ENABLE_CUDA=ON \
         -DVECGEOM_BACKEND=Scalar \
         -DVECGEOM_NO_SPECIALIZATION=ON \
         -DVECGEOM_VECTOR=avx2 \
         -DVECGEOM_TEST_BENCHMARK=ON \
         -DVECGEOM_TEST_COVERAGE=OFF \
         -DCMAKE_BUILD_TYPE=$BUILDMODE \
         -DCMAKE_CUDA_ARCHITECTURES=75 \
         -DUSE_CACHED_TRANSFORMATIONS=ON \
         -DCMAKE_CXX_STANDARD=17 \
         -DCMAKE_CUDA_STANDARD=17 \
         -DEXAMPLES=ON

and for AdePT:
Geant4 v11.2.0

cmake -S. -B./adept-build/${VGMODE} \
      -DCMAKE_INSTALL_PREFIX="$INSTALL_FOLDER" \
      -DCMAKE_BUILD_TYPE=$BUILDMODE \
      -DCMAKE_PREFIX_PATH="$VECCORE_ROOT;$VECGEOMROOT/lib/cmake/VecGeom;$HEPMC3_ROOT;$G4INSTALL;$G4HepEm_INSTALL" \
      -DCMAKE_CUDA_ARCHITECTURES=75 \
      -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
      -DADEPT_USE_SURF=OFF \
      -DBENCHMARK=ON \
      -DBUILD_TESTING=ON

drbenmorgan · 2024-03-05T11:01:23Z

Thanks for the info @agheata, I think the crucial thing is that we have the same VecGeom library structure here, i.e. only a static libvecgeom. Let's take one step back to the error you reported with the build of test_atomic:

> [ 47%] Linking CXX executable ../BuildProducts/bin/test_atomic
> /usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::ReturnTracks(std::vector<adeptint::TrackData, std::allocator<adeptint::TrackData> >*, int)'
> /usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::ProcessGPUHits(HostScoring&, HostScoring::Stats&)'
> /usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::InitScoringData(adeptint::VolAuxData*)'
> /usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::GetUniformFieldZ()'
> /usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::InitVolAuxData(adeptint::VolAuxData*, G4HepEmState*, bool, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)'
> /usr/bin/ld: ../BuildProducts/lib/libAdePT.so: undefined reference to `AdePTGeant4Integration::CheckGeometry(G4HepEmState*)'

Could you run:

$ make clean && make VERBOSE=1 test_atomic

and post the compile/link commands for the source file and end executable (should be the last two) please? For comparison, with my local, working build, I get (paths reduced for clarity):

[13/14] /.../nvcc -forward-unknown-to-host-compiler -O2 -g -DNDEBUG "--generate-code=arch=compute_86,code=[compute_86,sm_86]" /.../BuildProducts/lib64/libAdePT_static.a /.../lib64/libvecgeomcuda_static.a -Xnvlink --suppress-stack-size-warning -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink test/CMakeFiles/test_atomic.dir/test_atomic.cu.o -o CMakeFiles/test_atomic.dir/cmake_device_link.o -L/.../lib/stubs  -L/...cuda.../lib /.../libvgdml.a /.../libvecgeom.a -lrt -lpthread -ldl  -lcudadevrt -lcudart

[14/14] : && /usr/bin/c++ -O2 -g -DNDEBUG  test/CMakeFiles/test_atomic.dir/test_atomic.cu.o CMakeFiles/test_atomic.dir/cmake_device_link.o -o BuildProducts/bin/test_atomic -L/...cuda.../stubs   -L/...cuda.../lib -Wl,-rpath,/.../  BuildProducts/lib64/libAdePT_G4_integration.so  BuildProducts/lib64/libAdePT.so  /.../libvecgeomcuda.so  /.../libvgdml.a  /.../libvecgeom.a  -lrt  -lpthread  -ldl  /...geant4, g4hepemlibs.../  -lcudadevrt  -lcudart

JuanGonzalezCaminero · 2024-03-05T11:21:21Z

This is what I get for test_atomic @drbenmorgan

[100%] Linking CUDA device code CMakeFiles/test_atomic.dir/cmake_device_link.o
cd /home/gjuan/Dev/G4/AdePT_Master/build/test && /usr/bin/cmake -E cmake_link_script CMakeFiles/test_atomic.dir/dlink.txt --verbose=1
/usr/local/cuda-12.3/bin/nvcc -forward-unknown-to-host-compiler -O3 -DNDEBUG "--generate-code=arch=compute_89,code=[compute_89,sm_89]" /.../libAdePT_static.a /.../libvecgeomcuda_static.a -Xnvlink --suppress-stack-size-warning -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink --options-file CMakeFiles/test_atomic.dir/deviceObjects1.rsp -o CMakeFiles/test_atomic.dir/cmake_device_link.o --options-file CMakeFiles/test_atomic.dir/deviceLinkLibs.rsp

[100%] Linking CXX executable ../BuildProducts/bin/test_atomic
cd /home/gjuan/Dev/G4/AdePT_Master/build/test && /usr/bin/cmake -E cmake_link_script CMakeFiles/test_atomic.dir/link.txt --verbose=1
/usr/bin/c++ -O3 -DNDEBUG CMakeFiles/test_atomic.dir/test_atomic.cu.o CMakeFiles/test_atomic.dir/cmake_device_link.o -o ../BuildProducts/bin/test_atomic   -L/usr/local/cuda-12.3/targets/x86_64-linux/lib/stubs  -L/usr/local/cuda-12.3/targets/x86_64-linux/lib  -L/usr/lib/gcc/x86_64-linux-gnu/12  -Wl,-rpath,/.../ ../BuildProducts/lib/libAdePT_G4_integration.so ../BuildProducts/lib/libAdePT.so /.../libvecgeomcuda.so /.../libvgdml.a /.../libvecgeom.a -lrt -lpthread -ldl /.../libG4Tree.so /.../libG4FR.so /.../libG4GMocren.so /.../libG4visHepRep.so /.../libG4RayTracer.so /.../libG4VRML.so /.../libG4ToolsSG.so /.../libG4vis_management.so /.../libG4modeling.so /.../libG4interfaces.so /.../libG4persistency.so /usr/lib/x86_64-linux-gnu/libxerces-c.so /.../libG4error_propagation.so /.../libG4readout.so /.../libG4physicslists.so /.../libG4run.so /.../libG4parmodels.so /.../libg4HepEm.so /.../libG4event.so /.../libG4tracking.so /.../libg4HepEmInit.so /.../libG4processes.so /.../libG4analysis.so /.../libG4digits_hits.so /.../libG4track.so /.../libG4geometry.so /.../libG4graphics_reps.so /.../libG4expat.so /.../libG4particles.so /.../libG4materials.so /.../libG4intercoms.so /.../libG4global.so /.../libG4ptl.so.2.3.3 /.../libG4zlib.so /.../libg4HepEmRun.so /.../libG4clhep.so /.../libg4HepEmData.so -lcudadevrt -lcudart

drbenmorgan · 2024-03-05T11:57:01Z

Thanks @JuanGonzalezCaminero, is that failing with the same error as @agheata reported earlier? The only difference I can spot is the use of response files for the device link step. Could you check/post the contents of the files:

test/CMakeFiles/test_atomic.dir/deviceObjects1.rsp
test/CMakeFiles/test_atomic.dir/deviceLinkLibs.rsp

please?

edit: clarified the path to these files from the build directory.

JuanGonzalezCaminero · 2024-03-05T12:36:49Z

Yes, we were having the same errors, the files have:

deviceObjects1.rsp:

CMakeFiles/test_atomic.dir/test_atomic.cu.o

deviceLinkLibs.rsp

-L"/usr/local/cuda-12.3/targets/x86_64-linux/lib/stubs"  -L"/usr/local/cuda-12.3/targets/x86_64-linux/lib"  -L"/usr/lib/gcc/x86_64-linux-gnu/12"  /home/gjuan/Dev/G4/vecgeom_installation/lib/libvgdml.a /home/gjuan/Dev/G4/vecgeom_installation/lib/libvecgeom.a -lrt -lpthread -ldl  -lcudadevrt -lcudart

JuanGonzalezCaminero · 2024-03-05T12:46:30Z

Indeed I'm able to build after the last commit @drbenmorgan

drbenmorgan · 2024-03-05T12:48:02Z

O.k., I have absolutely no idea why things are failing on these specific machines, so the latest commit is a shot in the dark. All it does is add a dummy .cu file to AdePT_G4_integration. This forces it to be treated as an RDC library and should then appear on the device link line for test_atomic. This might resolve the missing symbol errors on the CERN machines, but if not then....

drbenmorgan · 2024-03-05T12:55:40Z

Thanks for the update @JuanGonzalezCaminero, that's something of a relief! If you could also check that your separate "findpackage_test" example building against an install also works, that'd cover that base as well.

All I can really imagine is that this is something to do with GCC - could you and @agheata double check and confirm which Linux, gcc and associated toolchain is in use please (i.e. gcc, gcc-ar gcc-ranlib etc)?

JuanGonzalezCaminero · 2024-03-05T13:15:33Z

Ah, I wasn't building the examples, I confirm the commit solved the linking issue with AdePT_G4_Integration, however the vecgeom DSO missing from command line error still appears, so I can't test the external linking yet.

drbenmorgan · 2024-03-05T13:42:31Z

Right, same exercise as before, so could you post the full link line for example1 please? It'll be the last one if you run

$ make clean && make VERBOSE=1 example1

JuanGonzalezCaminero · 2024-03-05T15:10:50Z

This is what I get:

cd /home/gjuan/Dev/G4/AdePT_Master/build/examples/Example1 && /usr/bin/cmake -E cmake_link_script CMakeFiles/example1.dir/link.txt --verbose=1
/usr/bin/c++ -O3 -DNDEBUG CMakeFiles/example1.dir/example1.cpp.o CMakeFiles/example1.dir/__/common/src/ParticleGun.cc.o CMakeFiles/example1.dir/__/common/src/ParticleGunMessenger.cc.o CMakeFiles/example1.dir/__/common/src/FTFP_BERT_HepEm.cc.o CMakeFiles/example1.dir/__/common/src/FTFP_BERT_AdePT.cc.o CMakeFiles/example1.dir/src/ActionInitialisation.cc.o CMakeFiles/example1.dir/src/DetectorConstruction.cc.o CMakeFiles/example1.dir/src/DetectorMessenger.cc.o CMakeFiles/example1.dir/src/EventAction.cc.o CMakeFiles/example1.dir/src/EventActionMessenger.cc.o CMakeFiles/example1.dir/src/SteppingAction.cc.o CMakeFiles/example1.dir/src/SimpleHit.cc.o CMakeFiles/example1.dir/src/PrimaryGeneratorAction.cc.o CMakeFiles/example1.dir/src/RunAction.cc.o CMakeFiles/example1.dir/src/SensitiveDetector.cc.o CMakeFiles/example1.dir/src/TrackingAction.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4AsciiReader.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4AsciiReaderMessenger.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4Interface.cc.o -o ../../BuildProducts/bin/example1  -Wl,-rpath,/.../ /.../HepMC3libs ../../BuildProducts/lib/libAdePT_G4_integration_final.so ../../BuildProducts/lib/libAdePT_G4_integration.so ../../BuildProducts/lib/libAdePT.so /.../libvecgeomcuda.so /.../libvgdml.a /.../libvecgeom.a -lrt -lpthread -ldl /.../Geant4libs /.../G4HepEMlibs

drbenmorgan · 2024-03-05T16:37:49Z

Which looks identical to mine... One last thing to try - could you modify the link command in examples/Example1/CMakeLists.txt to:

cuda_rdc_target_link_libraries(example1
  PRIVATE
    AdePT_G4_integration
    AdePT
    VecGeom::vecgeom
    ${HEPMC3_LIBRARIES} 
    ${HEPMC3_FIO_LIBRARIES}
)

Note the extra VecGeom link. This should fully declare that as needed in the link rather than relying on transitive inclusion. If that still fails, could you also:

Wipe that build directory and start from scratch (I have seen occasional caching issues with RDC builds)
If it fails again, post the link line now used.

drbenmorgan · 2024-03-06T12:11:53Z

As noted in #280, we should wait on the fix to the circular dependency presented there to go through with the existing link structure. I can then rebase here so we can at least decouple these issues (if they are not already orthogonal).

As I commented in #280:

FWIW, I can build the changes here without issue on my local system (Alma 9) with either GCC 11.4 or 12.2 and with CUDA > 12.0 or 12.3, plus a standard VecGeom CUDA enabled install.

and also the CI machine is passing. @agheata, @JuanGonzalezCaminero, which machines are you using, and are they accessible for CERN account holders at all? Are you able to reproduce the errors on lxplus-gpu? We can discuss further on Mattermost as required, but there's something very specific to your setups that's exposing this problem.

One additional cross-check to try (if @pcanal and @sethrj don't mind):

Could @agheata and @JuanGonzalezCaminero try building the current Celeritas develop branch on the systems in question using the Geant4/VecGeom/etc setup as for AdePT
Could @pcanal and @sethrj try building this PR in their Celeritas dev environment(s)

This should expose any fundamental system problems.

JuanGonzalezCaminero · 2024-03-06T13:24:22Z

@drbenmorgan Building the current Celeritas develop branch with the following configuration:

CXX=/usr/bin/gcc-10 cmake -S. -B./build 
    -DCMAKE_INSTALL_PREFIX="../celeritas_installation" 
    -DCMAKE_PREFIX_PATH="../geant4_installation;../vecgeom_installation" 
    -DCMAKE_CUDA_ARCHITECTURES=89

I get the following when it tries to link orange-update

[ 65%] Linking CXX executable ../bin/orange-update
/usr/bin/ld: CMakeFiles/orange-update.dir/orange-update.cc.o: undefined reference to symbol '_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE7compareEPKc@@GLIBCXX_3.4.21'
/usr/bin/ld: /lib/x86_64-linux-gnu/libstdc++.so.6: error adding symbols: DSO missing from command line

But in this case it seems unrelated. It seems I can't compile with gcc 12.3, the compiler I used instead is gcc 10.5

sethrj · 2024-03-06T14:21:09Z

@JuanGonzalezCaminero that last error message looks like you're mixing incompatible compilers...

drbenmorgan · 2024-03-06T15:09:43Z

Thanks @sethrj, much appreciated 🙇! That was with GCC8, CUDA 11.5 (which OS?)?

sethrj · 2024-03-06T15:11:39Z

linux-rhel8-cascadelake, gcc 8.5.0, CUDA 11.5.

drbenmorgan · 2024-03-06T15:25:25Z

O.k., on my setup (linux-rocky9-cascadelake, gcc 11.4.1, CUDA 12) the same spack env builds both this PR and celeritas develop perfectly (so even Celeritas with Vecgeom 2.0.0-rc2, which is this tag/tarball

pcanal · 2024-03-06T20:28:09Z

CMakeLists.txt

+  src/AdePTPhysics.cc
+  src/HepEMPhysics.cc
+  src/AdePTGeant4Integration.cpp
+  src/adept_g4_dummy.cu


The cuda_rdc_ function are meant to add this when needed. The fact that it isn't means that the detection mechanism in the cuda_rdc_ function that the library and/or executable is depending on more than one independent CUDA rdc library failed (this could indeed be related to the circular depenendy).

JuanGonzalezCaminero · 2024-03-15T10:49:38Z

@drbenmorgan We just merged the removal of circular dependencies. Could this PR be rebased onto master?

- Taken as of commit fb03216c0cd701c6aff0eb901370dad19e178503 - celeritas-project/celeritas@fb03216

Due to VecGeom's legacy CUDA design and the exposure of device functions in public APIs of libraries, ensuring correct singular device links in final executables is awkward. The Celeritas project has evolved the build/link pattern used in VecGeom into the "CudaRdcUtils" CMake module that enables projects using VecGeom (or building libraries that expose device APIs) to be correctly integrated and used by downstream clients. This is acheived through the use of custom CMake properties and wrappers around the fundamental target_XXX interfaces CMake provides, making use as transparent as it can be given the problem faced. Use CudaRdcUtils to build, link, install, and use AdePT - Provide FindVecGeom wrapper module to add RDC properties to VecGeom targets - Replace target_XXX calls with cuda_rdc_XXX equivalents - Remove obsolete "AdePT_cuda" and "AdePT_standalone" libraries that CudaRdcUtils build and handles automatically - Install and use FindVecGeom and CudaRdcUtils for use by downstream clients Tests and executables build and run successfully on a local Rocky 9, GCC11, CUDA 12 environment.

Link errors observed on CERN machines report missing symbols that are in AdePT_G4_integration, and with this library on the link line. The cause of this is not totally clear given success on other platforms, though it is known that there is a circular dependency between the AdePT and AdePT_G4_integration layer that might be an underlying issue. To attempt resolution/diagnosis, add a dummy .cu file to AdePT_G4_integration. This forces it to be built as an RDC library and thus onto the device link line alongside AdePT, potnetially resolving the device link errors.

drbenmorgan · 2024-03-18T12:20:34Z

Latest commits are a very quick and dirty rebase on the latest master, with fixups for the change to header only etc. It works locally for me modulo some device code warnings on flags and that the testField test doesn't pass. However, I want to see what the CI shows first.

drbenmorgan · 2024-03-18T12:46:22Z

O.k., CI looks good, so @JuanGonzalezCaminero, @agheata could you try building again and see if things are now resolved on your side please?

JuanGonzalezCaminero · 2024-03-18T14:06:27Z

Still getting one error here:

[100%] Linking CXX executable ../../BuildProducts/bin/example1
/usr/bin/ld: ../../BuildProducts/lib/libAdePT_G4_integration.so: undefined reference to `__cudaRegisterLinkedBinary_5ad212d0_23_AdePTTrackingManager_cu_6f198e90_884112'

Simplified link line:

/usr/bin/c++ -O3 -DNDEBUG CMakeFiles/example1.dir/example1.cpp.o CMakeFiles/example1.dir/__/common/src/ParticleGun.cc.o CMakeFiles/example1.dir/__/common/src/ParticleGunMessenger.cc.o CMakeFiles/example1.dir/__/common/src/FTFP_BERT_HepEm.cc.o CMakeFiles/example1.dir/__/common/src/FTFP_BERT_AdePT.cc.o CMakeFiles/example1.dir/src/ActionInitialisation.cc.o CMakeFiles/example1.dir/src/DetectorConstruction.cc.o CMakeFiles/example1.dir/src/DetectorMessenger.cc.o CMakeFiles/example1.dir/src/EventAction.cc.o CMakeFiles/example1.dir/src/EventActionMessenger.cc.o CMakeFiles/example1.dir/src/SteppingAction.cc.o CMakeFiles/example1.dir/src/SimpleHit.cc.o CMakeFiles/example1.dir/src/PrimaryGeneratorAction.cc.o CMakeFiles/example1.dir/src/RunAction.cc.o CMakeFiles/example1.dir/src/SensitiveDetector.cc.o CMakeFiles/example1.dir/src/TrackingAction.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4AsciiReader.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4AsciiReaderMessenger.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4Interface.cc.o -o ../../BuildProducts/bin/example1  -Wl,-rpath,/.../ HepMC3libs ../../BuildProducts/lib/libAdePT_G4_integration_final.so ../../BuildProducts/lib/libAdePT_G4_integration.so VecGeomlibs -lrt -lpthread -ldl XercesC G4libs G4HepEMlibs

And the same line on Master:

/usr/bin/c++ -O3 -DNDEBUG CMakeFiles/example1.dir/example1.cpp.o CMakeFiles/example1.dir/__/common/src/ParticleGun.cc.o CMakeFiles/example1.dir/__/common/src/ParticleGunMessenger.cc.o CMakeFiles/example1.dir/__/common/src/FTFP_BERT_HepEm.cc.o CMakeFiles/example1.dir/__/common/src/FTFP_BERT_AdePT.cc.o CMakeFiles/example1.dir/src/ActionInitialisation.cc.o CMakeFiles/example1.dir/src/DetectorConstruction.cc.o CMakeFiles/example1.dir/src/DetectorMessenger.cc.o CMakeFiles/example1.dir/src/EventAction.cc.o CMakeFiles/example1.dir/src/EventActionMessenger.cc.o CMakeFiles/example1.dir/src/SteppingAction.cc.o CMakeFiles/example1.dir/src/SimpleHit.cc.o CMakeFiles/example1.dir/src/PrimaryGeneratorAction.cc.o CMakeFiles/example1.dir/src/RunAction.cc.o CMakeFiles/example1.dir/src/SensitiveDetector.cc.o CMakeFiles/example1.dir/src/TrackingAction.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4AsciiReader.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4AsciiReaderMessenger.cc.o CMakeFiles/example1.dir/__/common/src/HepMC3G4Interface.cc.o -o ../../BuildProducts/bin/example1   -L/usr/local/cuda-12.3/targets/x86_64-linux/lib/stubs  -L/usr/local/cuda-12.3/targets/x86_64-linux/lib  -L/usr/lib/gcc/x86_64-linux-gnu/12  -Wl,-rpath,/.../ ../../BuildProducts/lib/libAdePT_G4_integration.so HepMC3libs VecGeomlibs -lrt -lpthread -ldl XercesC G4libs G4HepEMlibs -lcudadevrt -lcudart_static -lrt -lpthread -ldl

drbenmorgan · 2024-03-18T14:26:42Z

Argh... Could you try adding this line:

target_link_options(example1 PRIVATE "-Wl,--no-as-needed")

at line 64 of examples/example1/CMakeLists.txt please? Are you running on an Ubuntu machine by any chance?

JuanGonzalezCaminero · 2024-03-18T14:50:11Z

That line solves it indeed. I'm running on Debian

sethrj · 2024-03-18T15:06:39Z

Yep, I found out quite some time ago that VecGeom requires this on Ubuntu systems: see https://wiki.gentoo.org/wiki/Project:Quality_Assurance/-Wl,-z,defs_and_-Wl,--no-allow-shlib-undefined .

drbenmorgan · 2024-03-18T15:27:28Z

Thanks both! @JuanGonzalezCaminero, could you try modifying the line to:

target_link_options(example1 PRIVATE "-Wl,-z,defs")

and see what it prints please? Not a fix, but useful to see what it produces. I'll have a think about how to fix this for installs, but in the meantime I think we can use a project wide no-as-needed for our own builds.

JuanGonzalezCaminero · 2024-03-18T15:40:52Z

As far as I can tell I get the same error and link line when using those options. Is there something specific I should look for?

drbenmorgan · 2024-03-18T15:46:58Z

If I understand the option correctly, it should report specifics on what's missing, though perhaps it needs to be applied to the libraries itself? Anyway, I'm now able to reproduce the error locally by adding the --as-needed flag to my builds so can at leats check fixes locally (which I'll do).

agheata · 2024-03-20T09:14:37Z

@drbenmorgan it also works for me when adding target_link_options(integrationBenchmark PRIVATE "-Wl,--no-as-needed"), do you plan to add it globally for now in this PR? Or should we use -Wl,-z,defs during the creation of the libAdePT_G4_integration libs to force the resolution of all dependencies before the examples are built?

drbenmorgan · 2024-03-20T16:00:56Z

IIUC, -Wl,-z,defs is only for library developers to warn, rather than fix things (but I need to explore this more). The idea for now would be to use --no-as-needed globally in AdePT, and then document that this may be needed by clients to resolve things.

As @WitekPokorski reported that the current AdePT libraries had been successfully linked in Gaussino, I'm first going to go back to the celer-adept joint integration layer and see what happens there when fully linking in and using AdePT before deciding how to progress this PR (I also want to go back and update g4vg and its use as I think that's a higher priority?).

drbenmorgan · 2024-05-01T17:16:49Z

Closing as superseded by #289

drbenmorgan requested review from agheata and JuanGonzalezCaminero February 29, 2024 15:43

drbenmorgan force-pushed the rdc-linking branch from 6b1369e to 24098bb Compare February 29, 2024 15:59

pcanal reviewed Feb 29, 2024

View reviewed changes

JuanGonzalezCaminero reviewed Feb 29, 2024

View reviewed changes

agheata mentioned this pull request Mar 6, 2024

Rdc no circdeps #280

Closed

pcanal reviewed Mar 6, 2024

View reviewed changes

drbenmorgan added 8 commits March 18, 2024 11:41

Import CudaRdcUtils.cmake from Celeritas

0b918fc

- Taken as of commit fb03216c0cd701c6aff0eb901370dad19e178503 - celeritas-project/celeritas@fb03216

Import FindVecGeom.cmake wrapper from Celeritas

0375e49

Include CudaRdcUtils module

c934d41

Add MIT License for imported modules

3aeff97

Add SPDX licence info

25f38ab

Fixup target names following rebase

f08d53a

drbenmorgan force-pushed the rdc-linking branch from 726e762 to f08d53a Compare March 18, 2024 12:18

drbenmorgan changed the title ~~Factor build/link/use of VecGeom/CUDA using libraries for simplicity~~ Draft: Factor build/link/use of VecGeom/CUDA using libraries for simplicity Mar 20, 2024

drbenmorgan marked this pull request as draft March 20, 2024 16:24

drbenmorgan mentioned this pull request May 1, 2024

Minimal implementation of in-memory Geant4->VecGeom conversion with G4VG #289

Closed

drbenmorgan closed this May 1, 2024

Draft: Factor build/link/use of VecGeom/CUDA using libraries for simplicity #279

Draft: Factor build/link/use of VecGeom/CUDA using libraries for simplicity #279

Conversation

drbenmorgan commented Feb 29, 2024

phsft-bot commented Feb 29, 2024

pcanal commented Feb 29, 2024

pcanal Feb 29, 2024

Choose a reason for hiding this comment

drbenmorgan Feb 29, 2024

Choose a reason for hiding this comment

pcanal Feb 29, 2024

Choose a reason for hiding this comment

drbenmorgan Feb 29, 2024

Choose a reason for hiding this comment

JuanGonzalezCaminero Feb 29, 2024

Choose a reason for hiding this comment

drbenmorgan Feb 29, 2024

Choose a reason for hiding this comment

agheata commented Mar 1, 2024

agheata commented Mar 1, 2024

drbenmorgan commented Mar 1, 2024

drbenmorgan commented Mar 1, 2024

drbenmorgan commented Mar 1, 2024

pcanal commented Mar 1, 2024 • edited Loading

agheata commented Mar 4, 2024

agheata commented Mar 4, 2024

drbenmorgan commented Mar 5, 2024

JuanGonzalezCaminero commented Mar 5, 2024 • edited Loading

drbenmorgan commented Mar 5, 2024 • edited Loading

JuanGonzalezCaminero commented Mar 5, 2024 • edited Loading

JuanGonzalezCaminero commented Mar 5, 2024

drbenmorgan commented Mar 5, 2024

drbenmorgan commented Mar 5, 2024

JuanGonzalezCaminero commented Mar 5, 2024

drbenmorgan commented Mar 5, 2024

JuanGonzalezCaminero commented Mar 5, 2024

drbenmorgan commented Mar 5, 2024

drbenmorgan commented Mar 6, 2024

JuanGonzalezCaminero commented Mar 6, 2024

sethrj commented Mar 6, 2024

drbenmorgan commented Mar 6, 2024

sethrj commented Mar 6, 2024

drbenmorgan commented Mar 6, 2024

pcanal Mar 6, 2024

Choose a reason for hiding this comment

JuanGonzalezCaminero commented Mar 15, 2024

drbenmorgan commented Mar 18, 2024

drbenmorgan commented Mar 18, 2024

JuanGonzalezCaminero commented Mar 18, 2024 • edited Loading

drbenmorgan commented Mar 18, 2024

JuanGonzalezCaminero commented Mar 18, 2024

sethrj commented Mar 18, 2024

drbenmorgan commented Mar 18, 2024

JuanGonzalezCaminero commented Mar 18, 2024

drbenmorgan commented Mar 18, 2024

agheata commented Mar 20, 2024

drbenmorgan commented Mar 20, 2024

drbenmorgan commented May 1, 2024

pcanal commented Mar 1, 2024 •

edited

Loading

JuanGonzalezCaminero commented Mar 5, 2024 •

edited

Loading

drbenmorgan commented Mar 5, 2024 •

edited

Loading

JuanGonzalezCaminero commented Mar 5, 2024 •

edited

Loading

JuanGonzalezCaminero commented Mar 18, 2024 •

edited

Loading