feat: Add TensorRT support for GNNs #4016

benjaminhuth · 2025-01-09T16:16:29Z

Cannot be compiled currently in the CI

--- END COMMIT MESSAGE ---

Any further description goes here, @-mentions are ok here!

Use a conventional commits prefix: quick summary
- We mostly use feat, fix, refactor, docs, chore and build types.
A milestone will be assigned by one of the maintainers

Summary by CodeRabbit

Release Notes

New Features
- Added TensorRT support for edge classification in the ExaTrkX plugin.
- Introduced a new TensorRT-based edge classifier for machine learning tasks.
Infrastructure
- Updated build configuration to support TensorRT integration.
- Added a new build job for TensorRT-enabled components.
Technical Improvements
- Enhanced plugin with GPU-accelerated inference capabilities.
- Integrated TensorRT runtime and execution context for efficient processing.
- Improved build flexibility by removing previous dependency checks in the configuration.
- Introduced a CMake module for finding and configuring TensorRT.

coderabbitai · 2025-01-09T16:16:38Z

Walkthrough

A new job, build_gnn_tensorrt, added to the .gitlab-ci.yml, it has. For TensorRT-powered edge classification, this job prepares. Introduced, the TensorRTEdgeClassifier is, enabling GPU-accelerated inference in track finding. Multiple files, the changes span: Python bindings, CMake modifications, and core implementation files to support this new functionality, included they are.

Changes

File	Change Summary
`.gitlab-ci.yml`	Added `build_gnn_tensorrt` job with TensorRT Docker image and dependency configuration
`Examples/Python/src/ExaTrkXTrackFinding.cpp`	Introduced `TensorRTEdgeClassifier` class and Python bindings
`Plugins/ExaTrkX/CMakeLists.txt`	Added TensorRT package discovery and library linking
`Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp`	Defined `TensorRTEdgeClassifier` header with configuration and interface
`Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp`	Implemented `TensorRTEdgeClassifier` with TensorRT inference logic
`CMakeLists.txt`	Removed conditional check for `ACTS_EXATRKX_ENABLE_ONNX` or `ACTS_EXATRKX_ENABLE_TORCH`
`cmake/FindTensorRT.cmake`	Introduced module for locating and configuring TensorRT

Possibly related PRs

feat: Add CUDA connected components & track building #4015: The changes in this PR introduce a new job in the .gitlab-ci.yml file related to CUDA functionality, which aligns with the new job build_gnn_tensorrt added in the main PR. Both PRs focus on enhancing the build process for components that utilize CUDA and TensorRT.

Suggested Labels

automerge

Suggested Reviewers

paulgessinger
stephenswat

Poem

In circuits of silicon bright, 🖥️
TensorRT dances with neural might, 🤖
Edges classified with grace,
GPU's computational embrace,
Track finding's quantum delight! 🚀

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

github-actions · 2025-01-09T18:55:53Z

📊: Physics performance monitoring for `7243544`

Full contents

physmon summary

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (4)

Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1)
98-100: Use ACTS logging instead of std::cout, prefer you should.

For consistency within the codebase, replace std::cout with ACTS logging macros.

Apply this diff to use the logging framework:
 ~TimePrinter() {
-  std::cout << name << ": " << milliseconds(t0, t1) << std::endl;
+  ACTS_INFO(name << ": " << milliseconds(t0, t1) << " ms");
 }
Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp (2)
38-41: Destructor to be marked override, consider you should.

Since the base class has a virtual destructor, marking the destructor in the derived class with override good practice it is.

Apply this diff for clarity:
   ~TensorRTEdgeClassifier();
+  ~TensorRTEdgeClassifier() override;
49-58: Member variables' initialization order, ensure you must.

Initialize member variables in the order they are declared to avoid warnings.

Ensure that m_cfg is initialized before m_trtLogger, as declared.
Examples/Python/src/ExaTrkXTrackFinding.cpp (1)
110-128: Logger name, more specific make you should.

For clarity and consistency, use a distinct logger name for TensorRTEdgeClassifier.

Apply this diff to specify the logger name:
                       return std::make_shared<Alg>(
-                          c, getDefaultLogger("EdgeClassifier", lvl));
+                          c, getDefaultLogger("TensorRTEdgeClassifier", lvl));

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0e65bc and f5a819a.

📒 Files selected for processing (5)

.gitlab-ci.yml (1 hunks)
Examples/Python/src/ExaTrkXTrackFinding.cpp (2 hunks)
Plugins/ExaTrkX/CMakeLists.txt (1 hunks)
Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp (1 hunks)
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (6)

GitHub Check: CI Bridge / build_gnn_tensorrt
GitHub Check: linux_physmon
GitHub Check: linux_examples_test
GitHub Check: missing_includes
GitHub Check: linux_ubuntu_extra (ubuntu2204_clang, 20)
GitHub Check: build_debug

🔇 Additional comments (3)

Examples/Python/src/ExaTrkXTrackFinding.cpp (1)

126-126: Missing configuration member useEdgeFeatures, verify you should.

Inconsistent the configuration is with other classifiers. Include useEdgeFeatures if required.

Ensure that all necessary configuration options are included.
Plugins/ExaTrkX/CMakeLists.txt (2)

Line range hint 1-38: Well-structured, this CMake configuration is!

Follow consistent patterns for different backends, it does. Proper organization and clarity, I sense.

26-38: Version constraints for TensorRT, specify we must!

Hmmmm, missing version constraints for TensorRT package, I see. Dangerous this can be, yes. Compatibility issues, it may cause.

Apply this change, you should:
-    find_package(TensorRT REQUIRED)
+    find_package(TensorRT 8.6 REQUIRED)

Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp

Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp

.gitlab-ci.yml

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

cmake/FindTensorRT.cmake (4)
1-28: Hmmmm, missing minimum version requirements, I sense.

Document the minimum required TensorRT version for this module, you should. Help users avoid compatibility issues, it will.

30-34: Validate components list, we must.

Check for invalid component names in TensorRT_FIND_COMPONENTS, wise it would be. Prevent configuration errors early, this will.
 if(NOT TensorRT_FIND_COMPONENTS)
     set(TensorRT_FIND_COMPONENTS nvinfer nvinfer_plugin nvonnxparser nvparsers)
 endif()
+set(_valid_components nvinfer nvinfer_plugin nvonnxparser nvparsers)
+foreach(component IN LISTS TensorRT_FIND_COMPONENTS)
+    if(NOT component IN_LIST _valid_components)
+        message(FATAL_ERROR "Invalid TensorRT component specified: ${component}")
+    endif()
+endforeach()
 set(TensorRT_LIBRARIES)
44-53: More helpful error message, provide we should.

Include the searched paths in the error message, helpful it would be. Guide users to correct configuration, this will.
     if(TensorRT_FIND_REQUIRED)
         message(
             FATAL_ERROR
-            "Fail to find TensorRT, please set TensorRT_ROOT. Include path not found."
+            "Failed to find TensorRT header NvInfer.h. Searched in:\n"
+            "  - ${TensorRT_ROOT}/include\n"
+            "  - $ENV{TensorRT_ROOT}/include\n"
+            "Please set TensorRT_ROOT to the installation directory."
         )
     endif()
171-174: Process components in parallel, consider we should.

For large component lists, parallel processing could speed up configuration. Optional it is, but beneficial for large projects.
+if(CMAKE_VERSION VERSION_GREATER_EQUAL "3.19")
+    cmake_policy(SET CMP0114 NEW)
+    foreach(component IN LISTS TensorRT_FIND_COMPONENTS)
+        cmake_language(DEFER DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
+            CALL _find_trt_component ${component})
+    endforeach()
+else()
     foreach(component IN LISTS TensorRT_FIND_COMPONENTS)
         _find_trt_component(${component})
     endforeach()
+endif()

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b983ba9 and d70d26f.

📒 Files selected for processing (2)

Examples/Python/src/ExaTrkXTrackFinding.cpp (2 hunks)
cmake/FindTensorRT.cmake (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

Examples/Python/src/ExaTrkXTrackFinding.cpp

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: linux_ubuntu_extra (ubuntu2204_clang, 20)
GitHub Check: linux_ubuntu_extra (ubuntu2204, 20)
GitHub Check: macos
GitHub Check: linux_ubuntu
GitHub Check: unused_files
GitHub Check: missing_includes
GitHub Check: build_debug
GitHub Check: CI Bridge / build_gnn_tensorrt

🔇 Additional comments (2)

cmake/FindTensorRT.cmake (2)

176-182: Well implemented, the package handling is.

Following CMake best practices, you are. Approve this section, I do.

56-114: Handle malformed version strings, we must.

Verify that version strings are properly extracted, essential it is. Add error handling for malformed version strings, we should.

cmake/FindTensorRT.cmake

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

.gitlab-ci.yml (1)

189-224: ⚠️ Potential issue

Align with existing ExaTrkX jobs and add missing configurations, you must!

Similar to the past review, improvements needed there are:

CUDA architectures with other ExaTrkX jobs, align we must
Artifacts for downstream jobs, configure we should
Testing stage for TensorRT functionality, define we must
Torch disabled while TensorRT enabled, verify this approach we should

Apply these changes, you should:

 build_gnn_tensorrt:
   stage: build
   image: ghcr.io/acts-project/ubuntu2404_tensorrt:74
   variables:
     DEPENDENCY_URL: https://acts.web.cern.ch/ACTS/ci/ubuntu-24.04/deps.$DEPENDENCY_TAG.tar.zst
+    TORCH_CUDA_ARCH_LIST: "8.0 8.6 8.9 9.0"

   cache:
     key: ccache-${CI_JOB_NAME}-${CI_COMMIT_REF_SLUG}-${CCACHE_KEY_SUFFIX}
     fallback_keys:
       - ccache-${CI_JOB_NAME}-${CI_DEFAULT_BRANCH}-${CCACHE_KEY_SUFFIX}
     when: always
     paths:
       - ${CCACHE_DIR}

+  artifacts:
+    paths:
+      - build/
+    exclude:
+      - build/**/*.o
+    expire_in: 6 hours

   tags:
     - docker-gpu-nvidia

   script:
     - git clone $CLONE_URL src
     - cd src
     - git checkout $HEAD_SHA
     - source CI/dependencies.sh
     - cd ..
     - mkdir build
     - >
       cmake -B build -S src
       -DACTS_BUILD_PLUGIN_EXATRKX=ON
-      -DACTS_EXATRKX_ENABLE_TORCH=OFF
       -DACTS_EXATRKX_ENABLE_CUDA=ON
       -DACTS_EXATRKX_ENABLE_TENSORRT=ON
       -DPython_EXECUTABLE=$(which python3)
       -DCMAKE_CUDA_ARCHITECTURES="75;86"

     - ccache -z
     - cmake --build build -- -j6
     - ccache -s

+test_gnn_tensorrt:
+  stage: test
+  needs:
+    - build_gnn_tensorrt
+  image: ghcr.io/acts-project/ubuntu2404_tensorrt:74
+  tags:
+    - docker-gpu-nvidia
+  script:
+    - apt-get update -y
+    - git clone $CLONE_URL src
+    - cd src
+    - git checkout $HEAD_SHA
+    - source CI/dependencies.sh
+    - cd ..
+    - ctest --test-dir build -R TensorRT

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d70d26f and 0d18424.

📒 Files selected for processing (1)

.gitlab-ci.yml (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (9)

GitHub Check: merge-sentinel
GitHub Check: CI Bridge / build_gnn_tensorrt
GitHub Check: unused_files
GitHub Check: macos
GitHub Check: linux_ubuntu_extra (ubuntu2204_clang, 20)
GitHub Check: missing_includes
GitHub Check: build_debug
GitHub Check: linux_ubuntu
GitHub Check: docs

🔇 Additional comments (1)

.gitlab-ci.yml (1)

191-191: Verify the image tag, we must!

Different from other jobs using tag '63', this one uses tag '74'. Intentional this change is?

✅ Verification successful

Valid and intentional, this image tag is!

Created recently, the image tag '74' has been. Tagged as 'latest', it is, hmmmm. Trust in the Force and the wisdom of the maintainers, we must.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check if the image exists and compare its creation date with other images
gh api \
  -H "Accept: application/vnd.github+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  /orgs/acts-project/packages/container/ubuntu2404_tensorrt/versions | jq '.[] | select(.metadata.container.tags[] | contains("74"))'
Length of output: 2322

.gitlab-ci.yml

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp (1)
49-49: ⚠️ Potential issue

Return a proper torch::Device object, you must.

Return device type directly, you do. A proper torch::Device instance, needed it is.

Apply this diff to return the correct device:
-  torch::Device device() const override { return torch::kCUDA; };
+  torch::Device device() const override { return torch::Device(torch::kCUDA); };

🧹 Nitpick comments (1)

Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp (1)

29-38: Document the configuration parameters, you must.

Mysterious, these magic numbers are. Documentation for cut = 0.21 and validation for deviceID, needed they are. Add comments explaining:

The meaning and impact of the cut threshold

Valid ranges for deviceID

Purpose of each configuration parameter

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6337bb and bc06d42.

📒 Files selected for processing (1)

Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: linux_examples_test
GitHub Check: linux_physmon
GitHub Check: linux_ubuntu_extra (ubuntu2204_clang, 20)
GitHub Check: macos
GitHub Check: build_debug
GitHub Check: CI Bridge / build_gnn_tensorrt
GitHub Check: missing_includes

🔇 Additional comments (2)

Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp (2)

61-62: Wise use of mutex for thread safety, I sense.

Protected well, the execution contexts are. Thread-safe design, you have achieved.

18-23: Verify TensorRT headers in CI environment, we must.

Failed to compile in CI, the PR has. Check if TensorRT headers are available in CI environment, we should.

✅ Verification successful

Expected behavior in the Force, this is. Worry, you must not.

In specialized container with TensorRT, the CI runs. Missing headers in standard paths, natural this is. When building locally, set TensorRT_ROOT to your installation path, you must. In CI environment, the dedicated container ghcr.io/acts-project/ubuntu2404_tensorrt:74 handles this automatically, it does.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for TensorRT headers in system paths
find /usr/include -name "NvInfer.h" 2>/dev/null || echo "TensorRT headers not found in /usr/include"
find /usr/local/include -name "NvInfer.h" 2>/dev/null || echo "TensorRT headers not found in /usr/local/include"

# Check if TensorRT is mentioned in package configuration
if command -v pkg-config &> /dev/null; then
    pkg-config --list-all | grep -i tensorrt || echo "No TensorRT package config found"
fi
Length of output: 303

Script:
#!/bin/bash
# Check CMake configuration for TensorRT
fd CMakeLists.txt --exec grep -l "TensorRT" {} \;
fd "*.cmake" --exec grep -l "TensorRT" {} \;

# Check documentation for TensorRT requirements
fd "README|INSTALL" --extension md --exec grep -i -A 5 -B 5 "tensorrt" {} \;

# Check environment variable usage in build files
rg "TENSORRT|TensorRT" --type cmake
Length of output: 6096

Script:
#!/bin/bash
# Check build configuration options
fd "BuildOptions.cmake|CMakeCache.txt" --exec grep -A 5 -B 5 "EXATRKX.*TENSORRT" {} \;

# Look for any build documentation
rg -i "tensorrt|cuda" docs/ || echo "No direct mentions in docs/"

# Check CI configuration for TensorRT setup
fd "\.gitlab-ci\.yml|\.github/workflows" --exec grep -A 10 -B 10 -i "tensorrt" {} \;
Length of output: 2251

Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp

sonarqubecloud · 2025-01-21T11:46:06Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
9.5% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1)
37-37: Const correctness in log method, improve we must.

The log method should mark msg parameter as const char* const, it should.
-  void log(Severity severity, const char *msg) noexcept override {
+  void log(Severity severity, const char* const msg) noexcept override {

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91d9864 and a36cab6.

📒 Files selected for processing (2)

Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp (1 hunks)
Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp

🧰 Additional context used

📓 Learnings (1)

Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1)

Learnt from: benjaminhuth
PR: acts-project/acts#4016
File: Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp:44-46
Timestamp: 2025-01-21T10:16:57.820Z
Learning: In Acts project, when a derived class implements a virtual method, the documentation can be found in the base class and doesn't need to be repeated. For example, TensorRTEdgeClassifier's operator() documentation is in EdgeClassificationBase.

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: CI Bridge / build_gnn_tensorrt
GitHub Check: merge-sentinel

🔇 Additional comments (1)

Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1)

181-183: ⚠️ Potential issue

Fix undefined variable and improve memory management, we must.

Variable 'outputMem' undefined it is, and memory management improved it must be.
-  auto scores = torch::from_blob(
-      outputMem, edgeIndex.size(1), 1, [](void *ptr) { cudaFree(ptr); },
-      torch::TensorOptions().device(torch::kCUDA).dtype(torch::kFloat32));
+  // scores tensor already created and used for inference
+  // no need to create new tensor from blob
Likely invalid or redundant comment.

coderabbitai · 2025-01-21T13:50:20Z

Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp

+  std::size_t freeMem, totalMem;
+  cudaMemGetInfo(&freeMem, &totalMem);
+  ACTS_DEBUG("Used CUDA memory after TensorRT initialization: "
+             << (totalMem - freeMem) * 1e-9 << " / " << totalMem * 1e-9
+             << " GB");


⚠️ Potential issue

Check cudaMemGetInfo status, you must.

Handle errors from CUDA memory information retrieval, we should.

std::size_t freeMem, totalMem; -cudaMemGetInfo(&freeMem, &totalMem); +cudaError_t err = cudaMemGetInfo(&freeMem, &totalMem); +if (err != cudaSuccess) { + throw std::runtime_error("Failed to get CUDA memory info: " + + std::string(cudaGetErrorString(err))); +}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

std::size_t freeMem, totalMem;

cudaMemGetInfo(&freeMem, &totalMem);

ACTS_DEBUG("Used CUDA memory after TensorRT initialization: "

<< (totalMem - freeMem) * 1e-9 << " / " << totalMem * 1e-9

<< " GB");

std::size_t freeMem, totalMem;

cudaError_t err = cudaMemGetInfo(&freeMem, &totalMem);

if (err != cudaSuccess) {

throw std::runtime_error("Failed to get CUDA memory info: " +

std::string(cudaGetErrorString(err)));

}

ACTS_DEBUG("Used CUDA memory after TensorRT initialization: "

<< (totalMem - freeMem) * 1e-9 << " / " << totalMem * 1e-9

<< " GB");

coderabbitai · 2025-01-21T13:50:20Z

Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp

+  while (context == nullptr) {
+    std::lock_guard<std::mutex> lock(m_contextMutex);
+    if (!m_contexts.empty()) {
+      context = std::move(m_contexts.back());
+      m_contexts.pop_back();
+    }
+  }


⚠️ Potential issue

Prevent deadlock in context acquisition, we must.

Infinite loop without timeout or condition variable, dangerous it is. Add timeout or condition variable to prevent deadlock, we should.

+ auto start = std::chrono::steady_clock::now(); while (context == nullptr) { std::lock_guard<std::mutex> lock(m_contextMutex); if (!m_contexts.empty()) { context = std::move(m_contexts.back()); m_contexts.pop_back(); + } else { + auto now = std::chrono::steady_clock::now(); + auto elapsed = std::chrono::duration_cast<std::chrono::seconds> + (now - start).count(); + if (elapsed > 60) { // 1 minute timeout + throw std::runtime_error("Timeout waiting for available context"); + } + std::this_thread::sleep_for(std::chrono::milliseconds(100)); } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

while (context == nullptr) {

std::lock_guard<std::mutex> lock(m_contextMutex);

if (!m_contexts.empty()) {

context = std::move(m_contexts.back());

m_contexts.pop_back();

}

}

auto start = std::chrono::steady_clock::now();

while (context == nullptr) {

std::lock_guard<std::mutex> lock(m_contextMutex);

if (!m_contexts.empty()) {

context = std::move(m_contexts.back());

m_contexts.pop_back();

} else {

auto now = std::chrono::steady_clock::now();

auto elapsed = std::chrono::duration_cast<std::chrono::seconds>

(now - start).count();

if (elapsed > 60) { // 1 minute timeout

throw std::runtime_error("Timeout waiting for available context");

}

std::this_thread::sleep_for(std::chrono::milliseconds(100));

}

}

coderabbitai · 2025-01-21T13:50:20Z

Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp

+  auto nodeFeatures =
+      std::any_cast<torch::Tensor>(inNodeFeatures).to(execContext.device);
+
+  auto edgeIndex =
+      std::any_cast<torch::Tensor>(inEdgeIndex).to(execContext.device);
+  ACTS_DEBUG("edgeIndex: " << detail::TensorDetails{edgeIndex});
+
+  auto edgeFeatures =
+      std::any_cast<torch::Tensor>(inEdgeFeatures).to(execContext.device);
+  ACTS_DEBUG("edgeFeatures: " << detail::TensorDetails{edgeFeatures});


🛠️ Refactor suggestion

Validate input tensors before device transfer, we must.

Check tensor validity and dimensions before device transfer, essential it is.

+ if (!nodeFeatures.defined() || nodeFeatures.dim() != 2) { + throw std::runtime_error("Invalid node features tensor"); + } auto nodeFeatures = std::any_cast<torch::Tensor>(inNodeFeatures).to(execContext.device); + if (!edgeIndex.defined() || edgeIndex.dim() != 2 || + edgeIndex.size(0) != 2) { + throw std::runtime_error("Invalid edge index tensor"); + } auto edgeIndex = std::any_cast<torch::Tensor>(inEdgeIndex).to(execContext.device); + if (!edgeFeatures.defined() || edgeFeatures.dim() != 2) { + throw std::runtime_error("Invalid edge features tensor"); + } auto edgeFeatures = std::any_cast<torch::Tensor>(inEdgeFeatures).to(execContext.device);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

auto nodeFeatures =

std::any_cast<torch::Tensor>(inNodeFeatures).to(execContext.device);

auto edgeIndex =

std::any_cast<torch::Tensor>(inEdgeIndex).to(execContext.device);

ACTS_DEBUG("edgeIndex: " << detail::TensorDetails{edgeIndex});

auto edgeFeatures =

std::any_cast<torch::Tensor>(inEdgeFeatures).to(execContext.device);

ACTS_DEBUG("edgeFeatures: " << detail::TensorDetails{edgeFeatures});

if (!nodeFeatures.defined() || nodeFeatures.dim() != 2) {

throw std::runtime_error("Invalid node features tensor");

}

auto nodeFeatures =

std::any_cast<torch::Tensor>(inNodeFeatures).to(execContext.device);

if (!edgeIndex.defined() || edgeIndex.dim() != 2 ||

edgeIndex.size(0) != 2) {

throw std::runtime_error("Invalid edge index tensor");

}

auto edgeIndex =

std::any_cast<torch::Tensor>(inEdgeIndex).to(execContext.device);

ACTS_DEBUG("edgeIndex: " << detail::TensorDetails{edgeIndex});

if (!edgeFeatures.defined() || edgeFeatures.dim() != 2) {

throw std::runtime_error("Invalid edge features tensor");

}

auto edgeFeatures =

std::any_cast<torch::Tensor>(inEdgeFeatures).to(execContext.device);

ACTS_DEBUG("edgeFeatures: " << detail::TensorDetails{edgeFeatures});

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1)
197-203: Structured performance monitoring system, implement we should.

Instead of scattered timing measurements, a dedicated performance monitoring system, beneficial it would be. Consider these improvements:

Structured timing data collection

Memory usage tracking at key points

Performance metrics aggregation

Example implementation structure:
struct PerformanceMetrics {
    double deviceTransferTime;
    double inferenceTime;
    double postProcessingTime;
    size_t peakMemoryUsage;
    // Add more metrics as needed
};

// Add to class members
std::vector<PerformanceMetrics> m_metrics;

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a36cab6 and 7243544.

📒 Files selected for processing (1)

Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1 hunks)

🧰 Additional context used

📓 Learnings (1)

Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (1)

Learnt from: benjaminhuth
PR: acts-project/acts#4016
File: Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp:44-46
Timestamp: 2025-01-21T10:16:57.820Z
Learning: In Acts project, when a derived class implements a virtual method, the documentation can be found in the base class and doesn't need to be repeated. For example, TensorRTEdgeClassifier's operator() documentation is in EdgeClassificationBase.

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: merge-sentinel
GitHub Check: CI Bridge / build_gnn_tensorrt
GitHub Check: macos

🔇 Additional comments (4)

Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp (4)

30-57: Well implemented, the logger class is!

Clean and proper mapping of TensorRT severity levels to Acts logging levels, I see. RAII principles with unique_ptr, you follow.

104-108: Check cudaMemGetInfo status, you must.

Handle errors from CUDA memory information retrieval, we should.

124-133: Validate input tensors before device transfer, we must.

Check tensor validity and dimensions before device transfer, essential it is.

139-145: Prevent deadlock in context acquisition, we must.

Infinite loop without timeout or condition variable, dangerous it is.

coderabbitai · 2025-01-21T15:17:47Z

Plugins/ExaTrkX/src/TensorRTEdgeClassifier.cpp

+  auto stream = execContext.stream.value().stream();
+  auto status = context->enqueueV3(stream);
+  if (!status) {
+    throw std::runtime_error("Failed to execute TensorRT model");
+  }
+  ACTS_CUDA_CHECK(cudaStreamSynchronize(stream));
+


🛠️ Refactor suggestion

Ensure proper cleanup after inference failure, you must.

If inference fails, CUDA stream and resources must be properly cleaned up, they should be.

Apply this diff to ensure proper cleanup:

auto stream = execContext.stream.value().stream(); auto status = context->enqueueV3(stream); if (!status) { + { + std::lock_guard<std::mutex> lock(m_contextMutex); + m_contexts.push_back(std::move(context)); + } throw std::runtime_error("Failed to execute TensorRT model"); } ACTS_CUDA_CHECK(cudaStreamSynchronize(stream));

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

auto stream = execContext.stream.value().stream();

auto status = context->enqueueV3(stream);

if (!status) {

throw std::runtime_error("Failed to execute TensorRT model");

}

ACTS_CUDA_CHECK(cudaStreamSynchronize(stream));

auto stream = execContext.stream.value().stream();

auto status = context->enqueueV3(stream);

if (!status) {

{

std::lock_guard<std::mutex> lock(m_contextMutex);

m_contexts.push_back(std::move(context));

}

throw std::runtime_error("Failed to execute TensorRT model");

}

ACTS_CUDA_CHECK(cudaStreamSynchronize(stream));

update

bd45373

update

6d2b0a6

github-actions bot added Component - Examples Component - Plugins labels Jan 9, 2025

github-actions bot added this to the next milestone Jan 9, 2025

Merge branch 'main' into feature/add-tensorrt-support-gnn

f5a819a

benjaminhuth marked this pull request as ready for review January 15, 2025 13:53

coderabbitai bot reviewed Jan 15, 2025

View reviewed changes

benjaminhuth and others added 7 commits January 15, 2025 15:12

update gitlab

fadbfd3

update gitlab

963d14f

Update .gitlab-ci.yml

09ce4b2

Update .gitlab-ci.yml

45ffd7b

udpate

b983ba9

fix

574aa9a

add cmake file

d70d26f

github-actions bot added the Infrastructure label Jan 16, 2025

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

cmake/FindTensorRT.cmake Show resolved Hide resolved

Update .gitlab-ci.yml

0d18424

coderabbitai bot reviewed Jan 16, 2025

View reviewed changes

.gitlab-ci.yml Show resolved Hide resolved

benjaminhuth and others added 3 commits January 20, 2025 15:11

Merge branch 'main' into feature/add-tensorrt-support-gnn

07eca01

update

b6337bb

update

bc06d42

coderabbitai bot reviewed Jan 21, 2025

View reviewed changes

Plugins/ExaTrkX/include/Acts/Plugins/ExaTrkX/TensorRTEdgeClassifier.hpp Show resolved Hide resolved

update

91d9864

benjaminhuth added 2 commits January 21, 2025 14:41

udpate

57a9187

update

a36cab6

coderabbitai bot reviewed Jan 21, 2025

View reviewed changes

update

7243544

coderabbitai bot reviewed Jan 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add TensorRT support for GNNs #4016

feat: Add TensorRT support for GNNs #4016

benjaminhuth commented Jan 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 9, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

github-actions bot commented Jan 9, 2025 •

edited

Loading

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

sonarqubecloud bot commented Jan 21, 2025

coderabbitai bot left a comment

coderabbitai bot Jan 21, 2025

coderabbitai bot Jan 21, 2025

coderabbitai bot Jan 21, 2025

coderabbitai bot left a comment

coderabbitai bot Jan 21, 2025

feat: Add TensorRT support for GNNs #4016

Are you sure you want to change the base?

feat: Add TensorRT support for GNNs #4016

Conversation

benjaminhuth commented Jan 9, 2025 • edited by coderabbitai bot Loading

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Jan 9, 2025 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested Labels

Suggested Reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

github-actions bot commented Jan 9, 2025 • edited Loading

📊: Physics performance monitoring for 7243544

physmon summary

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Jan 21, 2025

Quality Gate passed

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 21, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 21, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 21, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 21, 2025

Choose a reason for hiding this comment

benjaminhuth commented Jan 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 9, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

github-actions bot commented Jan 9, 2025 •

edited

Loading

📊: Physics performance monitoring for `7243544`