Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to impl zero-copy when send a message out-of-process to cuda ? #45

Open
lix19937 opened this issue Aug 17, 2024 · 3 comments
Open

Comments

@lix19937
Copy link

lix19937 commented Aug 17, 2024

image

image

NOTE:
Two nodes are run at different processes. The ros2 msg memory is on cpu side, and next node(the sub side) want to receive the msg(like camera data), and then accelerate on gpu to infer.

I found, if user want to leverage the benefit of zero-copy in NITROS, all NITROS-accelerated nodes must run in the same process.

ref
https://nvidia-isaac-ros.github.io/concepts/nitros/index.html

@lix19937 lix19937 changed the title How to impl zero-copy when send a message out-of-process msgs to cuda ? How to impl zero-copy when send a message out-of-process to cuda ? Aug 17, 2024
@ZhenshengLee
Copy link

ZhenshengLee commented Oct 17, 2024

The design of NITROS makes the following assumptions of the ROS 2 applications:
To leverage the benefit of zero-copy in NITROS, all NITROS-accelerated nodes must run in the same process.
from https://nvidia-isaac-ros.github.io/concepts/nitros/index.html#system-assumptions

It's clear in isaac-ros docs that intra-process is needed for nitros node, but maintain compatibility in inter-process and normal ros2 nodes. In compatibility mode, the acceleration feature is not available.

NITROS is NVIDIA’s implementation of type adaption and negotiation.
from https://nvidia-isaac-ros.github.io/concepts/nitros/index.html#motivation

The root cause of the performance improvement from isaac_ros_nitros is TYPE ADAPTATION https://ros.org/reps/rep-2007.html and TYPE NEGOTIATION https://ros.org/reps/rep-2009.html . Which needs intra-communication.

more info:
https://www.openrobotics.org/blog/2022/5/24/ros-2-humble-hawksbill-release
https://developer.nvidia.com/blog/improve-perception-performance-for-ros-2-applications-with-nvidia-isaac-transport-for-ros/

@ZhenshengLee
Copy link

If you don't use the ros2 transport, you could use nvsci/cuda_ipc to get cuda sharedmemory.

more info:
pytorch/pytorch#137680

@ZhenshengLee
Copy link

@lix19937 Do you work in China? You could contact me with wechat to discuss: zhensheng_li

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants