Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup of https://github.com/ros2/rclcpp/pull/2683 #2714

Open
wants to merge 3 commits into
base: rolling
Choose a base branch
from

Conversation

jmachowinski
Copy link
Contributor

We did a quick fix of #2664 in order to get it merged fast in jazzy.

This is the proper fix for the issue reported in #2664

@jmachowinski
Copy link
Contributor Author

@mjcarroll @wjwwood @fujitatomoya @alsora
While doing the cleanup I found some other issues in the code. We should definitely backport the double adding of the notifier the jazzy....

@jmachowinski
Copy link
Contributor Author

jmachowinski commented Dec 18, 2024

There is still one issue going on, were I don't know if its a bug in the rmw or not.
The reason why I don't know if it is a bug or not, is that it is not clearly specified what
the behavior is if I add an entity twice to a waitset.

If you add a guard condition twice to a rcl_waitsets, the behavior is odd.
If the guard condition is triggered, only the first entry in the rcl_waitset is set, the second is nullptr.

The problem here, is that this breaks code like this :

  rcl_guard_condition_t * cond = &guard_condition_->get_rcl_guard_condition();

  size_t idx1;
  rcl_wait_set_add_guard_condition(&wait_set, cond, &idx1);

  size_t idx2;
  rcl_wait_set_add_guard_condition(&wait_set, cond, &idx2);

  guard_condition_->trigger();

  rcl_wait(&wait_set, 1);
  
  if(wait_set.guard_conditions[idx1] != nullptr)
  {
    // do something
  }

 // this entry will be a nullptr, even though it should not.
  if(wait_set.guard_conditions[idx2] != nullptr)
  {
    // do something
  }

@clalancette
Copy link
Contributor

If you add a guard condition twice to a rcl_waitsets, the behavior is odd.

Which RMW are you testing with? While the behavior should in theory be consistent across them, they all implement it fairly differently.

@jmachowinski
Copy link
Contributor Author

@clalancette I used the default, so this should have been fastDDS.
The big question here is, what is the wanted behavior ?
If we all agree, that the second guard condition, should not be null as well, I can come up with a test and patch....

@wjwwood
Copy link
Member

wjwwood commented Jan 10, 2025

The big question here is, what is the wanted behavior ?
If we all agree, that the second guard condition, should not be null as well, I can come up with a test and patch....

We discussed this in the ROS 2 PMC meeting as well as in the latest client library working group meeting, but for everyone else's benefit the conclusion was that adding the same entity (guard condition, timer, sub, etc...) to a wait set should be an error and we shouldn't require the rmw layer to allow it and leave both non-null. I believe the agreement was to make the check in the functions that add the entities to the wait set, but we may also try to avoid this in the calling code in rclcpp.

Janosch Machowinski added 2 commits January 17, 2025 18:08
…ditions

A waitable should make sure, that all entities it adds to the waitset
are alive during the wait. The ExecutorNotifyWaitable did not do this,
leading to crashes, if a callback group was dropped while waiting.
This commit changes this behavior, by holding shared pointers to the
used guard condition.
Also while at it, fixed a possible race, were a trigger could get lost.
Optimized the is_ready call by using the indices returned by rcl.

Signed-off-by: Janosch Machowinski <[email protected]>
@jmachowinski
Copy link
Contributor Author

Pulls: #2714
Gist: https://gist.githubusercontent.com/jmachowinski/08c4a37a9860fd54b4b0476dde773ae1/raw/93b766847151d1c56a80851ffa36e5b830619230/ros2.repos
BUILD args:
TEST args:
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/15075

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

The test was adding the same guard condition twice. This not allowed.

Signed-off-by: Janosch Machowinski <[email protected]>
@jmachowinski
Copy link
Contributor Author

@mjcarroll can you have a look at 5eaf3dc ?
This changes the test, in a way, that it does not add the same waitable twice to the waitset, but I am not sure what the test was supposed to test in the first place, and if the test now still tests it...

@mjcarroll
Copy link
Member

I had to rebuild a little bit of context here around this test. I think that the original intention was to see that the guard_condition associated with the node's callback group is only fired in the case that the contents of the callback group changed. This is why we spin, see that the execute happens, and then spin again to verify that it doesn't happen.

I think that there could be other ways to test that this happens without relying on adding the guard condition to the waitset twice.

@mjcarroll
Copy link
Member

Aside from that, this looks good. If we need to come up with another way to test, then that's fine.

@mjcarroll mjcarroll self-requested a review January 23, 2025 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants