Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce the advanced publisher and subscriber #368

Open
wants to merge 8 commits into
base: rolling
Choose a base branch
from

Conversation

YuanYuYuan
Copy link
Contributor

With this PR, we replace the QueryingSubscriber and PublicationCache (+ a normal publisher)
with the AdvancedSubscriber and AdvancedPublisher.

NOTE: A deadlock is found when undeclaring advanced subscribers. It will be fixed once eclipse-zenoh/zenoh#1685 is merged and synced with zenoh-c.

@clalancette clalancette marked this pull request as draft January 2, 2025 16:09
@YuanYuYuan YuanYuYuan force-pushed the feat/advanced-pub-sub branch from e0c490f to 1f9357f Compare January 7, 2025 07:57
@YuanYuYuan
Copy link
Contributor Author

Update: The deadlock fix has been merged. Mark this PR as ready for review.

@YuanYuYuan YuanYuYuan marked this pull request as ready for review January 7, 2025 07:58
@YuanYuYuan
Copy link
Contributor Author

YuanYuYuan commented Jan 8, 2025

Test Failure

ros2.repos

  • rclcpp_action
    • 1 - test_client
  • rclc_parameter
    • 1 - rclc_parameter_test
  • test_rclcpp.log
    • 74 - test_n_nodes__rmw_zenoh_cpp (Failed) <--- Flaky

No new failure is found.

@YuanYuYuan
Copy link
Contributor Author

Update: bump up zenoh-cpp version to include the memory leak fix eclipse-zenoh/zenoh-cpp#363.

@YuanYuYuan
Copy link
Contributor Author

Will rebase once #405 is merged.

@YuanYuYuan YuanYuYuan force-pushed the feat/advanced-pub-sub branch from 08b0906 to b76ee4c Compare January 11, 2025 17:05
Copy link
Member

@Yadunund Yadunund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see feedback inline.
I have the following additional questions

  1. It looks like this PR does not fix [Test Failure] rclcpp_action / test_client #326. But from my experience, it changes the behavior from deterministically failing to failing sometimes. ie, the test is now flaky. Do you observe the same?
  2. With AdvancedSubscribers do we no longer need to call Get() when we detect an AdvancedPublisher with publication cache? This was needed to fix Subscribers sometimes not getting messages from topics using durability transient_local #263.

@@ -22,7 +22,7 @@ set(ZENOHC_CARGO_FLAGS "--no-default-features$<SEMICOLON>--features=shared-memor
# - https://github.com/eclipse-zenoh/zenoh/pull/1696 (Fix SHM Garbage Collection (GC) policy)
ament_vendor(zenoh_c_vendor
VCS_URL https://github.com/eclipse-zenoh/zenoh-c.git
VCS_VERSION 61d8fcc136ce4ed36d921a32244da4f3d81a6097
VCS_VERSION 85ca060fa4037239ca4102a3a61f96626cc6b434
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace the comment above and below to include changes in this commit hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This was out of sync. We don't need to change it in this PR.

@@ -189,15 +189,6 @@ class GraphCache final
/// Returns true if the entity is a publisher or client. False otherwise.
static bool is_entity_pub(const liveliness::Entity & entity);

void set_querying_subscriber_callback(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll also need to delete querying_subs_cbs_ and update graph_cache.cpp to not iterate over querying_subs_cbs_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 9cc776b

zenoh::KeyExpr pub_ke(entity->topic_info()->topic_keyexpr_);
auto adv_pub_opts = zenoh::ext::SessionExt::AdvancedPublisherOptions::create_default();
adv_pub_opts.publisher_detection = true;
adv_pub_opts.sample_miss_detection = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the documentation for this option, it sounds like this should be set true only if adapted_qos_profile.reliability == RMW_QOS_POLICY_RELIABILITY_RELIABLE? Could you clarify how this differs from setting pub_opts.reliability = Z_RELIABILITY_RELIABLE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reliability option can still be set by the normal publisher option. See here. The reliability in zenoh is based on TCP reliability per see. The sample_miss_detection here is a custom data retransmission feature cached on the publisher side in the sense of transient_local duriability.

Comment on lines -118 to -125
pub_cache_opts.history = adapted_qos_profile.depth;
pub_cache_opts.queryable_complete = true;

std::string queryable_prefix = entity->zid();
pub_cache_opts.queryable_prefix = zenoh::KeyExpr(queryable_prefix);

pub_cache = session->ext().declare_publication_cache(
pub_ke, std::move(pub_cache_opts), &result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We previously needed to do this to fix #263. Can you confirm that we no longer need to explicitly call QueryingSubscriber::get() when we discover an AdvancedPublisher in the graph?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should no longer need this hack. Otherwise, it's a bug in zenoh.

adv_sub_opts.history->detect_late_publishers = true;
adv_sub_opts.history->max_samples = entity_->topic_info()->qos_.depth;
adv_sub_opts.recovery = zenoh::ext::SessionExt::AdvancedSubscriberOptions::RecoveryOptions{};
adv_sub_opts.recovery->periodic_queries_period_ms = 1000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again looking at the documentation, it sounds like recovery options should only be enabled if reliability is RELIABLE. Could you clarify on the behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, answered in #368 (comment).

@YuanYuYuan YuanYuYuan force-pushed the feat/advanced-pub-sub branch from b76ee4c to 9cc776b Compare January 13, 2025 08:28
@YuanYuYuan
Copy link
Contributor Author

Please see feedback inline. I have the following additional questions

1. It looks like this PR does not fix [[Test Failure] rclcpp_action / test_client #326](https://github.com/ros2/rmw_zenoh/issues/326). But from my experience, it changes the behavior from deterministically failing to failing sometimes. ie, the test is now flaky. Do you observe the same?

2. With `AdvancedSubscribers` do we no longer need to call `Get()`  when we detect an `AdvancedPublisher` with publication cache? This was needed to fix [Subscribers sometimes not getting messages from topics using durability transient_local #263](https://github.com/ros2/rmw_zenoh/issues/263).

Here’s an improved version of your paragraph with clearer phrasing and a more professional tone:

  1. This PR does not resolve the issue reported in [Test Failure] rclcpp_action / test_client #326. Based on the investigation, the root cause lies in the inappropriate synchronization assumptions in the test. We have also observed similar failures (slightly more often) when using the previous QueryingSubscriber implementation. To address this issue, we can either make the local publisher-to-subscriber action a blocking call or modify the test itself.

  2. We expect the issue will be resolved. So no longer need the hack 😄

@YuanYuYuan
Copy link
Contributor Author

YuanYuYuan commented Jan 13, 2025

Redid the test with the repos. No new failures were found.

  • rclcpp_action/test_client
  • rclc_parameter/rclc_parameter_test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants