-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timer callbacks can be delayed when using simulation time #2535
Comments
I was thinking about this use case and I believe relying on the processing order in the executors is fragile. Perhaps that's something we could ensure more in the future and therefore make this more reliable, but there are just so many implementation details here that must not change to keep this working, e.g. if the clock subscription was ever recreated sometime after start up, or some other seemingly innocuous change related to the clock topic. Instead, I wonder if a customized executor that waits for a ROS time update before each wait-execute loop with the clock topic thread enabled might result in a better solution. That might even make sense as an executor option for the standard executors. The reason I think so is that if you ever need to sleep_until or sleep_for in a callback you'll get into a dead lock without the clock topic thread, and I'm also worried about the executor waking up, handling the clock subscription first (as you desire in your above apporach), but then there are hundreds of clock messages which all get processed first and only after that do you start processing new messages from other subscriptions, all of which will "appear" to occur at the same time according to ros time (because while handling these the clock messages that are streaming in are not being handled). Or some other complex interaction with the clock topic. |
I would agree with this statement for the standard executors. However, I think executing the timer callbacks in response to receipt of the
A potential concern with this is how only one message per subscription is processed per spin cycle. If there are multiple publishers to a topic and multiple messages arrive on that topic between
This is a valid concern. My thought was that this shortcut wouldn't work if the clock thread was explicitly disabled because having a clock thread inherently ensures that there will always be a race condition between timer and subscription callbacks. To your point that the standard executors have no guarantee of ordering, it's seeming like the |
That's true, but if you made a custom executor then you could change that behavior. Though it's not as easy as it could be to do that. For example, for each subscription you can call take until it fails to take something if you want to "drain" a subscription before waiting again. But that has all kinds of potential issues, e.g. if you take longer to process a message than the period between messages then you'll end up never waiting again for the next clock message, among other subtle issues. The style of the
There are a bunch of ways I can imagine to manage these issues for a specific usecase, but you should be wary of just plugging in the
Depends on what you mean by "guarantee of ordering". Order of what? And with respect to what? (receipt time vs registration order vs something else, etc.) Also, just to be clear, the However, with a custom version of the The One more thing to be wary of is the issue of stale events, which can occur if your history cache gets full and you have to replace old data and/or if you have lifespan enabled such that messages can expire after being received and thus removed from the history cache. Consider an example with a subscription that uses KEEP_LAST with depth=10 (typical default setting), and that between spins you receive a burst of 11 messages but then no more (to keep it simple). That means you'll have 11 events in the event queue but only 10 messages in the history cache. Then lets say you get a new clock message (on the clock thread) and decide its time to execute some of the messages for those events, and you use the timestamps of the events to know that the first 10 messages of the 11 events are in the period you should execute, so you execute 10 times. Again, assuming no new messages have come in, you have 1 event in the queue and 0 messages in the history cache, so when you go to execute the next time period, you try to take a message for the one event, but there isn't one, and worse you've execute a message from this time period in the last one mistakenly. Now that's an overly contrived example, but I think something like that could come up in a real life scenario and only gets more complicated when you have messages coming in while your handling messages etc... So just be care to not assume that messages you are taking are the ones you expected based on the events. This is a complexity that the "wait set based" executors avoid by not trying to handle the situation at all, and instead only tells you that a subscription was ready in the past and that if you try to take a message from it then you're likely to get one. It specifically does not try to guarantee there will be a message (we gracefully handle a failed take, see: rclcpp/rclcpp/src/rclcpp/subscription_base.cpp Lines 213 to 226 in 8230d15
All that being said, I hope you figure out a way to make this work for your usecase. Any hopefully some of the above rambling can help you avoid pitfalls. 🙃 |
@wjwwood - thanks for taking the time to provide your thoughts. With much digging into ROS and
The issue I'm running into with creating a custom executor that will provide any form of guaranteed ordering is that it currently can't work with simulation time because the time stamps in One option is to require that every message have a header with a time stamp, but I'm not sure that it would be practical to handle that in an executor for several reasons. First, the executor deals with type-erased data, making it difficult for it to introspect the contents of the message. Perhaps a custom Our current thinking is to build something between the subscription and the application, in a similar manner to
This is actually something I've been wondering about with the
Since you mentioned this, it raises a question I have related to #1276. If understand correctly, |
This is a breakout of the issue described in #2532 (comment).
Timer processing when using simulation time is a two step process (e.g. requires two waits of the wait set), so it's possible that subscription callbacks can get called before the timer callback when using simulation time. As timers have higher priority than subscriptions (as documented in the Scheduling semantics), one may expect that timers driven via the
/clock
topic would also have that same higher-priority and expect that timer callbacks driven by the/clock
topic would execute before subscription callbacks for messages that were transmitted after the/clock
message. Because of the two-step process, this is not necessarily the case.A potential change to consider would be to have the clock jump handler registered by the timer call the timer callback directly instead of triggering the guard condition. This would only be applicable if the clock thread has been disabled or the default callback group has been configured to be reentrant (though that may be confusing when using the
SingleThreadedExecutor
, but would actually be consistent with theexecute_timers_separate_thread
option to theEventsExecutor
).It should be noted that due to the issue described in #2532, an implementation of the above suggestion would likely require that the executor have an explicit processing step for the
/clock
topic subscription before processing other subscriptions to ensure that the timers are executed before the other subscriptions, Prior to Jazzy, the/clock
subscription is always handled first because it was internally registered by theTimeSource
before any other application subscriptions.The text was updated successfully, but these errors were encountered: