-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework tf listener wrapper #35
Rework tf listener wrapper #35
Conversation
What are your thoughts about a flag (defaulting to off) for a process-wide tf listener? One that a user could access similar to the process-wide node or executor? |
Sounds useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only looked at tf listener wrapper
if not wait_for_frames: | ||
timeout_sec = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait_for_frames
is about whether a path exists from frame a to frame b. if the path doesn't exist, it should return immediately if wait_for_frames
is true. however if the path does exist then it should wait timeout
for the transform to come in at the requested time. I think the way this is written it will return immediately if wait_for_frames
is true and transform_time
is beyond what's in the tf buffer even if the path exists from frame a to frame b. this is why there was a loop here previously instead of using the built in functions.
this is useful for code that may be adding frames later in the processing and therefore if the path doesn't exist, waiting isn't going to help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the path doesn't exist, it should return immediately if wait_for_frames is true.
Hmm, that is not what the original implementation does.
I think the way this is written it will return immediately if wait_for_frames is true and transform_time is beyond what's in the tf buffer even if the path exists from frame a to frame b.
Hmm, it shouldn't. Didn't we test this a couple weeks ago? In fact, it's because it waits that the test_future_transform_wait
test isn't flaky due to races.
this is useful for code that may be adding frames later in the processing and therefore if the path doesn't exist, waiting isn't going to help.
I see. It's a bit of an odd use case, as this is a problem you would run into only if you had a single-threaded, synchronous program where statements are not ordered properly. That said, we can make it behave that way. Are you sure you want it to behave that way? It will expose us to races when using lookups not guarded by waits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow ok I did not write this correctly:
If wait_for_frames
is False then this should return immediately if a path from frame_a to frame_b does not exist. Otherwise it should wait for the transform to exist until the timeout (or forever if the timeout is None), which I think is the normal thing the underlying TF stuff does.
We could also potentially drop this argument as I'm not really sure anyone is using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, FTR I got the wait_for_frames
semantics wrong the first time. It is not redundant with timeout_sec
, it changes the scope of the wait to include (and exclude) lookups on frames for which a path does not exist yet. Fixed in 25bc002. Thanks for the rubber ducking @jbarry-bdai !
Signed-off-by: Michel Hidalgo <[email protected]>
Signed-off-by: Michel Hidalgo <[email protected]>
8c49fc7
to
561ada3
Compare
Signed-off-by: Michel Hidalgo <[email protected]>
@@ -101,7 +101,7 @@ def test_future_transform_extrapolation_exception( | |||
time.sleep(0.2) | |||
timestamp = ros.node.get_clock().now() | |||
with pytest.raises(ExtrapolationException): | |||
tf_listener.lookup_a_tform_b(FRAME_ID, CHILD_FRAME_ID, timestamp) | |||
tf_listener.lookup_a_tform_b(FRAME_ID, CHILD_FRAME_ID, timestamp, timeout_sec=0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbarry-bdai note this is necessary now as None
for a timeout is always treated as infinity. Original code aliased None
with 0 when wait_for_frames
was False
.
cache_time_s: Optional[int] = None, | ||
) -> None: | ||
class TFListenerWrapper: | ||
def __init__(self, node: Optional[Node] = None, cache_time_s: Optional[float] = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happeneed to the wait for init part? I do know people were using that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, quite a bit in fact (and not always correctly e.g. here we pass three frames). It is a problematic pattern, though. When the wrapper is bound to a node that is spinning, all is fine with waiting on a constructor. When the wrapper gets instantiated from within the constructor of the node it is bound to, it hangs, because the node can't be put to spin until it's fully constructed.
Eliminating that pattern was the original motivation behind this PR. You can still wait, but you have do to so explicitly after construction. Migrated code already accounts for this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Could we add a comment here explaining how to do that? The reason we added this was because people were constantly getting exceptions thrown from trying to use the listener too soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in fc71daa. I added documentation, including basic examples, and it will also now generate warnings if you are about to shoot yourself in the foot. PTAL!
Returns: | ||
The transform frame_a_t_frame_b at the time specified. | ||
Raises: | ||
All the possible TransformExceptions. | ||
""" | ||
if not wait_for_frames: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic now looks correct to me
Returns: | ||
The timestamp from the latest recorded transform frame_a_t_frame_b | ||
Raises: | ||
All the possible TransformExceptions. | ||
""" | ||
transform = self._internal_lookup_a_tform_b(frame_a, frame_b, None, timeout, wait_for_frames) | ||
transform = self.lookup_a_tform_b(frame_a, frame_b, timeout_sec, wait_for_frames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saw self._tf_buffer.get_latest_common_time(frame_a, frame_b)
above. Is there a difference in the result from doing that verses this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version can wait for frames to come into existence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bug on line 119 right there 👀 Fixed it and added a test (we didn't have any for this method) in 49ee392.
Signed-off-by: Michel Hidalgo <[email protected]>
Signed-off-by: Michel Hidalgo <[email protected]>
Signed-off-by: Michel Hidalgo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine to me, but I would want a second opinion on my concern about possible race conditions.
|
||
future.add_done_callback(done_callback) | ||
return event.wait(timeout=timeout_sec) | ||
context.on_shutdown(event.set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the context has multiple events set and shuts down? Is there a possibility where two events can be waiting on each other and deadlock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the context has multiple events, all of them will be set. Events do not wait on each other. Callees wait on events. Callees could run into deadlock if, for example, they have to complete each other futures or you somehow manage to wait on a future within an on context shutdown handler (which I'm not even sure if the Context API will allow).
So I would say deadlocks are a non-issue here inasmuch as it won't happen by accident unless code is flawed.
I'm happy with the TF listener changes but haven't looked at the other files. |
This PR reworks
TFListenerWrapper
and futures' APIs to prevent the user from running into deadlocks. Builds on #32.