Clean up error handling in many rcl{_action,_lifecycle} codepaths #1202

clalancette · 2024-12-24T23:28:27Z

The original impetus from this PR came from running the rcl tests, and seeing output like the following on the console:

>>> [rcutils|error_handling.c:108] rcutils_set_error_state()
This error state is being overwritten:

  'failed to reallocate memory for new transitions, at /home/ubuntu/rolling_ws/src/ros2/rcl/rcl_lifecycle/src/transition_map.c:152, at /home/ubuntu/rolling_ws/src/ros2/rcl/rcl_lifecycle/src/default_state_machine.c:725'

with this new error message:

  'failed to reallocate memory for new transitions on state, at /home/ubuntu/rolling_ws/src/ros2/rcl/rcl_lifecycle/src/transition_map.c:169'

rcutils_reset_error() should be called after error handling to avoid this.
<<<

There are 2 main reasons why this can happen:

The test itself is forcing an error condition, but then forgetting to call rcl_reset_error() before trying another condition.
An error happens in a "lower-layer" method (like rcutils), which sets an error message, and then the rcl* layer also tries to set the error message.

This PR corrects almost all instances of this occurence in the rcl libraries, with the exception of test_arguments, which has quite a few of these and will require a separate PR. Not only will this quite down many of the tests, it also fixes some bugs in error handling in this code.

Instead of waiting 250ms between setting up 10 goals (for at least 2.5 seconds), just wait 100ms which reduces this to 1 second. Signed-off-by: Chris Lalancette <[email protected]>

Signed-off-by: Chris Lalancette <[email protected]>

That is, if rcutils_hash_map_set() fails, it sets its own error, so overriding it with our own will cause a warning to print. Make sure to clear it before setting our own. Signed-off-by: Chris Lalancette <[email protected]>

This avoids a warning on cleanup in rcl_timer_init2. Signed-off-by: Chris Lalancette <[email protected]>

Otherwise, in a failure situation we set the error but we actually return RCL_RET_OK to the upper layers, which is odd. Signed-off-by: Chris Lalancette <[email protected]>

This generated code was translating an RCL error to an RCL error, which doesn't make much sense. Just remove the duplicate code. Signed-off-by: Chris Lalancette <[email protected]>

Rather than starting it enabled, and then immediately canceling it. Signed-off-by: Chris Lalancette <[email protected]>

It already sets the error, so rcl_action_server_goal_exists() should not set it again. Signed-off-by: Chris Lalancette <[email protected]>

That way we avoid an ugly warning in the error paths. Signed-off-by: Chris Lalancette <[email protected]>

That way when we go to cleanup in the "fail" case, the options actually exist and are valid. This avoids an ugly warning during cleanup. Signed-off-by: Chris Lalancette <[email protected]>

This makes it match the generated code for the action_client. Signed-off-by: Chris Lalancette <[email protected]>

That way subsequent failures won't print out ugly error strings. Signed-off-by: Chris Lalancette <[email protected]>

That is, if rcl_lexer_lookahead2_expect() returns an error, we should pass that along to higher layers rather than just ignoring it. Signed-off-by: Chris Lalancette <[email protected]>

It leads to ugly warnings. Signed-off-by: Chris Lalancette <[email protected]>

Signed-off-by: Chris Lalancette <[email protected]>

Otherwise we get a warning about overwriting the error from rcutils_hash_map_init. Signed-off-by: Chris Lalancette <[email protected]>

Only when rcl_context_is_valid doesn't set the error. Signed-off-by: Chris Lalancette <[email protected]>

It already sets the error in the failure case. Signed-off-by: Chris Lalancette <[email protected]>

That's because some of the RMW implementations may not support this feature, and thus set errors. Signed-off-by: Chris Lalancette <[email protected]>

That way we can set more useful errors for the upper layers. Signed-off-by: Chris Lalancette <[email protected]>

In particular, make sure to not overwrite errors as we get into error-handling paths, which should clean up warnings we get. Signed-off-by: Chris Lalancette <[email protected]>

That way we won't get ugly "overwritten" warnings on subsequent tests. Signed-off-by: Chris Lalancette <[email protected]>

clalancette · 2024-12-24T23:28:55Z

Pulls: #1202
Gist: https://gist.githubusercontent.com/clalancette/e3f7e4a8739d5a2fe4825922491b9561/raw/01d769156a1c9b934554eac77efc9aa4fb529cff/ros2.repos
BUILD args:
TEST args:
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14995

Linux
Linux-aarch64
Linux-rhel
Windows

fujitatomoya

lgtm, but a couple of comments.

rcl/src/rcl/publisher.c

rcl/src/rcl/subscription.c

clalancette added 22 commits December 20, 2024 13:42

Shorten the delay in test_action_server setup.

2260193

Instead of waiting 250ms between setting up 10 goals (for at least 2.5 seconds), just wait 100ms which reduces this to 1 second. Signed-off-by: Chris Lalancette <[email protected]>

Small style cleanups in test_action_server.cpp

1bf2373

Signed-off-by: Chris Lalancette <[email protected]>

Reset the error in rcl_node_type_cache_register_type().

09d26e9

That is, if rcutils_hash_map_set() fails, it sets its own error, so overriding it with our own will cause a warning to print. Make sure to clear it before setting our own. Signed-off-by: Chris Lalancette <[email protected]>

Only unregister a clock jump callback if we have installed it.

33200b7

This avoids a warning on cleanup in rcl_timer_init2. Signed-off-by: Chris Lalancette <[email protected]>

Record the return value from rcl_node_type_cache_register_type.

aa5abfa

Otherwise, in a failure situation we set the error but we actually return RCL_RET_OK to the upper layers, which is odd. Signed-off-by: Chris Lalancette <[email protected]>

Get rid of completely unnecessary return value translation.

344b435

This generated code was translating an RCL error to an RCL error, which doesn't make much sense. Just remove the duplicate code. Signed-off-by: Chris Lalancette <[email protected]>

Use the rcl_timer_init2 functionality to start the timer disabled.

25ede71

Rather than starting it enabled, and then immediately canceling it. Signed-off-by: Chris Lalancette <[email protected]>

Don't overwrite the error from rcl_action_goal_handle_get_info()

3b4be6d

It already sets the error, so rcl_action_server_goal_exists() should not set it again. Signed-off-by: Chris Lalancette <[email protected]>

Reset errors before setting new ones when checking action validity

7d8c0cd

That way we avoid an ugly warning in the error paths. Signed-off-by: Chris Lalancette <[email protected]>

Move the copying of the options earlier in rcl_subscription_init.

c1fe643

That way when we go to cleanup in the "fail" case, the options actually exist and are valid. This avoids an ugly warning during cleanup. Signed-off-by: Chris Lalancette <[email protected]>

Make sure to set the error on failure of rcl_action_get_##_service_name

52a0c77

This makes it match the generated code for the action_client. Signed-off-by: Chris Lalancette <[email protected]>

Reset the errors during RCUTILS_FAULT_INJECTION testing.

6d81f9c

That way subsequent failures won't print out ugly error strings. Signed-off-by: Chris Lalancette <[email protected]>

Make sure to return errors in _rcl_parse_resource_match .

d46b8f4

That is, if rcl_lexer_lookahead2_expect() returns an error, we should pass that along to higher layers rather than just ignoring it. Signed-off-by: Chris Lalancette <[email protected]>

Don't overwrite error by rcl_validate_enclave_name.

9ed7a91

It leads to ugly warnings. Signed-off-by: Chris Lalancette <[email protected]>

Add acomment that rmw_validate_namespace_with_size sets the error

89d072e

Signed-off-by: Chris Lalancette <[email protected]>

Make sure to reset error in rcl_node_type_cache_init.

96ef397

Otherwise we get a warning about overwriting the error from rcutils_hash_map_init. Signed-off-by: Chris Lalancette <[email protected]>

Conditionally set error message in rcl_publisher_is_valid.

4d3fcaa

Only when rcl_context_is_valid doesn't set the error. Signed-off-by: Chris Lalancette <[email protected]>

Don't overwrite error from rcl_node_get_logger_name.

19a61c9

It already sets the error in the failure case. Signed-off-by: Chris Lalancette <[email protected]>

Make sure to reset errors when testing network flow endpoints.

97124e3

That's because some of the RMW implementations may not support this feature, and thus set errors. Signed-off-by: Chris Lalancette <[email protected]>

Make sure to reset errors in rcl_expand_topic_name.

2d39a3e

That way we can set more useful errors for the upper layers. Signed-off-by: Chris Lalancette <[email protected]>

Cleanup wait.c error handling.

7fb850b

In particular, make sure to not overwrite errors as we get into error-handling paths, which should clean up warnings we get. Signed-off-by: Chris Lalancette <[email protected]>

Make sure to reset errors in rcl_lifecycle tests.

8c3817f

That way we won't get ugly "overwritten" warnings on subsequent tests. Signed-off-by: Chris Lalancette <[email protected]>

fujitatomoya reviewed Dec 25, 2024

View reviewed changes

rcl/src/rcl/publisher.c Show resolved Hide resolved

rcl/src/rcl/subscription.c Show resolved Hide resolved

fujitatomoya approved these changes Dec 26, 2024

View reviewed changes

fujitatomoya merged commit 39a5ba8 into rolling Dec 26, 2024
3 checks passed

clalancette deleted the clalancette/rcl-action-test-action-server-cleanup branch December 26, 2024 16:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up error handling in many rcl{_action,_lifecycle} codepaths #1202

Clean up error handling in many rcl{_action,_lifecycle} codepaths #1202

clalancette commented Dec 24, 2024

clalancette commented Dec 24, 2024

fujitatomoya left a comment

Clean up error handling in many rcl{_action,_lifecycle} codepaths #1202

Clean up error handling in many rcl{_action,_lifecycle} codepaths #1202

Conversation

clalancette commented Dec 24, 2024

clalancette commented Dec 24, 2024

fujitatomoya left a comment

Choose a reason for hiding this comment