-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(serial-reopening): bug that prevented the serial link to be reused after closing a session #1624
Conversation
…opened over a serial link Signed-off-by: Gabriele Baldoni <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gabriele, thank you so much for fixing this bug! I have tested the updated version with zenoh-pico/RP2350/Arduino and confirm there is no manual re-start of the zenoh router required for the microprocessor to start a new session! Now we can continue porting the zenoh-pico library to the STM32/ThreadX platform.
However, according to the practical tests, the current fix still does not make the serial connection as efficient as it should be.
Let me please explain.
The robot is connected to the zenoh router normally, aiming to access ROS2's turtle topics:
2024-12-02T18:00:36.865912Z DEBUG acc-0 ThreadId(12) zenoh_transport::unicast::manager: New transport opened between cedbdf0bcb5dfeffc67b643ca0f80f07 and d9fcf12b3fffa768d878b4042946cf2d - whatami: client, sn resolution: U32, initial sn: 161578089, qos: false, multilink: false, lowlatency: false
2024-12-02T18:00:36.865928Z DEBUG acc-0 ThreadId(12) zenoh_transport::unicast::establishment::accept: New transport link accepted from d9fcf12b3fffa768d878b4042946cf2d to cedbdf0bcb5dfeffc67b643ca0f80f07: TransportLinkUnicast { link: Link { src: serial//dev/ttyACM0, dst: serial/f33dcabb-919a-47df-b7a2-d1b0033a981e, mtu: 1500, is_reliable: false, is_streamed: false }, config: TransportLinkUnicastConfig { direction: Inbound, batch: BatchConfig { mtu: 1500, is_streamed: false, is_compression: false }, priorities: None, reliability: None } }
2024-12-02T18:00:36.873634Z DEBUG rx-1 ThreadId(15) zenoh::net::routing::dispatcher::resource: Register resource turtle1/cmd_vel
2024-12-02T18:00:36.873740Z DEBUG rx-1 ThreadId(15) zenoh::net::routing::dispatcher::pubsub: Face{5, d9fcf12b3fffa768d878b4042946cf2d} Declare subscriber 1 (turtle1/cmd_vel)
Everything goes in a normal way and the robot is able to receive data from the teleop program.
Then, once the microcontroller is restarted, it immediately tries connecting to the zenoh router, over and over again, so to get connected as soon as possible. Is not it a normal behavior? I think, yes. For many reasons, including safety, we should let the robot establish connection to the router as soon as possible. However, currently we get an infinite loop:
2024-12-02T18:00:42.229102Z DEBUG rx-1 ThreadId(15) zenoh_transport::unicast::universal::rx: Transport: d9fcf12b3fffa768d878b4042946cf2d. Message handling not implemented: TransportMessage { body: InitSyn(InitSyn { version: 9, whatami: Client, zid: d9fcf12b3fffa768d878b4042946cf2d, resolution: Resolution(10), batch_size: 2048, ext_qos: None, ext_qos_link: None, ext_auth: None, ext_mlink: None, ext_lowlatency: None, ext_compression: None }) }
2024-12-02T18:00:45.466414Z DEBUG rx-1 ThreadId(15) zenoh_transport::unicast::universal::rx: Transport: d9fcf12b3fffa768d878b4042946cf2d. Message handling not implemented: TransportMessage { body: InitSyn(InitSyn { version: 9, whatami: Client, zid: d9fcf12b3fffa768d878b4042946cf2d, resolution: Resolution(10), batch_size: 2048, ext_qos: None, ext_qos_link: None, ext_auth: None, ext_mlink: None, ext_lowlatency: None, ext_compression: None }) }
2024-12-02T18:00:48.550930Z DEBUG rx-1 ThreadId(15) zenoh_transport::unicast::universal::rx: Transport: d9fcf12b3fffa768d878b4042946cf2d. Message handling not implemented: TransportMessage { body: InitSyn(InitSyn { version: 9, whatami: Client, zid: d9fcf12b3fffa768d878b4042946cf2d, resolution: Resolution(10), batch_size: 2048, ext_qos: None, ext_qos_link: None, ext_auth: None, ext_mlink: None, ext_lowlatency: None, ext_compression: None }) }
2024-12-02T18:00:50.840401Z DEBUG rx-1 ThreadId(15) zenoh_transport::unicast::universal::rx: Transport: d9fcf12b3fffa768d878b4042946cf2d. Message handling not implemented: TransportMessage { body: InitSyn(InitSyn { version: 9, whatami: Client, zid: d9fcf12b3fffa768d878b4042946cf2d, resolution: Resolution(10), batch_size: 2048, ext_qos: None, ext_qos_link: None, ext_auth: None, ext_mlink: None, ext_lowlatency: None, ext_compression: None }) }
And so on, without any luck for the robot to reconnect to the zenoh router, other than to wait 20 seconds (or more, if we are trying to connect with increasing intervals).
This is very impractical.
Question: is it possible to start a new connection, say, immediately when the InitSyn message is received?
Thanks again!
Hi @ur8us, what you are experiencing is the session timeout, the microcontroller closes its serial without closing the sesssion. This has nothing to do with the serial link itself, its a general Zenoh configuration that applies to all links. |
Said that, we may still want to support some kind of serial-level (i.e. below zenoh) connection reset to support the use case. |
One other question - I note that it can only maintain one Serial connection. However its not clear is that one serial connection per a) Serial port available on the board (e.g. one computer has 3 full duplex serial ports like the raspberry pi you can maintain 3 connections) For sure we are going to need to operate with all the serial ports on our Raspberry Pi setup and on other future set ups that we are working on. Its very common to have many low speed peripherals and one of our planned system needs 6 serial ports. |
It's option a). It's one serial connection per serial port.
|
Hi! Let me please struggle for microcontroller's rights :-) As my experiment has discovered, the current fix does not reset the session by timeout, as desired. Instead, a deadlock occurs:
Workarounds: force the microcontroller send a "close session" message before resetting -or- wait 20 seconds or more between trying to re-connect to the zenoh router. The first one is impossible, the second one is impractical. The right way, as I see it: by receiving InitSyn, the zenoh router should immediately drop the old session to be prepared for the new one. |
Agree. But as I said in my previous comment, we would need to add some kind of serial-level (i.e. below zenoh) connection reset to support the use case. So, the right way is handling it is at serial level because it's a serial related issue. |
Signed-off-by: Gabriele Baldoni <[email protected]>
Fixed the issue with the timeout Changes from the PR on z-serial needs to be ported to zenoh pico for interoperability purposes: @sashacmc can you do this? |
Signed-off-by: Gabriele Baldoni <[email protected]>
Signed-off-by: Gabriele Baldoni <[email protected]>
Signed-off-by: Gabriele Baldoni <[email protected]>
Signed-off-by: Gabriele Baldoni <[email protected]>
Signed-off-by: Gabriele Baldoni <[email protected]>
Signed-off-by: Gabriele Baldoni <[email protected]>
Signed-off-by: Gabriele Baldoni <[email protected]>
Signed-off-by: Gabriele Baldoni <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with Pico in:
eclipse-zenoh/zenoh-pico#823
Signed-off-by: Gabriele Baldoni <[email protected]>
The CI failures are because z-serial is not yet released. |
Signed-off-by: Gabriele Baldoni <[email protected]>
Signed-off-by: Gabriele Baldoni <[email protected]>
Signed-off-by: Gabriele Baldoni <[email protected]>
Fixing a bug that prevented the serial link to be reused after closing a session.
Serial link remains 1-to-1 and exclusive, meaning that if a session is opened on a serial device no other sessions can use it.
once the first session closes, then another session can connect to the serial listener.