-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
impl embedded_io_async::Read for UartRx<'_, Async> is very unreliable. #3144
Comments
Thanks, I'm in the process of revising this code, I'll see if I can improve this or not. |
If you could please try #3142, that would be great. A few observations I have with that PR:
|
Tested with 143a941 The test did not indicate any errors in the data, and I found no overflows, but instead it looks like it stopped writing to the UART after 627 seconds...
|
Just a quick update, I restarted the test and now it has run for >900 sec with no RX errors, but with quite a lot of data corruption, probably a consequence of FifoOverflowed. |
I'm not sure what I can do about that. If wifi blocks the reader tasks for a considerable amount of time, but the writer task is writing more data than can be buffered, data will be lost. Ideally you'd have hardware flow control to throttle the writer in this case. |
I might not have been clear enough, I have no problem loosing data, bat reading corrupted data caused by a buffer overflow in the UART without any indication is bad. That is, it seems like the with the patches for the UART, there is no longer any overflow errors reported. |
Thanks, 8ec3d29 works much better! Looks like there now is a mix, sometimes I get silent corruption and sometime I get FifoOverflowed. I will try to improve test-application to provide more context to the logs when an error is returned or when the data seems corrupt. |
re. silently missing data, you're using
The internal TX FIFO is 128 bytes long, so it's unlikely you actually send more data than that. Please use |
Oops, your are right, I just copied the write part from examples/src/bin/embassy_serial.rs. I have been trying to improve other things in the demo project, just trying to find new ways to torture the UART some more, but I will probably not have any results before the weekend. But we could close link this issue with #3142, and if/when I find more issues I can create a new issue for that part, as in any case your fixes seems to make a huge difference is stability. |
Sounds like a plan, thanks for the reproducer and the report, it has been very helpful :) |
Bug description
We have been trying to use the UART together with WIFI on esp32s3 using eps-hal and embassy-rs, but our feeling has for quite some time been that the UART implementation is very unreliable under load.
I have been trying to understand the root cause of the problem, and I am currently leaning towards timing issues related to the unusual implementation where most work is done outside the interrupt handler.
An example of problematic code in the driver:
The time it take to copy bytes, may actually be very substantial if the embassy task got interrupted by e.g. WIFI tasks. I have seen that read_async() may return more than 200 bytes, even if the HW buffer is configured to 128 bytes.
But the feeling is also that depending on some timings, neither timeout nor fifo full threshold interrupt will trigger in some cases.
To Reproduce
I have created a demo-project to allow reproduction of the problems at https://github.com/mattiasgronlund/esp32s3-uart-stability.
The demo-project configures all three UARTS and spawns four tasks to test UART stability, and is currently tested against current main branch at acbb983.
There is one task writing increasingly long sequences of bytes, each sequence starts with a byte with the value 0 and increments it by one for each byte.
Then there are three tasks using one UART each, reading bytes written by the first task. Each of them have different size for their RX fifo full threshold 64,32,16 bytes.
There is also an idle-task and two tasks placing some load on the system by starting and stopping the WIFI connection.
Expected behavior
I would expect to see UART errors to happen, but not as frequently.
I would NOT expect breaks in the byte sequence unless an UART error was returned.
Why do we get overflow here?
4.300775 INFO Read[UART_2]: len: 15 [0]=121
14.300797 INFO Read[UART_1]: len: 2 [0]=134
14.300819 INFO Read[UART_0]: len: 8 [0]=128
14.354181 INFO Idle
14.403278 ERROR RX Error[UART_2]: FifoOverflowed
14.403388 ERROR RX Error[UART_1]: FifoOverflowed
14.403452 ERROR RX Error[UART_0]: FifoOverflowed
14.403514 INFO Read[UART_1]: len: 1 [0]=128
Idle was prioritised read_async seems to have missed interrupts, as it failed to get re-scheduled even if the previous call returned very few bytes
Why is it reading so few bytes?
7.455290 INFO Read[UART_0]: len: 1 [0]=64
7.455310 INFO Read[UART_1]: len: 1 [0]=64
7.455331 INFO Read[UART_2]: len: 1 [0]=64
7.462510 INFO Read[UART_2]: len: 4 [0]=65
7.462532 INFO Read[UART_1]: len: 4 [0]=65
7.462554 INFO Read[UART_0]: len: 4 [0]=65
Why is there all of a sudden a long Idle period, and then garbage in the data?
30 INFO Read[UART_1]: len: 21 [0]=206
27.301362 INFO Read[UART_0]: len: 53 [0]=174
27.327795 INFO Idle
27.428930 INFO Idle
27.439505 INFO Read[UART_2]: len: 122 [0]=113
27.439520 ERROR RX[UART_2]: expected 233 got 113 at 0/122 iteration 136
27.439592 INFO Read[UART_1]: len: 123 [0]=107
27.439607 ERROR RX[UART_1]: expected 227 got 107 at 0/123 iteration 136
27.439691 INFO Read[UART_0]: len: 124 [0]=107
27.439706 ERROR RX[UART_0]: expected 227 got 107 at 0/124 iteration 136
27.444040 INFO Read[UART_0]: len: 50 [0]=103
27.444083 INFO Read[UART_1]: len: 51 [0]=102
Environment
Chip type: esp32s3 (revision v0.1)
Crystal frequency: 40 MHz
Flash size: 16MB
Features: WiFi, BLE
MAC address: 34:85:18:9b:55:60
The text was updated successfully, but these errors were encountered: