-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recv_into fix for esp32spi socket implementation #168
Conversation
After rerunning the CI, it looks like |
# If no bytes are waiting, raise an OSError for good measure. | ||
# Some implementations of recv_into do this on their own (like CPython), | ||
# but we should be prepared if one doesn't (looking at you esp23spi...) | ||
raise OSError(errno.ETIMEDOUT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaftonbladet Could you clarify why this is being raised here instead of within the ESP32SPI implementation on L1141? We already raise this exception on L1147.
Also - if you're intending to replace recv with recv_into, why is this segment of code modified instead of L1143?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "esp32spi implementation" is just a fallback if the socket object doesn't have recv_into, as old versions of esp32spi didn't have recv_into. Check is on L344. Since recv_into now exists in esp32spi, the regular implementation is used instead of the backwards compatible one. The regular one doesn't check that data was actually read - it relies on the callee raising an OSError. Compare that to the backwards compatible one which was made with esp32 in mind, it raises an OSError, because esp32spi doesn't.
My patch makes the behavior consistent in both implementations, we check ourselves that data was read, if not, throw an OSError as the callee should've.
if you're intending to replace recv with recv_into
recv is left as a fallback if the socket object doesn't have recv_into. I suggested dropping the recv/"esp impl" entirely as esp now has recv_into, but I didn't want to pull the trigger on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now I understand. Thank you for clarifying.
recv is left as a fallback if the socket object doesn't have recv_into. I suggested dropping the recv/"esp impl" entirely as esp now has recv_into, but I didn't want to pull the trigger on that.
Would you be interested in adding this to the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking of this a bit more, I am not convinced this is the way to go. Firstly, this adds yet another exception to the surface which the library consumers will have to deal with, at least in this code path. If I am not mistaken, the library does not catch OSError in the callers of _sock_exact_recv()
. Secondly, this seems to trump the library receive timeout. If the socket times out, the library should probably wait up to recv_timeout
; this ordering (recv_timeout > socket_timeout
) is actually enforced in __init()__
.
Lastly, I don't immediately see how exactly this fixes the the stuck loop() in #148.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank's @vladak for the input. Will wait for @aaftonbladet 's response before moving forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly, this adds yet another exception to the surface which the library consumers will have to deal with, at least in this code path. If I am not mistaken, the library does not catch OSError in the callers of
_socket_exact_recv()
.
All callers of _socket_exact_recv()
should expect it to throw. As for whether it's an OSError, from what I can see, we filter exceptions only in one place, and that's around L857 in _wait_for_msg()
. Depending on whether there's a custom exception class for timeouts in the socket module, we catch that one as opposed to the generic OSError.
With that, a bug could arise if the socket module has its own timeout class. We'd try to catch that one instead of the generic OSError. In comes my patch, where it's possible that an OSError is still thrown, which then wouldn't be caught. Maybe we can check that there's no custom timeout class before throwing the error? If there's a custom timeout class, it's likely the underlying socket will handle timeouts itself rendering my patch redundant anyhow.
And yes, it's possible library consumers would be exposed to a new type of exception. But who's this an issue for? If you call _socket_exact_recv
right now and you're not catching exceptions, you'll crash. If you catch only OSError, my patch is using the same exception. If you catch the socket module's timeout exception, you're on a platform where we, after implementing the check I proposed in the previous paragraph, aren't introducing any new behavior.
Secondly, this seems to trump the library receive timeout
The socket implementation is told to use the given receive timeout; search for settimeout
in the code. If there's an issue with the receive timeout not being properly set, it already exists and my code won't affect them. Though I must say it is strange. My esp32spi socket, despite the MQTT receive timeout being zero, still times out after one second. Maybe there's bug at play.
Lastly, I don't immediately see how exactly this fixes the the stuck loop() in #148.
Right, sorry for not clarifying. The issue lies in us going past L966 because no exception is thrown, and then eventually into the loop at L971. Then we're stuck in that loop for 60 seconds, which is consistent with user reports. To compensate for recv_into()
not throwing on esp32spi, we throw the exception ourselves which keeps us from getting stuck in the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When traversing the code submitted to various issues, there is often the following pattern:
try:
io.loop()
except (MMQTTException) as e:
print("MQTTException: \n", e)
time.sleep(300)
continue
so the OSError will cause trouble for these. Personally, I don't see a reason for the OSError exception to shoot through as it is implementation detail of the MQTT library.
Looking at the loop() implementation, it will cause a problem there too. The while cycle seems to be designed around the receive timeout, yet it is strange how the function deals with timeouts in general. It overrides the socket timeout, which I don't think it should do (it also avoids the socket timeout < receive timeout check). It cycles up to receive timeout and it seems to me that the intention is that the timeout argument is meant as the overall wait time, no matter what socket timeout is set to.
If socket timeout in CPython raises OSError, this will propagate via _wait_for_msg() to loop(), breaking this behavior.
I think that the timeout argument for loop() should override the receive timeout and the function should avoid mingling with the socket timeout.
Someone noticed that too: #68 (comment)
If we go with this change, I think the _wait_for_msg() needs to catch the OSError, to make both if branches uniform (return None on socket timeout):
Adafruit_CircuitPython_MiniMQTT/adafruit_minimqtt/adafruit_minimqtt.py
Lines 1027 to 1040 in 40b9096
# CPython socket module contains a timeout attribute | |
if hasattr(self._socket_pool, "timeout"): | |
try: | |
res = self._sock_exact_recv(1) | |
except self._socket_pool.timeout: | |
return None | |
else: # socketpool, esp32spi | |
try: | |
res = self._sock_exact_recv(1) | |
except OSError as error: | |
if error.errno in (errno.ETIMEDOUT, errno.EAGAIN): | |
# raised by a socket timeout if 0 bytes were present | |
return None | |
raise MMQTTException from error |
Before that change, |
@@ -1119,6 +1119,12 @@ def _sock_exact_recv(self, bufsize: int) -> bytearray: | |||
rc = bytearray(bufsize) | |||
mv = memoryview(rc) | |||
recv_len = self._sock.recv_into(rc, bufsize) | |||
if recv_len == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some other case than timeout that could lead to the 0 being returned from recv_into()
, regardless of the underlying implementation ? (ESP32, CPython, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if we are not sure about that, it might be prudent to augment the recv_len == 0
check to verify this is actually esp32spi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That depends on the underlying implementation, so I wouldn't know. We could add check that receive_timeout seconds have passed before raising the exception, but that would require us to correctly define the receive timeout. As I mentioned earlier, even though the timeout passed to socket.settimeout is zero, my esp32spi socket waits a second before timing out.
@@ -1119,6 +1119,12 @@ def _sock_exact_recv(self, bufsize: int) -> bytearray: | |||
rc = bytearray(bufsize) | |||
mv = memoryview(rc) | |||
recv_len = self._sock.recv_into(rc, bufsize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another call to recv_into()
in the while
cycle below (L1134). Does it need same treatment ? (if we can agree on the treatment being correct)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say no, it doesn't need it. The first check is just to see if there's any data pending in the socket. When we reach the while loop, we already know we have things to read.
I might've gotten the exact commit mixed up, it could've been d60ab23. I tested my way back and forth until I found which version it broke in. |
I think you'd need to dig even further to history. If I remember correctly, the duplicate definition removal did not have any effect, it merely removed dead code. |
Is this fixed yet? I am trying to use a Matrix Portal M4 and I can't really update the display if the loop() is blocking all the time, especially if I am not getting messages very often. I messed around with asyncio a little, but it looks like it hangs the whole thing either way. |
@alison-gravley I believe if you are using the absolutely latest version of this library, and the absolute latest version of the ESP32SPI library in conjunction with each other the My understanding is that the changes from this PR won't necessarily be needed, because we instead made changes inside of ESP32SPI to bring it's behavior more inline with CPython sockets. So this could pototentially be closed at this point unless I've misunderstood some part of it. I cannot speak to asyncio support, I did a fair amount of testing when preparing to merge / release the latest changes across many different devices and use-cases, but asyncio is something I didn't touch at all during testing, I'm not sure how it interacts with |
I should be on latest release of CircuitPython, 8.2.0. Do I need to change to a bleeding edge? I am not sure where I find the version of esp32spi. From lib folder: boot_out.txt says: info_uf2.txt: |
@alison-gravley Ahh, I forgot the detail that esp32spi is frozen into the build. I think there has not been an update in the core to bump to the newest version so the frozen in one is still on the prior release. Which does make it a little bit trickier, but still do-able. For now the easiest way to get it working is get the latest release of esp32spi from the assets downloads on this release page: https://github.com/adafruit/Adafruit_CircuitPython_ESP32SPI/releases/tag/6.0.0 get the 8.x mpy zip, download and extract it. Inside find the lib folder, and inside of that should be the So your drive should look something like this:
Putting the latest esp32spi in the root of CIRCUITPY will make it so that this version gets used instead of the one frozen in. At some point the core one will get updated to the latest and then this process will no longer be necessary, but I think for now it is. |
@FoamyGuy, thank you so much! I finally have scrolling text and can move on to fun animations. I actually tried all of this in Arduino first, but ran into a ton of issues between adafruit libraries and changes with esp32 breaking things. I had to do some hacky things to even get the project to compile and run, but I couldn't quite get it to subscribe to the broker without getting an error. Off-topic here, but I keep hitting a lot of bad luck on when I dig these esp32-s2/s3 boards out. Every time I think it is fun to play with such and such library I run in to a problem and find a recent GitHub post explaining the issue. With both Circuit Python and Arduino. I am just thankful all of the developers have been great and I appreciate all of you for helping your end users who have a mish-mash of capabilities and experience levels. |
Well, three days down the rabbit hole after being bitten by this on my PyPortal. At least it was educational. Pulling the latest esp32spi and putting it at the top level of CircuitPython has fixed the problem. What's the next step to encourage pulling the updated esp32spi into the CircuitPython release itself? update: 8.2.3 still doesn't have the updated esp32spi library |
This was painful. |
Wow, you solved in hours... Better than my several days. |
@aaftonbladet and @dhalbert - I might recommend closing this, and if the esp32spi should raise on a condition like this, a PR should be added there so we don't have special case code in this library |
Closing this for now. I believe it should not be an issue any more. I think the frozen in versions were updated a while ago to support the new way of working. If anyone else runs into this on newer versions we can re-open and dig further again. |
Resolves #148 and #165
The esp32spi socket implementations of
recv
andrecv_into
do not raise an OSError if the read times out, deviating from the behavior in CPython. In the past we accounted for this by checking whether anything was read when callingrecv
, if not, raise an OSError ourselves. However, in c334c81 we started to utilizerecv_into
rather thanrecv
, if it was available. The new version does not have the "raise OSError ourselves if nothing was read"-workaround, because it was assumed that esp32spi wouldn't haverecv_into
and the old version would be used.Ideally, this would be fixed in esp32spi for both functions. However, I think they'd be hesitant to except such a radical change - it'd likely break projects relying on the old implementation. Hence I believe we should take it upon ourselves to fix it.
Maybe it's also appropriate to entirely drop support for socket implementations that lack
recv_into
. Our alternative implementation utilizingrecv
is labled "esp32 fix", but esp32spi sockets have had support forrecv_into
for more than a year. I did not include this change in my PR, but I think it should be considered.