Skip to content
This repository has been archived by the owner on Dec 8, 2022. It is now read-only.

Sometimes an OTA update will fail for no apparent reason #2306

Closed
TheIronNinja opened this issue Jul 23, 2020 · 6 comments
Closed

Sometimes an OTA update will fail for no apparent reason #2306

TheIronNinja opened this issue Jul 23, 2020 · 6 comments
Assignees

Comments

@TheIronNinja
Copy link

TheIronNinja commented Jul 23, 2020

The title pretty much describes it. I'm using an ESP32 with a custom aplication that uses OTA with a factory partition (I've seen that there was a problem with using factory partitions but it was fixed with #1429, so it should be good). It seems like the first attempt at downloading the update always fails at approximately the same point, but forcing a second attempt to download it by restarting the device seems to work.

This is the log that I'm seeing during the OTA update process:

553 6225 [OTA Agent Task] [prvExecuteHandler] Called handler. Current State [WaitingForFileBlock] Event [ReceivedFileBlock] New state [WaitingForFileBlock]
554 6229 [OTA Agent Task] [prvIngestDataBlock] Received file block 131, size 4096
555 6230 [OTA Agent Task] [prvIngestDataBlock] Remaining: 87
556 6230 [OTA Agent Task] [prvExecuteHandler] Called handler. Current State [WaitingForFileBlock] Event [ReceivedFileBlock] New state [WaitingForFileBlock]
557 6233 [OTA Agent Task] [prvIngestDataBlock] Received file block 132, size 4096
558 6234 [OTA Agent Task] [prvIngestDataBlock] Remaining: 86
559 6234 [OTA Agent Task] [prvExecuteHandler] Called handler. Current State [WaitingForFileBlock] Event [ReceivedFileBlock] New state [WaitingForFileBlock]
I (62682) ota_pal: prvPAL_SetPlatformImageState, 3
W (62682) ota_pal: Set image as invalid!
I (62682) esp_ota_ops: aws_esp_ota_get_boot_flags: 1
W (62692) esp_ota_ops: otadata partition is invalid, factory/ota_0 is boot partition
E (62702) ota_pal: currently executing firmware not marked as valid, abort
560 6238 [OTA Agent Task] [prvStopRequestTimer] Stopping request timer.
561 6238 [OTA Agent Task] [prvProcessDataMessage] Aborting due to IngestResult_t error -8
562 6240 [OTA Agent Task] [prvPublishStatusMessage] Msg: {"status":"FAILED","statusDetails":{"reason":"0x27000000: 0xfffffff8"}}
563 6250 [iot_thread] Error: No OTA data buffers available.
564 6250 [OTA Agent Task] [prvPublishStatusMessage] 'FAILED' to $aws/things/xxxxx/jobs/AFR_OTA-xxxxxxxxxxxxxxx/update

Sometimes there will be multiple "Error: No OTA data buffers available." messages.

There are some warning/error messages, but I'm not sure how to interpret them. I've tried incrementing the amount of OTA data buffers from aws_ota_agent_config.h but it works way worse, so I've left it at 2 for now.

It looks like there's a memory problem, but again, I'm not sure how to approach this issue if this is the case, so any help is apreciated.

Thank you!

@lundinc2
Copy link
Contributor

Hello @TheIronNinja,

Thank you for this report, we have begun investigating this issue.

@pvyawaha
Copy link
Contributor

Hello,

Thank you for sharing the logs, it seems that while OTA is in progress the malloc before decoding the file block is failing. The memory allocation on ESP32 is provided by Espressif. To confirm this can you please set a breakpoint or add a debug log here Code Line.

Another thing to confirm this is to try OTA using HTTP which does not require packet decoding. Please refer OTA Over HTTP

@TheIronNinja
Copy link
Author

Hi,

I've printed the resulting code from the function you told me and, sure enough, it returns the CborErrorOutOfMemory code.

I've also noticed that this doesn't happen if I don't subscribe/publish to a couple of extra MQTT topics on my custom application. I'm assuming that those things are taking away some necessary space that OTA needs, so the malloc fails and this ends up causing the problem.
Seeing that this is the problem, I think I'll reconsider how we structure the topics so we can reduce the amount of different subscriptions while keeping the functionality we want, and also simplify some of the code to be more memory efficient.

Regarding OTA using HTTP: is this a good long term solution for this problem? Won't it cause other memory issues considering the amount of RAM it requires to run on the ESP?

Thanks for the help.

@lundinc2
Copy link
Contributor

Hello @TheIronNinja,

@mahavirj gave some great tips for reducing the memory footprint on the ESP32 in this thread #2179.

Here are some additional threads about steps that can be taken to try and tune the TLS connection and reduce it's memory footprint. #1250 and #1468.

Memory is tricky and depending on your use case I hope these threads might help you. I'll defer your question about OTA on HTTP to @pvyawaha as he is much more knowledgeable on OTA

@pvyawaha
Copy link
Contributor

Hello,

Did the memory optimization techniques helped in your application? You are correct with OTA over HTTP as the file download will be over HTTP but the control operations to the service will be still using an MQTT connection. So this will require more memory. Do you also use BLE on ESP32 ?

@pvyawaha
Copy link
Contributor

Hello,

Closing this issue now , please reopen or create new for issues on other topics if required.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants