Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alarm handles leak, unable to schedule user alarms (TZ-1399) #507

Closed
3 tasks done
theorlangur opened this issue Dec 17, 2024 · 14 comments
Closed
3 tasks done

alarm handles leak, unable to schedule user alarms (TZ-1399) #507

theorlangur opened this issue Dec 17, 2024 · 14 comments
Labels

Comments

@theorlangur
Copy link

Answers checklist.

  • I have read the documentation ESP Zigbee SDK Programming Guide and tried the debugging tips, the issue is not addressed there.
  • I have updated ESP Zigbee libs (esp-zboss-lib and esp-zigbee-lib) to the latest version, with corresponding IDF version, and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

v5.3-369-g466a392a76

esp-zigbee-lib version.

1.6.1

esp-zboss-lib version.

1.6.1

Espressif SoC revision.

ESP32H2

What is the expected behavior?

I'm implementing a presence sensor that should support direct binding. When presence is detected an 'on' command is sent to bound devices.
I expect sending commands with esp_zb_zcl_on_off_cmd_req/esp_zb_zcl_on_off_on_with_timed_off_cmd_req just work.
And it does work, the commands are sent, the light is turned on (or off).

What is the actual behavior?

However it seems that issuing such a command occupies slots for for alarms and doesn't free them. What I observe is that when I create alarms via esp_zb_scheduler_user_alarm at some point I cannot, I get a ESP_ZB_USER_CB_HANDLE_INVALID value as a handle back. I have verified that doing all the things my app is doing but without actually issueing on/off command requests - no alarm handles seem to be leaking (I was waiting for a long time and handles returned by esp_zb_scheduler_user_alarm did not have a tendency for rising). With those command requests issued - at some point I get 0 back (ESP_ZB_USER_CB_HANDLE_INVALID).
esp32h2 is configured as an ED, not a router. 2 devices are bound to it: coordinator and IKEA lamp.

Steps to reproduce.

  1. Have a ED zigbee device with an On/Off client cluster
  2. Bind another device with a server On/Off cluster to it
  3. Send on commands with some intervals
  4. Register an alarm with some timeout, on timeout - re-register it again
  5. Observe the values of returned alarm handles. They should rise with time (should be noticable at most after ~10min)

More Information.

When device ends up in this state when alarm slot handles are exhausted, when trying to send the on/off command next time - it doesn't send anything, however I observe several attribute report commands/responses.

@github-actions github-actions bot changed the title alarm handles leak, unable to schedule user alarms alarm handles leak, unable to schedule user alarms (TZ-1399) Dec 17, 2024
@xieqinan
Copy link
Contributor

Hi,

The default number of scheduler_alarm slots is 80, configurable via esp_zb_scheduler_queue_size_set(). This related slot is confirmed will be freed after the callback registered with esp_zb_scheduler_user_alarm() has been triggered. I suspect there might be another memory leak issue in the application. Could you integrate your application into a similar example and provide a simplified patch to reproduce the issue?

@agustinebrecht
Copy link

I'm having a similar problem, and my temporary solution was to restart the chip once it stops me from registering more alarms, I launch the first alarm when initializing the system and then I periodically feed a Watchdog, the moment it stops feeding the chip must be restarted, (the espresif watchdog does not work in end-device with low power, so I built one myself).

#define TIME_WDT_ZIGBEE_MS (120000) // 120s

//Call this function only once during stack initialization
void set_zigbee_periodict_wdt_reset(uint8_t value){
AppAppHandlerSendSKeepAliveZigbee(); //feed the watchdog
esp_zb_scheduler_alarm((esp_zb_callback_t)set_zigbee_periodict_wdt_reset, 0, TIME_WDT_ZIGBEE_MS);
}

@theorlangur
Copy link
Author

theorlangur commented Dec 18, 2024

@xieqinan, I think I got it.
I was wrong in my initial assumptions. It has nothing to do with sending commands. The real issue seems to be some problems when cancelling a running alarm with esp_zb_scheduler_user_alarm_cancel

here's a snippet that allowed me to repro the issue:

static esp_zb_user_cb_handle_t g_CmdTimeout = ESP_ZB_USER_CB_HANDLE_INVALID;
static esp_zb_user_cb_handle_t g_TimeoutKiller = ESP_ZB_USER_CB_HANDLE_INVALID;

static void on_cmd_timeout(void *param);

static void faster_cancel_timeout(void *param)
{
    if (g_CmdTimeout != ESP_ZB_USER_CB_HANDLE_INVALID)
    {
        esp_err_t res = esp_zb_scheduler_user_alarm_cancel(g_CmdTimeout);
        ESP_LOGI(TAG, "Fast-Canceling cmd wait timer 0x%x with res 0x%x", g_CmdTimeout, res);
        g_CmdTimeout = ESP_ZB_USER_CB_HANDLE_INVALID;
    }else
    {
        ESP_LOGI(TAG, "Fast-Canceling timeout but cmd timer is not running");
    }
}

static void big_interval_timer(void *param)
{
    g_CmdTimeout = esp_zb_scheduler_user_alarm(on_cmd_timeout, NULL, 500);
    ESP_LOGI(TAG, "Armed timeout alarm. Handle=0x%x", g_CmdTimeout);
    g_TimeoutKiller = esp_zb_scheduler_user_alarm(faster_cancel_timeout, NULL, 200);
    ESP_LOGI(TAG, "Armed faster timeout killer alarm. Handle=0x%x", g_TimeoutKiller);
    if (g_TimeoutKiller == ESP_ZB_USER_CB_HANDLE_INVALID)
    {
        ESP_LOGI(TAG, "BINGO!!! Got invalid timer killer handle");
    }

    esp_zb_user_cb_handle_t h = esp_zb_scheduler_user_alarm(big_interval_timer, NULL, 5000);
    static esp_zb_user_cb_handle_t g_Max = ESP_ZB_USER_CB_HANDLE_INVALID;
    if (h > g_Max)
    {
        g_Max = h;
        ESP_LOGI(TAG, "Got another Alarm handle bigger than previous: 0x%x", h);
    }else if (h == ESP_ZB_USER_CB_HANDLE_INVALID)
    {
        ESP_LOGI(TAG, "BINGO!!! Got invalid alarm handle");
    }else
    {
        ESP_LOGI(TAG, "allocated alarm handle for 3 sec timer: 0x%x", h);
    }

    if (g_CmdTimeout > g_Max)
    {
        g_Max = g_CmdTimeout;
        ESP_LOGI(TAG, "Got another (timeout) Alarm handle bigger than previous: 0x%x", g_CmdTimeout);
    }else if (g_CmdTimeout == ESP_ZB_USER_CB_HANDLE_INVALID)
    {
        ESP_LOGI(TAG, "BINGO!!! Got invalid timeout alarm handle");
    }else
    {
        ESP_LOGI(TAG, "allocated timeout alarm handle for 0.5 sec timer: 0x%x", g_CmdTimeout);
    }
}

void on_cmd_timeout(void *param)
{
    ESP_LOGI(TAG, "Last sent command timed out");
    g_CmdTimeout = ESP_ZB_USER_CB_HANDLE_INVALID;
}

Explanations:
from a esp_zb_app_signal_handler on a ESP_ZB_BDB_SIGNAL_DEVICE_FIRST_START/ESP_ZB_BDB_SIGNAL_DEVICE_REBOOT a big_interval_timer is started.
This functions creates a timer for 500ms with a callback on_cmd_timeout, stores a returned handle into a global g_CmdTimeout. It also creates another shorted timer for 200ms (and stores the handle into g_TimeoutKiller, but that's not really important) with a callback faster_cancel_timeout.
As it's written on_cmd_timeout is never going to be called as it'll always be cancelled by the faster_cancel_timeout.
big_interval_timer keeps tracking of the value of the alarm handles and prints a handle value if it rises. It also re-schedules itself to run again after 5s.
After relatively short time I've run out of handles. If no cancellation is performed - no leakage happens (or so it seems).

In my original app I indeed have a timeout timer for commands if I don't get status report or response in some defined time and that timer gets cancelled as soon as I get the response.

The question is: am I using the cancel routine wrong? Or does it simply malfunction?

@agustinebrecht , lol, I'm doing something somewhat similar, although in my case I'm tracking the allocated handles and if a newly allocated handle has a value bigger than some defined threshold (in my case: 0x33) - I'm setting the flag that a restart is needed and performing a restart as soon as no occupancy is registered (I'm dealing with a presence sensor in my case, so...) to be less disruptive for the users))

@xieqinan
Copy link
Contributor

@theorlangur,

This is indeed a buffer leak issue in esp_zb_scheduler_user_alarm_cancel(). I believe there are two possible solutions to address it:

  • Free the esp_zb_user_cb_handle_t in the application. Please note that zb_buf_free() is not a safe API and should be removed once the issue is resolved in the next release.
#include "zboss_api.h"
static void faster_cancel_timeout(void *param)
{
    if (g_CmdTimeout != ESP_ZB_USER_CB_HANDLE_INVALID)
    {
        esp_err_t res = esp_zb_scheduler_user_alarm_cancel(g_CmdTimeout);
        ESP_LOGI(TAG, "Fast-Canceling cmd wait timer 0x%x with res 0x%x", g_CmdTimeout, res);
        if (res == ESP_OK && g_CmdTimeout != ESP_ZB_USER_CB_HANDLE_INVALID) {
             zb_buf_free(g_CmdTimeout);
        }
        g_CmdTimeout = ESP_ZB_USER_CB_HANDLE_INVALID;
    }else
    {
        ESP_LOGI(TAG, "Fast-Canceling timeout but cmd timer is not running");
    }
}
  • Use the esp_zb_scheduler_alarm() and esp_zb_scheduler_alarm_cancel() instead.
    The esp_zb_scheduler_user_alarm() is designed to support a user-defined pointer parameter in the alarm. Since you are not utilizing this feature, the esp_zb_scheduler_alarm() should suffice for your requirements.

@theorlangur
Copy link
Author

@xieqinan , hey, thanks for investigating and for the tip
I mean, I do use that void* parameter in the callback, it's just for the illustration purposes it was irrelevant. But since in my case the functionality is anyways incapsulated in a class, I can go either way.
And then later I'll remove the workaround once the issue is fixed.
Is there anything special I need to take into account when using zb_buf_free?

@theorlangur
Copy link
Author

@xieqinan , if I use zb_buf_free now and later it'll get fixed will I be then potentially double-freeing the memory?

@theorlangur
Copy link
Author

@xieqinan I'm getting linker errors when trying to use zb_buf_free
undefined reference to 'zb_buf_free_func(unsigned char)'
something special I need to provide in the config?

@xieqinan
Copy link
Contributor

undefined reference to 'zb_buf_free_func(unsigned char)'

Please refer to #507 (comment), you should add the #include "zboss_api.h" header file.

@xieqinan
Copy link
Contributor

if I use zb_buf_free now and later it'll get fixed will I be then potentially double-freeing the memory?

Yes, it will. So you need to remove this workaround once the issue is fixed.

@theorlangur
Copy link
Author

undefined reference to 'zb_buf_free_func(unsigned char)'

Please refer to #507 (comment), you should add the #include "zboss_api.h" header file.

I did
I didn't get compiler errors, but linker, when I attempted to link
From what I can tell, the relevant zboss*.a files were being linked, but apparently zb_buf_free (which seems to be a define to zb_buf_free_func) is declared but not defined in any of the libraries that I'm linking

@xieqinan
Copy link
Contributor

Why? It works fine on my end. Are you using a C++ compiler? If so, you may need to add

#ifdef __cplusplus
extern "C" {
#endif

#include "zboss_api.h"

#ifdef __cplusplus
}
#endif

@theorlangur
Copy link
Author

Why? It works fine on my end. Are you using a C++ compiler? If so, you may need to add

#ifdef __cplusplus
extern "C" {
#endif

#include "zboss_api.h"

#ifdef __cplusplus
}
#endif

that's it, thx
Yes, I'm using C++

@xieqinan
Copy link
Contributor

Hi @theorlangur ,

Yes, it will. So you need to remove this workaround once the issue is fixed.

I think this issue has been fixed in esp-zigbee-sdk v1.6.2, please test again with the new libs. If it works, please consider closing this ticket.

@theorlangur
Copy link
Author

@xieqinan
verified, doesn't leak anymore. closing.
thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants