VoIP example with 16Khz HAL i2s clock: `AV_STREAM: audio decoder ringbuf write timeout` (AUD-5988) #1347

steve-nomo · 2025-01-13T21:27:45Z

Environment

Audio development kit: ESP32-LyraT
Audio kit version: ESP32-LyraT v4.3
Module or chip used: ESP32-WROVER-E
IDF version: ESP-IDF v5.3.1 (with ADF 2.7's FreeRTOS patch)
ADF version: v2.7
Build system: idf.py
Running log: All logs from power-on to problem recurrence
Compiler version: xtensa-esp-elf-gcc (crosstool-NG esp-13.2.0_20240530) 13.2.0
Operating system: Linux
Using an IDE?: Yes, VS Code
Power supply: USB connector on LyraT to plugged in to 5.0v/1.0A transformer

Problem Description

Using the voip_example, setting av_stream_config.hal.audio_samplerate = 16000 causes write timeouts and watchdogs.

The reason I want to do this is because I have stored audio files recorded at 16kHz and I want to play them at 16kHz. I also need to support VoIP SIP calls running at 8kHz.

In IDF 4.x this worked, the i2s would run at 16kHz and a filter would be used in the SIP pipeline to convert the microphone to 8kHz out to SIP, and incoming SIP audio up to 16kHz to play out the speaker.

I haven't been able to get this setup to work in IDF v5 with the new SIP example that uses the av_stream component and the new esp_rtc_ APIs.

Expected Behavior

The av_stream component has logic to filter between the acodec_samplerate and the .hal.audio_samplerate. So I would expect to be able to play stored audio at 16kHz when not on a SIP call, and when not playing stored audio connect to SIP calls at 8kHz.

This works in IDF 4, but there is no direct translation of code to IDF 5 because if the SIP refactoring. I know the hardware can do it, and the av_stream code appears to be able to handle filtering to down/up sample, but it doesn't work.

Actual Behavior

It does not work.

Steps to Reproduce

Setup IDF 5.3.1 with ADF 2.7
Patch IDF 5.3.1 FreeRTOS with ADF changes
Remove esp32-camera dependency.
3.1 The voip example doesn't work out of the box. The only way I could get the legacy driver stuff sorted out is to remove all the video functions and esp32-camera dependency from av_stream (wich uses esp32-camera). I only need audio stuff anyway. I did this by making a local copy of the components and editing them.
In components/av_stream/av_stream_hal/av_stream_hal.h, define AUDIO_HAL_SAMPLE_RATE as 16000 (leave codec rate at 8kHz)
Make a SIP call.

Code to Reproduce This Issue

See attached:
esp32_adf_voip_16k_crash.zip

Debug Logs

Full logs: voip-sip-16k-wdog.txt

Snippet:

I (21599) SIP_SERVICE: ESP_RTC_EVENT_AUDIO_SESSION_BEGIN
I (21604) AUDIO_PIPELINE: link el->rb, el:0x3f820e40, tag:algo, rb:0x3f821928
I (21625) AUDIO_PIPELINE: link el->rb, el:0x3f8217cc, tag:filter, rb:0x3f821ab0
I (21625) AUDIO_THREAD: The algo task allocate stack on external memory
I (21632) AUDIO_THREAD: The filter task allocate stack on external memory
I (21639) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:4207696 Bytes, Inter:243943 Bytes, Dram:205159 Bytes, Dram largest free:110592Bytes

I (21659) AFE_VC: afe interface for voice communication

I (21665) AFE_VC: AFE version: VC_V220727

I (21685) AFE_VC: Initial auido front-end, total channel: 2, mic num: 1, ref num: 1

I (21685) AFE_VC: aec_init: 1, se_init: 1, vad_init: 0

I (21689) AFE_VC: wakenet_init: 0, voice_communication_agc_init: 0

I (21696) AFE_VC: ns_mode: 0

I (21818) AUDIO_PIPELINE: Pipeline started
I (21819) AUDIO_THREAD: The _audio_enc task allocate stack on external memory
I (21835) AFE_VC: mode: 1, (Nov 21 2023 19:15:51)

I (21820) RSP_FILTER: sample rate of source data : 16000, channel of source data : 1, sample rate of destination data : 8000, channel of destination data : 1
I (21836) AUDIO_THREAD: The algo_fetch task allocate stack on external memory
I (21836) AV_STREAM: audio_enc started
I (21877) AUDIO_THREAD: The _audio_dec task allocate stack on external memory
I (21878) AV_STREAM: audio_dec started
W (21906) SIP: CHANGE STATE FROM 16, TO 32, :func: sip_uas_process_req:1079
W (26123) AV_STREAM: audio decoder ringbuf write timeout
W (26126) AV_STREAM: audio decoder ringbuf write timeout
W (26144) AV_STREAM: AEC reference write timeout ref 2560
W (26282) AV_STREAM: audio decoder ringbuf write timeout
W (26284) AV_STREAM: audio decoder ringbuf write timeout
W (28415) AV_STREAM: audio decoder ringbuf write timeout
W (28417) AV_STREAM: audio decoder ringbuf write timeout
W (28895) AV_STREAM: audio decoder ringbuf write timeout
W (29705) AV_STREAM: audio decoder ringbuf write timeout
W (29831) AV_STREAM: audio decoder ringbuf write timeout
E (30059) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (30059) task_wdt:  - IDLE0 (CPU 0)
E (30059) task_wdt: Tasks currently running:
E (30059) task_wdt: CPU 0: algo
E (30059) task_wdt: CPU 1: IDLE1
E (30059) task_wdt: Print CPU 0 (current core) backtrace


Backtrace: 0x40114D87:0x3FFB2160 0x401151BC:0x3FFB2180 0x40085F19:0x3FFB21B0 0x4008BFF4:0x3F822DF0 0x4008B932:0x3F822E20 0x4008BA69:0x3F822E50 0x40106D72:0x3F822E70 0x40104F19:0x3F822EC0 0x4010523E:0x3F822EE0 0x400E3F2E:0x3F822F00 0x400E3FEF:0x3F822F20 0x400E782E:0x3F822F40 0x400E79D2:0x3F822F70 0x40090786:0x3F822FA0
--- 0x40114d87: task_wdt_timeout_handling at /home/steve/esp/v5.3.1/esp-idf/components/esp_system/task_wdt/task_wdt.c:434
0x401151bc: task_wdt_isr at /home/steve/esp/v5.3.1/esp-idf/components/esp_system/task_wdt/task_wdt.c:507
0x40085f19: _xt_lowint1 at /home/steve/esp/v5.3.1/esp-idf/components/xtensa/xtensa_vectors.S:1240
0x4008bff4: radix4_butterfly4_fft at /home/sunxiangyu/workspace/esp_sr_lib/components/c_speech_features/c_speech_features/fft.c:446 (discriminator 3)
0x4008b932: radix4_butterfly at /home/sunxiangyu/workspace/esp_sr_lib/components/c_speech_features/c_speech_features/fft.c:719
 (inlined by) fft_radix4 at /home/sunxiangyu/workspace/esp_sr_lib/components/c_speech_features/c_speech_features/fft.c:730
 (inlined by) fft_esp_proc at /home/sunxiangyu/workspace/esp_sr_lib/components/c_speech_features/c_speech_features/fft.c:745
0x4008ba69: fftr_esp_proc at /home/sunxiangyu/workspace/esp_sr_lib/components/c_speech_features/c_speech_features/fft.c:1037
0x40106d72: esp_aec3_process at /home/sunxiangyu/workspace/esp_sr_lib/components/esp_audio_processor/acoustic_echo_cancellation/esp_aec3.c:135
0x40104f19: afe_feed_aec_init_true at /home/sunxiangyu/workspace/esp_sr_lib/components/esp_audio_front_end/esp_afe_vc.c:528
0x4010523e: afe_feed at /home/sunxiangyu/workspace/esp_sr_lib/components/esp_audio_front_end/esp_afe_vc.c:671
0x400e3f2e: algorithm_data_process_for_type1 at /home/steve/esp/esp-adf-v2.7/components/audio_stream/algorithm_stream.c:273
0x400e3fef: _algo_process at /home/steve/esp/esp-adf-v2.7/components/audio_stream/algorithm_stream.c:318
0x400e782e: audio_element_process_running at /home/steve/esp/esp-adf-v2.7/components/audio_pipeline/audio_element.c:340
0x400e79d2: audio_element_task at /home/steve/esp/esp-adf-v2.7/components/audio_pipeline/audio_element.c:487
0x40090786: vPortTaskWrapper at /home/steve/esp/v5.3.1/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

The text was updated successfully, but these errors were encountered:

TempoTian · 2025-01-20T11:46:48Z

For G711 almost use 8K sample rate, if play tone use 16K and SIP call use 8K, better use following config:
hal samplerate means actually samplerate output through codec devices
acodec_samplerate means SIP received or send audio samplerate.
When they are not same av_stream will add a resample to convert the codec samplerate to hal samplerate.

   av_stream_config_t av_stream_config = {
        .algo_mask = ALGORITHM_STREAM_DEFAULT_MASK,
        .acodec_samplerate = 8000,
        .acodec_type = AV_ACODEC_G711A,
        .vcodec_type = AV_VCODEC_NULL,
        .hal = {
            .audio_samplerate = 16000,
            .audio_framesize = PCM_FRAME_SIZE,
        },
    };
    av_stream = av_stream_init(&av_stream_config);

steve-nomo · 2025-01-21T15:16:07Z

@TempoTian Thank you for the response.

Yes, this is what I originally did. I set the acodec_samplerate = 8000 and audio_samplerate = 16000 and got the errors I have described in this ticket.

If you download the zip file I posted, you will see that I changed to #define AUDIO_HAL_SAMPLE_RATE 16000 so my code is already exactly what you posted.

I agree that this should work, but it does not. There is a problem somewhere.

Have you tried it yourself? Using your own suggestion you should get the same error.

TempoTian · 2025-01-23T12:43:44Z

I have test the av stream, it works all right if input 8k, output 16k.
I think it is cause by the tone player, for tone player will call i2s_stream_set_clk to change the I2S setting, cause setting not same as av_stream_init, you can mark the tone player related code and try.
Also you can use 16k tone to replace the original tone. And add some log in i2s_stream_set_clk check where the wrong setting comes.

steve-nomo · 2025-01-24T15:25:55Z

Just to be clear: the error only occurs when you make a SIP call. Are you saying you made a SIP call using 16k HAL rate without getting errors like I showed?

In the zip file I gave, the only real change is AUDIO_HAL_SAMPLE_RATE is defined as 16000 instead of 8000. All the other changes in the zip file were required to remove the esp32_camera dependency problems so it would build.

So the code I gave:

In components/av_stream/av_stream_hal/av_stream_hal.h:

#define AUDIO_HAL_SAMPLE_RATE 16000

voip_app.c:

    av_stream_config_t av_stream_config = {
        .algo_mask = ALGORITHM_STREAM_DEFAULT_MASK,
        .acodec_samplerate = AUDIO_CODEC_SAMPLE_RATE,
        .acodec_type = AV_ACODEC_G711A,
        .vcodec_type = AV_VCODEC_NULL,
        .hal = {
            .audio_samplerate = AUDIO_HAL_SAMPLE_RATE, // <-------- defined as 16000
            .audio_framesize = PCM_FRAME_SIZE,
        },
    };

And:

audio_player_int_tone_init(AUDIO_HAL_SAMPLE_RATE, I2S_CHANNELS, I2S_DEFAULT_BITS); // <---- first argument is 16000

So in the code I provided, the tone player rate and the av_stream_config.hal.audio_samplerate already match.

github-actions bot changed the title ~~VoIP example with 16Khz HAL i2s clock: AV_STREAM: audio decoder ringbuf write timeout~~ VoIP example with 16Khz HAL i2s clock: AV_STREAM: audio decoder ringbuf write timeout (AUD-5988) Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VoIP example with 16Khz HAL i2s clock: `AV_STREAM: audio decoder ringbuf write timeout` (AUD-5988) #1347

VoIP example with 16Khz HAL i2s clock: `AV_STREAM: audio decoder ringbuf write timeout` (AUD-5988) #1347

steve-nomo commented Jan 13, 2025 •

edited

Loading

TempoTian commented Jan 20, 2025

steve-nomo commented Jan 21, 2025 •

edited

Loading

TempoTian commented Jan 23, 2025

steve-nomo commented Jan 24, 2025

VoIP example with 16Khz HAL i2s clock: AV_STREAM: audio decoder ringbuf write timeout (AUD-5988) #1347

VoIP example with 16Khz HAL i2s clock: AV_STREAM: audio decoder ringbuf write timeout (AUD-5988) #1347

Comments

steve-nomo commented Jan 13, 2025 • edited Loading

Environment

Problem Description

Expected Behavior

Actual Behavior

Steps to Reproduce

Code to Reproduce This Issue

Debug Logs

TempoTian commented Jan 20, 2025

steve-nomo commented Jan 21, 2025 • edited Loading

TempoTian commented Jan 23, 2025

steve-nomo commented Jan 24, 2025

VoIP example with 16Khz HAL i2s clock: `AV_STREAM: audio decoder ringbuf write timeout` (AUD-5988) #1347

VoIP example with 16Khz HAL i2s clock: `AV_STREAM: audio decoder ringbuf write timeout` (AUD-5988) #1347

steve-nomo commented Jan 13, 2025 •

edited

Loading

steve-nomo commented Jan 21, 2025 •

edited

Loading