Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VoIP example with 16Khz HAL i2s clock: AV_STREAM: audio decoder ringbuf write timeout (AUD-5988) #1347

Open
steve-nomo opened this issue Jan 13, 2025 · 4 comments

Comments

@steve-nomo
Copy link

steve-nomo commented Jan 13, 2025

Environment

  • Audio development kit: ESP32-LyraT
  • Audio kit version: ESP32-LyraT v4.3
  • Module or chip used: ESP32-WROVER-E
  • IDF version: ESP-IDF v5.3.1 (with ADF 2.7's FreeRTOS patch)
  • ADF version: v2.7
  • Build system: idf.py
  • Running log: All logs from power-on to problem recurrence
  • Compiler version: xtensa-esp-elf-gcc (crosstool-NG esp-13.2.0_20240530) 13.2.0
  • Operating system: Linux
  • Using an IDE?: Yes, VS Code
  • Power supply: USB connector on LyraT to plugged in to 5.0v/1.0A transformer

Problem Description

Using the voip_example, setting av_stream_config.hal.audio_samplerate = 16000 causes write timeouts and watchdogs.

The reason I want to do this is because I have stored audio files recorded at 16kHz and I want to play them at 16kHz. I also need to support VoIP SIP calls running at 8kHz.

In IDF 4.x this worked, the i2s would run at 16kHz and a filter would be used in the SIP pipeline to convert the microphone to 8kHz out to SIP, and incoming SIP audio up to 16kHz to play out the speaker.

I haven't been able to get this setup to work in IDF v5 with the new SIP example that uses the av_stream component and the new esp_rtc_ APIs.

Expected Behavior

The av_stream component has logic to filter between the acodec_samplerate and the .hal.audio_samplerate. So I would expect to be able to play stored audio at 16kHz when not on a SIP call, and when not playing stored audio connect to SIP calls at 8kHz.

This works in IDF 4, but there is no direct translation of code to IDF 5 because if the SIP refactoring. I know the hardware can do it, and the av_stream code appears to be able to handle filtering to down/up sample, but it doesn't work.

Actual Behavior

It does not work.

Steps to Reproduce

  1. Setup IDF 5.3.1 with ADF 2.7
  2. Patch IDF 5.3.1 FreeRTOS with ADF changes
  3. Remove esp32-camera dependency.
    3.1 The voip example doesn't work out of the box. The only way I could get the legacy driver stuff sorted out is to remove all the video functions and esp32-camera dependency from av_stream (wich uses esp32-camera). I only need audio stuff anyway. I did this by making a local copy of the components and editing them.
  4. In components/av_stream/av_stream_hal/av_stream_hal.h, define AUDIO_HAL_SAMPLE_RATE as 16000 (leave codec rate at 8kHz)
  5. Make a SIP call.

Code to Reproduce This Issue

See attached:
esp32_adf_voip_16k_crash.zip

Debug Logs

Full logs: voip-sip-16k-wdog.txt

Snippet:

I (21599) SIP_SERVICE: ESP_RTC_EVENT_AUDIO_SESSION_BEGIN
I (21604) AUDIO_PIPELINE: link el->rb, el:0x3f820e40, tag:algo, rb:0x3f821928
I (21625) AUDIO_PIPELINE: link el->rb, el:0x3f8217cc, tag:filter, rb:0x3f821ab0
I (21625) AUDIO_THREAD: The algo task allocate stack on external memory
I (21632) AUDIO_THREAD: The filter task allocate stack on external memory
I (21639) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:4207696 Bytes, Inter:243943 Bytes, Dram:205159 Bytes, Dram largest free:110592Bytes

I (21659) AFE_VC: afe interface for voice communication

I (21665) AFE_VC: AFE version: VC_V220727

I (21685) AFE_VC: Initial auido front-end, total channel: 2, mic num: 1, ref num: 1

I (21685) AFE_VC: aec_init: 1, se_init: 1, vad_init: 0

I (21689) AFE_VC: wakenet_init: 0, voice_communication_agc_init: 0

I (21696) AFE_VC: ns_mode: 0

I (21818) AUDIO_PIPELINE: Pipeline started
I (21819) AUDIO_THREAD: The _audio_enc task allocate stack on external memory
I (21835) AFE_VC: mode: 1, (Nov 21 2023 19:15:51)

I (21820) RSP_FILTER: sample rate of source data : 16000, channel of source data : 1, sample rate of destination data : 8000, channel of destination data : 1
I (21836) AUDIO_THREAD: The algo_fetch task allocate stack on external memory
I (21836) AV_STREAM: audio_enc started
I (21877) AUDIO_THREAD: The _audio_dec task allocate stack on external memory
I (21878) AV_STREAM: audio_dec started
W (21906) SIP: CHANGE STATE FROM 16, TO 32, :func: sip_uas_process_req:1079
W (26123) AV_STREAM: audio decoder ringbuf write timeout
W (26126) AV_STREAM: audio decoder ringbuf write timeout
W (26144) AV_STREAM: AEC reference write timeout ref 2560
W (26282) AV_STREAM: audio decoder ringbuf write timeout
W (26284) AV_STREAM: audio decoder ringbuf write timeout
W (28415) AV_STREAM: audio decoder ringbuf write timeout
W (28417) AV_STREAM: audio decoder ringbuf write timeout
W (28895) AV_STREAM: audio decoder ringbuf write timeout
W (29705) AV_STREAM: audio decoder ringbuf write timeout
W (29831) AV_STREAM: audio decoder ringbuf write timeout
E (30059) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (30059) task_wdt:  - IDLE0 (CPU 0)
E (30059) task_wdt: Tasks currently running:
E (30059) task_wdt: CPU 0: algo
E (30059) task_wdt: CPU 1: IDLE1
E (30059) task_wdt: Print CPU 0 (current core) backtrace


Backtrace: 0x40114D87:0x3FFB2160 0x401151BC:0x3FFB2180 0x40085F19:0x3FFB21B0 0x4008BFF4:0x3F822DF0 0x4008B932:0x3F822E20 0x4008BA69:0x3F822E50 0x40106D72:0x3F822E70 0x40104F19:0x3F822EC0 0x4010523E:0x3F822EE0 0x400E3F2E:0x3F822F00 0x400E3FEF:0x3F822F20 0x400E782E:0x3F822F40 0x400E79D2:0x3F822F70 0x40090786:0x3F822FA0
--- 0x40114d87: task_wdt_timeout_handling at /home/steve/esp/v5.3.1/esp-idf/components/esp_system/task_wdt/task_wdt.c:434
0x401151bc: task_wdt_isr at /home/steve/esp/v5.3.1/esp-idf/components/esp_system/task_wdt/task_wdt.c:507
0x40085f19: _xt_lowint1 at /home/steve/esp/v5.3.1/esp-idf/components/xtensa/xtensa_vectors.S:1240
0x4008bff4: radix4_butterfly4_fft at /home/sunxiangyu/workspace/esp_sr_lib/components/c_speech_features/c_speech_features/fft.c:446 (discriminator 3)
0x4008b932: radix4_butterfly at /home/sunxiangyu/workspace/esp_sr_lib/components/c_speech_features/c_speech_features/fft.c:719
 (inlined by) fft_radix4 at /home/sunxiangyu/workspace/esp_sr_lib/components/c_speech_features/c_speech_features/fft.c:730
 (inlined by) fft_esp_proc at /home/sunxiangyu/workspace/esp_sr_lib/components/c_speech_features/c_speech_features/fft.c:745
0x4008ba69: fftr_esp_proc at /home/sunxiangyu/workspace/esp_sr_lib/components/c_speech_features/c_speech_features/fft.c:1037
0x40106d72: esp_aec3_process at /home/sunxiangyu/workspace/esp_sr_lib/components/esp_audio_processor/acoustic_echo_cancellation/esp_aec3.c:135
0x40104f19: afe_feed_aec_init_true at /home/sunxiangyu/workspace/esp_sr_lib/components/esp_audio_front_end/esp_afe_vc.c:528
0x4010523e: afe_feed at /home/sunxiangyu/workspace/esp_sr_lib/components/esp_audio_front_end/esp_afe_vc.c:671
0x400e3f2e: algorithm_data_process_for_type1 at /home/steve/esp/esp-adf-v2.7/components/audio_stream/algorithm_stream.c:273
0x400e3fef: _algo_process at /home/steve/esp/esp-adf-v2.7/components/audio_stream/algorithm_stream.c:318
0x400e782e: audio_element_process_running at /home/steve/esp/esp-adf-v2.7/components/audio_pipeline/audio_element.c:340
0x400e79d2: audio_element_task at /home/steve/esp/esp-adf-v2.7/components/audio_pipeline/audio_element.c:487
0x40090786: vPortTaskWrapper at /home/steve/esp/v5.3.1/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134
@github-actions github-actions bot changed the title VoIP example with 16Khz HAL i2s clock: AV_STREAM: audio decoder ringbuf write timeout VoIP example with 16Khz HAL i2s clock: AV_STREAM: audio decoder ringbuf write timeout (AUD-5988) Jan 13, 2025
@TempoTian
Copy link
Contributor

For G711 almost use 8K sample rate, if play tone use 16K and SIP call use 8K, better use following config:
hal samplerate means actually samplerate output through codec devices
acodec_samplerate means SIP received or send audio samplerate.
When they are not same av_stream will add a resample to convert the codec samplerate to hal samplerate.

   av_stream_config_t av_stream_config = {
        .algo_mask = ALGORITHM_STREAM_DEFAULT_MASK,
        .acodec_samplerate = 8000,
        .acodec_type = AV_ACODEC_G711A,
        .vcodec_type = AV_VCODEC_NULL,
        .hal = {
            .audio_samplerate = 16000,
            .audio_framesize = PCM_FRAME_SIZE,
        },
    };
    av_stream = av_stream_init(&av_stream_config);

@steve-nomo
Copy link
Author

steve-nomo commented Jan 21, 2025

@TempoTian Thank you for the response.

Yes, this is what I originally did. I set the acodec_samplerate = 8000 and audio_samplerate = 16000 and got the errors I have described in this ticket.

If you download the zip file I posted, you will see that I changed to #define AUDIO_HAL_SAMPLE_RATE 16000 so my code is already exactly what you posted.

I agree that this should work, but it does not. There is a problem somewhere.

Have you tried it yourself? Using your own suggestion you should get the same error.

@TempoTian
Copy link
Contributor

I have test the av stream, it works all right if input 8k, output 16k.
I think it is cause by the tone player, for tone player will call i2s_stream_set_clk to change the I2S setting, cause setting not same as av_stream_init, you can mark the tone player related code and try.
Also you can use 16k tone to replace the original tone. And add some log in i2s_stream_set_clk check where the wrong setting comes.

@steve-nomo
Copy link
Author

Just to be clear: the error only occurs when you make a SIP call. Are you saying you made a SIP call using 16k HAL rate without getting errors like I showed?

In the zip file I gave, the only real change is AUDIO_HAL_SAMPLE_RATE is defined as 16000 instead of 8000. All the other changes in the zip file were required to remove the esp32_camera dependency problems so it would build.

So the code I gave:

In components/av_stream/av_stream_hal/av_stream_hal.h:

#define AUDIO_HAL_SAMPLE_RATE 16000

voip_app.c:

    av_stream_config_t av_stream_config = {
        .algo_mask = ALGORITHM_STREAM_DEFAULT_MASK,
        .acodec_samplerate = AUDIO_CODEC_SAMPLE_RATE,
        .acodec_type = AV_ACODEC_G711A,
        .vcodec_type = AV_VCODEC_NULL,
        .hal = {
            .audio_samplerate = AUDIO_HAL_SAMPLE_RATE, // <-------- defined as 16000
            .audio_framesize = PCM_FRAME_SIZE,
        },
    };

And:

audio_player_int_tone_init(AUDIO_HAL_SAMPLE_RATE, I2S_CHANNELS, I2S_DEFAULT_BITS); // <---- first argument is 16000

So in the code I provided, the tone player rate and the av_stream_config.hal.audio_samplerate already match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants