Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

为什么esp32s3-korvo-2 v3使用ai_agent/volc_rtc示例,在开启智能体的时候为什么出现以下问题? (AUD-6052) #1363

Open
tianrongqin opened this issue Feb 10, 2025 · 20 comments

Comments

@tianrongqin
Copy link

在esp32s3-korvo-2 v3使用ai_agent/volc_rtc在火山引擎个人版去配置对应的数据以及开启智能体,日志打印如下:
ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x1 (POWERON),boot:0x8 (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce2820,len:0x19fc
load:0x403c8700,len:0x4
load:0x403c8704,len:0xe8c
load:0x403cb700,len:0x3164
entry 0x403c8940
I (27) boot: ESP-IDF v5.3.2-dirty 2nd stage bootloader
I (27) boot: compile time Feb 10 2025 08:47:44
I (27) boot: Multicore bootloader
I (30) boot: chip revision: v0.2
I (34) boot: efuse block revision: v1.3
I (39) qio_mode: Enabling default flash chip QIO
I (44) boot.esp32s3: Boot SPI Speed : 80MHz
I (49) boot.esp32s3: SPI Mode : QIO
I (54) boot.esp32s3: SPI Flash Size : 16MB
I (58) boot: Enabling RNG early entropy source...
I (64) boot: Partition Table:
I (67) boot: ## Label Usage Type ST Offset Length
I (75) boot: 0 nvs WiFi data 01 02 00009000 00004000
I (82) boot: 1 phy_init RF data 01 01 0000d000 00001000
I (90) boot: 2 factory factory app 00 00 00010000 00300000
I (97) boot: 3 model Unknown data 01 82 00310000 0040e000
I (104) boot: 4 spiffs_data Unknown data 01 82 0071e000 00010000
I (112) boot: End of partition table
I (116) esp_image: segment 0: paddr=00010020 vaddr=3c180020 size=4b9ech (309740) map
I (171) esp_image: segment 1: paddr=0005ba14 vaddr=3fca0000 size=04604h ( 17924) load
I (174) esp_image: segment 2: paddr=00060020 vaddr=42000020 size=179c30h (1547312) map
I (406) esp_image: segment 3: paddr=001d9c58 vaddr=3fca4604 size=038cch ( 14540) load
I (409) esp_image: segment 4: paddr=001dd52c vaddr=40378000 size=17f10h ( 98064) load
I (441) boot: Loaded app from partition at offset 0x10000
I (441) boot: Disabling RNG early entropy source...
I (453) octal_psram: vendor id : 0x0d (AP)
I (453) octal_psram: dev id : 0x02 (generation 3)
I (453) octal_psram: density : 0x03 (64 Mbit)
I (458) octal_psram: good-die : 0x01 (Pass)
I (464) octal_psram: Latency : 0x01 (Fixed)
I (469) octal_psram: VCC : 0x01 (3V)
I (474) octal_psram: SRF : 0x01 (Fast Refresh)
I (480) octal_psram: BurstType : 0x01 (Hybrid Wrap)
I (486) octal_psram: BurstLen : 0x01 (32 Byte)
I (491) octal_psram: Readlatency : 0x02 (10 cycles@Fixed)
I (497) octal_psram: DriveStrength: 0x00 (1/1)
I (503) MSPI Timing: PSRAM timing tuning index: 4
I (508) esp_psram: Found 8MB PSRAM device
I (512) esp_psram: Speed: 80MHz
I (516) cpu_start: Multicore app
I (802) esp_psram: SPI SRAM memory test OK
I (811) cpu_start: Pro cpu start user code
I (811) cpu_start: cpu freq: 240000000 Hz
I (811) app_init: Application information:
I (814) app_init: Project name: volc_rtc
I (819) app_init: App version: 1
I (823) app_init: Compile time: Feb 10 2025 08:46:43
I (829) app_init: ELF file SHA256: e0476f476...
I (834) app_init: ESP-IDF: v5.3.2-dirty
I (840) efuse_init: Min chip rev: v0.0
I (844) efuse_init: Max chip rev: v0.99
I (849) efuse_init: Chip rev: v0.2
I (854) heap_init: Initializing. RAM available for dynamic allocation:
I (861) heap_init: At 3FCAF458 len 0003A2B8 (232 KiB): RAM
I (867) heap_init: At 3FCE9710 len 00005724 (21 KiB): RAM
I (874) heap_init: At 600FE100 len 00001EE8 (7 KiB): RTCRAM
I (880) esp_psram: Adding pool of 8192K of PSRAM memory to heap allocator
I (888) spi_flash: detected chip: gd
I (892) spi_flash: flash io: qio
W (896) ADC: legacy driver is deprecated, please migrate to esp_adc/adc_oneshot.h
I (904) sleep: Configure to isolate all GPIO pins in sleep state
I (911) sleep: Enable automatic switching of GPIO sleep configuration
I (918) main_task: Started on CPU0
I (938) esp_psram: Reserving pool of 32K of internal memory for DMA/internal allocations
I (938) main_task: Calling app_main()
I (948) main: Initialize board peripherals
I (948) PERIPH_SPIFFS: Partition size: total: 52961, used: 12299
I (948) AUDIO_THREAD: The esp_periph task allocate stack on internal memory
W (958) i2c_bus_v2: I2C master handle is NULL, will create new one
I (968) gpio: GPIO[17]| InputEn: 1| OutputEn: 1| OpenDrain: 1| Pullup: 1| Pulldown: 0| Intr:0
I (978) gpio: GPIO[18]| InputEn: 1| OutputEn: 1| OpenDrain: 1| Pullup: 1| Pulldown: 0| Intr:0
I (988) DRV8311: ES8311 in Slave mode
I (998) gpio: GPIO[48]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0
I (1008) ES7210: ES7210 in Slave mode
I (1018) ES7210: Enable ES7210_INPUT_MIC1
I (1018) ES7210: Enable ES7210_INPUT_MIC2
I (1018) ES7210: Enable ES7210_INPUT_MIC3
W (1028) ES7210: Enable TDM mode. ES7210_SDP_INTERFACE2_REG12: 2
I (1028) ES7210: config fmt 60
I (1028) AUDIO_HAL: Codec mode is 3, Ctrl:1
I (1048) pp: pp rom version: e7ae62f
I (1048) net80211: net80211 rom version: e7ae62f
I (1058) wifi:wifi driver task: 3fcc407c, prio:23, stack:6656, core=0
I (1058) wifi:wifi firmware version: b0fd6006b
I (1058) wifi:wifi certification version: v7.0
I (1058) wifi:config NVS flash: enabled
I (1058) wifi:config nano formating: disabled
I (1068) wifi:Init data frame dynamic rx buffer num: 32
I (1068) wifi:Init static rx mgmt buffer num: 5
I (1078) wifi:Init management short buffer num: 32
I (1078) wifi:Init static tx buffer num: 16
I (1088) wifi:Init tx cache buffer num: 32
I (1088) wifi:Init static tx FG buffer num: 2
I (1088) wifi:Init static rx buffer size: 1600
I (1098) wifi:Init static rx buffer num: 16
I (1098) wifi:Init dynamic rx buffer num: 32
I (1108) wifi_init: rx ba win: 16
I (1108) wifi_init: accept mbox: 6
I (1108) wifi_init: tcpip mbox: 32
I (1118) wifi_init: udp mbox: 6
I (1118) wifi_init: tcp mbox: 6
I (1118) wifi_init: tcp tx win: 5760
I (1128) wifi_init: tcp rx win: 5760
I (1128) wifi_init: tcp mss: 1440
I (1138) wifi_init: WiFi/LWIP prefer SPIRAM
I (1138) wifi_init: WiFi IRAM OP enabled
I (1148) wifi_init: WiFi RX IRAM OP enabled
W (1148) wifi:Password length matches WPA2 standards, authmode threshold changes from OPEN to WPA2
I (1158) wifi:Set ps type: 1, coexist: 0

I (1158) phy_init: phy_version 680,a6008b2,Jun 4 2024,16:41:10
I (1218) wifi:mode : sta (8c:bf:ea:0b:3e:48)
I (1218) wifi:enable tsf
W (1218) PERIPH_WIFI: WiFi Event cb, Unhandle event_base:WIFI_EVENT, event_id:43
I (1828) wifi:new:<6,0>, old:<1,0>, ap:<255,255>, sta:<6,0>, prof:1, snd_ch_cfg:0x0
I (1828) wifi:state: init -> auth (0xb0)
W (1828) PERIPH_WIFI: WiFi Event cb, Unhandle event_base:WIFI_EVENT, event_id:43
I (4278) wifi:state: auth -> init (0x200)
I (4278) wifi:new:<6,0>, old:<6,0>, ap:<255,255>, sta:<6,0>, prof:1, snd_ch_cfg:0x0
W (4278) PERIPH_WIFI: Wi-Fi disconnected from SSID xtian, auto-reconnect enabled, reconnect after 1000 ms
W (7688) PERIPH_WIFI: Wi-Fi disconnected from SSID xtian, auto-reconnect enabled, reconnect after 1000 ms
I (8688) wifi:new:<6,0>, old:<6,0>, ap:<255,255>, sta:<6,0>, prof:1, snd_ch_cfg:0x0
I (8688) wifi:state: init -> auth (0xb0)
I (9178) wifi:state: auth -> assoc (0x0)
I (9188) wifi:state: assoc -> run (0x10)
I (9278) wifi:connected with xtian, aid = 1, channel 6, BW20, bssid = b6:f7:a1:e8:4a:57
I (9278) wifi:security: WPA3-SAE, phy: bgn, rssi: -16
I (9278) wifi:pm start, type: 1

I (9288) wifi:dp: 1, bi: 102400, li: 3, scale listen interval from 307200 us to 307200 us
I (9298) wifi:set rx beacon pti, rx_bcn_pti: 0, bcn_timeout: 25000, mt_pti: 0, mt_time: 10000
I (9298) wifi:dp: 2, bi: 102400, li: 4, scale listen interval from 307200 us to 409600 us
I (9308) wifi:AP's beacon interval = 102400 us, DTIM period = 2
W (9318) PERIPH_WIFI: WiFi Event cb, Unhandle event_base:WIFI_EVENT, event_id:4
I (10318) esp_netif_handlers: sta ip: 192.168.182.198, mask: 255.255.255.0, gw: 192.168.182.96
I (10318) PERIPH_WIFI: Got ip:192.168.182.198
I (10318) audio processor: Create audio pipeline for audio player
I (10328) audio processor: Create audio player audio stream
I (10328) audio processor: Register all elements to playback pipeline
I (10338) audio processor: Link playback element together raw-->audio_decoder-->i2s_stream-->[codec_chip]
E (10348) gpio: gpio_install_isr_service(502): GPIO isr service already installed
E (10358) DISPATCHER: exe first list: 0x0
I (10358) DISPATCHER: dispatcher_event_task is running...
1970-01-01 00:00:09.557 [E] VolcEngineRTCLite.c:105 ****************** HELLO BOOKA (67a5cfda0ee28a01a8f2d585)(1.56.001.58)(6059fcf26792a8820bc81f13662979d531e5504d) ********************
1970-01-01 00:00:09.573 [E] Cache.c:270 operation returned status code: 0x00000009
1970-01-01 00:00:09.587 [E] ThreadPool.c:92 coreid 1 set 1 stack_size 8192 priority 5
I (10398) audio processor: recorder_pipeline_open
I (10408) audio processor: Create audio pipeline for recording
I (10408) audio processor: Create player audio stream
I (10418) audio processor: Register all player elements to audio pipeline
I (10418) audio processor: Link all player elements to audio pipeline
I (10428) audio processor: player_pipeline_open
I (10438) audio processor: Create audio pipeline for playback
I (10438) audio processor: Create playback audio stream
I (10448) audio_stream_7210: Create opus decoder
I (10458) audio processor: Register all elements to playback pipeline
I (10458) audio processor: ENBALE_AUDIO_STREAM_DUAL_MIC
I (10468) audio processor: Link playback element together raw-->audio_decoder-->rsp-->i2s_stream-->[codec_chip]
I (10478) audio processor: player pipe start running
I (10478) volc_rtc: start join room

1970-01-01 00:00:09.677 [E] RoomImplX.c:167 operation returned status code: 0x52000057
1970-01-01 00:00:10.751 [E] Cache.c:309 operation returned status code: 0x00000009
1970-01-01 00:00:10.756 [E] RoomImplX.c:167 operation returned status code: 0x52000057
1970-01-01 00:00:10.757 [E] LiteHttp.c:641 ID 690763143 E_LOGIC : NO need keepAlive
1970-01-01 00:00:10.766 [E] RoomImplX.c:167 operation returned status code: 0x52000057
I (11688) wifi:idx:0 (ifx:0, b6:f7:a1:e8:4a:57), tid:5, ssn:17, winSize:64
1970-01-01 00:00:11.028 [E] RoomImplX.c:167 operation returned status code: 0x52000057
I (12298) wifi:idx:1 (ifx:0, b6:f7:a1:e8:4a:57), tid:0, ssn:16, winSize:64
I (12318) volc_rtc: join channel success test_room123 elapsed 268 ms now 268 ms

I (12318) volc_rtc: join room success

I (12318) RAW_OPUS_ENC: Raw Opus encoder init
I (12328) MODEL_LOADER: The storage free size is 22720 KB
I (12328) MODEL_LOADER: The partition size is 4152 KB
I (12338) MODEL_LOADER: Successfully load srmodels
I (12348) RECORDER_SR: The first wakenet model: wn9_hilexin

I (12348) AFE_SR: afe interface for speech recognition

I (12358) AFE_SR: AFE version: SR_V220727

I (12358) AFE_SR: Initial auido front-end, total channel: 3, mic num: 2, ref num: 1

I (12368) AFE_SR: aec_init: 1, se_init: 0, vad_init: 0

I (12378) AFE_SR: wakenet_init: 0

I (12528) AFE_SR: wake num: 2, mode: 0, (Sep 4 2024 11:49:31)

I (12528) AUDIO_RECORDER: RECORDER_CMD_TRIGGER_START
I (12528) main_task: Returned from app_main()
请问怎么解决该问题?是开启智能体存在什么问题吗?具体可以怎么解决?

@github-actions github-actions bot changed the title 为什么esp32s3-korvo-2 v3使用ai_agent/volc_rtc示例,在开启智能体的时候为什么出现以下问题? 为什么esp32s3-korvo-2 v3使用ai_agent/volc_rtc示例,在开启智能体的时候为什么出现以下问题? (AUD-6052) Feb 10, 2025
@Zhentao-Lin
Copy link

My problem is similar to yours, but I print a lot of callback events that shouldn't be there.

Image

@shootao
Copy link

shootao commented Feb 11, 2025

Hi @Zhentao-Lin @tianrongqin
关于豆包, 它有两个阶段
第一个阶段是 设备端 进入房间, 就是 “join room success
第二个阶段是 智能体进入房间, 当两个同时 进入房间后 会有一个 WelcomeMessage 的下发, 设备端就会听到 智能体下发的 问候语
只有先听到 问候语后,才能 正常对话,先确定一下在什么阶段

@Zhentao-Lin
Copy link

处于第二阶段,我使用的是V5.3.2版本,不太行。使用5.3.1版本则可以,请问是修改了哪些代码导致了图片中日志的问题呢

@moonlight729
Copy link

byte_rtc_request 这个让智能体加入的代码都没有调用啊啊啊啊,代码有问题吧

@tianrongqin
Copy link
Author

byte_rtc_request 这个让智能体加入的代码都没有调用啊啊啊啊,代码有问题吧

使用火山引擎开启调试去开启智能体加入房间的

@tianrongqin
Copy link
Author

@shootao hello
现在已经可以听到欢迎词了,但是问问题的时候,esp32s3-korvo-2 v3板没反应或者反应很慢,需要问几次才做出应答,并且输出的声音断断续续的,抖得很厉害,请问这种是什么原因导致的,可以怎么解决呢?

@moonlight729
Copy link

@shootao hello 现在已经可以听到欢迎词了,但是问问题的时候,esp32s3-korvo-2 v3板没反应或者反应很慢,需要问几次才做出应答,并且输出的声音断断续续的,抖得很厉害,请问这种是什么原因导致的,可以怎么解决呢?

说话声音大点试下

@yejinghang
Copy link

请问是怎么解决前面没有反馈声的?

@TempoTian
Copy link
Contributor

@yejinghang 要在网页上startchat配置好, README有写:
然后在 api-explorer 中启动智能体,之后设备就可以连接到火山引擎进行语音交互了。

@moonlight729
Copy link

@Zhentao-Lin 请问这个问题有解决吗?我现在也这样,有欢迎语,但是说话没反应

@moonlight729
Copy link

@TempoTian 请问,你的问题解决了吗?说话没有回应

@TempoTian
Copy link
Contributor

@moonlight729 我这边测试是可以对话的,log是一样的,不过没有设定WAKEUP_MODE

1970-01-01 00:04:45.007 [E]  Counter.c:90 AudioRecevied fps 53
1970-01-01 00:04:45.036 [E]  EngineImplX.c:242 callback pEngineImplX->eventHandler.on_audio_data used too many times 4
1970-01-01 00:04:45.288 [E]  EngineImplX.c:242 callback pEngineImplX->eventHandler.on_audio_data used too many times 4
1970-01-01 00:04:45.401 [E]  EngineImplX.c:242 callback pEngineImplX->eventHandler.on_audio_data used too many times 7
1970-01-01 00:04:45.748 [E]  EngineImplX.c:242 callback pEngineImplX->eventHandler.on_audio_data used too many times 4
1970-01-01 00:04:46.009 [E]  Counter.c:90 AudioRecevied fps 51
1970-01-01 00:04:46.297 [E]  EngineImplX.c:242 callback pEngineImplX->eventHandler.on_audio_data used too many times 4
1970-01-01 00:04:46.722 [E]  EngineImplX.c:242 callback pEngineImplX->eventHandler.on_audio_data used too many times 7

Done
tempo@tempo-OptiPlex-3080:~/adf-commit/esp-adf-internal/examples/ai_agent/volc_rtc$ cat sdkconfig |grep CONFIG_LANGUAGE_WAKEUP_MODE
# CONFIG_LANGUAGE_WAKEUP_MODE is not set

@tianrongqin
Copy link
Author

@shootao hello 现在已经可以听到欢迎词了,但是问问题的时候,esp32s3-korvo-2 v3板没反应或者反应很慢,需要问几次才做出应答,并且输出的声音断断续续的,抖得很厉害,请问这种是什么原因导致的,可以怎么解决呢?

说话声音大点试下

还是一样,反应迟钝或者没啥反应

@tianrongqin
Copy link
Author

@moonlight729 我这边测试是可以对话的,log是一样的,不过没有设定WAKEUP_MODE

1970-01-01 00:04:45.007 [E]  Counter.c:90 AudioRecevied fps 53
1970-01-01 00:04:45.036 [E]  EngineImplX.c:242 callback pEngineImplX->eventHandler.on_audio_data used too many times 4
1970-01-01 00:04:45.288 [E]  EngineImplX.c:242 callback pEngineImplX->eventHandler.on_audio_data used too many times 4
1970-01-01 00:04:45.401 [E]  EngineImplX.c:242 callback pEngineImplX->eventHandler.on_audio_data used too many times 7
1970-01-01 00:04:45.748 [E]  EngineImplX.c:242 callback pEngineImplX->eventHandler.on_audio_data used too many times 4
1970-01-01 00:04:46.009 [E]  Counter.c:90 AudioRecevied fps 51
1970-01-01 00:04:46.297 [E]  EngineImplX.c:242 callback pEngineImplX->eventHandler.on_audio_data used too many times 4
1970-01-01 00:04:46.722 [E]  EngineImplX.c:242 callback pEngineImplX->eventHandler.on_audio_data used too many times 7

Done
tempo@tempo-OptiPlex-3080:~/adf-commit/esp-adf-internal/examples/ai_agent/volc_rtc$ cat sdkconfig |grep CONFIG_LANGUAGE_WAKEUP_MODE
# CONFIG_LANGUAGE_WAKEUP_MODE is not set

我这边是有设置唤醒词的,然后就是声音卡顿,说话没反应或者反应慢

@MichaelDu9226
Copy link

@tianrongqin 我的QQ是312394539。加我一下或者说下你的QQ,我加一下,谢谢

我这边提示加入房间成功,但是没有听到欢迎词。可能是智能体的语音识别和合成等没有配置成功。
请问你是如何配置的,有参考教程吗?另外你的火山账号是个人还是公司的,目前我看我的账号下都没有大模型语音合成这个选项。

@tianrongqin
Copy link
Author

@tianrongqin 我的QQ是312394539。加我一下或者说下你的QQ,我加一下,谢谢

我这边提示加入房间成功,但是没有听到欢迎词。可能是智能体的语音识别和合成等没有配置成功。 请问你是如何配置的,有参考教程吗?另外你的火山账号是个人还是公司的,目前我看我的账号下都没有大模型语音合成这个选项。

个人的,感觉不是很稳定

@shootao
Copy link

shootao commented Feb 18, 2025

服务开通可以参考 火山文档
如果确定 账号, 可以先用 web 端先测试一下

@moonlight729
Copy link

@MichaelDu9226 需要先了解一下服务开通。我的账号是个人的,跟个人还是公司没区别。

@MichaelDu9226
Copy link

MichaelDu9226 commented Feb 18, 2025

@MichaelDu9226 需要先了解一下服务开通。我的账号是个人的,跟个人还是公司没区别。

@shootao @moonlight729 照着文档申请了一圈,然后我现在可以看到有远程用户加入到房间了,但是听不到没有欢迎词。Hi乐鑫后,听到叮的一声,跟他对话也没反应。
看log有个mute audio和video,这是静音的意思?
请问这有可能是什么原因?谢谢
`
I (1023457) volc_rtc: remote user offline test_room:voiceChat_test_id_1739878003

I (1026177) volc_rtc: remote user joined test_room:voiceChat_test_id_1739878167

1970-01-01 00:17:05.620 [E] rx_net_delay_manager.c:1130

I (1026427) volc_rtc: remote user mute audio test_room:voiceChat_test_id_1739878167 0

I (1026427) volc_rtc: remote user mute video test_room:voiceChat_test_id_1739878167 1

1970-01-01 00:17:05.637 [E] EngineImplX.c:350 callback pEngineImplX->eventHandler.on_user_mute_video used too many times 5

I (1026477) volc_rtc: remote user mute audio test_room:voiceChat_test_id_1739878167 0

I (1026477) volc_rtc: remote user mute video test_room:voiceChat_test_id_1739878167 1
`

@shootao
Copy link

shootao commented Feb 18, 2025

@MichaelDu9226 这个可能就是 设备进房了, 但是智能体没有进房
建议先用 web 端的先测一下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants