报错求助 Segmentation fault (core dumped) #846

super31425 · 2025-01-07T12:40:57Z

运行：cosyvoice = CosyVoice2('pretrained_models/CosyVoice2-0.5B', load_jit=False, load_trt=False, fp16=False)
2025-01-07 20:37:38,786 INFO input frame rate=25
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2025-01-07 20:37:42.280338132 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-01-07 20:37:42.280368311 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
text.cc: festival_Text_init
open voice lang map failed
Segmentation fault (core dumped)

aluminumbox · 2025-01-08T03:14:00Z

定位一下哪一行，机器或者环境问题

fallbernana123456 · 2025-01-08T08:35:21Z

定位一下哪一行，机器或者环境问题

我也遇到了同样的问题。
定位：
Current thread 0x00007f8ee8b72740 (most recent call first):
File "lib/python3.10/site-packages/torch/_ops.py", line 854 in call
File "lib/python3.10/site-packages/torchaudio/_backend/sox.py", line 44 in load
File "lib/python3.10/site-packages/torchaudio/_backend/utils.py", line 205 in load
File "cosyvoice/utils/file_utils.py", line 42 in load_wav
prompt_speech_16k = load_wav('zero_shot_prompt.wav', 16000)

我尝试修改了load_wav方法：
'''
import librosa
def load_wav(wav, target_sr):
# 使用librosa加载音频文件
speech, sample_rate = librosa.load(wav, sr=None) # sr=None 保证使用原始采样率
# 将多声道转换为单声道
speech = librosa.to_mono(speech)
if sample_rate != target_sr:
assert sample_rate > target_sr, f'wav sample rate {sample_rate} must be greater than {target_sr}'
# 使用librosa进行重采样
speech = librosa.resample(speech, orig_sr=sample_rate, target_sr=target_sr)
return speech
'''
但是会报错：
'''
for i, j in enumerate(cosyvoice.inference_cross_lingual('在他讲述那个荒诞故事的过程中，他突然[laughter]停下来，因为他自己也被逗笑了[laughter]。', prompt_speech_16k, stream=False)):
File "cosyvoice/cli/cosyvoice.py", line 90, in inference_cross_lingual
model_input = self.frontend.frontend_cross_lingual(i, prompt_speech_16k, self.sample_rate)
File "cosyvoice/cli/frontend.py", line 162, in frontend_cross_lingual
model_input = self.frontend_zero_shot(tts_text, '', prompt_speech_16k, resample_rate)
File "cosyvoice/cli/frontend.py", line 144, in frontend_zero_shot
prompt_speech_resample = torchaudio.transforms.Resample(orig_freq=16000, new_freq=resample_rate)(prompt_speech_16k)
File "lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "lib/python3.10/site-packages/torchaudio/transforms/_transforms.py", line 979, in forward
return _apply_sinc_resample_kernel(waveform, self.orig_freq, self.new_freq, self.gcd, self.kernel, self.width)
File "lib/python3.10/site-packages/torchaudio/functional/functional.py", line 1454, in _apply_sinc_resample_kernel
if not waveform.is_floating_point():
AttributeError: 'numpy.ndarray' object has no attribute 'is_floating_point'
'''

jhxiang · 2025-01-08T09:56:59Z

定位一下哪一行，机器或者环境问题

我也遇到同样的问题，定位报错是在load_wav这一行，报错如下：

jhxiang · 2025-01-08T10:14:22Z

定位一下哪一行，机器或者环境问题

我也遇到了同样的问题。定位： Current thread 0x00007f8ee8b72740 (most recent call first): File "lib/python3.10/site-packages/torch/_ops.py", line 854 in call File "lib/python3.10/site-packages/torchaudio/_backend/sox.py", line 44 in load File "lib/python3.10/site-packages/torchaudio/_backend/utils.py", line 205 in load File "cosyvoice/utils/file_utils.py", line 42 in load_wav prompt_speech_16k = load_wav('zero_shot_prompt.wav', 16000)

我尝试修改了load_wav方法： ''' import librosa def load_wav(wav, target_sr): # 使用librosa加载音频文件 speech, sample_rate = librosa.load(wav, sr=None) # sr=None 保证使用原始采样率 # 将多声道转换为单声道 speech = librosa.to_mono(speech) if sample_rate != target_sr: assert sample_rate > target_sr, f'wav sample rate {sample_rate} must be greater than {target_sr}' # 使用librosa进行重采样 speech = librosa.resample(speech, orig_sr=sample_rate, target_sr=target_sr) return speech ''' 但是会报错： ''' for i, j in enumerate(cosyvoice.inference_cross_lingual('在他讲述那个荒诞故事的过程中，他突然[laughter]停下来，因为他自己也被逗笑了[laughter]。', prompt_speech_16k, stream=False)): File "cosyvoice/cli/cosyvoice.py", line 90, in inference_cross_lingual model_input = self.frontend.frontend_cross_lingual(i, prompt_speech_16k, self.sample_rate) File "cosyvoice/cli/frontend.py", line 162, in frontend_cross_lingual model_input = self.frontend_zero_shot(tts_text, '', prompt_speech_16k, resample_rate) File "cosyvoice/cli/frontend.py", line 144, in frontend_zero_shot prompt_speech_resample = torchaudio.transforms.Resample(orig_freq=16000, new_freq=resample_rate)(prompt_speech_16k) File "lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "lib/python3.10/site-packages/torchaudio/transforms/_transforms.py", line 979, in forward return _apply_sinc_resample_kernel(waveform, self.orig_freq, self.new_freq, self.gcd, self.kernel, self.width) File "lib/python3.10/site-packages/torchaudio/functional/functional.py", line 1454, in _apply_sinc_resample_kernel if not waveform.is_floating_point(): AttributeError: 'numpy.ndarray' object has no attribute 'is_floating_point' '''

你的修改没问题，numpy要转成torch.tensor，然后音频保存修改成下面的形式：

for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物，那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐，笑容如花儿般绽放。', '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=False)):
    audio_np = j['tts_speech'].squeeze(0).cpu().numpy()
    sf.write('zero_shot_{}.wav'.format(i), audio_np, cosyvoice.sample_rate)

这样的话我就没有报错了，我使用torchaudio库的方法都会报段错误

fallbernana123456 · 2025-01-08T11:29:34Z

frontend_cross_lingual

不行。我在for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物，那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐，笑容如花儿般绽放。', '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=False)):
这一行就报错了。

jhxiang · 2025-01-09T02:59:44Z

定位一下哪一行，机器或者环境问题

我也遇到同样的问题，定位报错是在load_wav这一行，报错如下：

conda install ffmpeg 解决torchaudio段错误问题

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

报错求助 Segmentation fault (core dumped) #846

报错求助 Segmentation fault (core dumped) #846

super31425 commented Jan 7, 2025

aluminumbox commented Jan 8, 2025

fallbernana123456 commented Jan 8, 2025 •

edited

Loading

jhxiang commented Jan 8, 2025

jhxiang commented Jan 8, 2025

fallbernana123456 commented Jan 8, 2025

jhxiang commented Jan 9, 2025

报错求助 Segmentation fault (core dumped) #846

报错求助 Segmentation fault (core dumped) #846

Comments

super31425 commented Jan 7, 2025

aluminumbox commented Jan 8, 2025

fallbernana123456 commented Jan 8, 2025 • edited Loading

jhxiang commented Jan 8, 2025

jhxiang commented Jan 8, 2025

fallbernana123456 commented Jan 8, 2025

jhxiang commented Jan 9, 2025

fallbernana123456 commented Jan 8, 2025 •

edited

Loading