The voice in version V2 doesn’t sound like the reference audio at all, whether in English or Chinese. #339

Bella-Tim · 2024-12-18T08:05:27Z

https://drive.google.com/drive/folders/1JQCDvsPcH_ZjNQywZm5B9o1SPA8gu79O?usp=sharing

Bella-Tim · 2024-12-18T08:15:04Z

C:\Users\Administrator\Desktop\OpenVoice\OpenVoice-main>python C:\Users\Administrator\Desktop\OpenVoice\OpenVoice-main\role_tts.py
Loaded checkpoint 'checkpoints_v2/converter/checkpoint.pth'
missing/unexpected keys: [] []
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
OpenVoice version: v2
C:\AudioDspFolder\Zero-Shot\6600df81-cb77-4d60-85f9-34960c5d87bb-4785.mp3
C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\SpectralOps.cpp:867.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\file_download.py:797: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
{'ZH': 1}

Text split to sentences.
在这次vacation中,
我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景.

0%| | 0/2 [00:00<?, ?it/s]Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\ADMINI~1\AppData\Local\Temp\2\jieba.cache
Loading model cost 0.716 seconds.
Prefix dict has been built successfully.
C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\file_download.py:797: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
Some weights of the model checkpoint at bert-base-multilingual-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']

This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
50%|██████████████████████████████████████████ 100%|█████████████████████████████████████████████████████████ 100%|█████████████████████████████████████████████████████████ ███████████████████████████| 2/2 [00:05<00:00, 2.96s/it]

dosubot · 2024-12-18T08:17:50Z

Hey @Bella-Tim, I'm here to help you with any bugs, questions, or contributions you have in mind. Let's get started!

To address the issue of the voice in OpenVoice V2 not sounding like the reference audio, consider the following troubleshooting steps:

Accent and Emotion: OpenVoice V2 clones only the tone color of the reference speaker, not the accent or emotion. Ensure that the base speaker TTS model has the desired accent and emotion for both English and Chinese ^[1].
Reference Audio Quality: Verify that the reference audio is clean, free from background noise, and does not contain speech from more than one person. Ensure the audio is not too short and does not have long blank sections ^[1].
Processed Folder: If you have reused the name for the reference audio, delete the processed folder to avoid conflicts that might affect the output ^[1].
Model and Checkpoint Compatibility: Ensure that the correct model and checkpoint files are being used. The warning messages about unused weights and deprecated functions suggest potential compatibility issues. Verify that all dependencies are up-to-date and compatible with the current version of OpenVoice.
Configuration and Parameters: Double-check the configurations and parameters used in the role_tts.py script to ensure they are set correctly for the desired output. This includes language settings and any specific configurations for Chinese or English voice cloning.

By following these steps, you should be able to improve the accuracy of the voice cloning output in OpenVoice V2.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

dosubot bot added the bug Something isn't working label Dec 18, 2024

aytimothy mentioned this issue Jan 8, 2025

Website Demo sounds nothing like Local Demo #349

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The voice in version V2 doesn’t sound like the reference audio at all, whether in English or Chinese. #339

The voice in version V2 doesn’t sound like the reference audio at all, whether in English or Chinese. #339

Bella-Tim commented Dec 18, 2024

Bella-Tim commented Dec 18, 2024

Text split to sentences.
在这次vacation中,
我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景.

dosubot bot commented Dec 18, 2024

The voice in version V2 doesn’t sound like the reference audio at all, whether in English or Chinese. #339

The voice in version V2 doesn’t sound like the reference audio at all, whether in English or Chinese. #339

Comments

Bella-Tim commented Dec 18, 2024

Bella-Tim commented Dec 18, 2024

Text split to sentences. 在这次vacation中, 我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景.

dosubot bot commented Dec 18, 2024

Text split to sentences.
在这次vacation中,
我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景.