Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The voice in version V2 doesn’t sound like the reference audio at all, whether in English or Chinese. #339

Open
Bella-Tim opened this issue Dec 18, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@Bella-Tim
Copy link

https://drive.google.com/drive/folders/1JQCDvsPcH_ZjNQywZm5B9o1SPA8gu79O?usp=sharing

@dosubot dosubot bot added the bug Something isn't working label Dec 18, 2024
@Bella-Tim
Copy link
Author

C:\Users\Administrator\Desktop\OpenVoice\OpenVoice-main>python C:\Users\Administrator\Desktop\OpenVoice\OpenVoice-main\role_tts.py
Loaded checkpoint 'checkpoints_v2/converter/checkpoint.pth'
missing/unexpected keys: [] []
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
OpenVoice version: v2
C:\AudioDspFolder\Zero-Shot\6600df81-cb77-4d60-85f9-34960c5d87bb-4785.mp3
C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\SpectralOps.cpp:867.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\file_download.py:797: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
{'ZH': 1}

Text split to sentences.
在这次vacation中,
我们计划去Paris欣赏埃菲尔铁塔和卢浮宫的美景.

0%| | 0/2 [00:00<?, ?it/s]Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\ADMINI~1\AppData\Local\Temp\2\jieba.cache
Loading model cost 0.716 seconds.
Prefix dict has been built successfully.
C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\huggingface_hub\file_download.py:797: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
Some weights of the model checkpoint at bert-base-multilingual-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']

  • This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    50%|██████████████████████████████████████████ 100%|█████████████████████████████████████████████████████████ 100%|█████████████████████████████████████████████████████████ ███████████████████████████| 2/2 [00:05<00:00, 2.96s/it]

Copy link

dosubot bot commented Dec 18, 2024

Hey @Bella-Tim, I'm here to help you with any bugs, questions, or contributions you have in mind. Let's get started!

To address the issue of the voice in OpenVoice V2 not sounding like the reference audio, consider the following troubleshooting steps:

  1. Accent and Emotion: OpenVoice V2 clones only the tone color of the reference speaker, not the accent or emotion. Ensure that the base speaker TTS model has the desired accent and emotion for both English and Chinese [1].

  2. Reference Audio Quality: Verify that the reference audio is clean, free from background noise, and does not contain speech from more than one person. Ensure the audio is not too short and does not have long blank sections [1].

  3. Processed Folder: If you have reused the name for the reference audio, delete the processed folder to avoid conflicts that might affect the output [1].

  4. Model and Checkpoint Compatibility: Ensure that the correct model and checkpoint files are being used. The warning messages about unused weights and deprecated functions suggest potential compatibility issues. Verify that all dependencies are up-to-date and compatible with the current version of OpenVoice.

  5. Configuration and Parameters: Double-check the configurations and parameters used in the role_tts.py script to ensure they are set correctly for the desired output. This includes language settings and any specific configurations for Chinese or English voice cloning.

By following these steps, you should be able to improve the accuracy of the voice cloning output in OpenVoice V2.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant