-
Notifications
You must be signed in to change notification settings - Fork 844
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
2 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,9 @@ | ||
# Voice Conversion with Non-Parallel Data | ||
## Subtitle: Speaking like Kate Winslet | ||
>* This is the first draft. | ||
>* Authors: Dabi Ahn([email protected]), [Kyubyong Park](https://github.com/Kyubyong)([email protected]) | ||
>* We always welcome any questions, new ideas, or contributions. | ||
## Samples | ||
It's not perfect yet, but listen to [them](https://soundcloud.com/andabi/sets/voice-style-transfer-to-kate-winslet-with-deep-neural-networks). | ||
[Here](https://soundcloud.com/andabi/sets/voice-style-transfer-to-kate-winslet-with-deep-neural-networks). | ||
|
||
## Intro | ||
What if you could imitate a famous celebrity's voice or sing like a famous singer? | ||
|
@@ -90,16 +88,6 @@ Net2 contains Net1 as a sub-network. | |
* IMHO, the accuracy of Net1(phoneme classification) does not need to be so perfect. | ||
* Net2 can reach to near optimal when Net1 accuracy is correct to some extent. | ||
|
||
## Future Works | ||
* Adversarial training | ||
* Expecting to generate sharper and cleaner voice. | ||
* Cross lingual | ||
|
||
## Ultimate Goals | ||
* Many-to-Many(Multi target speaker) voice conversion system | ||
* VC without training set of target voice, but only small set of target voice (1 min) | ||
* (On going) | ||
|
||
## References | ||
* ["Phonetic posteriorgrams for many-to-one voice conversion without parallel data training"](https://www.researchgate.net/publication/307434911_Phonetic_posteriorgrams_for_many-to-one_voice_conversion_without_parallel_data_training), 2016 IEEE International Conference on Multimedia and Expo (ICME) | ||
* ["TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS"](https://arxiv.org/abs/1703.10135), Submitted to Interspeech 2017 | ||
* ["TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS"](https://arxiv.org/abs/1703.10135), Submitted to Interspeech 2017 |