diff --git a/README.md b/README.md index cf0cb50d..08a10c05 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,9 @@ # Voice Conversion with Non-Parallel Data ## Subtitle: Speaking like Kate Winslet ->* This is the first draft. >* Authors: Dabi Ahn(andabi412@gmail.com), [Kyubyong Park](https://github.com/Kyubyong)(kbpark.linguist@gmail.com) ->* We always welcome any questions, new ideas, or contributions. ## Samples -It's not perfect yet, but listen to [them](https://soundcloud.com/andabi/sets/voice-style-transfer-to-kate-winslet-with-deep-neural-networks). +[Here](https://soundcloud.com/andabi/sets/voice-style-transfer-to-kate-winslet-with-deep-neural-networks). ## Intro What if you could imitate a famous celebrity's voice or sing like a famous singer? @@ -90,16 +88,6 @@ Net2 contains Net1 as a sub-network. * IMHO, the accuracy of Net1(phoneme classification) does not need to be so perfect. * Net2 can reach to near optimal when Net1 accuracy is correct to some extent. -## Future Works -* Adversarial training - * Expecting to generate sharper and cleaner voice. -* Cross lingual - -## Ultimate Goals -* Many-to-Many(Multi target speaker) voice conversion system -* VC without training set of target voice, but only small set of target voice (1 min) - * (On going) - ## References * ["Phonetic posteriorgrams for many-to-one voice conversion without parallel data training"](https://www.researchgate.net/publication/307434911_Phonetic_posteriorgrams_for_many-to-one_voice_conversion_without_parallel_data_training), 2016 IEEE International Conference on Multimedia and Expo (ICME) -* ["TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS"](https://arxiv.org/abs/1703.10135), Submitted to Interspeech 2017 \ No newline at end of file +* ["TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS"](https://arxiv.org/abs/1703.10135), Submitted to Interspeech 2017