Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem when synthesize #10

Closed
binyi10 opened this issue Nov 26, 2018 · 9 comments
Closed

problem when synthesize #10

binyi10 opened this issue Nov 26, 2018 · 9 comments

Comments

@binyi10
Copy link

binyi10 commented Nov 26, 2018

thanks for this nice job

  1. but I have some promble when synthesize, there is always sound reverberation in synthesize audio compared with raw audio, does someone have same problem with me(batch size = 4,1000k step). I guess the "change order"module may lead in this problem?

  2. after 200k step, I found loss almost remain unchanged in (-3.4,-3.7), and the result of synthesize is similar too from 200k to 1000k,so I want to ask it is reasonable? and if not, which scale of loss is reasonable
    image

@jiqizaisikao
Copy link

Hi,can you upload your result wav

@binyi10
Copy link
Author

binyi10 commented Dec 1, 2018

raw: http://github.com/binyi10/-/blob/master/yibintest7.wav
syn: http://github.com/binyi10/-/blob/master/generate_890682_7_0.7.wav

      Your wave maybe wrong and cannot be opened

sorry,my fault, you can copy these urls to your browser, or git clone http://github.com/binyi10/- to you pc, then you can listen the wave

@binyi10
Copy link
Author

binyi10 commented Dec 1, 2018

      Hi,can you upload your result wav

hello,instead of using ljspeech data ,I use my dataset, so I change the source code about some para(e.g. change the value of hop_length = 120), the results have been uploaded, do you meet the similar problem?

@ksw0306
Copy link
Owner

ksw0306 commented Dec 3, 2018

How many audio samples are in your dataset? The scale of the loss function varies from dataset to dataset(VCTK, LJSpeech, ...). In the case of LJSpeech, it's near -4.5.

@jiqizaisikao
Copy link

Your wave maybe wrong and cannot be opened

@binyi10
Copy link
Author

binyi10 commented Dec 6, 2018

      How many audio samples are in your dataset? The scale of the loss function varies from dataset to dataset(VCTK, LJSpeech, ...). In the case of LJSpeech, it's near -4.5.

i use 30k samples of audio, but my cuda mem only support me use batch size = 4, so I think maybe it's suboptimal, I have try to change your code to multi-gpu, I use pytorch function DATAPARALLEL(model), and I change your code in train.py in line 110

loss = -(log_p + logdet).mean()

but the loss may be wrong, and I syn the audio is just noise.
can you open a multi-gpu version code?

@L0SG
Copy link
Collaborator

L0SG commented Dec 7, 2018

Actually for the multi-GPU training there is subtle caveat. The ActNorm parameters need to be initialized with the first training batch before converting the model into DataParallel, or else the training diverges badly. We'll upload the (correct) multi-GPU training part of the code shortly.

@L0SG
Copy link
Collaborator

L0SG commented Dec 7, 2018

The latest commit now supports the proper multi-GPU training.

@L0SG
Copy link
Collaborator

L0SG commented Dec 21, 2018

We'll close the issue, but feel free to re-open if the problem persists. FYI: please also refer to #13 .

@L0SG L0SG closed this as completed Dec 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants