problem when synthesize #10

binyi10 · 2018-11-26T09:37:38Z

thanks for this nice job

but I have some promble when synthesize, there is always sound reverberation in synthesize audio compared with raw audio, does someone have same problem with me(batch size = 4,1000k step). I guess the "change order"module may lead in this problem?
after 200k step, I found loss almost remain unchanged in (-3.4,-3.7), and the result of synthesize is similar too from 200k to 1000k，so I want to ask it is reasonable? and if not, which scale of loss is reasonable

jiqizaisikao · 2018-11-27T09:19:06Z

Hi,can you upload your result wav

binyi10 · 2018-12-01T05:49:51Z

raw: http://github.com/binyi10/-/blob/master/yibintest7.wav
syn: http://github.com/binyi10/-/blob/master/generate_890682_7_0.7.wav

      Your wave maybe wrong and cannot be opened

sorry,my fault, you can copy these urls to your browser, or git clone http://github.com/binyi10/- to you pc, then you can listen the wave

binyi10 · 2018-12-01T05:55:45Z

      Hi,can you upload your result wav

hello，instead of using ljspeech data ,I use my dataset, so I change the source code about some para(e.g. change the value of hop_length = 120), the results have been uploaded, do you meet the similar problem?

ksw0306 · 2018-12-03T07:45:27Z

How many audio samples are in your dataset? The scale of the loss function varies from dataset to dataset(VCTK, LJSpeech, ...). In the case of LJSpeech, it's near -4.5.

jiqizaisikao · 2018-12-03T12:16:40Z

Your wave maybe wrong and cannot be opened

binyi10 · 2018-12-06T04:31:58Z

      How many audio samples are in your dataset? The scale of the loss function varies from dataset to dataset(VCTK, LJSpeech, ...). In the case of LJSpeech, it's near -4.5.

i use 30k samples of audio, but my cuda mem only support me use batch size = 4, so I think maybe it's suboptimal, I have try to change your code to multi-gpu, I use pytorch function DATAPARALLEL(model), and I change your code in train.py in line 110

loss = -(log_p + logdet).mean()

but the loss may be wrong, and I syn the audio is just noise.
can you open a multi-gpu version code?

L0SG · 2018-12-07T01:46:11Z

Actually for the multi-GPU training there is subtle caveat. The ActNorm parameters need to be initialized with the first training batch before converting the model into DataParallel, or else the training diverges badly. We'll upload the (correct) multi-GPU training part of the code shortly.

L0SG · 2018-12-07T03:23:54Z

The latest commit now supports the proper multi-GPU training.

L0SG · 2018-12-21T03:51:41Z

We'll close the issue, but feel free to re-open if the problem persists. FYI: please also refer to #13 .

L0SG closed this as completed Dec 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problem when synthesize #10

problem when synthesize #10

binyi10 commented Nov 26, 2018

jiqizaisikao commented Nov 27, 2018

binyi10 commented Dec 1, 2018 •

edited

Loading

binyi10 commented Dec 1, 2018

ksw0306 commented Dec 3, 2018

jiqizaisikao commented Dec 3, 2018

binyi10 commented Dec 6, 2018

L0SG commented Dec 7, 2018

L0SG commented Dec 7, 2018

L0SG commented Dec 21, 2018

problem when synthesize #10

problem when synthesize #10

Comments

binyi10 commented Nov 26, 2018

jiqizaisikao commented Nov 27, 2018

binyi10 commented Dec 1, 2018 • edited Loading

binyi10 commented Dec 1, 2018

ksw0306 commented Dec 3, 2018

jiqizaisikao commented Dec 3, 2018

binyi10 commented Dec 6, 2018

L0SG commented Dec 7, 2018

L0SG commented Dec 7, 2018

L0SG commented Dec 21, 2018

binyi10 commented Dec 1, 2018 •

edited

Loading