Reproduce on Voicebank dataset #2

trfnhle · 2022-04-20T07:58:50Z

First of all, thank you for your great work
I tried to reproduce on the Voicebank dataset with your code but got some problems. I try inference on checkpoint 100k but the result is not compared to your sample files and still remains background noise.

Some steps I do:

Preprocessing Voicebank dataset with flag se
Training without any modification
And here is my loss figure:

Could you get some insight into what possibly was I doing wrong?

neillu23 · 2022-04-20T08:15:10Z

Dear @l4zyf9x,

We found there are some issues with different PyTorch and torchaudio versions and we will also try to fix this issue soon.
Could you try with pytorch1.8.0/torchaudio0.8.0 or pytorch1.8.1/torchaudio0.8.1?

Here is my training loss figure:

Thank you!

trfnhle · 2022-04-20T08:37:19Z

@neillu23 Thanks for your quick response
I will try with torch and touch audio version you suggest
Btw, I have some more questions. I noticed that in sample audio, there are *raw_enhanced.wav and *enhanced.wav. What difference between them?
One more thing, when we use flag se_pre, it seems to use clean audio to condition on diffusion step. I just don't see the motive why do you use clean audio in the diffusion step

neillu23 · 2022-07-15T22:13:05Z

Hi @l4zyf9x , sorry I missed your last message.
I've replaced torchaudio.load_wav() with the torchaudio.load() function in the new commit. You can try it with the new torch and touchaudio versions.
The *enhanced.wav are further combined with a noise signal with a ratio of 0.2 to recover high-frequency speech
as described at the end of Sec. 4.1, while *raw_enhanced.wav is the result of no combination.
The "se_pre" step was designed for our previous work DiffuSE, we tried the same initialization for CDiffuSE while writing the paper. Afterwards, we found that the pre-training step was no longer needed in CDiffuSE, since the CDiffuSE initialized randomly performed as well as CDiffuSE initialized from pre-trained parameters.
Please try the new code and let me know if you have any further questions!

KarsonYu · 2023-04-16T10:18:43Z

Hello,have you try the version author mentioned?And how is the performance?

Charizard-007 · 2024-01-26T09:43:20Z

Hello,have you try the version author mentioned?And how is the performance?

Hello,have you try the version author mentioned?And how is the performance?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce on Voicebank dataset #2

Reproduce on Voicebank dataset #2

trfnhle commented Apr 20, 2022 •

edited

Loading

neillu23 commented Apr 20, 2022

trfnhle commented Apr 20, 2022

neillu23 commented Jul 15, 2022

KarsonYu commented Apr 16, 2023

Charizard-007 commented Jan 26, 2024

Reproduce on Voicebank dataset #2

Reproduce on Voicebank dataset #2

Comments

trfnhle commented Apr 20, 2022 • edited Loading

neillu23 commented Apr 20, 2022

trfnhle commented Apr 20, 2022

neillu23 commented Jul 15, 2022

KarsonYu commented Apr 16, 2023

Charizard-007 commented Jan 26, 2024

trfnhle commented Apr 20, 2022 •

edited

Loading