Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to reproduce but some issues occur #3

Open
yizhidamiaomiao opened this issue Jul 13, 2022 · 7 comments
Open

Try to reproduce but some issues occur #3

yizhidamiaomiao opened this issue Jul 13, 2022 · 7 comments

Comments

@yizhidamiaomiao
Copy link

yizhidamiaomiao commented Jul 13, 2022

I run the command "./train.sh 0 se model_se"

The issue is
"""""""""""""""""""""""""""""""""
Preprocessing: 0%| | 0/11572 [00:00<?, ?it/s]
Traceback (most recent call last):
File "src/cdiffuse/preprocess.py", line 140, in
main(parser.parse_args())
File "src/cdiffuse/preprocess.py", line 120, in main
list(tqdm(executor.map(spec_transform, filenames, repeat(args.dir), repeat(args.outdir)), desc='Preprocessing', total=len(filenames)))
File "/home/tiger/.local/lib/python3.7/site-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/usr/lib/python3.7/concurrent/futures/process.py", line 476, in _chain_from_iterable_of_lists
for element in iterable:
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 586, in result_iterator
yield fs.pop().result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
"""""""""""""""""""""""""""""""""
How to solve this?

Although "se_pre" mode can run, with the dataset provided by your link, I MUST change the sample_rate to 48000 in params.py, otherwise this code will throw a wrong information. Does this correct for the reproduce?

Also, I run for 12 hours with 4 GPU at step 156600 for "se_pre" mode. How long(how much epoch) do we need to train your model?

@yizhidamiaomiao yizhidamiaomiao changed the title Try to reproduce but issue happens for se mode Try to reproduce but some issues occur Jul 13, 2022
@neillu23
Copy link
Owner

Hi @yizhidamiaomiao, thanks for sharing your experience!
I've replaced torchaudio.load_wav() with the torchaudio.load() function in the new commit. This may fix some errors, as torchaudio.load_wav has been removed in newer versions of torchaudio.
For the second question, can you share a link to the data, and is the sample rate of the data you are using 48000?
Also, the "se_pre" step is no longer needed, as the randomly initialized CDiffuSE performs as well as the one initialized from pre-trained parameters. The model with step 507600 (no pre-training) in my experiments slightly exceeded our paper's results.
Please try the new code and let me know if you have any further questions!

@yizhidamiaomiao
Copy link
Author

Hi, thank you for your response!

By your instructions, I do not train the "se_pre" now. I tried to directly train your model by the command: "./train.sh 0 se model_se" and evaluate at step 600075 by the command "./inference.sh 0 600075 se model_se".

However, the training result seems different from your folder 'Sample Files'. Here is the link of the generated speech by the trained model 'weights-600075.pt' https://drive.google.com/drive/folders/1aK0zzC1wDToWIAoEq2dNSsdwSo9n9rWd?usp=sharing, which may not competitive with the SOTA model. Could you please help us find out what should we do in order to reproduce your result in 'Sample Files' ?

@neillu23
Copy link
Owner

Hi @yizhidamiaomiao, thanks for sharing the audio file!
The command you are using seems to be from a previous commit. I've updated the command style and torchaudio functions in this commit: 7e13e6e
The new command would be ". /train.sh 0 model_se" and ". /inference.sh 0 model_se 600075". Here are the results I got from my trained model 'weights-54000.pt' https://drive.google.com/drive/folders/1EIh-ZwokHcRacv20Umk9MMkld9ETdBSQ?usp=sharing.
The environment I used was torchaudio 0.9.0/ pytorch 1.9.0.
If this doesn't work for you, please let me know; thanks again!

@yizhidamiaomiao
Copy link
Author

7e13e6e

Thanks for your response!

We download your newest code, and trained by the command ". /train.sh 0 model_se" and inferenced by command "./inference.sh 0 model_se 108000 ". The trained model is 'weights-108000.pt'. The newest results we get are in the link https://drive.google.com/drive/folders/1aK0zzC1wDToWIAoEq2dNSsdwSo9n9rWd?usp=sharing with file named as "*_enhanced_ver 7e13e6e.wav". It seems that there still be some noise in those enhanced speech.

Shall we wait for step 507600?

The environment I used is torchaudio '0.10.0+cu113'/ pytorch 1.10.0.

Wait for any further guidance and thanks for your patient!

@neillu23
Copy link
Owner

Thank you for reporting the following results!

I think a possible reason could be the difference between our training data. You mentioned the data you used with a 48000 sampling rate but the data I used are with a 16000 sample rate. Could you share your training data and model with me so I can try if your data/model works in my environment?

Thank you again, and sorry for the inconvenience!

@yizhidamiaomiao
Copy link
Author

Thank you for reporting the following results!

I think a possible reason could be the difference between our training data. You mentioned the data you used with a 48000 sampling rate but the data I used are with a 16000 sample rate. Could you share your training data and model with me so I can try if your data/model works in my environment?

Thank you again, and sorry for the inconvenience!

I use the data directly from your link "https://datashare.ed.ac.uk/handle/10283/2791" given in the sentence "The default dataset is VOICEBANK-DEMAND dataset. You can download them from VOICEBANK-DEMAND)" in the README.md file. Actually the audio downloaded in the given website are 48k audio, and I need to write a torchaudio.resample(48k, 16k) in the function "transform" in your preprocess file to train the code.

@neillu23
Copy link
Owner

neillu23 commented Aug 1, 2022

The data I'm using is already at a 16k sample rate, which may be different from the one in the link. Could you try adding a torchaudio.resample(48k, 16k) for both "signal" and "noisy_signal" in the __getitem__ function here in NumpyDataset? If this works, I will change the description in the README. Sorry again about this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants