Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resampling in training #4

Open
yahshibu opened this issue Aug 25, 2022 · 1 comment
Open

Resampling in training #4

yahshibu opened this issue Aug 25, 2022 · 1 comment

Comments

@yahshibu
Copy link

Hello, thank you for making your code publicly available! Great work.

This is not a question, but I was confused about the implemented processing. That confusion is already resolved, but let me share what happened to me.

I know that README says, "this implementation assumes a sample rate of 16 kHz". On the other hand, the original sampling rate of VoiceBank-DEMAND is 48 kHz as you know. So, we need to resample audio signals for applying CDiffuSE to this dataset.

Indeed, your scripts resample audio signals in preprocessing and inference.

For preprocessing:

y, sr = librosa.load(filename, sr=16000)

For inference:

noisy_signal, _ = librosa.load(os.path.join(args.wav_path,spec.split("/")[-1].replace(".spec.npy","")),sr=16000)

But, audio signals are not resampled in training:

signal, _ = torchaudio.load(audio_filename)
noisy_signal, _ = torchaudio.load(noisy_filename)

To reproduce your experimental results, I read the README of this repository and used this script as it is. Then, audio signals of 48kHz are loaded in dataset.py and given to the diffusion model as they are during training. When I checked an audio signal that is saved here

writer.add_audio('feature/audio', features['audio'][0], step, sample_rate=self.params.sample_rate)

, I confirmed that it seems like the signal is played slowly. This is because an audio signal of 48 kHz is saved as a 16kHz signal.

It might be better either to resample audio signals also in dataset.py or to explicitly note that users need to resample audio signals by themselves in advance. That is clearer, at least to me.

Sorry for the long post. Best regards.

@neillu23
Copy link
Owner

Hi @yahshibu , thank you so much for clearing this up!!
I had been working with the 16kHz version for a while and I didn't notice that the original data in the link VoiceBank-DEMAND was 48kHz. This caused a lot of problems when users tried to reproduce it. I will add the resampling process to dataset.py as you suggested. Thanks again for your help!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants