Resampling in training #4

yahshibu · 2022-08-25T06:47:37Z

Hello, thank you for making your code publicly available! Great work.

This is not a question, but I was confused about the implemented processing. That confusion is already resolved, but let me share what happened to me.

I know that README says, "this implementation assumes a sample rate of 16 kHz". On the other hand, the original sampling rate of VoiceBank-DEMAND is 48 kHz as you know. So, we need to resample audio signals for applying CDiffuSE to this dataset.

Indeed, your scripts resample audio signals in preprocessing and inference.

For preprocessing:

CDiffuSE/src/cdiffuse/preprocess.py

Line 37 in e4b069f

y, sr = librosa.load(filename, sr=16000)

For inference:

CDiffuSE/src/cdiffuse/inference.py

Line 185 in e4b069f

    
           noisy_signal, _ = librosa.load(os.path.join(args.wav_path,spec.split("/")[-1].replace(".spec.npy","")),sr=16000)

But, audio signals are not resampled in training:

CDiffuSE/src/cdiffuse/dataset.py

Lines 56 to 57 in e4b069f

    
           signal, _ = torchaudio.load(audio_filename) 
        
           noisy_signal, _ = torchaudio.load(noisy_filename)

To reproduce your experimental results, I read the README of this repository and used this script as it is. Then, audio signals of 48kHz are loaded in dataset.py and given to the diffusion model as they are during training. When I checked an audio signal that is saved here

CDiffuSE/src/cdiffuse/learner.py

Line 172 in e4b069f

    
           writer.add_audio('feature/audio', features['audio'][0], step, sample_rate=self.params.sample_rate)

, I confirmed that it seems like the signal is played slowly. This is because an audio signal of 48 kHz is saved as a 16kHz signal.

It might be better either to resample audio signals also in dataset.py or to explicitly note that users need to resample audio signals by themselves in advance. That is clearer, at least to me.

Sorry for the long post. Best regards.

The text was updated successfully, but these errors were encountered:

neillu23 · 2022-08-26T14:00:32Z

Hi @yahshibu , thank you so much for clearing this up!!
I had been working with the 16kHz version for a while and I didn't notice that the original data in the link VoiceBank-DEMAND was 48kHz. This caused a lot of problems when users tried to reproduce it. I will add the resampling process to dataset.py as you suggested. Thanks again for your help!!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resampling in training #4

Resampling in training #4

yahshibu commented Aug 25, 2022

neillu23 commented Aug 26, 2022

Resampling in training #4

Resampling in training #4

Comments

yahshibu commented Aug 25, 2022

neillu23 commented Aug 26, 2022