You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, thank you for making your code publicly available! Great work.
This is not a question, but I was confused about the implemented processing. That confusion is already resolved, but let me share what happened to me.
I know that README says, "this implementation assumes a sample rate of 16 kHz". On the other hand, the original sampling rate of VoiceBank-DEMAND is 48 kHz as you know. So, we need to resample audio signals for applying CDiffuSE to this dataset.
Indeed, your scripts resample audio signals in preprocessing and inference.
To reproduce your experimental results, I read the README of this repository and used this script as it is. Then, audio signals of 48kHz are loaded in dataset.py and given to the diffusion model as they are during training. When I checked an audio signal that is saved here
, I confirmed that it seems like the signal is played slowly. This is because an audio signal of 48 kHz is saved as a 16kHz signal.
It might be better either to resample audio signals also in dataset.py or to explicitly note that users need to resample audio signals by themselves in advance. That is clearer, at least to me.
Sorry for the long post. Best regards.
The text was updated successfully, but these errors were encountered:
Hi @yahshibu , thank you so much for clearing this up!!
I had been working with the 16kHz version for a while and I didn't notice that the original data in the link VoiceBank-DEMAND was 48kHz. This caused a lot of problems when users tried to reproduce it. I will add the resampling process to dataset.py as you suggested. Thanks again for your help!!!
Hello, thank you for making your code publicly available! Great work.
This is not a question, but I was confused about the implemented processing. That confusion is already resolved, but let me share what happened to me.
I know that README says, "this implementation assumes a sample rate of 16 kHz". On the other hand, the original sampling rate of VoiceBank-DEMAND is 48 kHz as you know. So, we need to resample audio signals for applying CDiffuSE to this dataset.
Indeed, your scripts resample audio signals in preprocessing and inference.
For preprocessing:
CDiffuSE/src/cdiffuse/preprocess.py
Line 37 in e4b069f
For inference:
CDiffuSE/src/cdiffuse/inference.py
Line 185 in e4b069f
But, audio signals are not resampled in training:
CDiffuSE/src/cdiffuse/dataset.py
Lines 56 to 57 in e4b069f
To reproduce your experimental results, I read the README of this repository and used this script as it is. Then, audio signals of 48kHz are loaded in
dataset.py
and given to the diffusion model as they are during training. When I checked an audio signal that is saved hereCDiffuSE/src/cdiffuse/learner.py
Line 172 in e4b069f
, I confirmed that it seems like the signal is played slowly. This is because an audio signal of 48 kHz is saved as a 16kHz signal.
It might be better either to resample audio signals also in
dataset.py
or to explicitly note that users need to resample audio signals by themselves in advance. That is clearer, at least to me.Sorry for the long post. Best regards.
The text was updated successfully, but these errors were encountered: