Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature extractor #1

Open
Rasipuram opened this issue May 22, 2019 · 7 comments
Open

Feature extractor #1

Rasipuram opened this issue May 22, 2019 · 7 comments

Comments

@Rasipuram
Copy link

Hi,

Can you please point me to use this repository for audio feature extraction ?
Does this give a fixed representation for variable length audio signal?

@hrbigelow
Copy link
Owner

hrbigelow commented May 22, 2019 via email

@Rasipuram
Copy link
Author

Thank you for getting back.
I am eager to use this repository for my work. It sounds very interesting!!

I work in the area of Human Behavior Analysis. I am exploring options to extract features from any ML models. It would be great if you can point me to any such repositories.

@hrbigelow
Copy link
Owner

hrbigelow commented May 23, 2019 via email

@Rasipuram
Copy link
Author

Thank you for getting back.

@J22Melody
Copy link

Hi here,

I am also looking for an auto-encoder for wav files, and I find this project and this issue.

As said ~ three years ago, the model was not yet ready. Have you made it later? ;)

For me, I have a couple of wav files of a simple word (such as the word "three"), and I want to see how well an auto-encoder can encode the waveform to an embedding space then reconstruct the waveform. Further, I might do some more interpolation on the embedding space to make something interesting happen.

I have also found https://magenta.tensorflow.org/nsynth which perfectly does this job, but their model is mainly designed for music instead of human voice, and it's hard to fine-tune their models.

I think the main idea is very similar (correct me if I am wrong), so I would be very glad to also try out this project to see the reconstruction quality, if I can make it work at all.

Best,
Zifan

@hrbigelow
Copy link
Owner

Hi Zifan,

I'm sorry I cannot be more help here, but I never did succeed in training this model. I tried training it for 10 days on a TPU (full 8x cores) on Google Colab, and it didn't converge. I then tried training a simpler model without the vector quantization, just trying to invert the MFCC encoding. That did work a bit, but was so slow to train, I only could train it to completion on 10% of the librespeech dataset. I believe the reason this model is so slow to train is the decoder is autoregressive and thus sequential, and has to run for so many timesteps.

I haven't done much work on this subject since then. I remember seeing this repo which might be useful as a component for a much faster decoder for MFCC -> Wave generation. This could then be used in an autoencoder like the Jan Chorowski one in this repo.

@J22Melody
Copy link

Thank you very much for the information ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants