Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does mood_happy-msd-musicnn work in real time? #1459

Open
feibuguocanghai opened this issue Jan 17, 2025 · 1 comment
Open

Does mood_happy-msd-musicnn work in real time? #1459

feibuguocanghai opened this issue Jan 17, 2025 · 1 comment

Comments

@feibuguocanghai
Copy link

Is it possible to use mood_happy-msd-musicnn to make accurate inferences on a 3-second audio clip?

With the following code:

from essentia.standard import MonoLoader, TensorflowPredictMusiCNN, TensorflowPredict2D

audio = MonoLoader(filename="audio.wav", sampleRate=16000, resampleQuality=4)()
embedding_model = TensorflowPredictMusiCNN(graphFilename="msd-musicnn-1.pb", output="model/dense/BiasAdd")
embeddings = embedding_model(audio)

model = TensorflowPredict2D(graphFilename="mood_happy-msd-musicnn-1.pb", output="model/Softmax")
predictions = model(embeddings)

I read the C++ code of TensorflowPredictMusiCNN function and found that the input signal requirements are frameSize=512, hopSize=256, samplerate =16000.

Then I found that the input data requirement for mood_happy-msd-musicnn-1.pb is: batchsize * 187 *96.
According to this information, I calculated that the minimum audio duration required by this model is about 3 seconds, I am not sure whether the calculation is correct or not.
In addition, can 3 seconds of data accurately deduce the correct result?
Do I need to average enough results to make the results more accurate?

@feibuguocanghai feibuguocanghai changed the title how to run BeatTrackerDegara in c++ standard mode? Does mood_happy-msd-musicnn work in real time? Jan 17, 2025
@palonso
Copy link
Contributor

palonso commented Jan 22, 2025

Hi @feibuguocanghai, you are right, this model operates in windows of 3 seconds.

Individual estimations based on 3-second windows are expected to be noisy, and to obtain more accurate results you can average predictions over time.

If you want a real-time system, you could go for something like a moving average. You can have a look at our tutorial for real-time usage of the models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants