Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thesis Story #10

Open
anthonio9 opened this issue Mar 22, 2024 · 10 comments
Open

Thesis Story #10

anthonio9 opened this issue Mar 22, 2024 · 10 comments

Comments

@anthonio9
Copy link
Owner

No description provided.

@anthonio9
Copy link
Owner Author

anthonio9 commented Mar 22, 2024

Thesis story wrap-up:

  • should contain a comparison to PENN when with polyphonic pitch audio to show that it does not work at all
  • show that PENN with split hexaphonic input works quite well
  • make a common metric for FretNet and PPN and hope for the best
  • make a strong conclusion about the string confusion problem with PPN, compare the standard metrics with the Multi-Pitch specific string agnostic metrics (close the generalization gap). The monophonic audio example is great for showing the confusion.
  • rethink the visualizations
    • consider making the lines NOT dotted, just make one of the lines thicker, green and yellow color scheme for the logits isn't the best, maybe try a gray scale.

@anthonio9
Copy link
Owner Author

FretNet has multiple evaluator classes like PitchListEvaluator or MultipitchEvaluator. Each class has an unpack() method and evaluate method. The idea now is to derive from PitchListEvaluator class, use the same unpack() function, but a completely different evaluate().

anthonio9 added a commit to anthonio9/guitar-transcription-continuous that referenced this issue Mar 27, 2024
anthonio9 added a commit to anthonio9/guitar-transcription-continuous that referenced this issue Apr 6, 2024
...with the dataset stored in ../Datasets and cache in ../generated

related to: anthonio9/penn#10
@anthonio9
Copy link
Owner Author

Why does the reference differ so much from estimated in time stamps, but only in the pitch_list?

Image

@anthonio9
Copy link
Owner Author

As for the previous question, it seems that only every 4th timestamp is present in the predicted set. This means that FretNet is only made to handle larger buffers and latency is larger then what was designed in PPN.

@anthonio9
Copy link
Owner Author

Thesis story wrap-up:

* [ ]  should contain a comparison to PENN when with polyphonic pitch audio to show that it does not work at all

How should this be presented?

Plot of an example track with ground truth and predicted pitch, both over a spectrogram - This should be pretty good.
In addition to that a metric for FRMSE and FRPA would be a great addition.

@anthonio9
Copy link
Owner Author

anthonio9 commented Apr 26, 2024

Thesis Layout

Thesis should be 30-40-50 pages.

  • including pictures

Introduction:

  • General motivation
  • Something about the results
  • Introduction usually does not get into model details

Section 2:

  • go over theory: CNN - methods section
  • usually this is split into the background section and proposed method section
  • the background section is more detailed then in papers
  • background: explain the model of the original paper: PENN and FretNet
  • simply discover the problem with the GuitarSet dataset

Section 3: Proposed method

  • How is the model adapted to the polyphonic tracking
  • Explain the RMSE / RPA - accuracy metrics, the string agnostic metrics

Section 4: experiments

  • dataset explanation / description
  • tell more about the problem with the GuitarSet datset

Results 5: results and analysis

  • Conclusion
  • Fully convolutional models or transformers
  • New dataset with better ground truth is needed! [https://arxiv.org/abs/2309.09085](new dataset)

@anthonio9
Copy link
Owner Author

For the next meeting: the table of contents + anything extra is nice.

@anthonio9
Copy link
Owner Author

Main results table:
String Agnostic RMSE, String Agnostic RPA
Non-String Agnosic RMSE, Non-String Agnostic RPA,

if possible, copy String-Agnostic Note from FretNet

@anthonio9
Copy link
Owner Author

  • methods and background section is START IT!

@anthonio9
Copy link
Owner Author

anthonio9 commented Jul 5, 2024

  • explain the math behind everything that you use: Feed Forward NNs, CNNs, CQT. You don't have to talk about anything more, like transformers etc.
  • background: Explain the architecture of MLP, CNNs
  • Background: Explain how the network and classification of NNs, what is softmax, relu, binary cross entropy (piano roll model), categorical cross entropy (one-hot model)
  • Periodicity and entropy
  • Spend some time on the understanding, calculation and description of the receptive field

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant