Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update exercise for 2024 with feedback form 2023 #15

Merged
merged 53 commits into from
Aug 19, 2024
Merged

Update exercise for 2024 with feedback form 2023 #15

merged 53 commits into from
Aug 19, 2024

Conversation

afoix
Copy link
Contributor

@afoix afoix commented Jul 25, 2024

This pull request includes:

@afoix afoix requested review from cmalinmayor and adjavon July 25, 2024 09:18
@cmalinmayor
Copy link
Contributor

@afoix How long did it take to train the denoising model for you? On my M3 mac (no GPU, but usually quite speedy with pytorch) its over half an hour. We should test on the actual hardware, because if it is that slow we need to make it faster.

Copy link
Contributor

@cmalinmayor cmalinmayor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still want to test the denoising part on the hardware to see how long it takes. I don't remember it being slow last year? Perhaps @adjavon has more memory on this than I do. @adjavon also curious to get your opinion on the new denoising tasks - at the very least we should explain them a bit more.


# ### Train the denoiser on both MNIST and FashionMNIST
#
# In this section, we will perform the denoiser training once again, but this time on both MNIST and FashionMNIST datasets, and then try to apply the newly trained denoiser to a set of noisy test images.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to think a bit more on what the point of this part is. Do we actually want them to train sequentially and then shuffled? I think it would be fine to just train shuffled and show that it works and is the correct way to get good (ish?) performance on both. If we want to include a sequential example, I think it would make more sense as essentially a fine-tuning procedure, where you fully train on MNIST and then FashionMNIST, which would show how fine-tuned networks forget their initial task. The way it is implemented now, both datasets are included in both epochs so it is not really sequential or shuffled, and I find it confusing what exactly we are trying to teach them. Also, if the denoiser takes longer than 10 minutes to train, we should definitely skip sequential.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @cmalinmayor the denoiser definitely takes less to train at least in my machine with a GPU 😄 However, I am happy to skip the sequential part if you both prefer it to be removed. Let me know! :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment below, but long story short on the AWS hardware it is just fast enough to not be too annoying. I like having both because I like the lesson I learn from it (networks forget). I can have a go at making that a bit more on-the-nose in the text during my afternoon run typos-and-things PR :)

@afoix
Copy link
Contributor Author

afoix commented Aug 19, 2024

@afoix How long did it take to train the denoising model for you? On my M3 mac (no GPU, but usually quite speedy with pytorch) its over half an hour. We should test on the actual hardware, because if it is that slow we need to make it faster.

Hello @cmalinmayor in my machine, using a GPU each epoch is only 25 seconds, and it's 5 epochs so it's actually quite fast 😄

@adjavon
Copy link
Contributor

adjavon commented Aug 19, 2024

I still want to test the denoising part on the hardware to see how long it takes. I don't remember it being slow last year? Perhaps @adjavon has more memory on this than I do. @adjavon also curious to get your opinion on the new denoising tasks - at the very least we should explain them a bit more.

I'm testing the whole exercise today on the TA machine so we can see how long it takes.

@adjavon
Copy link
Contributor

adjavon commented Aug 19, 2024

Notes while running the exercise on the TA machine

  1. Something seems wrong with the normalization of the confusion matrices when I run this; it taps out at 30 and therefore the colors don't really match the numbers. For instance in the example below the cells on row 4 that have values 45, 17, and 34 are colored the same way as the diagonal cells that have values >90.
Screenshot 2024-08-19 at 12 08 35 PM
  1. Generally, the TQDM progress bars disappear after each epoch is done and turn into an odd recap (see first line). Not sure if this is on purpose as I've never seen this before, but ignore if it is!
Screenshot 2024-08-19 at 12 16 50 PM
  1. The first denoising network takes about 40s per epoch to train on the TA machine -- just the right amount of time for me to go make myself a coffee -- perfectly reasonable :)
  2. The second denoising network (on the concatenated dataset) takes 1min30 seconds per epoch on the TA machine; falls into the longer side but still generally reasonable.
  3. Same for the third denoising network (shuffled combined). I am not against keeping it as it does provide an important learning point (i.e. networks forget)
  4. A suggestion: it would be nice to have a little "Conclusion" at the final checkpoint about key takeaways

I noticed a couple typos and trivial things that I can fix so I'll make a PR this afternoon for those :)

@cmalinmayor
Copy link
Contributor

Something seems wrong with the normalization of the confusion matrices when I run this; it taps out at 30 and therefore the colors don't really match the numbers. For instance in the example below the cells on row 4 that have values 45, 17, and 34 are colored the same way as the diagonal cells that have values >90.

It seems like it is coloring on absolute numbers, not on percents. Weird!! We should fix.

@adjavon
Copy link
Contributor

adjavon commented Aug 19, 2024

Something seems wrong with the normalization of the confusion matrices when I run this; it taps out at 30 and therefore the colors don't really match the numbers. For instance in the example below the cells on row 4 that have values 45, 17, and 34 are colored the same way as the diagonal cells that have values >90.

It seems like it is coloring on absolute numbers, not on percents. Weird!! We should fix.

Will try to fix in my PR Done

@afoix
Copy link
Contributor Author

afoix commented Aug 19, 2024

@adjavon @cmalinmayor thank you for your help! I am merging this now onto main ready for the course 🥳

@afoix afoix merged commit c4f083b into main Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants