Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update exercise for 2024 with feedback form 2023 #15

Merged
merged 53 commits into from
Aug 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
ad2bc3b
Added answers
afoix Jul 24, 2024
d5ff5ce
more answers
afoix Jul 24, 2024
4231176
Added README
afoix Jul 24, 2024
baa9fa9
removed axis from images
afoix Jul 24, 2024
3f19b62
removed axis from images in the exercise
afoix Jul 24, 2024
e8f9ad9
fixed 5.2 -> 5.3
afoix Jul 24, 2024
bb724b8
added two tasks to train the denoiser on both MNIST and FashionMNIST …
afoix Jul 25, 2024
8bfef70
Added the 2 new tasks to the exercies notebook
afoix Jul 25, 2024
8707bb2
removed the now implemented bonus question cell
afoix Jul 25, 2024
26faa1f
remove colored_mnist and add dlmbl-unet packages
afoix Aug 17, 2024
d0037c2
fix classification package install in setup
afoix Aug 17, 2024
bd21329
switch to vanilla tqdm
afoix Aug 17, 2024
31210c9
added new dense model and unet
afoix Aug 17, 2024
5892413
updated the exercise
afoix Aug 17, 2024
f42e779
Added missing import to exercise
afoix Aug 17, 2024
f384125
added missing import to solution
afoix Aug 17, 2024
76eaa2a
use bilinear upsampling in the unet
afoix Aug 17, 2024
93b4699
Add github action for building notebooks
afoix Aug 17, 2024
e01ec02
cleared outputs from solution notebook
afoix Aug 17, 2024
b4665d9
Added solution.py
afoix Aug 17, 2024
e4173be
Commit from GitHub Actions (Build Notebooks)
afoix Aug 17, 2024
8ae095e
dded solution tag to all answers in solution.py
afoix Aug 17, 2024
bfab2c3
Commit from GitHub Actions (Build Notebooks)
afoix Aug 17, 2024
9431399
Fix section numbering issue
afoix Aug 17, 2024
1b1713e
Commit from GitHub Actions (Build Notebooks)
afoix Aug 17, 2024
7377642
Try to fix empty cells
afoix Aug 17, 2024
3dce87c
Commit from GitHub Actions (Build Notebooks)
afoix Aug 17, 2024
1fcfd04
Other attempt at fixing solution.py
afoix Aug 17, 2024
a9e9ff5
Commit from GitHub Actions (Build Notebooks)
afoix Aug 17, 2024
7e69e43
Commit from GitHub Actions (Build Notebooks)
afoix Aug 17, 2024
3af5978
Update README.md
afoix Aug 17, 2024
09c6e65
Commit from GitHub Actions (Build Notebooks)
afoix Aug 17, 2024
8f579c7
Remove jupyter lab from README
cmalinmayor Aug 18, 2024
a73770d
Make the intro in README a little shorter
cmalinmayor Aug 18, 2024
2f69b54
Use conda in setup.sh
cmalinmayor Aug 18, 2024
ac50c96
Add -y to conda command in setup.sh
cmalinmayor Aug 18, 2024
c201d2a
Add gitignore for data directories
cmalinmayor Aug 18, 2024
e75177f
Add a lot of newlines to stop cells merging together
cmalinmayor Aug 18, 2024
ce04a07
Use percent format to prevent cells merging together
cmalinmayor Aug 18, 2024
ca7f24c
Commit from GitHub Actions (Build Notebooks)
cmalinmayor Aug 18, 2024
7fd2f36
Merge pull request #16 from dlmbl/cmm_edits
afoix Aug 19, 2024
a3de073
Commit from GitHub Actions (Build Notebooks)
afoix Aug 19, 2024
8d26f5e
change tqdm.auto to only tqdm because of printing problems in jupyter
afoix Aug 19, 2024
862c65f
Commit from GitHub Actions (Build Notebooks)
afoix Aug 19, 2024
d2ecece
Fix typos
adjavon Aug 19, 2024
92eb6b6
Remove references to Element
adjavon Aug 19, 2024
d77a454
Remove hard-coded CUDA
adjavon Aug 19, 2024
878b60e
Fix confusion matrix visualization
adjavon Aug 19, 2024
41e70b4
Commit from GitHub Actions (Build Notebooks)
adjavon Aug 19, 2024
98cd0a0
Add description of the model forgetting
adjavon Aug 19, 2024
65adfb3
Commit from GitHub Actions (Build Notebooks)
adjavon Aug 19, 2024
e58a66e
Merge pull request #17 from dlmbl/da_updates
afoix Aug 19, 2024
f7963b7
Commit from GitHub Actions (Build Notebooks)
afoix Aug 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .github/workflows/build-notebooks.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Build Notebooks
on:
push:

jobs:
run:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"

- name: Install dependencies
run: |
python -m pip install -U pip
python -m pip install jupytext nbconvert


- name: Build notebooks
run: |
jupytext --to ipynb --update-metadata '{"jupytext":{"cell_metadata_filter":"all"}}' solution.py

jupyter nbconvert solution.ipynb --TagRemovePreprocessor.enabled=True --TagRemovePreprocessor.remove_cell_tags solution --to notebook --output exercise.ipynb
jupyter nbconvert solution.ipynb --TagRemovePreprocessor.enabled=True --TagRemovePreprocessor.remove_cell_tags task --to notebook --output solution.ipynb

- uses: EndBug/add-and-commit@v9
with:
add: solution.ipynb exercise.ipynb
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
mnist/
fashion_mnist/
73 changes: 68 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,77 @@
# Exercise 7: Failure Modes & Limits of Deep Learning

## Getting this repo

If you are working from the super repository https://github.com/dlmbl/DL-MBL-2024, don't forget to update this submodule:
```
git submodule update --init --recursive 07_failure_modes
```

## Goal
In Exercise 7: Failure Modes and Limits of Deep Learning, we delve into understanding the limits and failure modes of neural networks in the context of image classification. By tampering with image datasets and introducing extra visual information, the exercise mimics real-world scenarios where data collection inconsistencies can corrupt datasets.

The exercise examines how neural networks handle local and global data corruptions. We will reason about a classification network's performance through confusion matrices, and use tools like Integrated Gradients to identify areas of an image that influence classification decisions. Additionally, the exercise explores how denoising networks cope with domain changes by training a UNet model on noisy MNIST data and testing it on both similar and different datasets like FashionMNIST.

Through these activities, participants are encouraged to think deeply about neural network behavior, discuss their findings in groups, and reflect on the impact of dataset inconsistencies on model performance and robustness. By exploring failure modes, participants gain insights into the internal workings of neural networks and learn how to diagnose and mitigate issues that are common in real-world scendarios.


## Methodology
1. **Data Preparation**:
- **Load Data**: Load the MNIST dataset for training and testing.
- **Create Tainted Dataset**: Make copies of the original datasets to create tainted versions.
- **Local Corruption**: Add a white pixel to images of the digit '7' in the tainted dataset.
- **Global Corruption**: Add a grid texture to images of the digit '4' in the tainted dataset.

2. **Visualization**:
- Visualize examples of corrupted images to understand the modifications made.

3. **Train Neural Networks**:
- **Define Models**: Create a dense neural network model for classification.
- **Initialize Models**: Set up clean and tainted models with identical initial weights for comparison.
- **Load Data**: Initialize data loaders for clean and tainted datasets.
- **Train Models**: Train both models on their respective datasets (clean and tainted).

4. **Evaluate Performance**:
- **Loss Visualization**: Plot training loss for both clean and tainted models to compare performance.
- **Confusion Matrix**: Generate confusion matrices to analyze model performance on clean and tainted test sets.

5. **Interpret Results**:
- **Integrated Gradients**: Use the Integrated Gradients method to visualize the important regions of the images that influence the model's decisions.
- **Visualize Attention**: Compare the attention maps for clean and tainted models on specific images.

6. **Denoising Task**:
- **Add Noise**: Introduce noise to MNIST images to create a dataset for training a denoising model.
- **Define UNet Model**: Use a UNet model architecture for denoising.
- **Train Denoising Model**: Train the UNet model on the noisy MNIST dataset.
- **Evaluate on FashionMNIST**: Apply the trained denoising model to FashionMNIST data to see how it performs on unseen data.

### Technology Used

1. **Programming Language**:
- Python

2. **Libraries and Tools**:
- **PyTorch**: For building and training neural networks.
- `torchvision`: For loading and transforming datasets.
- `torch.nn`: For defining neural network models.
- `torch.optim`: For optimization algorithms.
- **Matplotlib**: For visualizing images and plotting graphs.
- **Scipy**: For image manipulation (e.g., adding textures).
- **Numpy**: For numerical operations.
- **TQDM**: For displaying progress bars during training.
- **Captum**: For implementing Integrated Gradients and other interpretability methods.
- **Seaborn**: For creating confusion matrices.

3. **Datasets**:
- **MNIST**: Handwritten digit dataset for training and testing classification models.
- **FashionMNIST**: Fashion item dataset for evaluating the denoising model on different data.

## Setup
Please run the setup script to create the environment for this exercise and download data.

```bash
source setup.sh
```

When you are ready to start the exercise, make sure you are in your base environment and then run jupyter lab.
```bash
mamba activate base
jupyter lab
```
When you are ready to start the exercise, open the `exercise.ipynb` file in VSCode
and select the `07-failure-modes` kernel
Loading