This is a guide for the session "From Theory to Practice: Live Demonstration" in the RSNA Spotlight Course 2021 AI Implementation: Building Expertise and Influence.
By the end of this activity, you will be able to:
- Understand the overall process to train a deep learning model
- Learn basic concepts on how to assess model generalizability and performance
This session used a subset of the 2019 RSNA Intracranial Hemorrhage Dataset. The paper describing the construction of the dataset can be found in this link: Flanders A.E., Prevedello L.M., Shih G. et al. Construction of a Machine Learning Dataset through Collaboration: The RSNA 2019 Brain CT Hemorrhage Challenge
- Luciano M. Prevedello, MD, MPH
- Felipe C. Kitamura, MD, MSc, PhD
The path our algorithm will go through in the loss landscape during the learning process is dependent on several factors. The learning rate is one of them, as well as the order we present our images to the model during the training process. The initial weights of the model determine the starting point in the loss landscape. These are some of the many reasons why the training process can be different if rerun in other machines. It also means you could get different results than the ones shown here if you run these experiments in your computer.
Although there is this variability in the training phase, after we finish training you should expect to get always the same results for a given test set.
Step 1: Download the file for Experiment 1 here and save it somewhere you can find it later.
Step 2: Right click on the teachable machine link here and choose "Open in a new tab" (or hold CTRL key + left mouse click)
You should see this website:
Step 3: Click on the left upper corner to open the menu and then click on "Open project from file" as below:
You should see multiple head CT images loaded onto the platform organized by presence or absence of hemorrhage:
After training, experiment 1 should look like this:
Notice that the accuracy is not as good as one may expect from a machine learning model. This is due to the limited number of cases in this experiment (30 normal + 30 hemorrhage)
Step 6: Repeat steps 3 to 5, but now load the file for Experiment 2 here
After training, experiment 2 should look like this:
Notice that the accuracy seems to have improved significantly. However, a closer look at the confusion matrix shows that most of the positive cases have been misclassified as negative. This is due to the heavy class imbalance thas was simulated in this experiment (1000 normal + 30 hemorrhage).
Step 7: Repeat steps 3 to 5, but now load the file for Experiment 3 here
After training, experiment 3 should look like this:
Notice that the accuracy is lower than in the last experiment. However, a closer look at the confusion matrix shows that it is indeed correctly classifying most of the positive cases, although still nor perfect. In this experiment our dataset was balanced (1000 normal + 1000 hemorrhage).
Step 8: Repeat steps 3 to 5, but now load the file for Experiment 4 here
After training, experiment 4 should look like this:
In this experiment we are training a multiclass model to predict only 1 of 3 classes (normal, subdural, intraparenchymal). Because of the limited number of images in each class (80) to keep it balanced, the accuracy is not impressive. Now let's compare it to experiment 5 (next step).
Step 9: Repeat steps 3 to 5, but now load the file for Experiment 5 here
After training, experiment 5 should look like this:
In this experiment we are also training a multiclass model with 3 classes (normal, subdural, intraparenchymal). Because of the preprocessing step done to color-code the bleedings, the model is able to achieve a significantly higher accuracy. This color-coding was done by assigning the red color to voxels that where within a certaing range of Hounsfield Units. Although this simple thresholding rule is not perfectly accurate to identify bleedings, it helps the model achieve a better result.