Done with slides session 5

alexhernandezgarcia · Jan 23, 2024 · 04b3ee9 · 04b3ee9
1 parent 28b54f0
commit 04b3ee9
Show file tree

Hide file tree

Showing 2 changed files with 24 additions and 5 deletions.
diff --git a/teaching/mlprojects24/slides/20240123-ml.md b/teaching/mlprojects24/slides/20240123-ml.md
@@ -237,7 +237,7 @@ $$g = \hat{h} = \argmin_{h \in \mathcal{H}} R_N(h)$$
 
 --
 
-\.highlight1[Important question]: how well does the empirical risk approximate the true risk?
+.highlight1[Important question]: how well does the empirical risk approximate the true risk?
 
 ---
 
@@ -373,7 +373,7 @@ It is generally a good idea to split the data into (at least) a train set and a
 
 --
 
-An even better idea: a .highlight1[validation split].
+An even better idea: .highlight1[validation split(s)].
 
 ???
 
@@ -393,9 +393,9 @@ A resampling method to split the data into multiple folds, for either evaluation
 When is cross-validation a good idea?
 
 * When the model is not computationally too expensive.
-* When the amount of data is particularly small.
+* When the amount of data is rather small.
 
-In the extreme, use leave-one-out cross-validation.
+In the extreme, use .highlight1[leave-one-out] cross-validation.
 
 ---
 
@@ -404,7 +404,7 @@ In the extreme, use leave-one-out cross-validation.
 Normalising the data is generally a good idea too, for several reasons:
 
 * Numerical stability.
-* It may make training easier or faster.
+* It may make training (optimisation via gradient descent) easier or faster.
 * It equalises artificial differences in scale/importance between features.
 
 --
@@ -421,6 +421,23 @@ There are multiple ways of doing [feature normalisation](https://en.wikipedia.or
 
 ---
 
+## Class imbalance
+
+Practical applications of machine learning for classification problems typically tackle data sets where the various classes have different amounts of data points. This is known as an .highlight1[imbalance classification problem].
+
+There are multiple techniques to deal with class imbalance:
+
+- Re-sampling: under-sampling the majority class or over-sampling the minority class
+- Re-weighting the loss function: increase the loss of the minority class and vice-versa
+- Choose appropriate metrics, not only accuracy: precision, recall, F1 score, confusion matrix...
+
+<figure style="text-align: center">
+	<img src="../../../assets/images/teaching/mlprojects/ml/class_imbalance.png" alt="Under- and over-sampling" style="width: 65%">
+  <figcaption style="text-align: center; font-size: small">Adapted from: http://www.capallen.top</figcaption>
+</figure>
+
+---
+
 name: title
 class: title, middle
 

diff --git a/teaching/mlprojects24/slides/index.md b/teaching/mlprojects24/slides/index.md
@@ -12,3 +12,5 @@ title: IFT 3710/6759 - Slides
 
 ### [18 janvier - Tutoriel clusters HPC](20240118-cluster)
 
+### [23 janvier - Revue de l'apprentissage automatique](20240123-ml)
+
Original file line number	Diff line number	Diff line change
Expand Up		@@ -12,3 +12,5 @@ title: IFT 3710/6759 - Slides

		### [18 janvier - Tutoriel clusters HPC](20240118-cluster)

		### [23 janvier - Revue de l'apprentissage automatique](20240123-ml)