Skip to content

Commit

Permalink
Done with slides session 5
Browse files Browse the repository at this point in the history
  • Loading branch information
alexhernandezgarcia committed Jan 23, 2024
1 parent 28b54f0 commit 04b3ee9
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 5 deletions.
27 changes: 22 additions & 5 deletions teaching/mlprojects24/slides/20240123-ml.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,7 @@ $$g = \hat{h} = \argmin_{h \in \mathcal{H}} R_N(h)$$

--

\.highlight1[Important question]: how well does the empirical risk approximate the true risk?
.highlight1[Important question]: how well does the empirical risk approximate the true risk?

---

Expand Down Expand Up @@ -373,7 +373,7 @@ It is generally a good idea to split the data into (at least) a train set and a

--

An even better idea: a .highlight1[validation split].
An even better idea: .highlight1[validation split(s)].

???

Expand All @@ -393,9 +393,9 @@ A resampling method to split the data into multiple folds, for either evaluation
When is cross-validation a good idea?

* When the model is not computationally too expensive.
* When the amount of data is particularly small.
* When the amount of data is rather small.

In the extreme, use leave-one-out cross-validation.
In the extreme, use .highlight1[leave-one-out] cross-validation.

---

Expand All @@ -404,7 +404,7 @@ In the extreme, use leave-one-out cross-validation.
Normalising the data is generally a good idea too, for several reasons:

* Numerical stability.
* It may make training easier or faster.
* It may make training (optimisation via gradient descent) easier or faster.
* It equalises artificial differences in scale/importance between features.

--
Expand All @@ -421,6 +421,23 @@ There are multiple ways of doing [feature normalisation](https://en.wikipedia.or

---

## Class imbalance

Practical applications of machine learning for classification problems typically tackle data sets where the various classes have different amounts of data points. This is known as an .highlight1[imbalance classification problem].

There are multiple techniques to deal with class imbalance:

- Re-sampling: under-sampling the majority class or over-sampling the minority class
- Re-weighting the loss function: increase the loss of the minority class and vice-versa
- Choose appropriate metrics, not only accuracy: precision, recall, F1 score, confusion matrix...

<figure style="text-align: center">
<img src="../../../assets/images/teaching/mlprojects/ml/class_imbalance.png" alt="Under- and over-sampling" style="width: 65%">
<figcaption style="text-align: center; font-size: small">Adapted from: http://www.capallen.top</figcaption>
</figure>

---

name: title
class: title, middle

Expand Down
2 changes: 2 additions & 0 deletions teaching/mlprojects24/slides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,5 @@ title: IFT 3710/6759 - Slides

### [18 janvier - Tutoriel clusters HPC](20240118-cluster)

### [23 janvier - Revue de l'apprentissage automatique](20240123-ml)

0 comments on commit 04b3ee9

Please sign in to comment.