This repository contains:
- The implementation from-scratch of an example bivariate EM algorithm that is robust to missing values, i.e. NAs, in both variables.
- A few 'tutorial' examples from Q for Mortals and an awesome lecture by Tim Thornton.
For easiness of reading the algorithm, present in the EMq.q
file, we only consider here the construction of a single bivariate distribution starting with the sample mean and covariance matrix of the whole Wine dataset (here restricted to the feature Alcohol and Malic.Acid).
In a more authentic case, we would compute a sample mean and covariance matrix for each of the 3 types of wine present in the dataset and compute the EM algorithm over each of them. Doing so would approximate, up to a local optimum, the distribution of each population as a Gaussian.
There are m elements with missing data out of n elements.
We update the parameters such that: