Hi my name is Shuang Song and I'm preparing for my future interviews for Data Scientist position. This repository is meant to review all the machine learning algorithms I learned from school in this past two semesters. Thanks to Professor Christopher De Sa and Professor Kilian Weinberger from Cornell University for introducing me to the beauty of Machine Learning. Also thanks to the book "The Quest for Machine Learning".
This repository will include the explanation of the following:
-- Feature Engineering --
- Normalization: Min-Max Scaling, Z-Score Normalization
- Categorical Feature: Ordinal Encoding, One-hot Encoding, Binary Encoding
-- Model Evaluation --
- Accuracy, Precision, Recall, Root Mean Square Error(RMSE)
- ROC Curve, AUC (Area Under Curve), ROC Curve vs. P-R Curve
- Distance Calculation: Cosine Similiarity, Cosine Distance, Eucliean Distance
- Cross Validation
- Bootstrap
-- Estimating Probabilities from data --
- MLE, MAP
-- Optimization --
- Gradient Descent
- SGD
- SGD with Momentum
-- Supervised ML --
- Naive Bayes
- Linear Regression
- SVM
- Logistoc Regression
- Decision Tree: ID3, CART; Pre-Pruning, Post-Pruning
-- Dimension Reduction --
- PCA
- LDA