This repository contains the code related to my M.Sc. Data Science thesis at the University of Helsinki. The work models and analyzes Statistics Finland's household consumption expenditure dataset from 2016 using supervised learning methods in order to make predictions for the expenditures of a given household (considered a surrogate for a mortgage loan applicant). The thesis is available at http://urn.fi/URN:NBN:fi:hulib-202404241829.
Featuring methods and algorithms such as:
- XGBoost (and its random forest-like variant, XGBoostRF)
- CatBoost
- Random Forest
- Gaussian Processes
- Elastic-net
- SVR
- Optuna
- MLflow