GitHub - suvasama/presRandomForests: Code for my talk on regression trees, prediction power and generating confidence intervals.

The repository contains the codes used for my Women in Data Science conference presentation. You can find the slides for my presentation here. I compared the performance of several random forests packages and showed how to generate confidence intervals for such models.

I am using California Housing dataset for estimates. You can download the dataset here. A modified version of the dataset is available at Kaggle.

I fitted the model using four R models from different packages: linear regression from base R, random forest estimates from ranger and grf and extreme boosting from xgboost. A minimal amount of hyperparameter tuning was performed to improve the performance xgboost.

The repository is organized as follows

Load and preprocess data here.
Visualize the data using point plots, maps and decision trees. I used the original dataset for visualizations and the preprocessed dataset to estimate the models.
Estimate the models. That is, fit the models, make predictions and compute confidence intervals for predictions.Choose the optimal amount of trees by cross validation for xgboost. Also, plot figures of the most important features chosen by the models. Estimate variance and confidence intervals using grf.

A snapshot of names and versions of packages I used is available here.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
main.R		main.R
maps.R		maps.R
models.R		models.R
packrat.lock		packrat.lock
reg_trees.R		reg_trees.R
rmse.R		rmse.R
visualizations.R		visualizations.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

suvasama/presRandomForests

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages