Breakdown of time periods for our training and testing sets can be found here.
Here is our full list of generated features.
Here is the list of tracts generated by our best model.
Here are the feature importances of our best model.
The Jupyter Notebook, titled predicting_va_evictions.ipynb , walks through the process of loading the data, creating the features, selecting the models to run, and finally runnning the models. The notebook calls pipeline_evictions.py, which handles data loading, processing, feature generation, and creation of test/train datasets. It also calls ml_loop_evictions.py, which passes training and testing datasets through a given list of models. The iterate_over_models_and_training_sets() function returns a table with results across train test splits over time and performance metrics (baseline, precision and recall at different thresholds 1%, 2%, 5%, 10%, 20%, 30%, 50% and AUC_ROC).
The csv used in the analysis can be found in data/tracts. The Jupyter Notebook, data-exploration.ipynb, contains visualizations of distributions of several continuous features as well as exploration into missing data.