The Effect of Epidemiological Cohort Creation on the Machine Learning Prediction of Homelessness and Police Interaction Outcomes Using Administrative Health Care Data
Background: Mental illness can lead to adverse outcomes such as homelessness and police 16 interaction and understanding how the events leading up to these adverse outcomes is important. 17 Predictive models may help identify individuals at risk of such adverse outcomes. Using fixed 18 observation window cohort with logistic regression (LR) or machine learning (ML) models can result 19 in lower performance when compared with adaptive and parcellated windows. 20 Method: An administrative health care dataset was used, comprising of 240,219 individuals in 21 Calgary, Alberta, Canada who were diagnosed with addiction or mental health (AMH) between April 22 1, 2013, and March 31, 2018. The cohort was followed for 2 years to identify factors associated with 23 homelessness and police interactions. To understand the benefit of flexible windows to predictive 24 models, an alternative cohort was created. Then LR and ML models, including random forests (RF), 25 and extreme gradient boosting (XGBoost) were compared in the two cohorts. 26 Results: Among 237,602 individuals, 0.8% (1,800) experienced first homelessness, while 0.32% 27 (759) reported initial police interaction among 237,141 individuals. Male sex (AORs: H=1.51, 28 P=2.52), substance disorder (AORs: H=3.70, P=2.83), psychiatrist visits (AORs: H=1.44, P=1.49), 29 and drug abuse (AORs: H=2.67, P=1.83) were associated with initial homelessness (H) and police 30 interaction (P). XGBoost showed superior performance using the flexible method (sensitivity =91%, 31 AUC =90% for initial homelessness, and sensitivity =90%, AUC=89% for initial police interaction) 32 Conclusion: This study identified key features associated with initial homelessness and police 33 interaction and demonstrated that flexible window can improve predictive modeling.