Evaluation of code upon which Ahneman 2018 was based, with application of additional analyses using XGBoost and Shapley Additive Explanations (SHAP). SHAP generates a comprehensive, additive mapping of statistical effects between experimentally observed chemical yield and computationally-derived descriptor datas. This mapping is based on a deductive algorithm that has been proven to be mathematically optimal for black-box effect attribution problems. Here, we demonstrate how the effect estimates generated by SHAP can be used for further, teritiary analyses, in a way similar to the use of scaled or normalized data. SHAP effect estimates represent the log-relative-risk (contribution) of a given chemical feature wrt the success of a given chemical reaction. SHAP-derived effect data is demonstrated to have numerous advantages over both nominal (original) and scaled data wrt interpretation and analytical flexibility.
Coded completely in R, with published link below.