-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH test isotonic calibration on simulation data & FIX correct joblib dependency error #4
base: main
Are you sure you want to change the base?
Conversation
Accommodate running HF in Jupyter Notebook Environment
Added calibrated HF to overlapping gaussian
Tests done for Isotonic Calibrated HF
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
It seems like the failing is happening in the original test_tree and test_forest files with their import modules (possibly caused by some dependency issues?). I have not touched those. |
try to pass pytest checks for committing (these files exist in the original repo already, I did not touch them)
try to pass pytest checks for committing (these files exist in the original repo already, I did not touch them)
try to pass pytest checks for committing (these files exist in the original repo already, I did not touch them)
fix dependency issues with the new version of joblib (no longer uses **_joblib_parallel_args)
The HonestForest class implemented by Ronan used joblib's backend for parallelization, which involved an import from sklearn's util files. However, sklearn's util file changed between the time Ronan wrote his HF class and the time I was making this commit, in which the util file abandoned its outdated usage of the previous version of joblib ( _joblib_parallel_args deprecated and is no longer in use). This caused dependency issues because Ronan's implementation still used the outdated version. I fixed the above issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. It seems like there is one extra file or something incorporated? Normally it would be better to make 2 PRs. One for the joblib change and one for the notebook. But this is fine.
( | ||
"Iso-HonestRF", | ||
CalibratedClassifierCV( | ||
base_estimator=HonestForestClassifier( | ||
n_estimators=n_estimators // clf_cv, | ||
max_features=max_features, | ||
n_jobs=n_jobs, | ||
), | ||
method="isotonic", | ||
cv=clf_cv, | ||
), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say don't change this file? Leave the honest + IRF for just the notebook. This way the main figure in the repo reflects the paper and isn't as confusing to first time viewers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will revert the changes.
"""Module for forest-based estimators""" | ||
"""I'm just using this version to facilitate future changes -Audrey""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this file? Is it the same as the estimators/forest.py
file? It seems like this was maybe accidentally left in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is created because it seems like if I import HF directly from the forest.py in a Jupyter Notebook will cause dependency errors. I've experimented a bunch of ways and it seems the only fix that works (a pretty dumb way, I have to admit) is to combine your forest.py and tree.py into one file. I personally suspect it's because the compiler for Jupyter Notebook is doing weird things if the file you are importing requires another import from another file you wrote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jzheng17 you should install the package & then use the honest_forests
library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be able to do that too. I went for a quick fix at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait. I remembered that installing dependency for Jupyter Notebooks and for .py scripts might be a little different. I guess I will still try after I recover from bronchitis.
@@ -446,7 +447,9 @@ def _predict_proba(self, X, indices=None, impute_missing=None): | |||
Parallel( | |||
n_jobs=n_jobs, | |||
verbose=self.verbose, | |||
**_joblib_parallel_args(require="sharedmem") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing, thanks for this fix. An alternative fix would be to require a lower version of sklearn but this is better. I assume I had a lower version. I assume this requires sklearn > 1.0? Can you add this to the requirments.txt file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it does require sklearn > 1.0. Note that this is because sklearn ppl changed their util.py to bump joblib version dependency to 1.0.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. We should make sure the requirements.txt file states the proper requirements. We don't want errors because this code isn't compatible with old versions of joblib or sklearn permitted by our requirements.txt file
Simulation test on #2
I did the overlapping gaussian tests for the Iso-Honest Forest in a Jupyter Notebook and had to adjust the original honest forest classes a bit for importing them into a Jupyter Notebook.