add example notebooks to the docs and fix scrolling inconvenience

interpretml · Jan 4, 2024 · 098fbbf · 098fbbf
1 parent b4aef18
commit 098fbbf
Show file tree

Hide file tree

Showing 165 changed files with 16,835 additions and 5,710 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+# Visual Studio
+.vs/
diff --git a/_images/576314be76b899d5dc530524a4b069e5d697ff780e6415828e0f3ab016e55fb2.png b/_images/576314be76b899d5dc530524a4b069e5d697ff780e6415828e0f3ab016e55fb2.png
diff --git a/_images/9508adff94f12ee2dc81c5c13d458bbce23cbc5e2797176c9a14156d98d9df30.png b/_images/9508adff94f12ee2dc81c5c13d458bbce23cbc5e2797176c9a14156d98d9df30.png
diff --git a/_images/Age_graph_adult.png → _images/age-graph-adult.png b/_images/Age_graph_adult.png → _images/age-graph-adult.png
diff --git a/_images/group-importances-all-other-groups.png b/_images/group-importances-all-other-groups.png
diff --git a/_images/group-importances-education-group.png b/_images/group-importances-education-group.png
diff --git a/_images/group-importances-global-lstat.png b/_images/group-importances-global-lstat.png
diff --git a/_images/group-importances-local-exp.png b/_images/group-importances-local-exp.png
diff --git a/_images/group-importances-social-group.png b/_images/group-importances-social-group.png
diff --git a/_images/Save_plotly_graph.png → _images/save-plotly-graph.png b/_images/Save_plotly_graph.png → _images/save-plotly-graph.png
diff --git a/_sources/debugging-guide.ipynb b/_sources/debugging-guide.ipynb
@@ -5,15 +5,15 @@
    "id": "757b49b4",
    "metadata": {},
    "source": [
-    "# Logging and Debugging\n"
+    "# Logging and Debugging"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "b2d36e19",
    "metadata": {},
    "source": [
-    "## Enable logging in a Python script\n",
+    "<h2>Enable logging in a Python script</h2>\n",
     "\n",
     "1. Import the debug_mode script\n",
     "   ```sh\n",
@@ -51,7 +51,7 @@
    "id": "08ad404c",
    "metadata": {},
    "source": [
-    "## Debugging Python and C++ in VS Code\n",
+    "<h2>Debugging Python and C++ in VS Code</h2>\n",
     "\n",
     "1. Set up debugging configurations for _Python_ and _C++ Attach_. As an example, the launch configuration file (`launch.json`) should contain\n",
     "\n",

diff --git a/_sources/deployment-guide.ipynb b/_sources/deployment-guide.ipynb
@@ -19,7 +19,7 @@
    "id": "drawn-hamburg",
    "metadata": {},
    "source": [
-    "## Install with every dependency (default)\n",
+    "<h2>Install with every dependency (default)</h2>\n",
     "\n",
     "The package `interpret` installs every dependency needed to run any part of the package.\n",
     "\n",
@@ -41,7 +41,7 @@
    "id": "surprised-driver",
    "metadata": {},
    "source": [
-    "## Install with minimal dependencies\n",
+    "<h2>Install with minimal dependencies</h2>\n",
     "\n",
     "When you only want the required dependencies, or you wish to customize the dependencies, install the package `interpret-core` instead.\n",
     "\n",
@@ -63,7 +63,7 @@
    "id": "alone-equivalent",
    "metadata": {},
    "source": [
-    "## Install with some official dependencies (pip)\n",
+    "<h2>Install with some official dependencies (pip)</h2>\n",
     "\n",
     "This scenario is not covered in all package managers we support. If you are installing with `pip`, you can take advantage of extra tags that are exposed for `interpret-core`.\n",
     "\n",

diff --git a/_sources/dpebm.ipynb b/_sources/dpebm.ipynb
@@ -7,7 +7,7 @@
    "source": [
     "# Differentially Private EBMs\n",
     "\n",
-    "Links to API References: [DPExplainableBoostingClassifier](./DPExplainableBoostingClassifier.ipynb), [DPExplainableBoostingRegressor](./DPExplainableBoostingRegressor.ipynb)\n",
+    "Links to API References: [DPExplainableBoostingClassifier](./python/api/DPExplainableBoostingClassifier.ipynb), [DPExplainableBoostingRegressor](./python/api/DPExplainableBoostingRegressor.ipynb)\n",
     "\n",
     "*See the reference paper for full details [[1](dp_ebms)].*  [Link](https://proceedings.mlr.press/v139/nori21a/nori21a.pdf)\n"
    ]
@@ -17,7 +17,7 @@
    "id": "announced-warning",
    "metadata": {},
    "source": [
-    "## Code Example\n",
+    "<h2>Code Example</h2>\n",
     "\n",
     "The following code will train a DPEBM classifier for the adult income dataset. The visualizations provided will be for both global and local explanations."
    ]
@@ -126,7 +126,7 @@
    "id": "engaging-string",
    "metadata": {},
    "source": [
-    "## Bibliography\n",
+    "<h2>Bibliography</h2>\n",
     "\n",
     "(dp_ebms)=\n",
     "[1] Harsha Nori, Rich Caruana, Zhiqi Bu, Judy Hanwen Shen, and Janardhan Kulkarni. Accuracy, Interpretability, and Differential Privacy via Explainable Boosting. In Proceedings of the 38th International Conference on Machine Learning, 8227-8237. 2021. [Paper Link](https://proceedings.mlr.press/v139/nori21a/nori21a.pdf)"

diff --git a/_sources/dr.ipynb b/_sources/dr.ipynb
@@ -7,23 +7,23 @@
    "source": [
     "# Decision Rule\n",
     "\n",
-    "Link to API Reference: [DecisionListClassifier](./DecisionListClassifier.ipynb)\n",
+    "Link to API Reference: [DecisionListClassifier](./python/api/DecisionListClassifier.ipynb)\n",
     "\n",
     "*See the backing repository for Skope Rules [here](https://github.com/scikit-learn-contrib/skope-rules).*\n",
     "\n",
-    "## Summary\n",
+    "<h2>Summary</h2>\n",
     "\n",
     "Decision rules are logical expressions of the form `IF ... THEN ...`. Interpret's implementation uses a wrapped variant of `skope-rules`[[1](skrules_2017_dr)], which is a weighted combination of rules extracted from a tree ensemble using L1-regularized optimization over the weights. Rule systems, like single decision trees, can give interpretability at the cost of model performance. These discovered decision rules are often integrated into expert-driven rule-based systems.\n",
     "\n",
-    "## How it Works\n",
+    "<h2>How it Works</h2>\n",
     "\n",
     "The creators of skope-rules have a lucid synopsis of what decision rules are [here](https://github.com/scikit-learn-contrib/skope-rules).\n",
     "\n",
     "Christoph Molnar's \"Interpretable Machine Learning\" e-book [[2](molnar2020interpretable_dr)] has an excellent overview on decision rules that can be found [here](https://christophm.github.io/interpretable-ml-book/rules.html).\n",
     "\n",
     "For implementation specific details, see the skope-rules GitHub repository [here](https://github.com/scikit-learn-contrib/skope-rules).\n",
     "\n",
-    "## Code Example\n",
+    "<h2>Code Example</h2>\n",
     "\n",
     "The following code will train an skope-rules classifier for the breast cancer dataset. The visualizations provided will be for both global and local explanations."
    ]
@@ -92,7 +92,7 @@
    "id": "varying-powell",
    "metadata": {},
    "source": [
-    "## Further Resources\n",
+    "<h2>Further Resources</h2>\n",
     "\n",
     "- [Skope Rules Documentation](https://skope-rules.readthedocs.io/en/latest/)"
    ]
@@ -102,7 +102,7 @@
    "id": "mexican-philadelphia",
    "metadata": {},
    "source": [
-    "## Bibliography\n",
+    "<h2>Bibliography</h2>\n",
     "\n",
     "\n",
     "(skrules_2017_dr)=\n",

diff --git a/_sources/dt.ipynb b/_sources/dt.ipynb
@@ -7,21 +7,21 @@
    "source": [
     "# Decision Tree\n",
     "\n",
-    "Links to API References: [ClassificationTree](./ClassificationTree.ipynb), [RegressionTree](./RegressionTree.ipynb)\n",
+    "Links to API References: [ClassificationTree](./python/api/ClassificationTree.ipynb), [RegressionTree](./python/api/RegressionTree.ipynb)\n",
     "\n",
     "*See the backing repository for Decision Tree [here](https://github.com/scikit-learn/scikit-learn).*\n",
     "\n",
-    "## Summary\n",
+    "<h2>Summary</h2>\n",
     "\n",
     "A supervised decision tree. This is a recursive partitioning method where the feature space is continually split into further partitions based on a split criteria. A predicted value is learned for each partition in the \"leaf nodes\" of the learned tree. This is a light wrapper to the decision trees exposed in `scikit-learn`. Single decision trees often have weak model performance, but are fast to train and great at identifying associations. Low depth decision trees are easy to interpret, but quickly become complex and unintelligible as the depth of the tree increases.  \n",
     "\n",
-    "## How it Works\n",
+    "<h2>How it Works</h2>\n",
     "\n",
     "Christoph Molnar's \"Interpretable Machine Learning\" e-book [[1](molnar2020interpretable_dt)] has an excellent overview on decision trees that can be found [here](https://christophm.github.io/interpretable-ml-book/tree.html).\n",
     "\n",
     "For implementation specific details, scikit-learn's user guide [[2](pedregosa2011scikit_dt)] on decision trees is solid and can be found [here](https://scikit-learn.org/stable/modules/tree.html#tree).\n",
     "\n",
-    "## Code Example\n",
+    "<h2>Code Example</h2>\n",
     "\n",
     "The following code will train an decision tree classifier for the breast cancer dataset. The visualizations provided will be for both global and local explanations."
    ]
@@ -90,7 +90,7 @@
    "id": "metropolitan-idaho",
    "metadata": {},
    "source": [
-    "## Further Resources\n",
+    "<h2>Further Resources</h2>\n",
     "\n",
     "- [Wikipedia on Decision Trees](https://en.wikipedia.org/wiki/Decision_tree_learning)\n",
     "- [scikit-learn on their Decision Tree module](https://scikit-learn.org/stable/modules/tree.html)"
@@ -101,7 +101,7 @@
    "id": "supreme-prescription",
    "metadata": {},
    "source": [
-    "## Bibliography\n",
+    "<h2>Bibliography</h2>\n",
     "\n",
     "(molnar2020interpretable_dt)=\n",
     "[1] Christoph Molnar. Interpretable machine learning. Lulu. com, 2020.\n",

diff --git a/_sources/ebm-internals-classification.ipynb b/_sources/ebm-internals-classification.ipynb
@@ -247,7 +247,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Sample code\n",
+    "<h2>Sample code</h2>\n",
     "\n",
     "Finally, here's some code which puts the above considerations together into a function that can make predictions for simplified scenarios. This code does not handle things like regression, multiclass, unknown values, or interactions beyond pairs.\n",
     "\n",
@@ -273,10 +273,9 @@
     "        # main effects will have 1 feature, and pairs will have 2 features\n",
     "        for feature_idx in features:\n",
     "            feature_val = sample[feature_idx]\n",
-    "            if feature_val is None or feature_val is np.nan:\n",
-    "                # missing values are always in the 0th bin\n",
-    "                bin_idx = 0\n",
-    "            else:\n",
+    "            bin_idx = 0  # if missing value, use bin index 0\n",
+    "\n",
+    "            if feature_val is not None and feature_val is not np.nan:\n",
     "                # we bin differently for main effects and pairs,\n",
     "                # so determine which resolution is needed\n",
     "                if len(features) == 1 or len(ebm.bins_[feature_idx]) == 1:\n",

diff --git a/_sources/ebm-internals-multiclass.ipynb b/_sources/ebm-internals-multiclass.ipynb
@@ -249,7 +249,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Sample code\n",
+    "<h2>Sample code</h2>\n",
     "\n",
     "This sample code incorporates everything discussed in all 3 sections. It could be used as a drop in replacement for the existing EBM predict function of the ExplainableBoostingRegressor or as the predict_proba function of the ExplainableBoostingClassifier."
    ]
@@ -265,13 +265,10 @@
     "sample_scores = []\n",
     "for sample in X:\n",
     "    # start from the intercept for each sample\n",
-    "    score = ebm.intercept_\n",
+    "    score = ebm.intercept_.copy()\n",
     "    if isinstance(score, float) or len(score) == 1:\n",
-    "        # binary classification or regression\n",
+    "        # regression or binary classification\n",
     "        score = float(score)\n",
-    "    else:\n",
-    "        # multiclass\n",
-    "        score = score.copy()\n",
     "\n",
     "    # we have 2 terms, so add their score contributions\n",
     "    for term_idx, features in enumerate(ebm.term_features_):\n",
@@ -281,11 +278,9 @@
     "        # main effects will have 1 feature, and pairs will have 2 features\n",
     "        for feature_idx in features:\n",
     "            feature_val = sample[feature_idx]\n",
+    "            bin_idx = 0  # if missing value, use bin index 0\n",
     "\n",
-    "            if feature_val is None or feature_val is np.nan:\n",
-    "                # missing values are always in the 0th bin\n",
-    "                bin_idx = 0\n",
-    "            else:\n",
+    "            if feature_val is not None and feature_val is not np.nan:\n",
     "                # we bin differently for main effects and pairs, so first \n",
     "                # get the list containing the bins for different resolutions\n",
     "                bin_levels = ebm.bins_[feature_idx]\n",
@@ -319,7 +314,7 @@
     "\n",
     "if hasattr(ebm, 'classes_'):\n",
     "    # classification\n",
-    "    if len(ebm.classes_) <= 2:\n",
+    "    if len(ebm.classes_) == 2:\n",
     "        # binary classification\n",
     "\n",
     "        # softmax expects two logits for binary classification\n",

diff --git a/_sources/ebm-internals-regression.ipynb b/_sources/ebm-internals-regression.ipynb
@@ -174,7 +174,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Sample code\n",
+    "<h2>Sample code</h2>\n",
     "\n",
     "Finally, here's some code which puts the above considerations together into a function that can make predictions for simplified scenarios. This code does not handle things like interactions, missing values, unknown values, or classification.\n",
     "\n",

diff --git a/_sources/ebm.ipynb b/_sources/ebm.ipynb
@@ -7,15 +7,15 @@
    "source": [
     "# Explainable Boosting Machine\n",
     "\n",
-    "Links to API References: [ExplainableBoostingClassifier](./ExplainableBoostingClassifier.ipynb), [ExplainableBoostingRegressor](./ExplainableBoostingRegressor.ipynb)\n",
+    "Links to API References: [ExplainableBoostingClassifier](./python/api/ExplainableBoostingClassifier.ipynb), [ExplainableBoostingRegressor](./python/api/ExplainableBoostingRegressor.ipynb)\n",
     "\n",
     "*See the reference paper for full details [[1](lou2013accurate_ebm)].*  [Link](https://www.cs.cornell.edu/~yinlou/papers/lou-kdd13.pdf)\n",
     "\n",
-    "## Summary\n",
+    "<h2>Summary</h2>\n",
     "\n",
-    "Explainable Boosting Machine (EBM) is a tree-based, cyclic gradient boosting Generalized Additive Model with automatic interaction detection. EBMs are often as accurate as state-of-the-art blackbox models while remaining completely interpretable. Although EBMs are often slower to train than other modern algorithms, EBMs are extremely compact and fast at prediction time.\n",
+    "Explainable Boosting Machine (EBM) is a tree-based, cyclic gradient boosting Generalized Additive Model with automatic interaction detection. EBMs are often as accurate as state-of-the-art blackbox models while remaining completely interpretable.\n",
     "\n",
-    "## How it Works\n",
+    "<h2>How it Works</h2>\n",
     "\n",
     "As part of the framework, InterpretML also includes a new interpretability algorithm -- the Explainable Boosting Machine (EBM). EBM is a glassbox model, designed to have accuracy comparable to state-of-the-art machine learning methods like Random Forest and Boosted Trees, while being highly intelligibile and explainable. EBM is a generalized additive model (GAM) of the form:\n",
     "\n",
@@ -48,25 +48,11 @@
    "id": "announced-warning",
    "metadata": {},
    "source": [
-    "## Code Example\n",
+    "<h2>Code Example</h2>\n",
     "\n",
     "The following code will train an EBM classifier for the adult income dataset. The visualizations provided will be for both global and local explanations."
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "coated-palestinian",
-   "metadata": {},
-   "source": [
-    "````{margin}\n",
-    "```{note}\n",
-    "EBM is slow and we don't have loading bars. If it looks like it froze, it's probably still burning all your CPU cycles.\n",
-    "\n",
-    "All of them.\n",
-    "```\n",
-    "````"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -163,7 +149,7 @@
    "id": "occupied-withdrawal",
    "metadata": {},
    "source": [
-    "## Further Resources\n",
+    "<h2>Further Resources</h2>\n",
     "\n",
     "- [Paper: GA2M](https://www.cs.cornell.edu/~yinlou/papers/lou-kdd12.pdf)\n",
     "- [Paper: InterpretML Framework](https://arxiv.org/pdf/1909.09223.pdf)\n",
@@ -175,7 +161,7 @@
    "id": "engaging-string",
    "metadata": {},
    "source": [
-    "## Bibliography\n",
+    "<h2>Bibliography</h2>\n",
     "\n",
     "(lou2013accurate_ebm)=\n",
     "[1] Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker. Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 623–631. 2013. [Paper Link](https://www.cs.cornell.edu/~yinlou/papers/lou-kdd13.pdf)\n",

diff --git a/_sources/examples.md b/_sources/examples.md
@@ -0,0 +1 @@
+# Examples