Merge branch 'sktime#132-feature-cyclic-boosting-interface' of https:…

…//github.com/setoguchi-naoki/skpro into sktime#132-feature-cyclic-boosting-interface
setoguchi-naoki · Jan 15, 2024 · c93b08f · c93b08f
2 parents 44da242 + d5dba04
commit c93b08f
Show file tree

Hide file tree

Showing 23 changed files with 918 additions and 25 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 <a href="https://skpro.readthedocs.io/en/latest"><img src="https://github.com/sktime/skpro/blob/main/docs/source/images/skpro-banner.png" width="500" align="right" /></a>
 
-:rocket: **Version 2.1.1 out now!** [Read the release notes here.](https://skpro.readthedocs.io/en/latest/changelog.html).
+:rocket: **Version 2.1.2 out now!** [Read the release notes here.](https://skpro.readthedocs.io/en/latest/changelog.html).
 
 `skpro` is a library for supervised probabilistic prediction in python.
 It provides `scikit-learn`-like, `scikit-base` compatible interfaces to:
@@ -18,7 +18,7 @@ It provides `scikit-learn`-like, `scikit-base` compatible interfaces to:
 | **Community** | [![!discord](https://img.shields.io/static/v1?logo=discord&label=discord&message=chat&color=lightgreen)](https://discord.com/invite/54ACzaFsn7) [![!slack](https://img.shields.io/static/v1?logo=linkedin&label=LinkedIn&message=news&color=lightblue)](https://www.linkedin.com/company/scikit-time/) |
 | **CI/CD** | [![github-actions](https://img.shields.io/github/actions/workflow/status/sktime/sktime/wheels.yml?logo=github)](https://github.com/sktime/skpro/actions/workflows/wheels.yml) [![!codecov](https://img.shields.io/codecov/c/github/sktime/skpro?label=codecov&logo=codecov)](https://codecov.io/gh/sktime/skpro) [![readthedocs](https://img.shields.io/readthedocs/skpro?logo=readthedocs)](https://skpro.readthedocs.io/en/latest/) [![platform](https://img.shields.io/conda/pn/conda-forge/skpro)](https://github.com/sktime/skpro) |
 | **Code** |  [![!pypi](https://img.shields.io/pypi/v/skpro?color=orange)](https://pypi.org/project/skpro/) [![!conda](https://img.shields.io/conda/vn/conda-forge/skpro)](https://anaconda.org/conda-forge/skpro) [![!python-versions](https://img.shields.io/pypi/pyversions/skpro)](https://www.python.org/) [![!black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) |
-
+| **Downloads**| [![Downloads](https://static.pepy.tech/personalized-badge/skpro?period=week&units=international_system&left_color=grey&right_color=blue&left_text=weekly%20(pypi))](https://pepy.tech/project/skpro) [![Downloads](https://static.pepy.tech/personalized-badge/skpro?period=month&units=international_system&left_color=grey&right_color=blue&left_text=monthly%20(pypi))](https://pepy.tech/project/skpro) [![Downloads](https://static.pepy.tech/personalized-badge/skpro?period=total&units=international_system&left_color=grey&right_color=blue&left_text=cumulative%20(pypi))](https://pepy.tech/project/skpro) |
 
 ## :books: Documentation
 

diff --git a/docs/source/_static/switcher.json b/docs/source/_static/switcher.json
@@ -5,7 +5,12 @@
     "url": "https://skpro.readthedocs.io/en/latest/"
   },
   {
-    "name": "2.1.1 (stable)",
+    "name": "2.1.2 (stable)",
+    "version": "stable",
+    "url": "https://skpro.readthedocs.io/en/v2.1.2/"
+  },
+  {
+    "name": "2.1.1",
     "version": "stable",
     "url": "https://skpro.readthedocs.io/en/v2.1.1/"
   },

diff --git a/docs/source/api_reference/regression.rst b/docs/source/api_reference/regression.rst
@@ -87,8 +87,44 @@ take one or multiple ``sklearn`` estimators and adda probabilistic prediction mo
 
     CyclicBoosting
 
-Base
-----
+Linear regression
+-----------------
+
+.. currentmodule:: skpro.regression.linear
+
+.. autosummary::
+    :toctree: auto_generated/
+    :template: class.rst
+
+    ARDRegression
+    BayesianRidge
+
+Gaussian process and kernel regression
+--------------------------------------
+
+.. currentmodule:: skpro.regression.gp
+
+.. autosummary::
+    :toctree: auto_generated/
+    :template: class.rst
+
+    GaussianProcess
+
+
+Adapters to other interfaces
+----------------------------
+
+.. currentmodule:: skpro.regression.adapters.sklearn
+
+.. autosummary::
+    :toctree: auto_generated/
+    :template: class.rst
+
+    SklearnProbaReg
+
+
+Base classes
+------------
 
 .. currentmodule:: skpro.regression.base
 

diff --git a/docs/source/changelog.rst b/docs/source/changelog.rst
@@ -14,6 +14,76 @@ You can also subscribe to ``skpro``'s
 
 For planned changes and upcoming releases, see our :ref:`roadmap`.
 
+[2.1.2] - 2023-01-07
+====================
+
+Highlights
+----------
+
+* ``sklearn`` based probabilistic regressors - Gaussian processes, Bayesian linear regression (:pr:`166`) :user:`fkiraly`
+* ``SklearnProbaReg`` - general interface adapter to ``sklearn`` regressors with variance prediction model (:pr:`163`) :user:`fkiraly`
+
+Dependency changes
+~~~~~~~~~~~~~~~~~~
+
+* ``scikit-base`` bounds have been updated to ``<0.8.0,>=0.6.1``.
+* ``polars`` (data container soft dependency) bounds have been updated to allow python 3.12.
+
+Enhancements
+------------
+
+Data types, checks, conversions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+* [ENH] ``n_features`` and ``feature_names`` metadata field for table mtypes (:pr:`150`) :user:`fkiraly`
+* [ENH] ``check_is_mtype`` dict type return, improved input check error messages in ``BaseRegressorProba`` (:pr:`151`) :user:`fkiraly`
+
+Probability distributions
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+* [ENH] adapter from ``scipy`` ``rv_discrete`` to ``skpro`` ``Empirical`` (:pr:`155`) :user:`fkiraly`
+
+Probabilistic regression
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+* [ENH] ``sklearn`` wrappers to str-coerce columns of ``pd.DataFrame`` before passing by @fkiraly in https://github.com/sktime/skpro/pull/148
+* [ENH] clean up copy-paste leftovers in ``BaseProbaRegressor`` by @fkiraly in https://github.com/sktime/skpro/pull/156
+* [ENH] adapter for ``sklearn`` probabilistic regressors (:pr:`163`) :user:`fkiraly`
+* [ENH] add tags to ``SklearnProbaReg`` (:pr:`168`) :user:`fkiraly`
+* [ENH] interfacing all concrete ``sklearn`` probabilistic regressors (:pr:`166`) :user:`fkiraly`
+
+Test framework
+~~~~~~~~~~~~~~
+
+* [ENH] scenario tests for mixed ``pandas`` column index types (:pr:`145`) :user:`fkiraly`
+* [ENH] scitype inference utility, test class register, test class test condition (:pr:`159`) :user:`fkiraly`
+
+Fixes
+-----
+
+Probabilistic regression
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+* [BUG] in probabilistic regressors, ensure correct index treatment if ``X: pd.DataFrame`` and ``y: np.ndarray`` are passed (:pr:`146`) :user:`fkiraly`
+
+Documentation
+-------------
+
+* [DOC] update ``AUTHORS.rst`` file (:pr:`147`) :user:`fkiraly`
+
+Maintenance
+-----------
+
+* [MNT] [Dependabot](deps): Bump ``actions/upload-artifact`` from 3 to 4 (:pr:`154`) :user:`dependabot`
+* [MNT] [Dependabot](deps): Bump ``actions/download-artifact`` from 3 to 4 (:pr:`153`) :user:`dependabot`
+* [MNT] [Dependabot](deps): Bump ``actions/setup-python`` from 4 to 5 (:pr:`152`) :user:`dependabot`
+* [MNT] [Dependabot](deps-dev): Update ``sphinx-gallery`` requirement from ``<0.15.0`` to ``<0.16.0`` (:pr:`149`) :user:`dependabot`
+* [MNT] [Dependabot](deps-dev): Update ``scikit-base`` requirement from ``<0.7.0,>=0.6.1`` to ``>=0.6.1,<0.8.0`` (:pr:`169`) :user:`dependabot`
+* [MNT] adding ``codecov.yml`` and turning coverage reports informational (:pr:`165`) :user:`fkiraly`
+* [MNT] handle deprecation of ``pandas.DataFrame.applymap`` (:pr:`170`) :user:`fkiraly`
+* [MNT] handle ``polars`` deprecations (:pr:`171`) :user:`fkiraly`
+
+
 [2.1.1] - 2023-11-02
 ====================
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "skpro"
-version = "2.1.1"
+version = "2.1.2"
 description = "A unified framework for probability distributions and probabilistic supervised regression"
 authors = [
     {name = "skpro developers", email = "[email protected]"},
@@ -44,7 +44,7 @@ dependencies = [
     "numpy>=1.21.0,<1.27",
     "pandas>=1.1.0,<2.2.0",
     "packaging",
-    "scikit-base>=0.6.1,<0.7.0",
+    "scikit-base>=0.6.1,<0.8.0",
     "scikit-learn>=0.24.0,<1.4.0",
     "scipy<2.0.0,>=1.2.0",
 ]
@@ -56,7 +56,7 @@ all_extras = [
     "mapie",
     "matplotlib>=3.3.2",
     "ngboost",
-    "polars<0.20.0; python_version < '3.12'",
+    "polars<0.21.0",
     "pyarrow<14.0.0; python_version < '3.12'",
     "statsmodels>=0.12.1",
     "tabulate",

diff --git a/skpro/__init__.py b/skpro/__init__.py
@@ -1,6 +1,6 @@
 """skpro."""
 
-__version__ = "2.1.1"
+__version__ = "2.1.2"
 
 __all__ = ["show_versions"]
 

diff --git a/skpro/datatypes/_adapter/polars.py b/skpro/datatypes/_adapter/polars.py
@@ -42,7 +42,7 @@ def check_polars_frame(obj, return_metadata=False, var_name="obj", lazy=False):
         if isinstance(obj, pl.LazyFrame):
             metadata["has_nans"] = "NA"
         else:
-            hasnan = obj.null_count().sum(axis=1).to_numpy()[0] > 0
+            hasnan = obj.null_count().sum_horizontal().to_numpy()[0] > 0
             metadata["has_nans"] = hasnan
 
     return ret(True, None, metadata, return_metadata)
diff --git a/skpro/distributions/base.py b/skpro/distributions/base.py
@@ -11,6 +11,7 @@
 import pandas as pd
 
 from skpro.base import BaseObject
+from skpro.utils.pandas import df_map
 from skpro.utils.validation._dependencies import _check_estimator_deps
 
 
@@ -229,7 +230,8 @@ def pdf(self, x):
                 "this may be numerically unstable"
             )
             warn(self._method_error_msg("pdf", fill_in=approx_method))
-            return self.log_pdf(x=x).applymap(np.exp)
+
+            return df_map(self.log_pdf(x=x))(np.exp)
 
         raise NotImplementedError(self._method_error_msg("pdf", "error"))
 
@@ -269,13 +271,7 @@ def log_pdf(self, x):
             )
             warn(self._method_error_msg("log_pdf", fill_in=approx_method))
 
-            pdf_res = self.pdf(x=x)
-            # safe deprecation of applymap, renamed to map in pandas 2 versions
-            # this if/else ensures compatibility with a wider range of pandas versions
-            if hasattr(pdf_res, "map"):
-                return pdf_res.map(np.log)
-            else:
-                return pdf_res.applymap(np.log)
+            return df_map(self.pdf(x=x))(np.log)
 
         raise NotImplementedError(self._method_error_msg("log_pdf", "error"))
 

diff --git a/skpro/distributions/normal.py b/skpro/distributions/normal.py
@@ -15,9 +15,9 @@ class Normal(BaseDistribution):
 
     Parameters
     ----------
-    mean : float or array of float (1D or 2D)
+    mu : float or array of float (1D or 2D)
         mean of the normal distribution
-    sd : float or array of float (1D or 2D), must be positive
+    sigma : float or array of float (1D or 2D), must be positive
         standard deviation of the normal distribution
     index : pd.Index, optional, default = RangeIndex
     columns : pd.Index, optional, default = RangeIndex

diff --git a/skpro/distributions/tests/test_qpd.py b/skpro/distributions/tests/test_qpd.py
@@ -1,6 +1,7 @@
 """Tests for quantile-parameterized distributions."""
 
 import pytest
+
 from skpro.distributions.qpd import QPD_B, QPD_S, QPD_U
 from skpro.tests.test_switch import run_test_for_class
 

diff --git a/skpro/regression/adapters/__init__.py b/skpro/regression/adapters/__init__.py
@@ -0,0 +1,2 @@
+"""Adapters for probabilistic regressors."""
+# copyright: skpro developers, BSD-3-Clause License (see LICENSE file)
diff --git a/skpro/regression/adapters/sklearn/__init__.py b/skpro/regression/adapters/sklearn/__init__.py
@@ -0,0 +1,6 @@
+"""Adapters for probabilistic regressors, towards sklearn."""
+# copyright: skpro developers, BSD-3-Clause License (see LICENSE file)
+
+from skpro.regression.adapters.sklearn._sklearn_proba import SklearnProbaReg
+
+__all__ = ["SklearnProbaReg"]
diff --git a/skpro/regression/adapters/sklearn/_sklearn_proba.py b/skpro/regression/adapters/sklearn/_sklearn_proba.py
@@ -0,0 +1,138 @@
+# copyright: skpro developers, BSD-3-Clause License (see LICENSE file)
+"""Adapter to sklearn probabilistic regressors."""
+
+__author__ = ["fkiraly"]
+
+import pandas as pd
+
+from skpro.regression.base import BaseProbaRegressor
+from skpro.utils.sklearn import prep_skl_df
+
+
+class SklearnProbaReg(BaseProbaRegressor):
+    """Adapter to sklearn regressors with variance prediction interface.
+
+    Wraps an sklearn regressor that can be queried for variance prediction,
+    and constructs an skpro regressor from it.
+
+    The wrapped resgressor must have a ``predict`` with
+    a ``return_std`` argument, and return a tuple of ``(y_pred, y_std)``,
+    both ndarray of shape (n_samples,) or (n_samples, n_targets).
+
+    Parameters
+    ----------
+    estimator : sklearn regressor
+        Estimator to wrap, must have ``predict`` with ``return_std`` argument.
+    """
+
+    _tags = {
+        "capability:multioutput": False,
+        "capability:missing": True,
+    }
+
+    def __init__(self, estimator):
+        self.estimator = estimator
+        super().__init__()
+
+    # todo: implement this, mandatory
+    def _fit(self, X, y):
+        """Fit regressor to training data.
+
+        Writes to self:
+            Sets fitted model attributes ending in "_".
+
+        Parameters
+        ----------
+        X : pandas DataFrame
+            feature instances to fit regressor to
+        y : pandas DataFrame, must be same length as X
+            labels to fit regressor to
+
+        Returns
+        -------
+        self : reference to self
+        """
+        from sklearn import clone
+
+        self.estimator_ = clone(self.estimator)
+        X_inner = prep_skl_df(X)
+        y_inner = prep_skl_df(y)
+
+        if len(y_inner.columns) == 1:
+            y_inner = y_inner.iloc[:, 0]
+        self.estimator_.fit(X_inner, y_inner)
+        return self
+
+    def _predict(self, X):
+        """Predict labels for data from features.
+
+        State required:
+            Requires state to be "fitted" = self.is_fitted=True
+
+        Accesses in self:
+            Fitted model attributes ending in "_"
+
+        Parameters
+        ----------
+        X : pandas DataFrame, must have same columns as X in `fit`
+            data to predict labels for
+
+        Returns
+        -------
+        y : pandas DataFrame, same length as `X`, same columns as `y` in `fit`
+            labels predicted for `X`
+        """
+        X_inner = prep_skl_df(X)
+        y_pred = self.estimator_.predict(X_inner)
+        return y_pred
+
+    def _predict_var(self, X):
+        """Compute/return variance predictions.
+
+        private _predict_var containing the core logic, called from predict_var
+
+        Parameters
+        ----------
+        X : pandas DataFrame, must have same columns as X in `fit`
+            data to predict labels for
+
+        Returns
+        -------
+        pred_var : pd.DataFrame
+            Column names are exactly those of ``y`` passed in ``fit``.
+            Row index is equal to row index of ``X``.
+            Entries are variance prediction, for var in col index.
+            A variance prediction for given variable and fh index is a predicted
+            variance for that variable and index, given observed data.
+        """
+        X_inner = prep_skl_df(X)
+        _, y_std = self.estimator_.predict(X_inner, return_std=True)
+        y_std = pd.DataFrame(y_std, index=X.index, columns=X.columns)
+        y_var = y_std**2
+        return y_var
+
+    @classmethod
+    def get_test_params(cls, parameter_set="default"):
+        """Return testing parameter settings for the estimator.
+
+        Parameters
+        ----------
+        parameter_set : str, default="default"
+            Name of the set of test parameters to return, for use in tests. If no
+            special parameters are defined for a value, will return `"default"` set.
+
+        Returns
+        -------
+        params : dict or list of dict, default = {}
+            Parameters to create testing instances of the class
+            Each dict are parameters to construct an "interesting" test instance, i.e.,
+            `MyClass(**params)` or `MyClass(**params[i])` creates a valid test instance.
+            `create_test_instance` uses the first (or only) dictionary in `params`
+        """
+        from sklearn.gaussian_process import GaussianProcessRegressor
+        from sklearn.linear_model import BayesianRidge
+
+        param1 = {"estimator": BayesianRidge()}
+        param2 = {"estimator": GaussianProcessRegressor()}
+
+        return [param1, param2]
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		"""Adapters for probabilistic regressors."""
		# copyright: skpro developers, BSD-3-Clause License (see LICENSE file)