Skip to content

Commit

Permalink
Merge branch 'sktime#132-feature-cyclic-boosting-interface' of https:…
Browse files Browse the repository at this point in the history
…//github.com/setoguchi-naoki/skpro into sktime#132-feature-cyclic-boosting-interface
  • Loading branch information
setoguchi-naoki committed Jan 15, 2024
2 parents 44da242 + d5dba04 commit c93b08f
Show file tree
Hide file tree
Showing 23 changed files with 918 additions and 25 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<a href="https://skpro.readthedocs.io/en/latest"><img src="https://github.com/sktime/skpro/blob/main/docs/source/images/skpro-banner.png" width="500" align="right" /></a>

:rocket: **Version 2.1.1 out now!** [Read the release notes here.](https://skpro.readthedocs.io/en/latest/changelog.html).
:rocket: **Version 2.1.2 out now!** [Read the release notes here.](https://skpro.readthedocs.io/en/latest/changelog.html).

`skpro` is a library for supervised probabilistic prediction in python.
It provides `scikit-learn`-like, `scikit-base` compatible interfaces to:
Expand All @@ -18,7 +18,7 @@ It provides `scikit-learn`-like, `scikit-base` compatible interfaces to:
| **Community** | [![!discord](https://img.shields.io/static/v1?logo=discord&label=discord&message=chat&color=lightgreen)](https://discord.com/invite/54ACzaFsn7) [![!slack](https://img.shields.io/static/v1?logo=linkedin&label=LinkedIn&message=news&color=lightblue)](https://www.linkedin.com/company/scikit-time/) |
| **CI/CD** | [![github-actions](https://img.shields.io/github/actions/workflow/status/sktime/sktime/wheels.yml?logo=github)](https://github.com/sktime/skpro/actions/workflows/wheels.yml) [![!codecov](https://img.shields.io/codecov/c/github/sktime/skpro?label=codecov&logo=codecov)](https://codecov.io/gh/sktime/skpro) [![readthedocs](https://img.shields.io/readthedocs/skpro?logo=readthedocs)](https://skpro.readthedocs.io/en/latest/) [![platform](https://img.shields.io/conda/pn/conda-forge/skpro)](https://github.com/sktime/skpro) |
| **Code** | [![!pypi](https://img.shields.io/pypi/v/skpro?color=orange)](https://pypi.org/project/skpro/) [![!conda](https://img.shields.io/conda/vn/conda-forge/skpro)](https://anaconda.org/conda-forge/skpro) [![!python-versions](https://img.shields.io/pypi/pyversions/skpro)](https://www.python.org/) [![!black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) |

| **Downloads**| [![Downloads](https://static.pepy.tech/personalized-badge/skpro?period=week&units=international_system&left_color=grey&right_color=blue&left_text=weekly%20(pypi))](https://pepy.tech/project/skpro) [![Downloads](https://static.pepy.tech/personalized-badge/skpro?period=month&units=international_system&left_color=grey&right_color=blue&left_text=monthly%20(pypi))](https://pepy.tech/project/skpro) [![Downloads](https://static.pepy.tech/personalized-badge/skpro?period=total&units=international_system&left_color=grey&right_color=blue&left_text=cumulative%20(pypi))](https://pepy.tech/project/skpro) |

## :books: Documentation

Expand Down
7 changes: 6 additions & 1 deletion docs/source/_static/switcher.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,12 @@
"url": "https://skpro.readthedocs.io/en/latest/"
},
{
"name": "2.1.1 (stable)",
"name": "2.1.2 (stable)",
"version": "stable",
"url": "https://skpro.readthedocs.io/en/v2.1.2/"
},
{
"name": "2.1.1",
"version": "stable",
"url": "https://skpro.readthedocs.io/en/v2.1.1/"
},
Expand Down
40 changes: 38 additions & 2 deletions docs/source/api_reference/regression.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,44 @@ take one or multiple ``sklearn`` estimators and adda probabilistic prediction mo

CyclicBoosting

Base
----
Linear regression
-----------------

.. currentmodule:: skpro.regression.linear

.. autosummary::
:toctree: auto_generated/
:template: class.rst

ARDRegression
BayesianRidge

Gaussian process and kernel regression
--------------------------------------

.. currentmodule:: skpro.regression.gp

.. autosummary::
:toctree: auto_generated/
:template: class.rst

GaussianProcess


Adapters to other interfaces
----------------------------

.. currentmodule:: skpro.regression.adapters.sklearn

.. autosummary::
:toctree: auto_generated/
:template: class.rst

SklearnProbaReg


Base classes
------------

.. currentmodule:: skpro.regression.base

Expand Down
70 changes: 70 additions & 0 deletions docs/source/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,76 @@ You can also subscribe to ``skpro``'s

For planned changes and upcoming releases, see our :ref:`roadmap`.

[2.1.2] - 2023-01-07
====================

Highlights
----------

* ``sklearn`` based probabilistic regressors - Gaussian processes, Bayesian linear regression (:pr:`166`) :user:`fkiraly`
* ``SklearnProbaReg`` - general interface adapter to ``sklearn`` regressors with variance prediction model (:pr:`163`) :user:`fkiraly`

Dependency changes
~~~~~~~~~~~~~~~~~~

* ``scikit-base`` bounds have been updated to ``<0.8.0,>=0.6.1``.
* ``polars`` (data container soft dependency) bounds have been updated to allow python 3.12.

Enhancements
------------

Data types, checks, conversions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* [ENH] ``n_features`` and ``feature_names`` metadata field for table mtypes (:pr:`150`) :user:`fkiraly`
* [ENH] ``check_is_mtype`` dict type return, improved input check error messages in ``BaseRegressorProba`` (:pr:`151`) :user:`fkiraly`

Probability distributions
~~~~~~~~~~~~~~~~~~~~~~~~~

* [ENH] adapter from ``scipy`` ``rv_discrete`` to ``skpro`` ``Empirical`` (:pr:`155`) :user:`fkiraly`

Probabilistic regression
~~~~~~~~~~~~~~~~~~~~~~~~

* [ENH] ``sklearn`` wrappers to str-coerce columns of ``pd.DataFrame`` before passing by @fkiraly in https://github.com/sktime/skpro/pull/148
* [ENH] clean up copy-paste leftovers in ``BaseProbaRegressor`` by @fkiraly in https://github.com/sktime/skpro/pull/156
* [ENH] adapter for ``sklearn`` probabilistic regressors (:pr:`163`) :user:`fkiraly`
* [ENH] add tags to ``SklearnProbaReg`` (:pr:`168`) :user:`fkiraly`
* [ENH] interfacing all concrete ``sklearn`` probabilistic regressors (:pr:`166`) :user:`fkiraly`

Test framework
~~~~~~~~~~~~~~

* [ENH] scenario tests for mixed ``pandas`` column index types (:pr:`145`) :user:`fkiraly`
* [ENH] scitype inference utility, test class register, test class test condition (:pr:`159`) :user:`fkiraly`

Fixes
-----

Probabilistic regression
~~~~~~~~~~~~~~~~~~~~~~~~

* [BUG] in probabilistic regressors, ensure correct index treatment if ``X: pd.DataFrame`` and ``y: np.ndarray`` are passed (:pr:`146`) :user:`fkiraly`

Documentation
-------------

* [DOC] update ``AUTHORS.rst`` file (:pr:`147`) :user:`fkiraly`

Maintenance
-----------

* [MNT] [Dependabot](deps): Bump ``actions/upload-artifact`` from 3 to 4 (:pr:`154`) :user:`dependabot`
* [MNT] [Dependabot](deps): Bump ``actions/download-artifact`` from 3 to 4 (:pr:`153`) :user:`dependabot`
* [MNT] [Dependabot](deps): Bump ``actions/setup-python`` from 4 to 5 (:pr:`152`) :user:`dependabot`
* [MNT] [Dependabot](deps-dev): Update ``sphinx-gallery`` requirement from ``<0.15.0`` to ``<0.16.0`` (:pr:`149`) :user:`dependabot`
* [MNT] [Dependabot](deps-dev): Update ``scikit-base`` requirement from ``<0.7.0,>=0.6.1`` to ``>=0.6.1,<0.8.0`` (:pr:`169`) :user:`dependabot`
* [MNT] adding ``codecov.yml`` and turning coverage reports informational (:pr:`165`) :user:`fkiraly`
* [MNT] handle deprecation of ``pandas.DataFrame.applymap`` (:pr:`170`) :user:`fkiraly`
* [MNT] handle ``polars`` deprecations (:pr:`171`) :user:`fkiraly`


[2.1.1] - 2023-11-02
====================

Expand Down
6 changes: 3 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "skpro"
version = "2.1.1"
version = "2.1.2"
description = "A unified framework for probability distributions and probabilistic supervised regression"
authors = [
{name = "skpro developers", email = "[email protected]"},
Expand Down Expand Up @@ -44,7 +44,7 @@ dependencies = [
"numpy>=1.21.0,<1.27",
"pandas>=1.1.0,<2.2.0",
"packaging",
"scikit-base>=0.6.1,<0.7.0",
"scikit-base>=0.6.1,<0.8.0",
"scikit-learn>=0.24.0,<1.4.0",
"scipy<2.0.0,>=1.2.0",
]
Expand All @@ -56,7 +56,7 @@ all_extras = [
"mapie",
"matplotlib>=3.3.2",
"ngboost",
"polars<0.20.0; python_version < '3.12'",
"polars<0.21.0",
"pyarrow<14.0.0; python_version < '3.12'",
"statsmodels>=0.12.1",
"tabulate",
Expand Down
2 changes: 1 addition & 1 deletion skpro/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""skpro."""

__version__ = "2.1.1"
__version__ = "2.1.2"

__all__ = ["show_versions"]

Expand Down
2 changes: 1 addition & 1 deletion skpro/datatypes/_adapter/polars.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def check_polars_frame(obj, return_metadata=False, var_name="obj", lazy=False):
if isinstance(obj, pl.LazyFrame):
metadata["has_nans"] = "NA"
else:
hasnan = obj.null_count().sum(axis=1).to_numpy()[0] > 0
hasnan = obj.null_count().sum_horizontal().to_numpy()[0] > 0
metadata["has_nans"] = hasnan

return ret(True, None, metadata, return_metadata)
12 changes: 4 additions & 8 deletions skpro/distributions/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import pandas as pd

from skpro.base import BaseObject
from skpro.utils.pandas import df_map
from skpro.utils.validation._dependencies import _check_estimator_deps


Expand Down Expand Up @@ -229,7 +230,8 @@ def pdf(self, x):
"this may be numerically unstable"
)
warn(self._method_error_msg("pdf", fill_in=approx_method))
return self.log_pdf(x=x).applymap(np.exp)

return df_map(self.log_pdf(x=x))(np.exp)

raise NotImplementedError(self._method_error_msg("pdf", "error"))

Expand Down Expand Up @@ -269,13 +271,7 @@ def log_pdf(self, x):
)
warn(self._method_error_msg("log_pdf", fill_in=approx_method))

pdf_res = self.pdf(x=x)
# safe deprecation of applymap, renamed to map in pandas 2 versions
# this if/else ensures compatibility with a wider range of pandas versions
if hasattr(pdf_res, "map"):
return pdf_res.map(np.log)
else:
return pdf_res.applymap(np.log)
return df_map(self.pdf(x=x))(np.log)

raise NotImplementedError(self._method_error_msg("log_pdf", "error"))

Expand Down
4 changes: 2 additions & 2 deletions skpro/distributions/normal.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ class Normal(BaseDistribution):
Parameters
----------
mean : float or array of float (1D or 2D)
mu : float or array of float (1D or 2D)
mean of the normal distribution
sd : float or array of float (1D or 2D), must be positive
sigma : float or array of float (1D or 2D), must be positive
standard deviation of the normal distribution
index : pd.Index, optional, default = RangeIndex
columns : pd.Index, optional, default = RangeIndex
Expand Down
1 change: 1 addition & 0 deletions skpro/distributions/tests/test_qpd.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""Tests for quantile-parameterized distributions."""

import pytest

from skpro.distributions.qpd import QPD_B, QPD_S, QPD_U
from skpro.tests.test_switch import run_test_for_class

Expand Down
2 changes: 2 additions & 0 deletions skpro/regression/adapters/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
"""Adapters for probabilistic regressors."""
# copyright: skpro developers, BSD-3-Clause License (see LICENSE file)
6 changes: 6 additions & 0 deletions skpro/regression/adapters/sklearn/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""Adapters for probabilistic regressors, towards sklearn."""
# copyright: skpro developers, BSD-3-Clause License (see LICENSE file)

from skpro.regression.adapters.sklearn._sklearn_proba import SklearnProbaReg

__all__ = ["SklearnProbaReg"]
138 changes: 138 additions & 0 deletions skpro/regression/adapters/sklearn/_sklearn_proba.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# copyright: skpro developers, BSD-3-Clause License (see LICENSE file)
"""Adapter to sklearn probabilistic regressors."""

__author__ = ["fkiraly"]

import pandas as pd

from skpro.regression.base import BaseProbaRegressor
from skpro.utils.sklearn import prep_skl_df


class SklearnProbaReg(BaseProbaRegressor):
"""Adapter to sklearn regressors with variance prediction interface.
Wraps an sklearn regressor that can be queried for variance prediction,
and constructs an skpro regressor from it.
The wrapped resgressor must have a ``predict`` with
a ``return_std`` argument, and return a tuple of ``(y_pred, y_std)``,
both ndarray of shape (n_samples,) or (n_samples, n_targets).
Parameters
----------
estimator : sklearn regressor
Estimator to wrap, must have ``predict`` with ``return_std`` argument.
"""

_tags = {
"capability:multioutput": False,
"capability:missing": True,
}

def __init__(self, estimator):
self.estimator = estimator
super().__init__()

# todo: implement this, mandatory
def _fit(self, X, y):
"""Fit regressor to training data.
Writes to self:
Sets fitted model attributes ending in "_".
Parameters
----------
X : pandas DataFrame
feature instances to fit regressor to
y : pandas DataFrame, must be same length as X
labels to fit regressor to
Returns
-------
self : reference to self
"""
from sklearn import clone

self.estimator_ = clone(self.estimator)
X_inner = prep_skl_df(X)
y_inner = prep_skl_df(y)

if len(y_inner.columns) == 1:
y_inner = y_inner.iloc[:, 0]
self.estimator_.fit(X_inner, y_inner)
return self

def _predict(self, X):
"""Predict labels for data from features.
State required:
Requires state to be "fitted" = self.is_fitted=True
Accesses in self:
Fitted model attributes ending in "_"
Parameters
----------
X : pandas DataFrame, must have same columns as X in `fit`
data to predict labels for
Returns
-------
y : pandas DataFrame, same length as `X`, same columns as `y` in `fit`
labels predicted for `X`
"""
X_inner = prep_skl_df(X)
y_pred = self.estimator_.predict(X_inner)
return y_pred

def _predict_var(self, X):
"""Compute/return variance predictions.
private _predict_var containing the core logic, called from predict_var
Parameters
----------
X : pandas DataFrame, must have same columns as X in `fit`
data to predict labels for
Returns
-------
pred_var : pd.DataFrame
Column names are exactly those of ``y`` passed in ``fit``.
Row index is equal to row index of ``X``.
Entries are variance prediction, for var in col index.
A variance prediction for given variable and fh index is a predicted
variance for that variable and index, given observed data.
"""
X_inner = prep_skl_df(X)
_, y_std = self.estimator_.predict(X_inner, return_std=True)
y_std = pd.DataFrame(y_std, index=X.index, columns=X.columns)
y_var = y_std**2
return y_var

@classmethod
def get_test_params(cls, parameter_set="default"):
"""Return testing parameter settings for the estimator.
Parameters
----------
parameter_set : str, default="default"
Name of the set of test parameters to return, for use in tests. If no
special parameters are defined for a value, will return `"default"` set.
Returns
-------
params : dict or list of dict, default = {}
Parameters to create testing instances of the class
Each dict are parameters to construct an "interesting" test instance, i.e.,
`MyClass(**params)` or `MyClass(**params[i])` creates a valid test instance.
`create_test_instance` uses the first (or only) dictionary in `params`
"""
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.linear_model import BayesianRidge

param1 = {"estimator": BayesianRidge()}
param2 = {"estimator": GaussianProcessRegressor()}

return [param1, param2]
Loading

0 comments on commit c93b08f

Please sign in to comment.