fix: replace json with pickle for storing lgbm params #190

pmandiola · 2024-12-24T13:47:46Z

Motivation

Fixes #188, allowing the use of custom objective functions

Description of the changes

Replaces json.dumps and json.loads with pickle to store and retrieve the trials' lightgbm_params dictionary

github-actions · 2024-12-31T23:05:43Z

This pull request has not seen any recent activity.

nabenabe0928 · 2025-01-07T10:07:01Z

Thank you for your PR!

Let me leave some comments:

pickle --> base64 may generate a long string that exceeds the limit of MySQL, so we should avoid using this procedure,
meanwhile, we use the lgbm_params only to provide it with users, so we could convert any contents that cannot be serialized to a string.

For example, what about the following:

serializable_lgbm_params = {}
for k, v in lgbm_params.items():
    try:
        json.dumps([v])
        serializable_lgbm_params[k] = v
    except TypeError:
        # We store only the name of an unserializable object.
        serializable_lgbm_params[k] = v.__name__

pmandiola · 2025-01-07T18:23:48Z

Thanks @nabenabe0928 for reviewing the PR.

My first fix attempt was exactly what you suggested, but it didn't work. When optimizing, I think the LightGBMTuner restores the parametes of the best trial after finishing all the trials for a specific step. So when running the tuner, the first 7 trials that search for feature_fraction work, but when it switches to the next parameter num_leaves it throws an error as it doesn't recognize the objective function (because we stored it as a string).

Here is the error trace:

feature_fraction, val_score: 0.007458: 100%|#| 7/7 
num_leaves, val_score: 0.007458:   0%| | 0/20 [00:0[LightGBM] [Fatal] Unknown objective type name: lgb_obj
[W 2025-01-07 15:11:53,040] Trial 7 failed with parameters: {'num_leaves': 239} because of the following error: LightGBMError('Unknown objective type name: lgb_obj').
Traceback (most recent call last):
  File "/Users/pmandiola/Documents/Lidz/data-exploration/.venv/lib/python3.11/site-packages/optuna/study/_optimize.py", line 197, in _run_trial
    value_or_values = func(trial)
                      ^^^^^^^^^^^
  File "/Users/pmandiola/Documents/optuna-integration/optuna_integration/_lightgbm_tuner/optimize.py", line 321, in __call__
    cv_results = lgb.cv(self.lgbm_params, train_set, **self.lgbm_kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pmandiola/Documents/Lidz/data-exploration/.venv/lib/python3.11/site-packages/lightgbm/engine.py", line 774, in cv
    cvfolds = _make_n_folds(
              ^^^^^^^^^^^^^^
  File "/Users/pmandiola/Documents/Lidz/data-exploration/.venv/lib/python3.11/site-packages/lightgbm/engine.py", line 557, in _make_n_folds
    booster_for_fold = Booster(tparam, train_set)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pmandiola/Documents/Lidz/data-exploration/.venv/lib/python3.11/site-packages/lightgbm/basic.py", line 3641, in __init__
    _safe_call(
  File "/Users/pmandiola/Documents/Lidz/data-exploration/.venv/lib/python3.11/site-packages/lightgbm/basic.py", line 296, in _safe_call
    raise LightGBMError(_LIB.LGBM_GetLastError().decode("utf-8"))
lightgbm.basic.LightGBMError: Unknown objective type name: lgb_obj
[W 2025-01-07 15:11:53,042] Trial 7 failed with value None.

nabenabe0928 · 2025-01-08T03:42:04Z

@pmandiola
Would you mind providing a minimal code?
We will try finding a better solution with the code.
Anyways, since pickle may not work for MySQL, this PR would be a breaking change if we merge it as is, meaning that we need to figure out another solution.

HideakiImamura · 2025-01-08T05:32:59Z

@nabenabe0928 Could you review this PR?

pmandiola · 2025-01-08T14:15:57Z

Sure, the code I tested is:

lgbm_params = copy.deepcopy(self.lgbm_params)
if type(self.lgbm_params["objective"]) != str:
    lgbm_params["objective"] = lgbm_params["objective"].__name__
trial.storage.set_trial_system_attr(
    trial._trial_id, _LGBM_PARAMS_KEY, json.dumps(lgbm_params)
)

One alternative solution could be to just store the oprimized parameters from the current Trial instead of the full lgbm_params. I tried it just changing line 271 and it seems to work (the tuning is running correctly) but I'm not sure if something else could be broken:

trial.storage.set_trial_system_attr(
    trial._trial_id, _LGBM_PARAMS_KEY, json.dumps(trial.params)
)

nabenabe0928 · 2025-01-08T14:42:37Z

@pmandiola
It would be very grateful if you could provide the code that shows how you ran LightGBMTuner exactly. (the problem is that there are several ways to run LightGBMTuner actually.)

pmandiola · 2025-01-08T18:01:19Z

This is what I did (skipping some previous details):

params = {
    "objective": focal_loss_obj,
    "metric": "focal_loss",
    "boosting_type": "gbdt",
    "learning_rate": 0.01,
    "verbosity": 1,
}

tuner = lgbo.LightGBMTunerCV(
    params,
    dtrain,
    num_boost_round=2000,
    callbacks=[lgb.early_stopping(100), lgb.log_evaluation(100)],
    feval = focal_loss_eval
)

tuner.run()

nabenabe0928 · 2025-01-09T04:56:14Z

Verification Code

from __future__ import annotations

import optuna.integration.lightgbm as lgb

from lightgbm import early_stopping
from lightgbm import log_evaluation
import numpy as np
import sklearn.datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import KFold


def custom_binary_objective(
    y_true: np.ndarray, y_pred: lgb.Dataset
) -> tuple[np.ndarray, np.ndarray]:
    preds = y_pred.get_label()
    ps = 1.0 / (1.0 + np.exp(-preds))
    res = y_true - ps
    grad = -res / (ps * (1 - ps))
    hess = -ps * (1 - ps) * (1 - 2 * y_true) / ((ps * (1 - ps)) ** 2)
    return grad, hess


def custom_accuracy(
    y_true: np.ndarray, y_pred: lgb.Dataset
) -> tuple[str, float, bool]:
    preds = y_pred.get_label()
    ps = np.round(1.0 / (1.0 + np.exp(-preds)))
    return "custom_accuracy", accuracy_score(y_true, ps), True


if __name__ == "__main__":
    data, target = sklearn.datasets.load_breast_cancer(return_X_y=True)
    dtrain = lgb.Dataset(data, label=target)

    params = {
        "objective": custom_binary_objective,
        "metric": "custom_accuracy",
        "verbosity": -1,
        "boosting_type": "gbdt",
    }
    tuner = lgb.LightGBMTunerCV(
        params,
        dtrain,
        callbacks=[early_stopping(10), log_evaluation(10)],
        feval=custom_accuracy,
    )

    tuner.run()

nabenabe0928 · 2025-01-09T04:59:21Z

Another approach for the bug fix:

https://github.com/optuna/optuna-integration/compare/main...nabenabe0928:optuna-integration:fix/accept-custom-objective-in-lgbm-tuner?expand=1

We need to check whether this change becomes a breaking change or not.
My biggest concern is the picklability.

nabenabe0928 · 2025-01-09T06:23:33Z

@pmandiola
We need a bit more time to discuss how we should tackle your issue, so please give us some time to think through a solution.
Do you mind if we ask you to reflect our suggestions after we discuss them internally?

pmandiola · 2025-01-09T14:02:50Z

Sure, happy to help!

fix: replace json with pickle for storing lgbm params

cf34dd2

github-actions bot added the stale Exempt from stale bot labeling. label Dec 31, 2024

nabenabe0928 removed the stale Exempt from stale bot labeling. label Jan 7, 2025

HideakiImamura assigned nabenabe0928 Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: replace json with pickle for storing lgbm params #190

fix: replace json with pickle for storing lgbm params #190

pmandiola commented Dec 24, 2024

github-actions bot commented Dec 31, 2024

nabenabe0928 commented Jan 7, 2025

pmandiola commented Jan 7, 2025 •

edited

Loading

nabenabe0928 commented Jan 8, 2025

HideakiImamura commented Jan 8, 2025

pmandiola commented Jan 8, 2025

nabenabe0928 commented Jan 8, 2025 •

edited

Loading

pmandiola commented Jan 8, 2025

nabenabe0928 commented Jan 9, 2025 •

edited

Loading

nabenabe0928 commented Jan 9, 2025 •

edited

Loading

nabenabe0928 commented Jan 9, 2025

pmandiola commented Jan 9, 2025

fix: replace json with pickle for storing lgbm params #190

Are you sure you want to change the base?

fix: replace json with pickle for storing lgbm params #190

Conversation

pmandiola commented Dec 24, 2024

Motivation

Description of the changes

github-actions bot commented Dec 31, 2024

nabenabe0928 commented Jan 7, 2025

pmandiola commented Jan 7, 2025 • edited Loading

nabenabe0928 commented Jan 8, 2025

HideakiImamura commented Jan 8, 2025

pmandiola commented Jan 8, 2025

nabenabe0928 commented Jan 8, 2025 • edited Loading

pmandiola commented Jan 8, 2025

nabenabe0928 commented Jan 9, 2025 • edited Loading

nabenabe0928 commented Jan 9, 2025 • edited Loading

nabenabe0928 commented Jan 9, 2025

pmandiola commented Jan 9, 2025

pmandiola commented Jan 7, 2025 •

edited

Loading

nabenabe0928 commented Jan 8, 2025 •

edited

Loading

nabenabe0928 commented Jan 9, 2025 •

edited

Loading

nabenabe0928 commented Jan 9, 2025 •

edited

Loading