The Automodels are not working after 1.7.4 #1255

jinhangjiang · 2025-01-23T17:50:21Z

What happened + What you expected to happen

All the automodels are not able to run when I install versions after 1.7.4. It kept throwing ValueError: Expected a parent.

I am using a gpu runtime on databricks. Automodels only work when I downgrade the package to version 1.7.4 and downgrade ray to 2.10.0

Versions / Dependencies

any version larger than 1.7.4

Reproduction script

Any automodel script will not work on my side.

Issue Severity

None

marcopeix · 2025-01-24T15:37:06Z

Can you share a minima reproducible example code? Also, are you using multiple GPUs or just one?

jinhangjiang · 2025-01-24T16:22:06Z

@marcopeix I am using GPUs (and the type of GPUs seem irrelevant since I have tried with T4, V100, and H100). The runtime I am using on databricks is 15.4 LTS ML.

The code is quite simple:

from neuralforecast import NeuralForecast
from neuralforecast.utils import AirPassengersPanel
from neuralforecast.losses.pytorch import SMAPE
from neuralforecast.auto import AutoGRU

from ray import tune
from ray.tune.search.hyperopt import HyperOptSearch


gru_config = {
    "learning_rate": tune.choice([1e-3, 1e-2, 1e-4]),
    "max_steps": 20000,
    "batch_size": tune.choice([64, 256]),
    "random_seed": tune.randint(1, 100),
    "input_size": 104,
    "early_stop_patience_steps": 10,
    "val_check_steps": 50,

    'encoder_n_layers': tune.choice([2, 4, 6]),
    'encoder_hidden_size': tune.choice([200, 400, 512]),
    'encoder_dropout': tune.uniform(0.0, 0.3),
    'context_size': tune.choice([10, 20, 30]),
    'decoder_layers': tune.choice([2, 4, 6]),
    'decoder_hidden_size': tune.choice([200, 400, 512])

}

models = [AutoGRU(h = 12,
                       loss = SMAPE(),
                       config = gru_config,
                       backend = 'ray',
                       search_alg = HyperOptSearch(),
                       num_samples = 20,
                       refit_with_val=True)]

nf = NeuralForecast(models = models, freq = "ME", local_scaler_type = 'standard')
nf.fit(df = AirPassengersPanel, val_size = 12)

I just created a new env and found that

I was able to get the automodels work with neuralforecast ==2.0.0 or neuralforecast ==1.7.4 and the ray==2.10.0
when neuralforecast == 2.0.0 and ray==2.41.0 (I found ray[tune]>=2.2.0 in environment-cuda.yml), I got this error:

2025-01-24 16:11:44,423 ERROR tune_controller.py:1331 -- Trial task failed for trial _train_tune_5926a6f4
Traceback (most recent call last):
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
result = ray.get(future)
^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/_private/worker.py", line 2772, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/_private/worker.py", line 919, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::ImplicitFunc.train() (pid=35372, ip=10.139.106.0, actor_id=9940c705aa60105f83a6b36301000000, repr=_train_tune)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/tune/trainable/trainable.py", line 331, in train
raise skipped from exception_cause(skipped)
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/air/_internal/util.py", line 107, in run
self._ret = self._target(*self._args, **self._kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/tune/trainable/function_trainable.py", line 44, in
training_func=lambda: self._trainable_func(self.config),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/tune/trainable/function_trainable.py", line 249, in _trainable_func
output = fn()
^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/tune/trainable/util.py", line 130, in inner
return trainable(config, **fn_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/neuralforecast/common/_base_auto.py", line 214, in _train_tune
_ = self._fit_model(
^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/neuralforecast/common/_base_auto.py", line 362, in _fit_model
model = model.fit(
^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/neuralforecast/common/_base_recurrent.py", line 535, in fit
return self._fit(
^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/neuralforecast/common/_base_model.py", line 359, in _fit
trainer = pl.Trainer(**model.trainer_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pytorch_lightning/utilities/argparse.py", line 70, in insert_env_defaults
return fn(self, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 426, in init
self._callback_connector.on_trainer_init(
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 80, in on_trainer_init
_validate_callbacks_list(self.trainer.callbacks)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 228, in _validate_callbacks_list
stateful_callbacks = [cb for cb in callbacks if is_overridden("state_dict", instance=cb)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 228, in
stateful_callbacks = [cb for cb in callbacks if is_overridden("state_dict", instance=cb)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pytorch_lightning/utilities/model_helpers.py", line 42, in is_overridden
raise ValueError("Expected a parent")
ValueError: Expected a parent

jinhangjiang added the bug label Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Automodels are not working after 1.7.4 #1255

The Automodels are not working after 1.7.4 #1255

jinhangjiang commented Jan 23, 2025

marcopeix commented Jan 24, 2025

jinhangjiang commented Jan 24, 2025 •

edited

Loading

The Automodels are not working after 1.7.4 #1255

The Automodels are not working after 1.7.4 #1255

Comments

jinhangjiang commented Jan 23, 2025

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

marcopeix commented Jan 24, 2025

jinhangjiang commented Jan 24, 2025 • edited Loading

jinhangjiang commented Jan 24, 2025 •

edited

Loading