Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Automodels are not working after 1.7.4 #1255

Open
jinhangjiang opened this issue Jan 23, 2025 · 2 comments
Open

The Automodels are not working after 1.7.4 #1255

jinhangjiang opened this issue Jan 23, 2025 · 2 comments
Labels

Comments

@jinhangjiang
Copy link

What happened + What you expected to happen

All the automodels are not able to run when I install versions after 1.7.4. It kept throwing ValueError: Expected a parent.

I am using a gpu runtime on databricks. Automodels only work when I downgrade the package to version 1.7.4 and downgrade ray to 2.10.0

Versions / Dependencies

any version larger than 1.7.4

Reproduction script

Any automodel script will not work on my side.

Issue Severity

None

@marcopeix
Copy link
Contributor

Can you share a minima reproducible example code? Also, are you using multiple GPUs or just one?

@jinhangjiang
Copy link
Author

jinhangjiang commented Jan 24, 2025

@marcopeix I am using GPUs (and the type of GPUs seem irrelevant since I have tried with T4, V100, and H100). The runtime I am using on databricks is 15.4 LTS ML.

The code is quite simple:

from neuralforecast import NeuralForecast
from neuralforecast.utils import AirPassengersPanel
from neuralforecast.losses.pytorch import SMAPE
from neuralforecast.auto import AutoGRU

from ray import tune
from ray.tune.search.hyperopt import HyperOptSearch


gru_config = {
    "learning_rate": tune.choice([1e-3, 1e-2, 1e-4]),
    "max_steps": 20000,
    "batch_size": tune.choice([64, 256]),
    "random_seed": tune.randint(1, 100),
    "input_size": 104,
    "early_stop_patience_steps": 10,
    "val_check_steps": 50,

    'encoder_n_layers': tune.choice([2, 4, 6]),
    'encoder_hidden_size': tune.choice([200, 400, 512]),
    'encoder_dropout': tune.uniform(0.0, 0.3),
    'context_size': tune.choice([10, 20, 30]),
    'decoder_layers': tune.choice([2, 4, 6]),
    'decoder_hidden_size': tune.choice([200, 400, 512])

}

models = [AutoGRU(h = 12,
                       loss = SMAPE(),
                       config = gru_config,
                       backend = 'ray',
                       search_alg = HyperOptSearch(),
                       num_samples = 20,
                       refit_with_val=True)]

nf = NeuralForecast(models = models, freq = "ME", local_scaler_type = 'standard')
nf.fit(df = AirPassengersPanel, val_size = 12)

I just created a new env and found that

  • I was able to get the automodels work with neuralforecast ==2.0.0 or neuralforecast ==1.7.4 and the ray==2.10.0
  • when neuralforecast == 2.0.0 and ray==2.41.0 (I found ray[tune]>=2.2.0 in environment-cuda.yml), I got this error:

2025-01-24 16:11:44,423 ERROR tune_controller.py:1331 -- Trial task failed for trial _train_tune_5926a6f4
Traceback (most recent call last):
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
result = ray.get(future)
^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/_private/worker.py", line 2772, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/_private/worker.py", line 919, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::ImplicitFunc.train() (pid=35372, ip=10.139.106.0, actor_id=9940c705aa60105f83a6b36301000000, repr=_train_tune)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/tune/trainable/trainable.py", line 331, in train
raise skipped from exception_cause(skipped)
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/air/_internal/util.py", line 107, in run
self._ret = self._target(*self._args, **self._kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/tune/trainable/function_trainable.py", line 44, in
training_func=lambda: self._trainable_func(self.config),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/tune/trainable/function_trainable.py", line 249, in _trainable_func
output = fn()
^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-708137a3-4f29-44b1-8222-1b0743bf59d3/lib/python3.11/site-packages/ray/tune/trainable/util.py", line 130, in inner
return trainable(config, **fn_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/neuralforecast/common/_base_auto.py", line 214, in _train_tune
_ = self._fit_model(
^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/neuralforecast/common/_base_auto.py", line 362, in _fit_model
model = model.fit(
^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/neuralforecast/common/_base_recurrent.py", line 535, in fit
return self._fit(
^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/neuralforecast/common/_base_model.py", line 359, in _fit
trainer = pl.Trainer(**model.trainer_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pytorch_lightning/utilities/argparse.py", line 70, in insert_env_defaults
return fn(self, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 426, in init
self._callback_connector.on_trainer_init(
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 80, in on_trainer_init
_validate_callbacks_list(self.trainer.callbacks)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 228, in _validate_callbacks_list
stateful_callbacks = [cb for cb in callbacks if is_overridden("state_dict", instance=cb)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 228, in
stateful_callbacks = [cb for cb in callbacks if is_overridden("state_dict", instance=cb)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pytorch_lightning/utilities/model_helpers.py", line 42, in is_overridden
raise ValueError("Expected a parent")
ValueError: Expected a parent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants