-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArithmeticError: NaN detected in train loss #102
Comments
Hi @davidecarnevali, Thank you for opening this issue. Changing the number of training epochs should not affect the training for the Let me know if any of these suggestions worked for you or if you are still experiencing this issue. Best, Clément |
Hi, `OverflowError Traceback (most recent call last) /nfs/users/dcarnevali/pyenvs/AINU/lib/python3.7/site-packages/pythae/pipelines/training.py in call(self, train_data, eval_data, callbacks) /nfs/users/dcarnevali/pyenvs/AINU/lib/python3.7/site-packages/pythae/trainers/base_trainer/base_trainer.py in train(self, log_output_dir) /nfs/users/dcarnevali/pyenvs/AINU/lib/python3.7/site-packages/pythae/trainers/base_trainer/base_trainer.py in train_step(self, epoch) /nfs/users/dcarnevali/pyenvs/AINU/lib/python3.7/site-packages/pythae/trainers/base_trainer/base_trainer.py in _optimizers_step(self, model_output) /nfs/users/dcarnevali/pyenvs/AINU/lib/python3.7/site-packages/torch/optim/optimizer.py in wrapper(*args, **kwargs) /nfs/users/dcarnevali/pyenvs/AINU/lib/python3.7/site-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs) /nfs/users/dcarnevali/pyenvs/AINU/lib/python3.7/site-packages/torch/optim/adamw.py in step(self, closure) /nfs/users/dcarnevali/pyenvs/AINU/lib/python3.7/site-packages/torch/optim/adamw.py in adamw(params, grads, exp_avgs, exp_avg_sqs, max_exp_avg_sqs, state_steps, foreach, capturable, amsgrad, beta1, beta2, lr, weight_decay, eps, maximize) /nfs/users/dcarnevali/pyenvs/AINU/lib/python3.7/site-packages/torch/optim/adamw.py in _single_tensor_adamw(params, grads, exp_avgs, exp_avg_sqs, max_exp_avg_sqs, state_steps, amsgrad, beta1, beta2, lr, weight_decay, eps, maximize, capturable) OverflowError: (34, 'Numerical result out of range')` |
I run the example code for pipeline below:
it used to work (at least when using 50 epochs) but now when I run it with 500 epochs I got already at early epochs:
I saw the same problem reported on ticket #79 and it seems has been fixed for SVAE
Could you please check?
Thank you
D.
The text was updated successfully, but these errors were encountered: