Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model running produces NaN values. #4

Open
Tbabtm opened this issue Apr 18, 2024 · 5 comments
Open

Model running produces NaN values. #4

Tbabtm opened this issue Apr 18, 2024 · 5 comments

Comments

@Tbabtm
Copy link

Tbabtm commented Apr 18, 2024

Hello, thank you for your work. However, I encountered some issues when using your model. When training the model with my dataset, I encountered NaN values. My dataset has the same format as weather.csv, but with different field values and numbers of fields. Interestingly, the same dataset can be trained on other models without any issues, such as ICLR's spotlight 'Itransformer'. When training with your model, all parameters remain unchanged, and training with seq_len=96 and pred_len in [96, 192, 336] results in NaN values and failure. However, training with seq_len=96 and pred_len=336 does not result in NaN values and is successful. I believe my data is fine, so there might be some bugs in your model. The specific error message is as follows:
Traceback (most recent call last): File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/run.py", line 112, in <module> exp.train(setting) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/exp/exp_main.py", line 143, in train outputs, balance_loss = self.model(batch_x) File "/data/zhangshi/.conda/envs/jj-commonenvs/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/models/PathFormer.py", line 57, in forward out, aux_loss = layer(out) File "/data/zhangshi/.conda/envs/jj-commonenvs/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/layers/AMS.py", line 103, in forward gates, load = self.noisy_top_k_gating(new_x, self.training) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/layers/AMS.py", line 94, in noisy_top_k_gating load = (self._prob_in_top_k(clean_logits, noisy_logits, noise_stddev, top_logits)).sum(0) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/layers/AMS.py", line 61, in _prob_in_top_k prob_if_in = normal.cdf((clean_values - threshold_if_in) / noise_stddev) File "/data/zhangshi/.conda/envs/jj-commonenvs/lib/python3.10/site-packages/torch/distributions/normal.py", line 87, in cdf self._validate_sample(value) File "/data/zhangshi/.conda/envs/jj-commonenvs/lib/python3.10/site-packages/torch/distributions/distribution.py", line 300, in _validate_sample raise ValueError( ValueError: Expected value argument (Tensor of shape (256, 4)) to be within the support (Real()) of the distribution Normal(loc: tensor([0.], device='cuda:0'), scale: tensor([1.], device='cuda:0')), but found invalid values: tensor([[nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan], ..., [nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan]], device='cuda:0', grad_fn=<DivBackward0>)

@shandongpengyuyan
Copy link

Have you solved this problem

@wickedCuriosity
Copy link

我也遇到这个问题了。兄弟你解决了嘛?

@Tbabtm
Copy link
Author

Tbabtm commented Jun 26, 2024 via email

@Tbabtm
Copy link
Author

Tbabtm commented Jun 26, 2024 via email

@PengChen12
Copy link
Contributor

We have resolved this issue, and you can download the new pathformer code from the GitHub repository. Note that you need to set the batch_norm parameter to 1 when running your dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants