Model running produces NaN values. #4

Tbabtm · 2024-04-18T00:32:21Z

Hello, thank you for your work. However, I encountered some issues when using your model. When training the model with my dataset, I encountered NaN values. My dataset has the same format as weather.csv, but with different field values and numbers of fields. Interestingly, the same dataset can be trained on other models without any issues, such as ICLR's spotlight 'Itransformer'. When training with your model, all parameters remain unchanged, and training with seq_len=96 and pred_len in [96, 192, 336] results in NaN values and failure. However, training with seq_len=96 and pred_len=336 does not result in NaN values and is successful. I believe my data is fine, so there might be some bugs in your model. The specific error message is as follows:
Traceback (most recent call last): File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/run.py", line 112, in <module> exp.train(setting) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/exp/exp_main.py", line 143, in train outputs, balance_loss = self.model(batch_x) File "/data/zhangshi/.conda/envs/jj-commonenvs/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/models/PathFormer.py", line 57, in forward out, aux_loss = layer(out) File "/data/zhangshi/.conda/envs/jj-commonenvs/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/layers/AMS.py", line 103, in forward gates, load = self.noisy_top_k_gating(new_x, self.training) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/layers/AMS.py", line 94, in noisy_top_k_gating load = (self._prob_in_top_k(clean_logits, noisy_logits, noise_stddev, top_logits)).sum(0) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/layers/AMS.py", line 61, in _prob_in_top_k prob_if_in = normal.cdf((clean_values - threshold_if_in) / noise_stddev) File "/data/zhangshi/.conda/envs/jj-commonenvs/lib/python3.10/site-packages/torch/distributions/normal.py", line 87, in cdf self._validate_sample(value) File "/data/zhangshi/.conda/envs/jj-commonenvs/lib/python3.10/site-packages/torch/distributions/distribution.py", line 300, in _validate_sample raise ValueError( ValueError: Expected value argument (Tensor of shape (256, 4)) to be within the support (Real()) of the distribution Normal(loc: tensor([0.], device='cuda:0'), scale: tensor([1.], device='cuda:0')), but found invalid values: tensor([[nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan], ..., [nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan]], device='cuda:0', grad_fn=<DivBackward0>)

The text was updated successfully, but these errors were encountered:

shandongpengyuyan · 2024-06-18T07:03:14Z

Have you solved this problem

wickedCuriosity · 2024-06-25T10:27:22Z

我也遇到这个问题了。兄弟你解决了嘛？

Tbabtm · 2024-06-26T03:21:44Z

不好意思，当时就是想着复现一下，运行出问题我就没管了，我觉得还是发邮件问作者吧

…

------------------ 原始邮件 ------------------ 发件人: "decisionintelligence/pathformer" ***@***.***>; 发送时间: 2024年6月25日(星期二) 晚上6:27 ***@***.***>; ***@***.******@***.***>; 主题: Re: [decisionintelligence/pathformer] Model running produces NaN values. (Issue #4) 我也遇到这个问题了。兄弟你解决了嘛？ — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

Tbabtm · 2024-06-26T03:22:42Z

Sorry, I was just trying to replicate it. When I ran into problems, I didn't bother to fix them. I think it's best to email the author.

…

------------------ 原始邮件 ------------------ 发件人: "decisionintelligence/pathformer" ***@***.***>; 发送时间: 2024年6月18日(星期二) 下午3:03 ***@***.***>; ***@***.******@***.***>; 主题: Re: [decisionintelligence/pathformer] Model running produces NaN values. (Issue #4) Have you solved this problem — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

PengChen12 · 2024-08-14T15:40:14Z

We have resolved this issue, and you can download the new pathformer code from the GitHub repository. Note that you need to set the batch_norm parameter to 1 when running your dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model running produces NaN values. #4

Model running produces NaN values. #4

Tbabtm commented Apr 18, 2024

shandongpengyuyan commented Jun 18, 2024

wickedCuriosity commented Jun 25, 2024

Tbabtm commented Jun 26, 2024 via email

Tbabtm commented Jun 26, 2024 via email

PengChen12 commented Aug 14, 2024

Model running produces NaN values. #4

Model running produces NaN values. #4

Comments

Tbabtm commented Apr 18, 2024

shandongpengyuyan commented Jun 18, 2024

wickedCuriosity commented Jun 25, 2024

Tbabtm commented Jun 26, 2024 via email

Tbabtm commented Jun 26, 2024 via email

PengChen12 commented Aug 14, 2024