-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model running produces NaN values. #4
Comments
Have you solved this problem |
我也遇到这个问题了。兄弟你解决了嘛? |
不好意思,当时就是想着复现一下,运行出问题我就没管了,我觉得还是发邮件问作者吧
…------------------ 原始邮件 ------------------
发件人: "decisionintelligence/pathformer" ***@***.***>;
发送时间: 2024年6月25日(星期二) 晚上6:27
***@***.***>;
***@***.******@***.***>;
主题: Re: [decisionintelligence/pathformer] Model running produces NaN values. (Issue #4)
我也遇到这个问题了。兄弟你解决了嘛?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Sorry, I was just trying to replicate it. When I ran into problems, I didn't bother to fix them. I think it's best to email the author.
…------------------ 原始邮件 ------------------
发件人: "decisionintelligence/pathformer" ***@***.***>;
发送时间: 2024年6月18日(星期二) 下午3:03
***@***.***>;
***@***.******@***.***>;
主题: Re: [decisionintelligence/pathformer] Model running produces NaN values. (Issue #4)
Have you solved this problem
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
We have resolved this issue, and you can download the new pathformer code from the GitHub repository. Note that you need to set the batch_norm parameter to 1 when running your dataset. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello, thank you for your work. However, I encountered some issues when using your model. When training the model with my dataset, I encountered NaN values. My dataset has the same format as weather.csv, but with different field values and numbers of fields. Interestingly, the same dataset can be trained on other models without any issues, such as ICLR's spotlight 'Itransformer'. When training with your model, all parameters remain unchanged, and training with seq_len=96 and pred_len in [96, 192, 336] results in NaN values and failure. However, training with seq_len=96 and pred_len=336 does not result in NaN values and is successful. I believe my data is fine, so there might be some bugs in your model. The specific error message is as follows:
Traceback (most recent call last): File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/run.py", line 112, in <module> exp.train(setting) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/exp/exp_main.py", line 143, in train outputs, balance_loss = self.model(batch_x) File "/data/zhangshi/.conda/envs/jj-commonenvs/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/models/PathFormer.py", line 57, in forward out, aux_loss = layer(out) File "/data/zhangshi/.conda/envs/jj-commonenvs/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/layers/AMS.py", line 103, in forward gates, load = self.noisy_top_k_gating(new_x, self.training) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/layers/AMS.py", line 94, in noisy_top_k_gating load = (self._prob_in_top_k(clean_logits, noisy_logits, noise_stddev, top_logits)).sum(0) File "/data/zhangshi/jiangjun/remote/pywork/tmp/pycharm_project_431/layers/AMS.py", line 61, in _prob_in_top_k prob_if_in = normal.cdf((clean_values - threshold_if_in) / noise_stddev) File "/data/zhangshi/.conda/envs/jj-commonenvs/lib/python3.10/site-packages/torch/distributions/normal.py", line 87, in cdf self._validate_sample(value) File "/data/zhangshi/.conda/envs/jj-commonenvs/lib/python3.10/site-packages/torch/distributions/distribution.py", line 300, in _validate_sample raise ValueError( ValueError: Expected value argument (Tensor of shape (256, 4)) to be within the support (Real()) of the distribution Normal(loc: tensor([0.], device='cuda:0'), scale: tensor([1.], device='cuda:0')), but found invalid values: tensor([[nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan], ..., [nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan]], device='cuda:0', grad_fn=<DivBackward0>)
The text was updated successfully, but these errors were encountered: