You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been attempting to train the StarGAN v2 model using an RTX 3090 GPU, but I encounter a persistent issue where the loss values turn to NaN, and the generated images are solely composed of noise. This is puzzling, especially since the same code and data work flawlessly on a Tesla P100 GPU.
Could you please shed some light on what might be causing these discrepancies between the two GPUs? Any insights or suggestions you could provide would be immensely appreciated, as I am quite perplexed by this behavior.
The text was updated successfully, but these errors were encountered:
eps696
changed the title
I have been attempting to train the StarGAN v2 model using an RTX 3090 GPU, but I encounter a persistent issue where the loss values turn to NaN, and the generated images are solely composed of noise. This is puzzling, especially since the same code and data work flawlessly on a Tesla P100 GPU. Could you please shed some light on what might be causing these discrepancies between the two GPUs? Any insights or suggestions you could provide would be immensely appreciated, as I am quite perplexed by this behavior.
NaN on RTX 3090 GPU training
May 17, 2024
alas, i'm not an expert in GPU details, so can't guess out of the blue what may be the difference.
and that's too little info to guess anything. is it on colab or local? what python/torch versions are used? did you try other data on 3090?
I have been attempting to train the StarGAN v2 model using an RTX 3090 GPU, but I encounter a persistent issue where the loss values turn to NaN, and the generated images are solely composed of noise. This is puzzling, especially since the same code and data work flawlessly on a Tesla P100 GPU.
Could you please shed some light on what might be causing these discrepancies between the two GPUs? Any insights or suggestions you could provide would be immensely appreciated, as I am quite perplexed by this behavior.
The text was updated successfully, but these errors were encountered: