NaN on RTX 3090 GPU training #6

PengJihua · 2024-05-16T10:01:45Z

I have been attempting to train the StarGAN v2 model using an RTX 3090 GPU, but I encounter a persistent issue where the loss values turn to NaN, and the generated images are solely composed of noise. This is puzzling, especially since the same code and data work flawlessly on a Tesla P100 GPU.

Could you please shed some light on what might be causing these discrepancies between the two GPUs? Any insights or suggestions you could provide would be immensely appreciated, as I am quite perplexed by this behavior.

eps696 · 2024-05-17T10:42:47Z

alas, i'm not an expert in GPU details, so can't guess out of the blue what may be the difference.
and that's too little info to guess anything. is it on colab or local? what python/torch versions are used? did you try other data on 3090?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NaN on RTX 3090 GPU training #6

NaN on RTX 3090 GPU training #6

PengJihua commented May 16, 2024

eps696 commented May 17, 2024

NaN on RTX 3090 GPU training #6

NaN on RTX 3090 GPU training #6

Comments

PengJihua commented May 16, 2024

eps696 commented May 17, 2024