You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 3, 2021. It is now read-only.
I have been trying to train on my own data.
Dataset consists of 539278 user_ids and 1551731 items. Data is super sparse.
While training my RMSE: nan. Should I take absolute value of mseloss?
I have PyTorch 0.4, Cuda 9.0. Training on gtx 1080ti.
Using GPUs: [0] Doing epoch 0 of 12 run.py:198: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number t_loss += loss.data[0] [0, 0] RMSE: 8.0848995 run.py:212: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number total_epoch_loss += loss.data[0] [0, 1000] RMSE: nan [0, 2000] RMSE: nan [0, 3000] RMSE: nan [0, 4000] RMSE: nan [0, 5000] RMSE: nan [0, 6000] RMSE: nan [0, 7000] RMSE: nan [0, 8000] RMSE: nan Total epoch 0 finished in 1966.838391304016 seconds with TRAINING RMSE loss: nan run.py:74: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number total_epoch_loss += loss.data[0] run.py:75: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number denom += num_ratings.data[0] Epoch 0 EVALUATION LOSS: nan Saving model to model_save/model.epoch_0 Doing epoch 1 of 12 [1, 0] RMSE: nan [1, 1000] RMSE: nan [1, 2000] RMSE: nan [1, 3000] RMSE: nan [1, 4000] RMSE: nan [1, 5000] RMSE: nan [1, 6000] RMSE: nan [1, 7000] RMSE: nan [1, 8000] RMSE: nan
Could you please help me out?
The text was updated successfully, but these errors were encountered:
@okuchaiev
I have lowered my lr to 0.001 and optimizer to Adam. nan's disappeared but loss wont converge.
label's range between 1-5. Also my batch:64 and hidden_layers:128,196,256,320.
I have tried different sets of hidden layers, and these ones show some kind of loss decrease.
For example 128,128,256 increased loss after each epoch.
I know it's art of tuning, but could you please toward me to right direction?
maybe my data is too sparse and batch,hidden layers are too small?
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
hey @okuchaiev
I have been trying to train on my own data.
Dataset consists of 539278 user_ids and 1551731 items. Data is super sparse.
While training my RMSE: nan. Should I take absolute value of mseloss?
I have PyTorch 0.4, Cuda 9.0. Training on gtx 1080ti.
Using GPUs: [0] Doing epoch 0 of 12 run.py:198: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number t_loss += loss.data[0] [0, 0] RMSE: 8.0848995 run.py:212: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number total_epoch_loss += loss.data[0] [0, 1000] RMSE: nan [0, 2000] RMSE: nan [0, 3000] RMSE: nan [0, 4000] RMSE: nan [0, 5000] RMSE: nan [0, 6000] RMSE: nan [0, 7000] RMSE: nan [0, 8000] RMSE: nan Total epoch 0 finished in 1966.838391304016 seconds with TRAINING RMSE loss: nan run.py:74: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number total_epoch_loss += loss.data[0] run.py:75: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number denom += num_ratings.data[0] Epoch 0 EVALUATION LOSS: nan Saving model to model_save/model.epoch_0 Doing epoch 1 of 12 [1, 0] RMSE: nan [1, 1000] RMSE: nan [1, 2000] RMSE: nan [1, 3000] RMSE: nan [1, 4000] RMSE: nan [1, 5000] RMSE: nan [1, 6000] RMSE: nan [1, 7000] RMSE: nan [1, 8000] RMSE: nan
Could you please help me out?
The text was updated successfully, but these errors were encountered: