train1stuck at first epoch #45

fazlekarim · 2018-06-12T01:37:00Z

My training is stuck on the first epoch and I cant seem to figure out why.

Below is what I see when I run python train1.py. Any help would be fantastic.

[fakarim@blipp78 deep-voice-conversion]$ vim log1_1.txt
net1/cbhg/highwaynet_1/dense2/bias:0 [64] 64
net1/cbhg/highwaynet_2/dense1/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_2/dense1/bias:0 [64] 64
net1/cbhg/highwaynet_2/dense2/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_2/dense2/bias:0 [64] 64
net1/cbhg/highwaynet_3/dense1/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_3/dense1/bias:0 [64] 64
net1/cbhg/highwaynet_3/dense2/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_3/dense2/bias:0 [64] 64
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/gates/kernel:0 [128, 128] 16384
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/gates/bias:0 [128] 128
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/candidate/kernel:0 [128, 64] 8192
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/candidate/bias:0 [64] 64
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/gates/kernel:0 [128, 128] 16384
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/gates/bias:0 [128] 128
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/candidate/kernel:0 [128, 64] 8192
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/candidate/bias:0 [64] 64
net1/dense/kernel:0 [128, 61] 7808
net1/dense/bias:0 [61] 61^[[36m
Total #vars=58, #param=363389 (1.39 MB assuming all float32)^[[0m
^[[32m[0611 19:17:35 @base.py:158]^[[0m Setup callbacks graph ...
^[[32m[0611 19:17:35 @summary.py:34]^[[0m Maintain moving average summary of 0 tensors.
^[[32m[0611 19:17:36 @base.py:174]^[[0m Creating the session ...
2018-06-11 19:17:37.187985: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-06-11 19:17:41.938789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:0a:00.0
totalMemory: 15.77GiB freeMemory: 15.36GiB
2018-06-11 19:17:41.938844: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-06-11 19:17:42.321558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-11 19:17:42.321609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-06-11 19:17:42.321619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-06-11 19:17:42.322000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15990 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:0a:00.0, compute capability: 7.0)
^[[32m[0611 19:17:43 @base.py:182]^[[0m Initializing the session ...
^[[32m[0611 19:17:43 @base.py:189]^[[0m Graph Finalized.
2018-06-11 19:17:45.195791: W tensorflow/core/kernels/queue_base.cc:285] _0_QueueInput/input_queue: Skipping cancelled dequeue attempt with queue not closed
^[[32m[0611 19:17:45 @concurrency.py:36]^[[0m Starting EnqueueThread QueueInput/input_queue ...
^[[32m[0611 19:17:45 @graph.py:70]^[[0m Running Op sync_variables_from_main_tower ...
^[[32m[0611 19:17:45 @base.py:209]^[[0m Start Epoch 1 ...
^M 0%| |0/100[00:00<?,?it/s]
"log1_1.txt" [noeol] 141L, 11139C

caicaibins · 2018-12-20T09:53:57Z

me too, can you tell me how to fix it.

rohitmahor · 2019-02-19T22:34:33Z

@fazlekarim why did you close the issue? if you get any solution please share with us. I have the same issue in train1.py.

fazlekarim closed this as completed Jun 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train1stuck at first epoch #45

train1stuck at first epoch #45

fazlekarim commented Jun 12, 2018

caicaibins commented Dec 20, 2018

rohitmahor commented Feb 19, 2019 •

edited

Loading

train1stuck at first epoch #45

train1stuck at first epoch #45

Comments

fazlekarim commented Jun 12, 2018

caicaibins commented Dec 20, 2018

rohitmahor commented Feb 19, 2019 • edited Loading

rohitmahor commented Feb 19, 2019 •

edited

Loading