Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train1stuck at first epoch #45

Closed
fazlekarim opened this issue Jun 12, 2018 · 2 comments
Closed

train1stuck at first epoch #45

fazlekarim opened this issue Jun 12, 2018 · 2 comments

Comments

@fazlekarim
Copy link

My training is stuck on the first epoch and I cant seem to figure out why.

Below is what I see when I run python train1.py. Any help would be fantastic.

[fakarim@blipp78 deep-voice-conversion]$ vim log1_1.txt
net1/cbhg/highwaynet_1/dense2/bias:0 [64] 64
net1/cbhg/highwaynet_2/dense1/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_2/dense1/bias:0 [64] 64
net1/cbhg/highwaynet_2/dense2/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_2/dense2/bias:0 [64] 64
net1/cbhg/highwaynet_3/dense1/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_3/dense1/bias:0 [64] 64
net1/cbhg/highwaynet_3/dense2/kernel:0 [64, 64] 4096
net1/cbhg/highwaynet_3/dense2/bias:0 [64] 64
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/gates/kernel:0 [128, 128] 16384
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/gates/bias:0 [128] 128
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/candidate/kernel:0 [128, 64] 8192
net1/cbhg/gru/bidirectional_rnn/fw/gru_cell/candidate/bias:0 [64] 64
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/gates/kernel:0 [128, 128] 16384
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/gates/bias:0 [128] 128
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/candidate/kernel:0 [128, 64] 8192
net1/cbhg/gru/bidirectional_rnn/bw/gru_cell/candidate/bias:0 [64] 64
net1/dense/kernel:0 [128, 61] 7808
net1/dense/bias:0 [61] 61^[[36m
Total #vars=58, #param=363389 (1.39 MB assuming all float32)^[[0m
^[[32m[0611 19:17:35 @base.py:158]^[[0m Setup callbacks graph ...
^[[32m[0611 19:17:35 @summary.py:34]^[[0m Maintain moving average summary of 0 tensors.
^[[32m[0611 19:17:36 @base.py:174]^[[0m Creating the session ...
2018-06-11 19:17:37.187985: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-06-11 19:17:41.938789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:0a:00.0
totalMemory: 15.77GiB freeMemory: 15.36GiB
2018-06-11 19:17:41.938844: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-06-11 19:17:42.321558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-11 19:17:42.321609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-06-11 19:17:42.321619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-06-11 19:17:42.322000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15990 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:0a:00.0, compute capability: 7.0)
^[[32m[0611 19:17:43 @base.py:182]^[[0m Initializing the session ...
^[[32m[0611 19:17:43 @base.py:189]^[[0m Graph Finalized.
2018-06-11 19:17:45.195791: W tensorflow/core/kernels/queue_base.cc:285] _0_QueueInput/input_queue: Skipping cancelled dequeue attempt with queue not closed
^[[32m[0611 19:17:45 @concurrency.py:36]^[[0m Starting EnqueueThread QueueInput/input_queue ...
^[[32m[0611 19:17:45 @graph.py:70]^[[0m Running Op sync_variables_from_main_tower ...
^[[32m[0611 19:17:45 @base.py:209]^[[0m Start Epoch 1 ...
^M 0%| |0/100[00:00<?,?it/s]
"log1_1.txt" [noeol] 141L, 11139C

@caicaibins
Copy link

me too, can you tell me how to fix it.

@rohitmahor
Copy link

rohitmahor commented Feb 19, 2019

@fazlekarim why did you close the issue? if you get any solution please share with us. I have the same issue in train1.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants