train a causal implicit model using multiple GPUs #4

happyqianwei · 2019-12-01T09:46:35Z

Hi @mkocaoglu Murat, I try to first train a causal implicit model by using your source code. It ran well if using default one gpu. However, when I set more GPUs, e.g.
python main.py --causal_model big_causal_graph --is_pretrain True --num_gpu 8
I met the error as below. Please help me! Thanks a lot!

setting up pretrain: CausalController
Traceback (most recent call last):
File "main.py", line 89, in
trainer=get_trainer()
File "main.py", line 76, in get_trainer
trainer=Trainer(config,cc_config,model_config)
File "/home/qw/projects/CausalGAN/trainer.py", line 60, in init
self.cc.build_pretrain(label_queue)
File "/home/qw/projects/CausalGAN/causal_controller/CausalController.py", line 179, in build_pretrain
grad_cost,self.dcc_slopes=Grad_Penalty(real_inputs,fake_inputs,self.DCC,self.config)
File "/home/qw/projects/CausalGAN/causal_controller/models.py", line 45, in Grad_Penalty
interpolates = alpha*real_data + ((1-alpha)*fake_data)#Could do more if not fixed batch_size
File "/home/qw/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 866, in binary_op_wrapper
return func(x, y, name=name)
File "/home/qw/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1131, in _mul_dispatch
return gen_math_ops.mul(x, y, name=name)
File "/home/qw/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5042, in mul
"Mul", x=x, y=y, name=name)
File "/home/qw/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/qw/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/qw/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/qw/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1792, in init
control_input_ops)
File "/home/qw/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1631, in _create_c_op
raise ValueError(str(e))
ValueError: Dimensions must be equal, but are 16 and 128 for 'mul_1' (op: 'Mul') with input shapes: [16,1], [128,9].

The text was updated successfully, but these errors were encountered:

mansourehk · 2020-02-24T22:39:40Z

Were you able to fix this?
I found out that in [16,1], 16 is the batch_size and in [128,9] 128 is num_gpu*batch_size and 9 is the num of labels
I am facing the same problem.

JiweiMaster · 2021-10-16T11:20:12Z

I failed to use the dataset of download link in download.py, could you provide a new download link of your dataset, which is trained by your code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train a causal implicit model using multiple GPUs #4

train a causal implicit model using multiple GPUs #4

happyqianwei commented Dec 1, 2019

mansourehk commented Feb 24, 2020

JiweiMaster commented Oct 16, 2021

train a causal implicit model using multiple GPUs #4

train a causal implicit model using multiple GPUs #4

Comments

happyqianwei commented Dec 1, 2019

mansourehk commented Feb 24, 2020

JiweiMaster commented Oct 16, 2021