Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train a causal implicit model using multiple GPUs #4

Open
happyqianwei opened this issue Dec 1, 2019 · 2 comments
Open

train a causal implicit model using multiple GPUs #4

happyqianwei opened this issue Dec 1, 2019 · 2 comments

Comments

@happyqianwei
Copy link

Hi @mkocaoglu Murat, I try to first train a causal implicit model by using your source code. It ran well if using default one gpu. However, when I set more GPUs, e.g.
python main.py --causal_model big_causal_graph --is_pretrain True --num_gpu 8
I met the error as below. Please help me! Thanks a lot!

setting up pretrain: CausalController
Traceback (most recent call last):
File "main.py", line 89, in
trainer=get_trainer()
File "main.py", line 76, in get_trainer
trainer=Trainer(config,cc_config,model_config)
File "/home/qw/projects/CausalGAN/trainer.py", line 60, in init
self.cc.build_pretrain(label_queue)
File "/home/qw/projects/CausalGAN/causal_controller/CausalController.py", line 179, in build_pretrain
grad_cost,self.dcc_slopes=Grad_Penalty(real_inputs,fake_inputs,self.DCC,self.config)
File "/home/qw/projects/CausalGAN/causal_controller/models.py", line 45, in Grad_Penalty
interpolates = alpha*real_data + ((1-alpha)*fake_data)#Could do more if not fixed batch_size
File "/home/qw/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 866, in binary_op_wrapper
return func(x, y, name=name)
File "/home/qw/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1131, in _mul_dispatch
return gen_math_ops.mul(x, y, name=name)
File "/home/qw/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5042, in mul
"Mul", x=x, y=y, name=name)
File "/home/qw/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/qw/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/qw/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/qw/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1792, in init
control_input_ops)
File "/home/qw/anaconda3/envs/py27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1631, in _create_c_op
raise ValueError(str(e))
ValueError: Dimensions must be equal, but are 16 and 128 for 'mul_1' (op: 'Mul') with input shapes: [16,1], [128,9].

@mansourehk
Copy link

Were you able to fix this?
I found out that in [16,1], 16 is the batch_size and in [128,9] 128 is num_gpu*batch_size and 9 is the num of labels
I am facing the same problem.

@JiweiMaster
Copy link

I failed to use the dataset of download link in download.py, could you provide a new download link of your dataset, which is trained by your code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants