Does it works with cpu too? #7

andreae293 · 2022-09-11T10:32:43Z

Hi, has anyone ever tried to train with cpu?
i know it will be super slow but im tried for the fun of it

i currently disabled my gpu by setting this line in image_train_stable.py
torch.cuda.is_available = lambda : False

Traceback (most recent call last):
File "scripts\image_train_stable.py", line 157, in
main()
File "scripts\image_train_stable.py", line 85, in main
TrainLoop(
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\train_util.py", line 194, in run_loop
self.run_step(batch, cond)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\train_util.py", line 208, in run_step
self.forward_backward(batch, cond)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\train_util.py", line 236, in forward_backward
losses = compute_losses()
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\respace.py", line 96, in training_losses
return super().training_losses(self._wrap_model(model), *args, **kwargs)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\gaussian_diffusion.py", line 1137, in training_losses
model_output = model(x_t, self._scale_timesteps(t), **model_kwargs)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\respace.py", line 133, in call
return self.model(x, new_ts, **kwargs)
File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\unet.py", line 880, in forward
h = module(h, emb, context)
File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "c:\users\andre\desktop\ml\glid-3\guided_diffusion\unet.py", line 217, in forward
x = layer(x)
File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\conv.py", line 447, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\Users\andre\anaconda3\envs\ldm\lib\site-packages\torch\nn\modules\conv.py", line 443, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

sorry for bothering with useless question but am i doing something wrong?
thanks

edit:
nevermind
i removed both .half() from the image_train_stable.py and deleted --use_fp16 from the training arguments

this way i was able to train on cpu

timotheecour4 · 2022-10-19T23:35:14Z

@andreae293 can you please provide more details?
I tried:

export CUDA_VISIBLE_DEVICES=-1
MODEL_FLAGS="--actual_image_size 512 --lr_warmup_steps 10000 --ema_rate 0.9999 --attention_resolutions 64,32,16 --class_cond False --diffusion_steps 1000 --image_size 64 --learn_sigma False --noise_schedule linear --num_channels 320 --num_heads 8 --num_res_blocks 2 --resblock_updown False --use_fp16 False --use_scale_shift_norm False "
TRAIN_FLAGS="--lr 5e-5 --batch_size 32 --log_interval 10 --save_interval 5000 --kl_model kl.pt --resume_checkpoint diffusion.pt"
export OPENAI_LOGDIR=./logs/
python scripts/image_train_stable.py --data_dir /path/to/image-and-text-files $MODEL_FLAGS $TRAIN_FLAGS

where I change the default from README from --use_fp16 True to --use_fp16 False (and IIUC no need to remove .half() from the image_train_stable.py with this flag), but it gives:

RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 15.78 GiB total capacity; 10.52 GiB already allocated; 3.86 GiB free; 10.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ie, still seems to use gpu instead of cpu

ghost · 2022-10-24T19:05:36Z

Stable Diffusion requires CUDA to run the AI, as it is the language for communicating with the GPU and preforming the necessary calculations. Using with the CPU would require a complete rewrite or virtualization which requires more RAM and money than it would take to go and buy a supported CUDA GPU. Although if anyone who is reading this is willing, would it be possible to utilize a TPU from Kaggle or Google Collab instead? I feel like it might be more efficient than a GPU or CPU, as it is meant for processing Tensors directly.

andreae293 · 2022-10-25T19:02:44Z

@timotheecour4 sorry for the late response
if you dont have enough RAM you have to dedicate a lot of GB to the virtual memory (paging file or swap memory)
i dont know the minimum required by this repo,but i did dedicate 100 GB of virtual memory
also if you are trying to finetune i suggest you to look for dreambooth for stable diffusion

@TheRealUnBot stable diffusion does not necessarily requires CUDA-supporting hardware to run
since its based on pytorch you can run just fine in CPU with the downside of being x50 slower

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does it works with cpu too? #7

Does it works with cpu too? #7

andreae293 commented Sep 11, 2022 •

edited

Loading

timotheecour4 commented Oct 19, 2022

ghost commented Oct 24, 2022

andreae293 commented Oct 25, 2022

Does it works with cpu too? #7

Does it works with cpu too? #7

Comments

andreae293 commented Sep 11, 2022 • edited Loading

timotheecour4 commented Oct 19, 2022

ghost commented Oct 24, 2022

andreae293 commented Oct 25, 2022

andreae293 commented Sep 11, 2022 •

edited

Loading