-
Notifications
You must be signed in to change notification settings - Fork 443
774M Model running out of memory #24
Comments
Same here 👍 |
We are looking for a solution here: minimaxir/gpt-2-simple#108 |
I was able to get it working on a Tesla P40 GPU (24GB), was still OOM with Adam but switching optimizer to |
I think there was another person who said they got it working on a 24GB GPU. The unfortunate part is gcloud only offers V100 at most and that's all I and many others have access to at the moment. That's why I'm trying to find new optimizers or ways we can distribute the memory (although historically it's been hard for fully connected transformers). More here https://github.com/dantuluri/gpt-2 |
Yep it's really hard to find anything larger than 16GB but Azure does offer a 24GB instance (ND6s, in case someone wants to try it out). |
Works fine with SGD on Titan RTX (24Gb). |
@mgrankin is there some kind of cloud provider that provides RTX? |
@saippuakauppias I don't know any. Nvidia discourages use of RTX in datacenters. |
Just FYI, I have been able to fine-tune a sub-set of variables under Colab with good results. My forked notebook (via Tenoke's fork) demonstrating how to do this is here: https://github.com/jkraybill/gpt-2/blob/finetuning/GPT2-finetuning2-774M.ipynb I got the tip from an under-the-radar tweet by BasedBlue. (See comments here: minimaxir/gpt-2-simple#108 (comment)) |
@saippuakauppias FWIW I just learned today that Linode is running a paid pilot with 24GB Quadro RTX's. Haven't used them personally. |
@jkraybill Thanks! |
I started fine-tuning the model on a 48GB GPU today, with adam optimizer it is using 33GB of memory, I have even been able to increase the batch size due to the extra capacity of the GPU. |
@nkk0, Adafactor optimizer + checkpoints use ~8GB GPU RAM on 1 batch. Read all my comments in 108 issue in minimaxir repo (link above). |
Running on personal machine with GPUs/everything installed. Worked for 345M model well but getting into memory issues for 774.
I made sure memory saving gradients were on and batch size was just 1, any suggestions?
The text was updated successfully, but these errors were encountered: