4090 performance #6436
Replies: 12 comments 20 replies
-
try a higher batch size and ask the user to test it with the same for a well-rounded test. Many users are doing 1 img 512x512 and how fast it shows may be a system thing. Perform the same tests on the same sampler, with the same commit. |
Beta Was this translation helpful? Give feedback.
-
Are you guys manually updating and installing the latest CUDA for this or is it part of a git pull from Automatic1111? |
Beta Was this translation helpful? Give feedback.
-
My 4090 performance also not looking amazing. Depending on the batch, I might get 15-20 it/s. This is with the latest cudnn DLLs and Cuda 11.7. Windows 10. I did the other tricks like disable browser hardware acceleration, etc. One question I have for you @playlogitech -- what are the temps like in your computer/GPU? I noticed that due to poor cooling, mine keeps bumping up against 88C. That might explain at least some of the performance lag due to throttling? I'm going to keep monitoring and comparing based on this factor. Couldn't quite figure out how to get Pytorch 2 to work, though I did find a CUDA 11.8 (Lovelace support IIUC) wheel. https://download.pytorch.org/whl/nightly/cu118/torch-2.1.0.dev20230308%2Bcu118-cp310-cp310-win_amd64.whl |
Beta Was this translation helpful? Give feedback.
-
Are you both using --xformers ? Edit: Can you also check if they have overclocked their GPU's? |
Beta Was this translation helpful? Give feedback.
-
Yea, I'm using |
Beta Was this translation helpful? Give feedback.
-
me too, on ubuntu18.04 rtx4090 libcudnn8.7 512x512 EulerA 23-24.5 it/s |
Beta Was this translation helpful? Give feedback.
-
ok what the devil am i doing wrong here? i am on latest drivers on arch linux and cant get anywhere near that much iterations.. i even udated torch to a recent nightly build and am only getting <9 it/sec (yes, on 512x512 EulerA batch size 1) how can i verify what libcudnn is being utilized? my starting args are: driver in use is nvidia sidenote: GPU is a 4090 suprim liquid so thermals can be ruled out as the fans barely even bother to turn on |
Beta Was this translation helpful? Give feedback.
-
Where do you guys download the latest cudnn dll's? |
Beta Was this translation helpful? Give feedback.
-
guys if you are on latest torch from nightly builds you can stop using xformers at all and use console argument --opt-sdp-attention instead |
Beta Was this translation helpful? Give feedback.
-
I am getting 5 at best, usually <2... on my 4090, and that's with the new PyTorch... |
Beta Was this translation helpful? Give feedback.
-
I've followed all the steps above and have an odd observation. I can get 28-35 it/s only if my console window is visible. If it's minimized I get approximately -10 it/s drop. I don't think this is an illusion. There is a noticeable speed difference in watching the generated images pop up on the webui viewing window. Also, if I do a run with console in view and the next one minimized, the first few generated images report the same top speeds, but by around the 4th or 5th image the speed falls off to the degraded level. Attached a snapshot to show this: I'm doing the following test: Here is my system information. How does this look? |
Beta Was this translation helpful? Give feedback.
-
Should we still be installing CUDA 11.8 dlls? I'm confused why that would be necessary if cuda 11.8 is installed along Torch 2. |
Beta Was this translation helpful? Give feedback.
-
before cudnn libs update less than 18it/s on 512x512 EulerA, after update 23-25
today i saw a guy with 30it/s on same settings, and he do not understand why there so big difference
i asked a few more people to test their 4090 and all of them have same results as me - 23-25 it/s
the only difference between us is windows version, i use w10, he is on w11, but im not really sure that is the reason
Beta Was this translation helpful? Give feedback.
All reactions