Replies: 1 comment 2 replies
-
You are out of memory, and hitting swap. Running a 34B model on 16GB of ram will be difficult. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Used to use the YI 34B Q4K_M model with 1.5t/s text generation speed on my humble hardware (RTX 3060 desktop, ram 16gb ddr4, ryzen 7 3700x). Now the same model doesn't even load. It takes around 5 minutes to load into koboldcpp from cmd prompt (Used to load under 20 seconds) and the text gen is also slow as snails. Its around one word every 3 minutes now. The smaller 8b model load times and text gen times are fine (load under 20 seconds and 19t/s on Q8K_M). Only the bigger model is unusable now. I don't know if I did anything wrong with the settings. I'm not a technical person, only tinkering around with LLM's. This is my first time posting on GitHub. Sorry to use layman terms to explain my problems.
Beta Was this translation helpful? Give feedback.
All reactions