Bigger model not loading anymore with same hardware #859

cloak2001 · 2024-05-21T16:58:26Z

cloak2001
May 21, 2024

Used to use the YI 34B Q4K_M model with 1.5t/s text generation speed on my humble hardware (RTX 3060 desktop, ram 16gb ddr4, ryzen 7 3700x). Now the same model doesn't even load. It takes around 5 minutes to load into koboldcpp from cmd prompt (Used to load under 20 seconds) and the text gen is also slow as snails. Its around one word every 3 minutes now. The smaller 8b model load times and text gen times are fine (load under 20 seconds and 19t/s on Q8K_M). Only the bigger model is unusable now. I don't know if I did anything wrong with the settings. I'm not a technical person, only tinkering around with LLM's. This is my first time posting on GitHub. Sorry to use layman terms to explain my problems.

LostRuins · 2024-05-22T04:19:38Z

LostRuins
May 22, 2024
Maintainer

You are out of memory, and hitting swap. Running a 34B model on 16GB of ram will be difficult.
Have you tried running it with mmap disabled?

2 replies

cloak2001 May 22, 2024
Author

Yes after asking around in the kobold discord, one helpful member suggested me to disable mmap. It slightly boosted load times (still very slow) but did nothing for text generation. It used to work perfectly fine before.

LostRuins May 22, 2024
Maintainer

Perhaps try the Q3_K_S quant instead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bigger model not loading anymore with same hardware #859

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Bigger model not loading anymore with same hardware #859

cloak2001 May 21, 2024

Replies: 1 comment · 2 replies

LostRuins May 22, 2024 Maintainer

cloak2001 May 22, 2024 Author

LostRuins May 22, 2024 Maintainer

cloak2001
May 21, 2024

Replies: 1 comment 2 replies

LostRuins
May 22, 2024
Maintainer

cloak2001 May 22, 2024
Author

LostRuins May 22, 2024
Maintainer