settings for python3 -m llama_cpp.server to test using GPU? #1146

silvacarl2 · 2024-01-31T16:52:33Z

silvacarl2
Jan 31, 2024

what are the settings to test for using a GPU or more than one GPU for fastAPI? We are going to do some speed benchmarking.

these are the steps we did:

CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir

export MODEL=$HOME/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf

python3 -m llama_cpp.server --n_gpu_layers -1

however, it does not seem to be taking advantage of the GPU?

shaunck96 · 2024-05-30T00:50:36Z

did you try setting the main_gpu param ?

0 replies