settings for python3 -m llama_cpp.server to test using GPU? #1146
Unanswered
silvacarl2
asked this question in
Q&A
Replies: 1 comment
-
did you try setting the main_gpu param ? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
what are the settings to test for using a GPU or more than one GPU for fastAPI? We are going to do some speed benchmarking.
these are the steps we did:
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir
export MODEL=$HOME/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
python3 -m llama_cpp.server --n_gpu_layers -1
however, it does not seem to be taking advantage of the GPU?
Beta Was this translation helpful? Give feedback.
All reactions