Why llamafile is slower with GPU than CPU (on Windows) #629

BradHutchings · 2024-11-14T19:46:34Z

BradHutchings
Nov 14, 2024

From my search on the subject, someone is going to need this answer. 🤣

Background

When you run a llamafile.exe that does not have GPU support linked in statically for Windows, the executable will try to find a dynamic library in the user's $env:USERPROFILE\.llamafile directory. If it finds one, it will use it. If it doesn't find one, it can build it if you have Visual Studio (cl.exe) and CUDA installed. Your .args need to contain a -ngl parameter, or you need to specify on the command line of the tool, for the executable to go looking for a GPU.