GPU Acceleration

CAUTION: Support of GPU acceleration is preliminary. There are known issues.

Supported backends

Generally, all backends supported by GGML are available, with a focus on below backends.

Backend	Target devices
CUDA	Nvidia GPU
RPC	Any
Vulkan	GPU

Build

To build with Vulkan:

cmake -B build -DGGML_VULKAN=1
cmake --build build --config Release

To build with CUDA:

cmake -B build -DGGML_CUDA=1
cmake --build build --config Release

For more information, please checkout Build llama.cpp locally.

Usage

Use -ngl (--n_gpu_layers) to specify number of layers to be deployed to GPU. We name the staffs before the first layer as "Prolog", and staffs after the last layer as "Epilog". "Prolog" and "Epilog" are treated as special layers, and they can also be configured from -ngl by including prolog and epilog respectively. Suppose there is a model with 10 hidden layers:

-ngl 5: put the first 5 layers to GPU;
-ngl 100: put all layers to GPU;
-ngl 5,prolog: put the first 5 layers, and "Prolog" layer to GPU;
-ngl 100,prolog,epilog: put all layers, "Prolog" layer and "Epilog" layer to GPU.
-ngl all: equivalent to -ngl 99999,prolog,epilog.

The full format of -ngl is -ngl [id:]layer_specs[;id:layer_specs]... id is GPU device ID. If id is omitted, 0 is assumed. layer_spec can be a positive integer, prolog, epilog, a combination of these; or just all.

Use --show_devices to check all available devices.

Known issues

Custom operators (ggml::map_custom...);

If hidden layers of a model use custom operators, then GPU acceleration is unavailable.
Models with tie_word_embeddings = true;

Ensure Prolog and Epilog layers are on the same device.
Other issues;

If a model has 10 hidden layers and -ngl 10 not work, then try -ngl all, -ngl 10,epilog, or -ngl 9.

Tips

Having trouble with Python binding on Windows with CUDA?

Copy these DLL to the bindings folder:
- cublas64_12.dll
- cudart64_12.dll
- cublasLt64_12.dll

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu.md

gpu.md

GPU Acceleration

Supported backends

Build

Usage

Known issues

Tips

Files

gpu.md

Latest commit

History

gpu.md

File metadata and controls

GPU Acceleration

Supported backends

Build

Usage

Known issues

Tips