-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Threads and asynchronous calls
About threading in Faiss.
Faiss CPU and GPU indices are not thread-safe with respect to multiple user calling threads. Any multithreaded use of a single Faiss index needs to implement mutual exclusion.
Different CPU Faiss indices with different data can be safely used from different threads, but this is not a good idea (see below). Multi-GPU Faiss does internally run different GPU indices from different threads.
StandardGpuResources
for GPU Faiss is not thread-safe. A single StandardGpuResources
object must be created for each thread that is actively running a GPU Faiss index. Multiple GPU indices managed by a single CPU thread and share the same StandardGpuResources
(and indeed should, as they can use the same temporary regions of GPU memory). A single GpuResources
object can support multiple devices, but only from a single calling CPU thread.
Faiss itself is internally threaded in a couple of different ways. For CPU Faiss, the three basic operations on indexes (training, adding, searching) are internally multithreaded. Threading is done through OpenMP, and a multithreaded BLAS implementation, typically MKL. Faiss does not set the number of threads. The caller can adjust this number via environment variable OMP_NUM_THREADS
or at any time by calling omp_set_num_threads (10)
. This function is available in Python through faiss.
For the add
and search
functions, threading is over the vectors. This means that querying or adding a single vector is not or only partially multi-threaded.
GPU Faiss for a single GPU is not internally multi-CPU threaded.
It can be useful to perform an Index
search operation in parallel with some other computation including:
-
single thread computations
-
waiting for I/O
-
GPU computations
This way, the program runs in parallel. For Faiss CPU, it is not useful to parallelize with other multithreaded computations (eg. other searches), because this will spawn too many threads and degrade overall performance; multiple incoming searches from potentially different user threads should be enqueued and aggregated/batched by the user before handing to Faiss.
It is of course possible and useful to run operations in parallel on multiple GPUs, where each CPU thread is dedicated to kernel launches on a different GPU, this is how IndexProxy
and IndexShards
are implemented.
How to spawn the search thread:
-
in C++: with eg.
pthread_create
+pthread_join
-
in Python: with eg.
thread.start_new_thread
+ a lock, or withmultiprocessing.dummy.Pool
. The search, add and train functions release the Global Interpreter Lock.
Faiss supports multiprocessing to some extent via the IndexIVFPQCompact
object. An IndexIVFPQCompact
object is constructed from an IndexIVFPQ
object, that is then frozen. After it is stored it can be memory mapped so that there is no penalty accessing it from distinct processes.
Faiss building blocks: clustering, PCA, quantization
Index IO, cloning and hyper parameter tuning
Threads and asynchronous calls
Inverted list objects and scanners
Indexes that do not fit in RAM
Brute force search without an index
Fast accumulation of PQ and AQ codes (FastScan)
Setting search parameters for one query
Binary hashing index benchmark