Threads and asynchronous calls

About threading in Faiss.

Internal threading

The three basic operations on indexes (training, adding, searching) are multithreaded. Threading is done through OpenMP, and a multithreaded BLAS implementation, typically MKL. Faiss does not set the number of threads. The caller can adjust this number via environment variable OMP_NUM_THREADS or at any time by calling omp_set_num_threads (10). This function is available in Python through faiss.

For the add and search functions, threading is over the vectors. This means that querying or adding a single vector is not or only partially multi-threaded.

Asynchronous search

It can be useful to perform an Index search operation in parallel with some other computation including:

single thread computations
waiting for I/O
GPU computations

This way, the program run in parallel. It is not useful to parallelize with other multithreaded computations (eg. other searches), because this will spawn too many threads and degrade overall performance. It is of course possible and useful to run operations in parallel on multiple GPUs.

How to spawn the search thread:

in C++: with eg. pthread_create + pthread_join
in Python: with eg. thread.start_new_thread + a lock, or with multiprocessing.dummy.Pool. The search, add and train functions release the Global Interpreter Lock.

Multiprocessing

Faiss supports multiprocessing to some extent via the IndexIVFPQCompact object. An IndexIVFPQCompact object is constructed from an IndexIVFPQ object, that is then frozen. After it is stored it can be memory mapped so that there is no penalty accessing it from distinct processes.