IVF-PQ: low-precision coarse search #715

achirkin · 2025-02-21T08:15:44Z

Enable low-precision (half / int8) element type for use in the cuBLAS GEMM performed during coarse search (select clusters to probe). This makes cuBLAS use tensor cores and thus speeds up the coarse search.

Also propagate kMaxQueries compile time constant to a runtime search parameter: this allows to improve GPU utilization in extremely large batch size case, such as using IVF-PQ for constructing a nearest-neighbor graph for the whole dataset.

IVF-PQ: low-precision coarse search

8c3b0aa

achirkin added feature request New feature or request non-breaking Introduces a non-breaking change labels Feb 21, 2025

achirkin self-assigned this Feb 21, 2025

achirkin requested review from a team as code owners February 21, 2025 08:15

github-actions bot added cpp CMake labels Feb 21, 2025

Merge branch 'branch-25.04' into fea-ivf-pq-low-precision-coarse-search

7e239a4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IVF-PQ: low-precision coarse search #715

IVF-PQ: low-precision coarse search #715

achirkin commented Feb 21, 2025

IVF-PQ: low-precision coarse search #715

Are you sure you want to change the base?

IVF-PQ: low-precision coarse search #715

Conversation

achirkin commented Feb 21, 2025