Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexIVFPQFastScan crashes with certain nlist values #4089

Open
alisafaya opened this issue Dec 14, 2024 · 4 comments · May be fixed by #4095
Open

IndexIVFPQFastScan crashes with certain nlist values #4089

alisafaya opened this issue Dec 14, 2024 · 4 comments · May be fixed by #4095
Assignees

Comments

@alisafaya
Copy link

alisafaya commented Dec 14, 2024

IndexIVFPQFastScan crashes with certain nlist values

The IndexIVFPQFastScan index in the Faiss similarity search library crashes when the nlist parameter (number of Voronoi cells) is not byte-aligned (e.g. 256, 65536). This occurs when calling either index.reconstruct(i) or index.reconstruct_batch(ids) after adding vectors.

Platform

  • OS: MacOS
  • Installed from: Anaconda
  • Running on: CPU
  • Interface: Python

Reproduction Instructions

import numpy as np
import faiss

faiss.omp_set_num_threads(16)

dim = 64  # Dimensionality of vectors 
train_vectors = np.random.random((100000, dim)).astype("float32")

def run_example(nlist: int):
    # Create an IVF index
    index = faiss.IndexIVFPQFastScan(faiss.IndexFlatL2(dim), dim, nlist, dim // 2, 4, faiss.METRIC_L2) 
    index.make_direct_map(True)
    index.train(train_vectors)
    index.add(train_vectors)
    
    print(index.reconstruct(24545))
    print(index.reconstruct_batch(np.arange(10)))

run_example(256) # This works fine
run_example(100) # Crashes with error below whenever `nlist` is not byte-aligned

Error Message

libc++abi: terminating due to uncaught exception of type faiss::FaissException: Error in faiss::idx_t faiss::Level1Quantizer::decode_listno(const uint8_t *) const at 
/Users/runner/miniconda3/conda-bld/faiss-pkg_1728491206806/work/faiss/IndexIVF.cpp:149: 
Error: 'list_no >= 0 && list_no < nlist' failed

Note: If you set by_residual=True in the IndexIVFPQFastScan constructor, the error only occurs for reconstruct_batch(), not reconstruct().

Potential Reasons

The issue seems to stem from the values in codes in IndexIVFPQFastScan::sa_decode(): here.

The potential reason is that these codes are unpacked incorrectly. Which makes the list_no value read from code invalid when nlist is not byte-aligned, causing the bounds check 'list_no >= 0 && list_no < nlist' to fail.

Additional note: The call to decode_listno is unnecessary when by_residual is false.

@alisafaya
Copy link
Author

Moreover, I looked into reconstruction using IndexIVFPQFastScan. It seems like the reconstruction is not correct.

import faiss
import numpy as np

n = 100
dim = 64
nlist = 256
train_vectors = np.random.random((100000, dim)).astype("float32")
ind = np.arange(n)

def run_example(IndexClass):
    print("##", IndexClass.__name__)

    # Create an IVF index
    index = IndexClass(faiss.IndexFlatL2(dim), dim, nlist, dim // 2, 4, faiss.METRIC_L2)
    index.make_direct_map(True)
    index.train(train_vectors)
    index.add(train_vectors)

    print(
        "Is reconstruction correct?",
        (index.reconstruct_batch(ind) == index.sa_decode(index.sa_encode(train_vectors[ind]))).all()
    )

run_example(faiss.IndexIVFPQ)
run_example(faiss.IndexIVFPQFastScan)

Output:

## IndexIVFPQ
Is reconstruction correct? True
## IndexIVFPQFastScan
Is reconstruction correct? False

This might be related to the issue.

@mdouze
Copy link
Contributor

mdouze commented Dec 16, 2024

This is probably because the quantizer gets deallocated, it is missing in

https://github.com/facebookresearch/faiss/blob/main/faiss/python/__init__.py#L175

for IndexIVFFastScan. Could you try

def run_example(IndexClass):
    # Create an IVF index
    quantizer = faiss.IndexFlatL2(dim)
    index = IndexClass(quantizer, dim, nlist, dim // 2, 4, faiss.METRIC_L2)
    index.make_direct_map(True)
    ....

@alisafaya
Copy link
Author

alisafaya commented Dec 16, 2024

Unfortunately, still the same error.

Edit: I believe this is more related to how the codes are extracted in the function: IndexIVFFastScan::reconstruct_from_offset

@alisafaya
Copy link
Author

alisafaya commented Dec 16, 2024

I tried to dig into it by reading the codes, and running a few experiments. I believe the main bug here is the following:

void IndexIVFFastScan::reconstruct_from_offset(
        int64_t list_no,
        int64_t offset,
        float* recons) const {
    // unpack codes
    InvertedLists::ScopedCodes list_codes(invlists, list_no);
    std::vector<uint8_t> code(code_size, 0);
    BitstringWriter bsw(code.data(), code_size);

    // Possible fix: we should start by writing the encoded `list_no` to the `code` using `bsw`

    for (size_t m = 0; m < M; m++) {
        uint8_t c =
                pq4_get_packed_element(list_codes.get(), bbs, M2, offset, m);
        bsw.write(c, nbits);
    }
    sa_decode(1, code.data(), recons);

    // add centroid to it
    if (by_residual) {
        std::vector<float> centroid(d);
        quantizer->reconstruct(list_no, centroid.data());
        for (int i = 0; i < d; ++i) {
            recons[i] += centroid[i];
        }
    }
}

We have two different ways to fix this.

The first:

  • write the list_no to the code before sending it to sa_decode function.
  • remove the residual computation since it is repeated in both functions.
void IndexIVFFastScan::reconstruct_from_offset(
        int64_t list_no,
        int64_t offset,
        float* recons) const {
    InvertedLists::ScopedCodes list_codes(invlists, list_no);
    std::vector<uint8_t> code(code_size, 0);

    // Encode the list_no into the `code` for compatibility with the `sa_decode`
    encode_listno(list_no, code.data());
    // Shift the `code` to the right by `coarse_code_size()` bytes
    BitstringWriter bsw(code.data() + coarse_code_size(), code_size);

    for (size_t m = 0; m < M; m++) {
        uint8_t c =
                pq4_get_packed_element(list_codes.get(), bbs, M2, offset, m);
        bsw.write(c, nbits);
    }

    sa_decode(1, code.data(), recons);
}

The second solution is to implement the decoding in reconstruct_from_offset function instead of sa_decode. However here we do not have access to the pq.

I am not sure what is better for faiss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants