Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for matryoshka indexing #131

Open
npip99 opened this issue Sep 5, 2024 · 2 comments
Open

Support for matryoshka indexing #131

npip99 opened this issue Sep 5, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@npip99
Copy link

npip99 commented Sep 5, 2024

CREATE INDEX ix_chunk_embedding
ON chunk USING diskann (embedding) WITH (num_dimensions=1999);
NOTICE:  Starting index build. num_neighbors=-1 search_list_size=100, max_alpha=1.2, storage_layout=SbqCompression
ERROR:  assertion failed: dimensions > 0 && dimensions < 2000

The error above is a bit of a shame.

If my vector is a Vector(3072), it would be nice to support matryoshka by allowing the dimension of the index to be < 2000, even if the source vector has a larger dimension. I believe the above SQL code should execute successfully, since I'm only indexing a subvector of the original vector.

For now, I have a generated column and calculate it based on my desired subvector, but this takes physical space on disk, when ideally it should be computed on the fly. And, it means that I have to rerank manually by the full vector, rather than the index automatically handling it (Not a big deal).

If it could support e.g. this notation, then the num_dimensions attribute wouldn't be necessary anymore, and solve both problems (But I think supporting that notation might be overkill, I'm not sure).

@cevian
Copy link
Collaborator

cevian commented Sep 23, 2024

Oh yeah this seems to be something we overlooked

@cevian cevian added the bug Something isn't working label Sep 23, 2024
@theodufort
Copy link

Getting the same error when running create index title_embedding_idx on books_parted using diskann (title_embedding) WITH (num_dimensions=768);
Error: SQL Error [XX000]: ERROR: assertion failed: dimensions > 0 && dimensions < 2000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants