Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about memory_optimized storage layout #122

Open
agandra30 opened this issue Aug 21, 2024 · 3 comments
Open

Question about memory_optimized storage layout #122

agandra30 opened this issue Aug 21, 2024 · 3 comments

Comments

@agandra30
Copy link

agandra30 commented Aug 21, 2024

Need some inputs , I encountered an issue even after i have set the. storage_layout='plain' , my understanding is that when set to plain it should not to use SBQ or to set the bit_per_dimension=2

My dataset set is cohere and dimension is 768 Dim
python 3.11
postgres 16
Name | Version | Schema | Description
-------------+---------+------------+----------------------------------------------------------------------------------------------
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
vector | 0.7.4 | public | vector data type and ivfflat and hnsw access methods
vectors | 0.3.0 | vectors | vectors: Vector database plugin for Postgres, written in Rust, specifically designed for LLM
vectorscale | 0.3.0 | public | pgvectorscale: Advanced indexing for vector data

psycopg.errors.InternalError_: SBQ with more than 1 bit per dimension is only supported with the memory_optimized storage layout.

is it required to set the bit_per_dimension and also use only storage_layout='memory_optmized'.

Thank you in advance

@cevian
Copy link
Collaborator

cevian commented Aug 22, 2024

@agandra30 this is indeed a bug. You can get around it by setting num_bits_per_dimension=1 explicitly when using storage_layout='plain'. I'll submit a PR to fix this soon.

@agandra30
Copy link
Author

agandra30 commented Aug 22, 2024

@cevian , thanks for the addressing the problem

My observations are even after setting it to num_bits_dimension=1 there is no much of a progress , the index creation just get stucks and no progress for hours

I even tried with storage_layout='memory_optimized', even it fails to create an index and hangs in there for really long time.

Not sure if there is any optimisations needs to be set the DB side ? confused if this could hinder, if used for production usecases.

pgrustscale=# SELECT COUNT(*) FROM pg_vectorscale_collection;
  count  
---------
 1000000
(1 row)

pgrustscale=# CREATE INDEX IF NOT EXISTS  "pgvectorscale_index"  ON public. "pg_vectorscale_collection"  
            USING  "diskann"  (embedding  "vector_cosine_ops" )
             WITH ( "storage_layout" = "memory_optimized", "num_neighbors" = "50", "search_list_size" = "100", "max_alpha" = "1.2", "num_bits_per_dimension" = "1"   );
NOTICE:  Starting index build. num_neighbors=50 search_list_size=100, max_alpha=1.2, storage_layout=SbqCompression

@cevian
Copy link
Collaborator

cevian commented Aug 22, 2024

A million vectors can take a while to index.

If you do SET client_min_messages = DEBUG1; before the create index statement you should see progress information.

Also I would replace USING "diskann" (embedding "vector_cosine_ops" ) with USING "diskann" (embedding) I never use the ops in the statement and I don't know if it messes anything up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants