Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Dataclasses and Builder for GPU Index Build Config #16

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Rajrahane
Copy link
Member

Description

Implements classes for the GPU Index Config passed to the build gpu index function
Implements a Builder and Director class for a user to initialize the config with partial params.

Issues Resolved

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@Rajrahane
Copy link
Member Author

Still a WIP
Here's how you can initialize the config-

def create_index_config(**kwargs) -> GPUIndexBuildConfig:
builder = IndexConfigBuilder()
director = IndexConfigDirector(builder)
return director.construct_config(kwargs)

print(create_index_config(metric='cosinesimil', gpu_config={'graph_build_algo': 'NN_DECENT', 'ivf_pq_build_params': {'n_lists': 1040}}))

@Rajrahane Rajrahane marked this pull request as ready for review March 3, 2025 20:21
Signed-off-by: Rajvaibhav Rahane <[email protected]>
@@ -0,0 +1,22 @@
# Copyright OpenSearch Contributors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty redundant to core/common/models/__init__.py. Why do we need both?

def __init__(self):
self._hnsw_config: Optional[IndexHNSWCagraConfig] = None
self._gpu_config: Optional[GPUIndexCagraConfig] = None
self._metric: SpaceType = SpaceType("l2") # default metric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set this class variable to a default just to keep it non null

the build method passes this variable as a param to the data class constructor.
https://github.com/opensearch-project/remote-vector-index-builder/pull/16/files#diff-c4202e71643e2e39cb6a8505a00019cbe2abe7fe43cca55e926a472d310720bf


# The dimensionality of the vector after compression by PQ. When zero, an
# optimal value is selected using a heuristic.
# pq_bits` must be a multiple of 8.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like theres some validation we can do for some of these values. Can we add a validator function to check if pq_bits is a multiple of 8?

self._gpu_config: Optional[GPUIndexCagraConfig] = None
self._metric: SpaceType = SpaceType("l2") # default metric

def set_hnsw_config(self, params: Dict[str, Any]) -> "IndexConfigBuilder":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to be a string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't recognize the itself as a datatype if we remove the quotes. Just the way it works, hence the quotes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants