-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Add Sparse Float Vector support #29419
Comments
This comment is to track the implementation progress of sparse vector support in Milvus.
Pending:
|
Knowhere tracking issue: zilliztech/knowhere#193 |
Exciting about the new feature! |
…30400) issue: #29419 this PR solely adds proto definition. sparse float vector support will be in subsequent PRs. Signed-off-by: Buqian Zheng <[email protected]>
This commit adds sparse float vector support to segcore with the following: 1. data type enum declarations 2. Adds corresponding data structures for handling sparse float vectors in various scenarios, including: * FieldData as a bridge between the binlog and the in memory data structures * mmap::Column as the in memory representation of a sparse float vector column of a sealed segment; * ConcurrentVector as the in memory representation of a sparse float vector of a growing segment which supports inserts. 3. Adds logic in payload reader/writer to serialize/deserialize from/to binlog 4. Adds the ability to allow the index node to build sparse float vector index 5. Adds the ability to allow the query node to build growing index for growing segment and temp index for sealed segment without index built This commit also includes some code cleanness, comment improvement, and some unit tests for sparse vector. #29419 Signed-off-by: Buqian Zheng <[email protected]>
…nd get raw vector by id (#30629) This PR adds the ability to search/get sparse float vectors in segcore, and added unit tests by modifying lots of existing tests into parameterized ones. #29419 Signed-off-by: Buqian Zheng <[email protected]>
…nents (#30630) add sparse float vector support to different milvus components, including proxy, data node to receive and write sparse float vectors to binlog, query node to handle search requests, index node to build index for sparse float column, etc. #29419 --------- Signed-off-by: Buqian Zheng <[email protected]>
See also milvus-io/milvus#29419 Signed-off-by: Congqi Xia <[email protected]>
See also milvus-io/milvus#29419 --------- Signed-off-by: Congqi Xia <[email protected]>
For SDK owners: We also need to support sparse float vector in C#/NodeJs/Java/Go SDK.
The accepted sparse input format:
When sending requests to milvus(both insert and search), use one proto Note that support for those SDKs is not a must-have for the formal milvus 2.4 release. We'll be adding more features for sparse and announcing GA in the next major release(2.5 or 2.6). I'll keep updating the issues as necessary. Thanks a lot for the efforts! |
issue: #29419 Signed-off-by: Buqian Zheng <[email protected]>
issue: #29419 also re-enabled an e2e test using restful api, which is previously disabled due to #32214. In restful api, the accepted json formats of sparse float vector are: * `{"indices": [1, 100, 1000], "values": [0.1, 0.2, 0.3]}` * {"1": 0.1, "100": 0.2, "1000": 0.3} for accepted indice and value range, see https://milvus.io/docs/sparse_vector.md#FAQ Signed-off-by: Buqian Zheng <[email protected]>
issue: milvus-io/milvus#29419 as range search supported has been added to sparse index Signed-off-by: Buqian Zheng <[email protected]>
issue: #29419 pr: #33231 Signed-off-by: Buqian Zheng <[email protected]>
issue: #29419 pr: #33209 codecov will fail due to newly added ut in test_sealed.cpp skipped due to #33210 Signed-off-by: Buqian Zheng <[email protected]>
issue: #29419 pr: #33656 Signed-off-by: Buqian Zheng <[email protected]> Co-authored-by: Buqian Zheng <[email protected]>
issue: #29419 pr: #33713 Signed-off-by: Buqian Zheng <[email protected]>
issue: milvus-io#29419 pr: milvus-io#33713 Signed-off-by: Buqian Zheng <[email protected]>
issue: #29419 * sparse float vector to support raw data mmap For get vector from chunk cache, I added a unit test but marking it as skipped due to a known issue. I have tested it locally. Signed-off-by: Buqian Zheng <[email protected]>
issue: #29419 Signed-off-by: Buqian Zheng <[email protected]>
issue: milvus-io#29419 Signed-off-by: Buqian Zheng <[email protected]>
issue: milvus-io#29419 Signed-off-by: Buqian Zheng <[email protected]>
issue: milvus-io#29419 also re-enabled an e2e test using restful api, which is previously disabled due to milvus-io#32214. In restful api, the accepted json formats of sparse float vector are: * `{"indices": [1, 100, 1000], "values": [0.1, 0.2, 0.3]}` * {"1": 0.1, "100": 0.2, "1000": 0.3} for accepted indice and value range, see https://milvus.io/docs/sparse_vector.md#FAQ Signed-off-by: Buqian Zheng <[email protected]>
issue: milvus-io#29419 * sparse float vector to support raw data mmap For get vector from chunk cache, I added a unit test but marking it as skipped due to a known issue. I have tested it locally. Signed-off-by: Buqian Zheng <[email protected]>
issue: milvus-io#29419 Signed-off-by: Buqian Zheng <[email protected]>
issue: #29419 Signed-off-by: Buqian Zheng <[email protected]>
issue: #29419 * If a sparse vector with 0 non-zero value is inserted, no ANN search on this sparse vector field will return it as a result. User may retrieve this row via scalar query or ANN search on another vector field though. * If the user uses an empty sparse vector as the query vector for a ANN search, no neighbor will be returned. Signed-off-by: Buqian Zheng <[email protected]>
issue: #29419 /kind branch-feature Signed-off-by: xianliang.li <[email protected]>
Is there an existing issue for this?
Is your feature request related to a problem? Please describe.
Now milvus supports only dense vectors and lack the ability to store/index/search sparse vectors(vectors with up to million dimensions while only a handful of them are non zero). We wish to add sparse float vector support to Milvus so users can insert, index and search them with ease.
Describe the solution you'd like.
No response
Describe an alternate solution.
No response
Anything else? (Additional Context)
No response
The text was updated successfully, but these errors were encountered: