Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design for transaction support for Vector Search(VSS) commands #2912

Open
1 task
prateek-kumar-improving opened this issue Jan 2, 2025 · 0 comments
Open
1 task
Assignees
Milestone

Comments

@prateek-kumar-improving
Copy link
Collaborator

prateek-kumar-improving commented Jan 2, 2025

Problem Statement:

The purpose of this document is to suggest a high level design to implement transaction support for all vector search commands in all languages.

Prerequisites:

  1. Design for implementing Transactions in GO Lang. #2791: This document gives an overview of transaction design for GO lang. You can also find the high level design for most of the GO and Python code in the prerequisites section of this document. This will give you enough context to understand how everything is structured in our code.
  2. Understanding how JSON module transactions are implemented. Explained in brief in the next 3 sections of this document.

For implementing transactions for any module, the basic idea is to reuse the transaction class for all languages and pass it as an input to the already defined exec() function for the respective language.

Python design for JSON module transaction support:

The following diagram gives an overview of how transactions are implemented for JSON commands in the Python language:

vss_transaction_support drawio

  1. The transaction object used in normal commands is also used for JSON commands as an input to the command.
  2. Inside the definition of each JSON transaction command, the customCommand function(defined inside the transaction object) is called to append the custom command for the respective JSON transaction command to the commands list.
  3. The JSON command then returns this transaction object that contains the updated commands list.
  4. We can then pass this transaction object to the exec() function defined in the clients to run the entire transaction.

Code reference: #2684

Node JS design for JSON module transaction support:

  1. In case of Node JS, we use the idea as Python.
  2. We pass the transaction object as an input to the JSON transaction command.
  3. Inside the JSON command definition we return the updated transaction object containing the updated commands list.

The difference between the design of Python and Node:

  1. Python JSON transaction commands are at the top level of the file, not inside any class.
  2. Node JSON transaction commands are inside a separate GlideMultiJson class.
    Note: Difference could be because of the way the modules are imported in the 2 languages. So, it is more of a language specific design decision.

Code reference: #2862

node_json_transaction_design drawio

JAVA JSON module transaction support design:

Code reference:
#2691

GO JSON module transaction support design:

  1. Not implemented.

Note: For JSON module implementation for GO lang, use the file and class structure that is the best for Go lang. Don't copy directly from Node or Python. Each language has a different way of importing and therefore, we might need to design according to what is best for the Go lang user/customer.

Design for adding transaction support for Vector search (VSS) commands in all languages:

VSS_module_transaction_design drawio (1)

Implementation wise the design is similar to JSON module. There are few structure improvement suggestions mentioned in the above diagram for NodeJS and Python.

Examples:

Using the examples on vector search FT.CREATE documentation(https://docs.aws.amazon.com/memorydb/latest/devguide/vector-search-commands-ft.create.html) to test the commands in MULTI/EXEC transaction block from cli:

MULTI
OK

QUEUED
FT.CREATE hash_idx1 ON HASH PREFIX 1 hash: SCHEMA vec AS VEC VECTOR HNSW 6 DIM 2 TYPE FLOAT32
QUEUED
DISTANCE_METRIC L2
QUEUED
FT.CREATE json_idx1 ON JSON PREFIX 1 json: SCHEMA $.vec AS VEC VECTOR HNSW 6 DIM 6 TYPE FLOAT32
QUEUED
DISTANCE_METRIC L2
QUEUED
HSET hash:0 vec "\x00\x00\x00\x00\x00\x00\x00\x00"
QUEUED
HSET hash:1 vec "\x00\x00\x00\x00\x00\x00\x80\xbf"
QUEUED
JSON.SET json:0 . '{"vec":[1,2,3,4,5,6]}'
QUEUED
JSON.SET json:1 . '{"vec":[10,20,30,40,50,60]}'
QUEUED
JSON.SET json:2 . '{"vec":[1.1,1.2,1.3,1.4,1.5,1.6]}'
QUEUED
FT.DROPINDEX json_idx1
QUEUED
FT.CREATE json_idx1 ON JSON PREFIX 1 json: SCHEMA $.vec AS VEC VECTOR FLAT 6 DIM 6 TYPE FLOAT32
QUEUED
DISTANCE_METRIC L2
QUEUED
FT.SEARCH json_idx1 "*=>[KNN 100 @vec $query_vec]" PARAMS 2 query_vec "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" DIALECT 2
QUEUED

EXEC

  1. (error) Index already exists
  2. (error) Index already exists
  3. (integer) 0
  4. (integer) 0
  5. OK
  6. OK
  7. OK
  8. OK
  9. OK

[Important] Possible issues with VSS transactions.

Special cases:

  1. It seems that FT commands are not fully transactional. Need more documentation for atomicity for standard commands vs Vector search commands for Valkey side.
  2. Index creation command might be running in the background while the rest of the transaction is being executed. This is one of the special case that must be accounted for while creating transactions using such commands. For example: The index might not be immediately available for search operations and can lead to inconsistency in case of such commands in the transaction, like calling FT.SEARCH immediately after FT.CREATE in a transaction.
  3. Similarly, like index creation, removal of index might cause issues as mentioned in point 2.

Need more documentation from Valkey side to validate/confirm these and other similar use cases.

Example for this is as follows:

flushall
OK

multi
OK

JSON.SET json:0 . '{"vec":"1"}'
QUEUED
JSON.SET json:1 . '{"vec":"1"}'
QUEUED
JSON.SET json:2 . '{"vec":"1"}'
QUEUED
JSON.SET json:3 . '{"vec":"1"}'
QUEUED
JSON.SET json:4 . '{"vec":"1"}'
QUEUED
JSON.SET json:5 . '{"vec":"1"}'
QUEUED
JSON.SET json:6 . '{"vec":"1"}'
QUEUED
JSON.SET json:7 . '{"vec":"1"}'
QUEUED
FT.CREATE json_idx1 ON JSON PREFIX 1 json: SCHEMA $.vec AS VEC TEXT
QUEUED
FT.SEARCH json_idx1 "*"
QUEUED

exec

  1. OK
  2. OK
  3. OK
  4. OK
  5. OK
  6. OK
  7. OK
  8. OK
  9. OK
    1. (integer) 0

You can clearly see that json_idx1 is not updated for searching and therefore below FT.SEARCH gives inconsistent search result compared to the FT.SEARCH in the transaction. Sometimes the server can also throw an error saying that the index is under construction.

NOTE: Need documentation and expected behaviour from the server side for such cases.

FT.SEARCH json_idx1 "*"

  1. (integer) 8
  2. "json:4"
    1. "$"
    2. "{"vec":"1"}"
  3. "json:7"
    1. "$"
    2. "{"vec":"1"}"
  4. "json:5"
    1. "$"
    2. "{"vec":"1"}"
  5. "json:2"
    1. "$"
    2. "{"vec":"1"}"
  6. "json:6"
    1. "$"
    2. "{"vec":"1"}"
  7. "json:3"
    1. "$"
    2. "{"vec":"1"}"
  8. "json:1"
    1. "$"
    2. "{"vec":"1"}"
  9. "json:0"
    1. "$"
    2. "{"vec":"1"}"

How to handle such cases:

  1. Avoid using index creation/removal in the same MULTI block as other commands to avoid an inconsistency.
  2. Manual checks. For example, check for existence of indexes before running a set of transactions on the index.
    Note: These are just implementation suggestions for the user/customer, but documentation behaviour from server side for such cases will definitely help.

Known issues:

  1. It is only possible to create an index in a MULTI/EXEC transaction if the database is not clustered[https://redis.io/kb/doc/16ekjs4rja/is-it-possible-to-create-an-index-in-a-multi-exec-transaction]

Conclusion

  1. Transaction support for Vector search module can be easily provided from Glide client.
  2. More details about the valkey server documentation for Vector search commands inside a MULTI/EXEC transaction block(Not related to glide client implementation) are needed: It would be great to also get some clarity for the user on the behaviour from server side for some edge cases so that the user/customer can know errors/functional errors returned from the server (to)-> client (to)-> user. Note: These details in point 2 won't have any impact on the Glide client implementation of Vector search commands transaction support using MULTI/EXEC as the client simply returns the behaviour of the Valkey server to the customer/user. But the server behaviour is important to the client customer. So, therefore, Valkey server documentation/clarity around this will be helpful.

Checklist

  • Task item 1

Additional Notes

FAQ:

  1. Consistency in code structure and naming of classes/files?
  2. Add a separate class called JsonTransaction, instead of a standalone class for modules in NodeJS? Note: We have a separate class for JSON transaction commands in Python.

No response

@prateek-kumar-improving prateek-kumar-improving changed the title Design for transaction support for Vector search commands Design for transaction support for Vector Search(VSS) commands Jan 2, 2025
@prateek-kumar-improving prateek-kumar-improving changed the title Design for transaction support for Vector Search(VSS) commands Design for transaction support for Vector Search(VSS) commands Jan 2, 2025
@asafpamzn asafpamzn added this to the 1.3 milestone Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants