Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: SlotMap refactor - Added NodesMap, Update the slot map upon MOVED errors #2682

Merged
merged 3 commits into from
Nov 14, 2024

Conversation

barshaul
Copy link
Collaborator

@barshaul barshaul commented Nov 13, 2024

Description

This PR was previously reviewed in our redis-rs fork at amazon-contributing/redis-rs#190 but was on hold due to lock issues. The recently merged PR #2643 addresses the encountered lock issues, making this PR now ready for merging.

This pull request introduces two significant updates:

  1. SlotMap Refactor: The SlotMap structure has been updated with the addition of a NodesMap, which allows shard addresses to be shared between shard nodes and slot map values. This refactor optimizes how shard information is managed. For a detailed explanation, refer to PR #185.

    The diagram on the left illustrates the current SlotMap design, while the diagram on the right shows the newly implemented structure:
    image

  2. MOVED Error Handling: Logic has been added to update the slot map in response to MOVED errors. Previously, this was handled only during refresh_slots operations. With this change, the slot map is updated immediately upon encountering a MOVED error. More information can be found in PR #186.

Issue link

This Pull Request is linked to issue (URL): closes #2123

Checklist

Before submitting the PR make sure the following are checked:

  • [ X] This Pull Request is related to one issue.
  • [ X] Commit message has a detailed description of what changed and why.
  • [X ] Tests are added or updated.
  • [X ] CHANGELOG.md and documentation files are updated.
  • [ X] Destination branch is correct - main or release
  • [X ] Commits will be squashed upon merging.

@barshaul barshaul force-pushed the update_slots_on_moved branch from 84c4df4 to c6bf0a2 Compare November 13, 2024 14:52
@barshaul barshaul changed the title WIP: Update slots upon MOVED Core: SlotMap refactor - Added NodesMap, Update the slot map upon MOVED errors Nov 13, 2024
@barshaul barshaul force-pushed the update_slots_on_moved branch from c6bf0a2 to ba16669 Compare November 13, 2024 15:13
@barshaul barshaul marked this pull request as ready for review November 13, 2024 15:28
@barshaul barshaul requested a review from a team as a code owner November 13, 2024 15:28
@Yury-Fridlyand Yury-Fridlyand added the Rust core redis-rs/glide-core matter label Nov 13, 2024
@Yury-Fridlyand
Copy link
Collaborator

How can we test this with a real cluster and client? I think we can do E2E tests with CLUSTER SETSLOT command.

@barshaul
Copy link
Collaborator Author

#2643

How can we test this with a real cluster and client? I think we can do E2E tests with CLUSTER SETSLOT command.

ATM we don't have a way to test the inner behavior of the cluster client in redis-rs, so we need to use mocks to make sure the right number of errors were caught/commands were executed internally.
Later on we can use the Telemetry lib Eran added to add these counters and use them for tests.
E2E tests are now being done with polyglot - where we test glide with EC clusters while performing different workflows such as scaling in/out/up/down, failover, upgrade and so on.
I ran these changes twice in polyglot to make sure our downtime remain the same/improves during topology changes.

Signed-off-by: barshaul <[email protected]>
Signed-off-by: barshaul <[email protected]>
@barshaul barshaul force-pushed the update_slots_on_moved branch from 5b40be7 to 8dff69c Compare November 14, 2024 12:36
@barshaul barshaul requested a review from eifrah-aws November 14, 2024 12:36
let mut conn = connections.remove(addr).unwrap();
let addr = addr.to_string();
if connections.contains_key(&addr) {
let mut conn = connections.remove(&addr).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better handling here: no unwrap()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the non async cluster implementation, which we don't use. We should eventually remove this code from our repo.

Signed-off-by: barshaul <[email protected]>
@barshaul barshaul force-pushed the update_slots_on_moved branch from 8dff69c to 68e7fa4 Compare November 14, 2024 12:47
@barshaul barshaul merged commit 7d72b87 into valkey-io:release-1.2 Nov 14, 2024
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Rust core redis-rs/glide-core matter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants