Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement: CLUSTER REPLICATE NO ONE #1674

Open
wants to merge 1 commit into
base: unstable
Choose a base branch
from

Conversation

skolosov-snap
Copy link

@skolosov-snap skolosov-snap commented Feb 5, 2025

Currently, ValKey doesn't allow to detach replica attached to primary node. So, if you want to change cluster topology the only way to do it is to reset (CLUSTER RESET command) the node. However, this results into removing node from the cluster what affects clients. All clients will keep sending traffic to this node (with getting inaccurate responses) until they refresh their topology.

In this change we implement supporting of new argument for CLUSTER REPLICATE command: CLUSTER REPLICATE NO ONE. When calling this command the node will be converted from replica to empty primary node but still staying in the cluster. Thus, all traffic coming from the clients to this node can be redirected to correct node.

@skolosov-snap skolosov-snap force-pushed the skolosov/replicate-no-one branch from 91589d1 to ff96c0f Compare February 5, 2025 22:51
Copy link

codecov bot commented Feb 6, 2025

Codecov Report

Attention: Patch coverage is 11.76471% with 15 lines in your changes missing coverage. Please review.

Project coverage is 71.08%. Comparing base (591ae9a) to head (e4e8b24).
Report is 15 commits behind head on unstable.

Files with missing lines Patch % Lines
src/cluster_legacy.c 11.76% 15 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1674      +/-   ##
============================================
+ Coverage     71.04%   71.08%   +0.04%     
============================================
  Files           121      123       +2     
  Lines         65254    65548     +294     
============================================
+ Hits          46357    46596     +239     
- Misses        18897    18952      +55     
Files with missing lines Coverage Δ
src/commands.def 100.00% <ø> (ø)
src/cluster_legacy.c 85.73% <11.76%> (-0.41%) ⬇️

... and 26 files with indirect coverage changes

Copy link
Collaborator

@hpatro hpatro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster-replicate.json file should be updated and as part of the build commands.def will get updated. or if it was accidentally not staged, please add that.

Also, could you run the clang-format on your end to fix some of the formatting issue.

Comment on lines 7056 to 7048
if (server.primary != NULL) {
replicationUnsetPrimary();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other invocation of replicationUnsetPrimary() don't have this wrapped under the condition. Is it unnecessary to invoke it if server.primary is NULL ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also unsure why clusterPromoteSelfToPrimary was introduced, seems like it's the same behavior at this point but good to call this abstraction if there will be additional steps introduced in the future for cluster mode.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also unsure why clusterPromoteSelfToPrimary was introduced, seems like it's the same behavior at this point but good to call this abstraction if there will be additional steps introduced in the future for cluster mode.

Ok.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other invocation of replicationUnsetPrimary() don't have this wrapped under the condition. Is it unnecessary to invoke it if server.primary is NULL ?

Probably not needed. It looks like left over from KeyDB's implementation where primary needed to be passed to replicationUnsetMaster

@skolosov-snap skolosov-snap force-pushed the skolosov/replicate-no-one branch 2 times, most recently from fde1ab6 to 4abbae8 Compare February 7, 2025 23:46
@skolosov-snap
Copy link
Author

cluster-replicate.json file should be updated and as part of the build commands.def will get updated. or if it was accidentally not staged, please add that.

Also, could you run the clang-format on your end to fix some of the formatting issue.

Updated.

@zuiderkwast zuiderkwast added the major-decision-pending Major decision pending by TSC team label Feb 10, 2025
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature makes sense to me.

@valkey-io/core-team New arguments = major decision. Please approve or vote if you agree.

src/cluster_legacy.c Show resolved Hide resolved
src/commands/cluster-replicate.json Show resolved Hide resolved
@skolosov-snap skolosov-snap force-pushed the skolosov/replicate-no-one branch from 4abbae8 to 85238e6 Compare February 10, 2025 16:12
@zuiderkwast
Copy link
Contributor

The CI job "DCO" is failing. You need to use git commit -s. See the Details link next to the DCO job.

Why we need it? See here: https://github.com/valkey-io/valkey/blob/unstable/CONTRIBUTING.md#developer-certificate-of-origin thanks!

@skolosov-snap skolosov-snap force-pushed the skolosov/replicate-no-one branch from 85238e6 to 3789227 Compare February 10, 2025 17:24
@skolosov-snap skolosov-snap force-pushed the skolosov/replicate-no-one branch from 3789227 to e4e8b24 Compare February 10, 2025 17:36
@skolosov-snap
Copy link
Author

The CI job "DCO" is failing. You need to use git commit -s. See the Details link next to the DCO job.

Why we need it? See here: https://github.com/valkey-io/valkey/blob/unstable/CONTRIBUTING.md#developer-certificate-of-origin thanks!

Done

/* Lookup the specified node in our table. */
if (c->argc == 4) {
if (0 != strcasecmp(c->argv[2]->ptr, "NO") || 0 != strcasecmp(c->argv[3]->ptr, "ONE")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't do this 0 != xxx trick, let's keep the same style for the code

Comment on lines +7049 to +7050
/* Reset manual failover state. */
resetManualFailover();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it become a primary, do we need this reset? it has no way to enter the failover check logic. or at least we remove the comment since this line is super easy.

Suggested change
/* Reset manual failover state. */
resetManualFailover();
resetManualFailover();

Comment on lines +7047 to +7048
int empty_db_flags = server.repl_replica_lazy_flush ? EMPTYDB_ASYNC : EMPTYDB_NO_FLAGS;
emptyData(-1, empty_db_flags, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int empty_db_flags = server.repl_replica_lazy_flush ? EMPTYDB_ASYNC : EMPTYDB_NO_FLAGS;
emptyData(-1, empty_db_flags, NULL);
emptyData(-1, server.repl_replica_lazy_flush ? EMPTYDB_ASYNC : EMPTYDB_NO_FLAGS, NULL);

addReply(c, shared.ok);
return 1;
}
serverLog(LL_NOTICE, "Stop replication and turning myself into empty primary.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also use catClientInfoShortString to get the client info for a better audit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-pending Major decision pending by TSC team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants