Implement: CLUSTER REPLICATE NO ONE #1674

skolosov-snap · 2025-02-05T22:43:54Z

Currently, ValKey doesn't allow to detach replica attached to primary node. So, if you want to change cluster topology the only way to do it is to reset (CLUSTER RESET command) the node. However, this results into removing node from the cluster what affects clients. All clients will keep sending traffic to this node (with getting inaccurate responses) until they refresh their topology.

In this change we implement supporting of new argument for CLUSTER REPLICATE command: CLUSTER REPLICATE NO ONE. When calling this command the node will be converted from replica to empty primary node but still staying in the cluster. Thus, all traffic coming from the clients to this node can be redirected to correct node.

codecov · 2025-02-06T01:17:26Z

Codecov Report

Attention: Patch coverage is 11.76471% with 15 lines in your changes missing coverage. Please review.

Project coverage is 71.08%. Comparing base (591ae9a) to head (e4e8b24).
Report is 15 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/cluster_legacy.c	11.76%	15 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1674      +/-   ##
============================================
+ Coverage     71.04%   71.08%   +0.04%     
============================================
  Files           121      123       +2     
  Lines         65254    65548     +294     
============================================
+ Hits          46357    46596     +239     
- Misses        18897    18952      +55

Files with missing lines	Coverage Δ
src/commands.def	`100.00% <ø> (ø)`
src/cluster_legacy.c	`85.73% <11.76%> (-0.41%)`	⬇️

... and 26 files with indirect coverage changes

src/cluster_legacy.c

hpatro

cluster-replicate.json file should be updated and as part of the build commands.def will get updated. or if it was accidentally not staged, please add that.

Also, could you run the clang-format on your end to fix some of the formatting issue.

hpatro · 2025-02-07T00:02:57Z

src/cluster_legacy.c

+            if (server.primary != NULL) {
+                replicationUnsetPrimary();
+            }


All other invocation of replicationUnsetPrimary() don't have this wrapped under the condition. Is it unnecessary to invoke it if server.primary is NULL ?

Also unsure why clusterPromoteSelfToPrimary was introduced, seems like it's the same behavior at this point but good to call this abstraction if there will be additional steps introduced in the future for cluster mode.

Also unsure why clusterPromoteSelfToPrimary was introduced, seems like it's the same behavior at this point but good to call this abstraction if there will be additional steps introduced in the future for cluster mode.

Ok.

All other invocation of replicationUnsetPrimary() don't have this wrapped under the condition. Is it unnecessary to invoke it if server.primary is NULL ?

Probably not needed. It looks like left over from KeyDB's implementation where primary needed to be passed to replicationUnsetMaster

skolosov-snap · 2025-02-07T23:47:20Z

cluster-replicate.json file should be updated and as part of the build commands.def will get updated. or if it was accidentally not staged, please add that.

Also, could you run the clang-format on your end to fix some of the formatting issue.

Updated.

zuiderkwast

This feature makes sense to me.

@valkey-io/core-team New arguments = major decision. Please approve or vote if you agree.

src/cluster_legacy.c

src/commands/cluster-replicate.json

zuiderkwast · 2025-02-10T17:12:17Z

The CI job "DCO" is failing. You need to use git commit -s. See the Details link next to the DCO job.

Why we need it? See here: https://github.com/valkey-io/valkey/blob/unstable/CONTRIBUTING.md#developer-certificate-of-origin thanks!

src/commands/cluster-replicate.json

Signed-off-by: Sergey Kolosov <[email protected]>

skolosov-snap · 2025-02-10T17:37:35Z

The CI job "DCO" is failing. You need to use git commit -s. See the Details link next to the DCO job.

Why we need it? See here: https://github.com/valkey-io/valkey/blob/unstable/CONTRIBUTING.md#developer-certificate-of-origin thanks!

Done

enjoy-binbin · 2025-02-11T04:25:47Z

src/cluster_legacy.c

        /* Lookup the specified node in our table. */
+        if (c->argc == 4) {
+            if (0 != strcasecmp(c->argv[2]->ptr, "NO") || 0 != strcasecmp(c->argv[3]->ptr, "ONE")) {


we don't do this 0 != xxx trick, let's keep the same style for the code

enjoy-binbin · 2025-02-11T04:30:10Z

src/cluster_legacy.c

+            /* Reset manual failover state. */
+            resetManualFailover();


it become a primary, do we need this reset? it has no way to enter the failover check logic. or at least we remove the comment since this line is super easy.

Suggested change

/* Reset manual failover state. */

resetManualFailover();

resetManualFailover();

enjoy-binbin · 2025-02-11T04:30:47Z

src/cluster_legacy.c

+            int empty_db_flags = server.repl_replica_lazy_flush ? EMPTYDB_ASYNC : EMPTYDB_NO_FLAGS;
+            emptyData(-1, empty_db_flags, NULL);


Suggested change

int empty_db_flags = server.repl_replica_lazy_flush ? EMPTYDB_ASYNC : EMPTYDB_NO_FLAGS;

emptyData(-1, empty_db_flags, NULL);

emptyData(-1, server.repl_replica_lazy_flush ? EMPTYDB_ASYNC : EMPTYDB_NO_FLAGS, NULL);

enjoy-binbin · 2025-02-11T04:31:40Z

src/cluster_legacy.c

+                addReply(c, shared.ok);
+                return 1;
+            }
+            serverLog(LL_NOTICE, "Stop replication and turning myself into empty primary.");


let's also use catClientInfoShortString to get the client info for a better audit.

skolosov-snap force-pushed the skolosov/replicate-no-one branch from 91589d1 to ff96c0f Compare February 5, 2025 22:51

enjoy-binbin reviewed Feb 6, 2025

View reviewed changes

src/cluster_legacy.c Show resolved Hide resolved

hpatro reviewed Feb 7, 2025

View reviewed changes

skolosov-snap force-pushed the skolosov/replicate-no-one branch 2 times, most recently from fde1ab6 to 4abbae8 Compare February 7, 2025 23:46

zuiderkwast added the major-decision-pending Major decision pending by TSC team label Feb 10, 2025

zuiderkwast reviewed Feb 10, 2025

View reviewed changes

src/cluster_legacy.c Show resolved Hide resolved

src/commands/cluster-replicate.json Show resolved Hide resolved

skolosov-snap force-pushed the skolosov/replicate-no-one branch from 4abbae8 to 85238e6 Compare February 10, 2025 16:12

skolosov-snap force-pushed the skolosov/replicate-no-one branch from 85238e6 to 3789227 Compare February 10, 2025 17:24

zuiderkwast reviewed Feb 10, 2025

View reviewed changes

src/commands/cluster-replicate.json Outdated Show resolved Hide resolved

Implement: CLUSTER REPLICATE NO ONE

e4e8b24

Signed-off-by: Sergey Kolosov <[email protected]>

skolosov-snap force-pushed the skolosov/replicate-no-one branch from 3789227 to e4e8b24 Compare February 10, 2025 17:36

enjoy-binbin reviewed Feb 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement: CLUSTER REPLICATE NO ONE #1674

Implement: CLUSTER REPLICATE NO ONE #1674

skolosov-snap commented Feb 5, 2025 •

edited

Loading

codecov bot commented Feb 6, 2025 •

edited

Loading

hpatro left a comment

hpatro Feb 7, 2025

hpatro Feb 7, 2025

skolosov-snap Feb 7, 2025

skolosov-snap Feb 7, 2025

skolosov-snap commented Feb 7, 2025

zuiderkwast left a comment

zuiderkwast commented Feb 10, 2025

skolosov-snap commented Feb 10, 2025

enjoy-binbin Feb 11, 2025

enjoy-binbin Feb 11, 2025

enjoy-binbin Feb 11, 2025

enjoy-binbin Feb 11, 2025

	/* Reset manual failover state. */
	resetManualFailover();
	resetManualFailover();

		int empty_db_flags = server.repl_replica_lazy_flush ? EMPTYDB_ASYNC : EMPTYDB_NO_FLAGS;
		emptyData(-1, empty_db_flags, NULL);

	int empty_db_flags = server.repl_replica_lazy_flush ? EMPTYDB_ASYNC : EMPTYDB_NO_FLAGS;
	emptyData(-1, empty_db_flags, NULL);
	emptyData(-1, server.repl_replica_lazy_flush ? EMPTYDB_ASYNC : EMPTYDB_NO_FLAGS, NULL);

Implement: CLUSTER REPLICATE NO ONE #1674

Are you sure you want to change the base?

Implement: CLUSTER REPLICATE NO ONE #1674

Conversation

skolosov-snap commented Feb 5, 2025 • edited Loading

codecov bot commented Feb 6, 2025 • edited Loading

Codecov Report

hpatro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skolosov-snap commented Feb 7, 2025

zuiderkwast left a comment

Choose a reason for hiding this comment

zuiderkwast commented Feb 10, 2025

skolosov-snap commented Feb 10, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skolosov-snap commented Feb 5, 2025 •

edited

Loading

codecov bot commented Feb 6, 2025 •

edited

Loading