Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: conditionally handleReady after tick #138357

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nvanbenschoten
Copy link
Member

Related to #133885.
Related to #133216.

This commit adjusts Replica.tick to check RawNode.HasReady when deciding whether a Ready struct should be processed (Replica.handleRaftReady) from the ticked replica. Previously, we would unconditionally pass through Replica.handleRaftReady after a tick, even if doing so was unnecessary and we knew ahead of time that we would hit this early return:

This was mostly harmless in the past with quiescence and a cheap short-circuit in handleRaftReady, but has become more problematic recently for the following three reasons:

  1. Leader leases don't permit quiescence. We have some work planned to allow follower replicas to quiesce under a leader lease (see kv: local, StoreLiveness-based quiescence for Leader leases #133885), but even after that work, we likely won't allow quiescence for the leaseholder. This means that
  2. Replica.handleRaftReady has been getting more expensive, even in the no-op case. This is because of work done in Replica.updateProposalQuotaRaftMuLocked and work done for replica admission control v2 (RACv2).
  3. At some point in the future, we would like to revert 5e6698 and drop the raft tick interval back down to something like 200ms. This will allow us to address raft: separate out ElectionJitterTicks from ElectionTick #133576 and further reduce failover times.

Release note: None

Related to cockroachdb#133885.
Related to cockroachdb#133216.

This commit adjusts `Replica.tick` to check `RawNode.HasReady` when
deciding whether a `Ready` struct should be processed
(`Replica.handleRaftReady`) from the ticked replica. Previously, we
would unconditionally pass through `Replica.handleRaftReady` after a
tick, even if doing so was unnecessary and we knew ahead of time that
we would hit this early return: https://github.com/cockroachdb/cockroach/blob/57aab736c34ce5dc7988bd53e0604fde48cef441/pkg/kv/kvserver/replica_raft.go#L1025.

This was mostly harmless in the past with quiescence and a cheap short-circuit
in `handleRaftReady`, but has become more problematic recently for the following
three reasons:
1. Leader leases don't permit quiescence. We have some work planned to allow
   follower replicas to quiesce under a leader lease (see cockroachdb#133885), but even
   after that work, we likely won't allow quiescence for the leaseholder. This
   means that
2. `Replica.handleRaftReady` has been getting more expensive, even in the no-op
   case. This is because of work done in `Replica.updateProposalQuotaRaftMuLocked`
   and work done for replica admission control v2 (RACv2).
3. At some point in the future, we would like to revert 5e6698 and drop the raft
   tick interval back down to something like 200ms. This will allow us to
   address cockroachdb#133576 and further reduce failover times.

Release note: None
@nvanbenschoten nvanbenschoten added the A-leader-leases Related to the introduction of leader leases label Jan 7, 2025
@nvanbenschoten nvanbenschoten requested a review from a team as a code owner January 7, 2025 02:03
Copy link

blathers-crl bot commented Jan 7, 2025

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-leader-leases Related to the introduction of leader leases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants