Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2652 Membership::set_first_epoch #2654

Merged
merged 6 commits into from
Mar 3, 2025
Merged

2652 Membership::set_first_epoch #2654

merged 6 commits into from
Mar 3, 2025

Conversation

pls148
Copy link
Contributor

@pls148 pls148 commented Feb 21, 2025

Closes #2652

This PR:

  • Adds a fn to Membership called set_first_epoch
  • Adds impl of that fn in both hotshot and sequencer side
  • Adds testing to StaticCommittee to ensure set_first_epoch is called appropriately

This PR does not:

Key places to review:

@pls148 pls148 marked this pull request as draft February 21, 2025 01:03
@pls148 pls148 force-pushed the ps/2652 branch 7 times, most recently from 516ba06 to 2ff3c8b Compare February 24, 2025 18:52
@pls148 pls148 marked this pull request as ready for review February 24, 2025 19:35
@pls148 pls148 force-pushed the ps/2652 branch 3 times, most recently from a97d405 to 4bf4660 Compare February 24, 2025 20:55
if let Some(decided_leaf_info) = res.leaf_views.last() {
decide_epoch_root(&decided_leaf_info.leaf, epoch_height, membership).await;
} else {
tracing::info!("No decided leaf while a view has been decided.");
}
}

if let Some(ref decided_upgrade_cert) = res.decided_upgrade_cert {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we combine this if let and the one 2 levels down (i.e. if let (Some(..), Some(..)) = (.., ..) or similar)

temp_memberships
.set_first_epoch(<TYPES as NodeType>::Epoch::new(1), INITIAL_DRB_RESULT);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this in hotshot-testing, is there a reason we can't just do this in SystemContext::init?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up removing this entirely as claim_block_with_num_nodes updates the number of nodes already.

@@ -297,6 +298,16 @@ impl<TYPES: NodeType, I: NodeImplementation<TYPES>, V: Versions> SystemContext<T
config.epoch_height,
);

if let Some(epoch) = epoch {
// #2652 REVIEW NOTE: This epoch can't be right; what if a node starts up between when we upgrade to epochs
// and when we're at epoch+2? This would cause the current node to propagate the non-epochs stake table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure if this is the right way to do this

can we maybe move this to start_consensus and do this in the same place we broadcast the genesis QC (~ line 440)? that seems like it might be closer to the semantics we want

I think there the check should be on V::Base::VERSION not anything else. this won't handle (at all) the case where we're restarting with epochs, but that should be handled by @bfish713's catchup

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we don't need to set this epoch when we start consensus. If we set it to our current epoch (based on the anchor leaf) I think that value will never be correct. I think we should only set this when the epoch upgrade succeeds and it should be set to a specific epoch (i.e the upgrade is agree to start epochs at a specific height)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we start up with epochs enabled for genesis, the Membership trait needs to know that epochs 1 & 2 have no epoch root/drb

I agree we should not use the anchor leaf, but I think we have to specifically do it for genesis unless I'm missing something

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be handled in a general way, if epochs are enabled when we start, we need to load with an epoch root or roots. I think this should be included in the HotshotInitializer, We should also make sure we can load up again with the stored epoch, like how we do with the anchor leaf.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we can't just ignore at startup, I was mistaken. I was only considering the mainnet upgrade

Copy link
Contributor

@lukaszrzasik lukaszrzasik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have any comments. I'll let someone else approve after the discussion with the setting the initial first epoch is resolved.

I do have a question but I'll take it offline as this more of a generic question and I don't think it will affect the PR in any way.

@pls148 pls148 force-pushed the ps/2652 branch 10 times, most recently from 681a0b5 to 8e85bb6 Compare February 27, 2025 23:17
@pls148 pls148 force-pushed the ps/2652 branch 6 times, most recently from b987c63 to 8710c56 Compare February 28, 2025 05:16
Copy link
Contributor

@ss-es ss-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments. I think notably there aren't any methods storing the initializer info, will that be a separate PR?

otherwise I think it looks fine overall. I'll leave to @bfish713 or @lukaszrzasik to approve (since I accidentally merged the upgrade logic into this branch and I don't want to approve my own code)

@@ -243,6 +244,53 @@ impl<TYPES: NodeType, I: NodeImplementation<TYPES>, V: Versions> SystemContext<T
.await
}

async fn load_start_epoch_info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason this is a method on SystemContext?

it seems like it should either be a method on Membership or a stand-alone function

Comment on lines 251 to 260
// If the first element in start_epoch_info does not have a block header,
// use that to set_first_epoch instead
if let Some(epoch_info) = start_epoch_info.first() {
if epoch_info.block_header.is_none() {
membership
.write()
.await
.set_first_epoch(epoch_info.epoch, INITIAL_DRB_RESULT);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually suggest just calling set_first_epoch in the loop, as part of the else on if let Some(block_header)

I think doing it the way it's done here is more likely to result in unexpected behaviour, because we can silently discard parts of the initializer. and I don't think there's much to gain from second-guessing the initializer: if it's misconfigured and results in set_first_epoch being called twice, it shouldn't be load_start_epoch_info's job to second-guess it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we should log by the way, maybe at tracing::debug!(), every call to set_first_epoch or add_drb_result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done and done

@pls148
Copy link
Contributor Author

pls148 commented Feb 28, 2025

left some comments. I think notably there aren't any methods storing the initializer info, will that be a separate PR?

otherwise I think it looks fine overall. I'll leave to @bfish713 or @lukaszrzasik to approve (since I accidentally merged the upgrade logic into this branch and I don't want to approve my own code)

Addressed. The methods for storing it are part of a separate task

Comment on lines 510 to 512
let decided_block_number = decided_leaf_info.leaf.block_header().block_number();
let first_epoch_number =
TYPES::Epoch::new(epoch_from_block_number(decided_block_number, epoch_height));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the plan was to have the block height configured for the first epoch, but his code is getting the epoch from when we decide the upgrade. Is that change for a different PR or am I misunderstanding/misremembering?

Comment on lines +495 to +484
let epoch_height = consensus_reader.epoch_height;
drop(consensus_reader);

if with_epochs && res.new_decided_view_number.is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this function stops being called when we upgrade, decide_from_proposal_2 gets activated, any logic specific to epochs would need to be in there too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding here was that once we're in epochs we don't want to set_first_epoch any more anyway. Either way this logic moved due to the next comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are correct once we in epochs we don't have to worry about epoch upgrade stuff

@@ -390,21 +391,44 @@ pub(crate) async fn handle_quorum_proposal_validated<
.await
};

if let Some(cert) = &task_state.staged_epoch_upgrade_certificate {
if leaf_views.last().unwrap().leaf.height() >= task_state.epoch_upgrade_block_height {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here we are waiting to apply the upgrade until the configured height, but we already set the first epoch, I think that needs to happen here

Comment on lines 417 to 429
membership
.write()
.await
.set_first_epoch(first_epoch_number, INITIAL_DRB_RESULT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set this in the block below

    if let Some(cert) = decided_upgrade_cert.clone() {
        if cert.data.new_version == V::Epochs::VERSION {
            task_state.staged_epoch_upgrade_certificate = Some(cert);

Comment on lines +495 to +484
let epoch_height = consensus_reader.epoch_height;
drop(consensus_reader);

if with_epochs && res.new_decided_view_number.is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are correct once we in epochs we don't have to worry about epoch upgrade stuff

@@ -390,21 +391,56 @@ pub(crate) async fn handle_quorum_proposal_validated<
.await
};

if let Some(cert) = &task_state.staged_epoch_upgrade_certificate {
Copy link
Contributor

@bfish713 bfish713 Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear this option after we apply the upgrade so we don't hit this on every proposal

@pls148 pls148 enabled auto-merge (squash) March 3, 2025 20:49
@pls148 pls148 merged commit 4771ff5 into main Mar 3, 2025
50 of 51 checks passed
@pls148 pls148 deleted the ps/2652 branch March 3, 2025 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Membership::set_first_epoch
4 participants