-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(topdown): pull effects up until committed finality. #887
Conversation
Fendermint was pulling parent effects only until committed_finality-1. This choice assumed that chains like Filecoin leaked deferred execution mechanics through the query mechanism we use (Eth JSON-RPC API). Therefore it was deemed that the safe choice was to lag behind by one epoch. However, we can safely assume that Eth JSON-RPC APIs will only expose blocks that have been executed and whose side-effects are available (unless "pending" is requested). Filecoin reconciles its deferred execution model with Eth client expectations of immediate execution, alas with some known rough edges [1]. Furthermore, this choice assumed that our chain_head_delay can be 0, i.e. we can consume the exposed HEAD of the parent in real time. In practice, that's not a good idea, ever. All in all, this is problematic because it's a hidden. Users inspecting the committed parent finality and seeing N now need to understand that events emitted in N are (counterintuitively) not imported yet, and they need to wait longer. This PR: - Rectifies the queried range so that we import up to, and including, the committed parent finality. - Disallows a zero chain_head_delay. Note: I'm clarifying with @aakoshh the mechanics of Fendermint's Eth JSON-RPC API; concretely, if it renconciles deferred execution or not. There are some off-by-one shifts assumed in a number of places, so unclear. However, by disallowing a chain_head_delay of 0, we are able to work with APIs that expose heights before their results are available. Note 2: when we adopt light clients to watch the parent, some assumptions may change. [1] filecoin-project/lotus#11205
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to have a problem where the data pre-fetched by the sycner into the cache, and the data fetched in-sync during catch-up, were different because of which height the query was executed at. It seems to make sense that if we rely on the Ethereum API then you can get the results of a final height at that height, so hopefully this is not an issue here.
Worth noting that this isn't just a fix, it's a breaking change that would cause consensus failure if applied by anyone with an existing chain. That is, anyone trying to sync with the chain from genesis would apply different top-down effects. The disciplined way to roll this out for such chains would be an upgrade that enables this behaviour from a certain height onward.
Regarding the Fendermint Ethereum API itself: the ethapi
pretends that the latest height of the chain is 1 less than what it really is in CometBFT, to make sure it only shows executed blocks. It also adjusts the query height by +1 to execute queries at the height the results are stored at.
What I'm not sure about is the event log. The eth_getLogs
API goes block by block, looking at the results of the execution, so if you go [990-1000]
you will see the the effects of block 1000. The end of the range is also explicitly restricted to the latest height (which is -1 adjusted, so results should be available). If the results still weren't available (e.g. because someone queries by block hash and it's not executed yet, for whatever reason), the results would be empty for that block, with no error returned. If someone were to use the CometBFT search API to look for transaction effects with certain topics, they would see them at however CometBFT chooses the block index, which I believe is by where the transaction was included.
It's also worth noting that if somebody just applied this on a running chain, they could accidentally skip the effects of a block:
105 ends up not being applied. |
Another reason to roll this out as an upgrade, or with a condition of which height to apply this logic from, is that unless all in-sync nodes are restarted right at the same height, they can experience consensus failure too, not just the ones syncing from genesis. |
So the way to roll this would be something like: let (execution_fr, execution_to) =
match state.block_height() {
h if h < self.finality_exec_switch_height => (prev_height, finality.height - 1),
h if h == self.finality_exec_switch_height => (prev_height, finality.height),
_ => (prev_height + 1, finality.height),
}; And to add a |
@aakoshh This is consensus breaking indeed, with the mitigating factor that that the fork would only manifest if the events being pulled top-down are in a boundary height. The fork would not manifest if the events are in the inner portion of the parent block range being pulled. Prior to submitting this PR, Fluence had already applied a height condition in their IPC fork. I am not aware of any other live networks that would be impacted by this change, so taking a pragmatic stance, I'm reluctant to add the proposed switch flag in our codebase as it'll quickly turn into technical debt without sound justification. I think it's safe to assume that —today— all IPC adopters are running forks, and reviewing incoming changes carefully before merging from upstream. Taking a step back, what's missing here is proper versioning, a changelog, and explicit breaking change signalling. We do expect a large amount of consensus-breaking changes in the next months as we refactor things and build out missing features (as IPC is really in its infancy), and it won't scale well to put each one behind a fork flag. A more manageable solution for the time being is to treat this repo as experimental (through v0 versioning), flag consensus-breaking changes in changelogs, and work directly with impacted adopters to evaluate the condition they'd need to include. This is more reactive, but right now it's a tradeoff I'd strongly advise us to accept. This situation will change when we modularise IPC, and package it behind neat modules and APIs that can be composed to build a customized client without forking the whole repo. Fluence really wants this and I think it makes a ton of sense. Furthermore, I'd like to note that we've been making contract and Solidity and Rust compiler changes without much sympathy in that they affect the genesis payload and break syncing from genesis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be fine, LGTM.
Anyways, once the cert and commitment is onchain, then there is no need for this from
and to
anymore.
Fendermint was pulling parent effects only until committed_finality-1.
This choice assumed that chains like Filecoin leaked deferred execution mechanics through the query mechanism we use (Eth JSON-RPC API). Therefore it was deemed that the safe choice was to lag behind by one epoch. However, we can safely assume that Eth JSON-RPC APIs will only expose blocks that have been executed and whose side-effects are available (unless "pending" is requested). Filecoin reconciles its deferred execution model with Eth client expectations of immediate execution, alas with some known rough edges [1].
Furthermore, this choice assumed that our chain_head_delay can be 0, i.e. we can consume the exposed HEAD of the parent in real time. In practice, that's not a good idea, ever.
All in all, this is problematic because it's a hidden gotcha. Users inspecting the committed parent finality on-chain and seeing N now need to understand that events emitted in N are (counterintuitively) not imported yet, and they need to wait longer.
This PR:
Note: I'm clarifying with @aakoshh the mechanics of Fendermint's Eth JSON-RPC API; concretely, if it renconciles deferred execution or not. There are some off-by-one shifts assumed in a number of places, so unclear. However, by disallowing a chain_head_delay of 0, we are able to work with APIs that expose heights before their results are available.
Note 2: when we adopt light clients to watch the parent, some assumptions may change.
[1] filecoin-project/lotus#11205