fix(topdown): pull effects up until committed finality. #887

raulk · 2024-04-19T12:10:51Z

Fendermint was pulling parent effects only until committed_finality-1.

This choice assumed that chains like Filecoin leaked deferred execution mechanics through the query mechanism we use (Eth JSON-RPC API). Therefore it was deemed that the safe choice was to lag behind by one epoch. However, we can safely assume that Eth JSON-RPC APIs will only expose blocks that have been executed and whose side-effects are available (unless "pending" is requested). Filecoin reconciles its deferred execution model with Eth client expectations of immediate execution, alas with some known rough edges [1].

Furthermore, this choice assumed that our chain_head_delay can be 0, i.e. we can consume the exposed HEAD of the parent in real time. In practice, that's not a good idea, ever.

All in all, this is problematic because it's a hidden gotcha. Users inspecting the committed parent finality on-chain and seeing N now need to understand that events emitted in N are (counterintuitively) not imported yet, and they need to wait longer.

This PR:

Rectifies the queried range so that we import up to, and including, the committed parent finality.
Disallows a zero chain_head_delay.

Note: I'm clarifying with @aakoshh the mechanics of Fendermint's Eth JSON-RPC API; concretely, if it renconciles deferred execution or not. There are some off-by-one shifts assumed in a number of places, so unclear. However, by disallowing a chain_head_delay of 0, we are able to work with APIs that expose heights before their results are available.

Note 2: when we adopt light clients to watch the parent, some assumptions may change.

[1] filecoin-project/lotus#11205

@aakoshh

Fendermint was pulling parent effects only until committed_finality-1. This choice assumed that chains like Filecoin leaked deferred execution mechanics through the query mechanism we use (Eth JSON-RPC API). Therefore it was deemed that the safe choice was to lag behind by one epoch. However, we can safely assume that Eth JSON-RPC APIs will only expose blocks that have been executed and whose side-effects are available (unless "pending" is requested). Filecoin reconciles its deferred execution model with Eth client expectations of immediate execution, alas with some known rough edges [1]. Furthermore, this choice assumed that our chain_head_delay can be 0, i.e. we can consume the exposed HEAD of the parent in real time. In practice, that's not a good idea, ever. All in all, this is problematic because it's a hidden. Users inspecting the committed parent finality and seeing N now need to understand that events emitted in N are (counterintuitively) not imported yet, and they need to wait longer. This PR: - Rectifies the queried range so that we import up to, and including, the committed parent finality. - Disallows a zero chain_head_delay. Note: I'm clarifying with @aakoshh the mechanics of Fendermint's Eth JSON-RPC API; concretely, if it renconciles deferred execution or not. There are some off-by-one shifts assumed in a number of places, so unclear. However, by disallowing a chain_head_delay of 0, we are able to work with APIs that expose heights before their results are available. Note 2: when we adopt light clients to watch the parent, some assumptions may change. [1] filecoin-project/lotus#11205

aakoshh

We used to have a problem where the data pre-fetched by the sycner into the cache, and the data fetched in-sync during catch-up, were different because of which height the query was executed at. It seems to make sense that if we rely on the Ethereum API then you can get the results of a final height at that height, so hopefully this is not an issue here.

Worth noting that this isn't just a fix, it's a breaking change that would cause consensus failure if applied by anyone with an existing chain. That is, anyone trying to sync with the chain from genesis would apply different top-down effects. The disciplined way to roll this out for such chains would be an upgrade that enables this behaviour from a certain height onward.

Regarding the Fendermint Ethereum API itself: the ethapi pretends that the latest height of the chain is 1 less than what it really is in CometBFT, to make sure it only shows executed blocks. It also adjusts the query height by +1 to execute queries at the height the results are stored at.

What I'm not sure about is the event log. The eth_getLogs API goes block by block, looking at the results of the execution, so if you go [990-1000] you will see the the effects of block 1000. The end of the range is also explicitly restricted to the latest height (which is -1 adjusted, so results should be available). If the results still weren't available (e.g. because someone queries by block hash and it's not executed yet, for whatever reason), the results would be empty for that block, with no error returned. If someone were to use the CometBFT search API to look for transaction effects with certain topics, they would see them at however CometBFT chooses the block index, which I believe is by where the transaction was included.

aakoshh · 2024-04-19T12:47:11Z

It's also worth noting that if somebody just applied this on a running chain, they could accidentally skip the effects of a block:

Say we are running a fendermint without this change.
Say prev_height = 100 and finalized.height = 105.
The execution grabs the topdown effects between 100 and 104 (inclusive)
Now we restart fendermint with this change
The next block is executing, we have a new parent finality at 110, ie. prev_height = 105, finalized.height = 110
The execution grabs the topdown effects between 106 and 110

105 ends up not being applied.

aakoshh · 2024-04-19T12:48:20Z

Another reason to roll this out as an upgrade, or with a condition of which height to apply this logic from, is that unless all in-sync nodes are restarted right at the same height, they can experience consensus failure too, not just the ones syncing from genesis.

aakoshh · 2024-04-19T12:51:53Z

So the way to roll this would be something like:

let (execution_fr, execution_to) = 
  match state.block_height() {
    h if h <  self.finality_exec_switch_height => (prev_height, finality.height - 1), 
    h if h == self.finality_exec_switch_height => (prev_height, finality.height),
    _                                          => (prev_height + 1, finality.height),
  };

And to add a finality_exec_switch_height setting with a default value of 0.

raulk · 2024-05-07T15:38:48Z

@aakoshh This is consensus breaking indeed, with the mitigating factor that that the fork would only manifest if the events being pulled top-down are in a boundary height. The fork would not manifest if the events are in the inner portion of the parent block range being pulled.

Prior to submitting this PR, Fluence had already applied a height condition in their IPC fork. I am not aware of any other live networks that would be impacted by this change, so taking a pragmatic stance, I'm reluctant to add the proposed switch flag in our codebase as it'll quickly turn into technical debt without sound justification. I think it's safe to assume that —today— all IPC adopters are running forks, and reviewing incoming changes carefully before merging from upstream.

Taking a step back, what's missing here is proper versioning, a changelog, and explicit breaking change signalling. We do expect a large amount of consensus-breaking changes in the next months as we refactor things and build out missing features (as IPC is really in its infancy), and it won't scale well to put each one behind a fork flag.

A more manageable solution for the time being is to treat this repo as experimental (through v0 versioning), flag consensus-breaking changes in changelogs, and work directly with impacted adopters to evaluate the condition they'd need to include. This is more reactive, but right now it's a tradeoff I'd strongly advise us to accept.

This situation will change when we modularise IPC, and package it behind neat modules and APIs that can be composed to build a customized client without forking the whole repo. Fluence really wants this and I think it makes a ton of sense.

Furthermore, I'd like to note that we've been making contract and Solidity and Rust compiler changes without much sympathy in that they affect the genesis payload and break syncing from genesis.

cryptoAtwill

I think it should be fine, LGTM.

Anyways, once the cert and commitment is onchain, then there is no need for this from and to anymore.

raulk requested review from aakoshh and cryptoAtwill April 19, 2024 12:10

aakoshh reviewed Apr 19, 2024

View reviewed changes

raulk and others added 2 commits July 22, 2024 13:29

Merge branch 'main' into raulk/fix-top-down-finality-range-read

cd826a7

Merge branch 'main' into raulk/fix-top-down-finality-range-read

ae64b06

cryptoAtwill requested a review from a team as a code owner September 25, 2024 09:33

cryptoAtwill approved these changes Sep 25, 2024

View reviewed changes

raulk merged commit ad43416 into main Sep 25, 2024
15 checks passed

raulk deleted the raulk/fix-top-down-finality-range-read branch September 25, 2024 12:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(topdown): pull effects up until committed finality. #887

fix(topdown): pull effects up until committed finality. #887

raulk commented Apr 19, 2024 •

edited

Loading

aakoshh left a comment •

edited

Loading

aakoshh commented Apr 19, 2024 •

edited

Loading

aakoshh commented Apr 19, 2024

aakoshh commented Apr 19, 2024 •

edited

Loading

raulk commented May 7, 2024

cryptoAtwill left a comment

fix(topdown): pull effects up until committed finality. #887

fix(topdown): pull effects up until committed finality. #887

Conversation

raulk commented Apr 19, 2024 • edited Loading

aakoshh left a comment • edited Loading

Choose a reason for hiding this comment

aakoshh commented Apr 19, 2024 • edited Loading

aakoshh commented Apr 19, 2024

aakoshh commented Apr 19, 2024 • edited Loading

raulk commented May 7, 2024

cryptoAtwill left a comment

Choose a reason for hiding this comment

raulk commented Apr 19, 2024 •

edited

Loading

aakoshh left a comment •

edited

Loading

aakoshh commented Apr 19, 2024 •

edited

Loading

aakoshh commented Apr 19, 2024 •

edited

Loading