Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncing fails on certain blocks after new syncing algo merged #2309

Closed
shawn-zil opened this issue Feb 7, 2025 · 2 comments · Fixed by #2329
Closed

Syncing fails on certain blocks after new syncing algo merged #2309

shawn-zil opened this issue Feb 7, 2025 · 2 comments · Fixed by #2329
Assignees

Comments

@shawn-zil
Copy link
Contributor

shawn-zil commented Feb 7, 2025

Discovered while testing out #2307 in devnet

All syncing nodes - that sync from genesis - are stuck at block 505042.

{
  "jsonrpc": "2.0",
  "id": "1",
  "result": {
    "startingBlock": 505042,
    "currentBlock": 505042,
    "highestBlock": 986771,
    "status": {
      "currentPhase": "phase2",
      "peerCount": 13,
      "headerDownloads": 65100,
      "blockDownloads": 62200,
      "bufferedBlocks": 0,
      "emptyCount": 0,
      "retryCount": 22,
      "timeoutCount": 142
    }
  }
}

Checking the DB on devnet reveals that there are two blocks at this height, a fork, one canonical:

select hex(block_hash), is_canonical from blocks where height=505042;
015A025D610D3F7878D67122BDFDF2186E2C4419E684B5B690CC7A6BCEB1B26E|1
26F53E4DF5E857C04A462416902140A3BD89FFA823F55F309FB71C0743F92013|0

Since syncing follows the parent_hash, the one synced to the local node is:

select hex(block_hash) from blocks where height=505042;
015A025D610D3F7878D67122BDFDF2186E2C4419E684B5B690CC7A6BCEB1B26E

Which is also the expected parent for block 505043, as indicated by the QC:

BlockHeader { view: 505957, number: 505043, hash: 16163caa8e396ab0147aa19ea4ee4da31d09b639c32d0f122d2fcdbf2d4d5a28, qc: QuorumCertificate { signature: BlsSignature(Basic(G2Projective { x: Fp2 { c0: Fp(0x10ee3ea072747db24ce794515b14247ed8d95b1e879ad87f71cb4198854e5d7b8df5093cc600ee58f0598f05cbd8a534), c1: Fp(0x0b7f94921543ed18f6f73965eb13340e4d14f84b9800c9a66f1fd72b60d40063facab1ef4d8edb8c8c440d8f77c4774f) }, y: Fp2 { c0: Fp(0x1083a15976c6655d319c77ee7f3ddff2b2066d12eed785e431107cb448affcde4a8149d59966ffac1f21651c1a24e00d), c1: Fp(0x03e7ef4ddfae8ef06f4261d60fabf55233f29819b0bd4c22519eebb9edeef0276fcff0b1800369ced1a49dffb11f6e37) }, z: Fp2 { c0: Fp(0x000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001), c1: Fp(0x000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000) } })), cosigned: BitArray<u8, bitvec::order::Msb0> { addr: 0x7fa1bd1fe8d8, head: 000, bits: 256 } [1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], block_hash: 015a025d610d3f7878d67122bdfdf2186e2c4419e684b5b690cc7a6bceb1b26e, view: 505955 }

Also, when executing block 505042, unlike other regular blocks, it seems to be the only block that was added for some reason:

2025-02-07T11:58:43.153916Z DEBUG zilliqa::consensus: 2839: Executing block: BlockHeader { view: 505955, number: 505042, hash: 015a025d610d3f7878d67122bdfdf2186e2c4419e684b5b690cc7a6bceb1b26e, qc: QuorumCertificate { signature: BlsSignature(Basic(G2Proje
ctive { x: Fp2 { c0: Fp(0x156c156749fd50374d06e38facc11a37589aad69b53597822c83975c257093bdde8bbaf44bdcf2b9db4d97dc50218374), c1: Fp(0x1069816663a59316f26763ae94f127124f8d1bd2952f76b40a969289c38656e09d7c3186c162851d06eebc0984e9977d) }, y: Fp2 { c0: Fp(0x0
d9767318d8da0ebd5a4e51fa45967a9ee953d21c2be397cc6029b88b8b0c1d57e56276f836909b195a5cffe77b3dcc6), c1: Fp(0x1334d8e8b7f58c06fe9412180f79b20dda45e9bff45e9e298bfb310579a56155d379ad157bd8bfa642cda7aa1730891a) }, z: Fp2 { c0: Fp(0x0000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000001), c1: Fp(0x000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000) } })), cosigned: BitArray<u8, bitvec::order::Msb0> { addr: 0x7fa20edc78e0, h
ead: 000, bits: 256 } [0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
, 0, 0, 0, 0, 0, 0, 0, 0, 0], block_hash: 9ba5e13c22c62f9c952a5e6f670f778bb4deceb7e280052643c5e7d3daa66e24, view: 505952 }, signature: BlsSignature(Basic(G2Projective { x: Fp2 { c0: Fp(0x18d6e6a4b9fef5261e7077be00dd30dde08cf84087df846e1124b9c3dda5279ecbe
e52dae8aa607af76f7a8ff8036a0b), c1: Fp(0x0bf0ad99f958e268e25f5cf471f65097a36b4f8ce780040ec1e28e1f0d18fed9c6da6c3a08683361d83dd790f16f47f2) }, y: Fp2 { c0: Fp(0x10e31d855b05ff4b7a10b98edd4ced4bfb70c68ea448ef0cfa6995c13a37341a3810429325c2fe9a5b2adabb2e7b6d
a5), c1: Fp(0x151bc69ddd740596a37dd63736843022e5afabf28c7b81d7555e066393ccb716029e94e0ce66df55914fa4923c093a3b) }, z: Fp2 { c0: Fp(0x000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001), c1: Fp(0x0000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000000) } })), state_root_hash: 3a0ffc4f1e4c9ff40e70c4a5ef013206254ca0ee4653286e8e520d558c68bc86, transactions_root_hash: 56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb
5e363b421, receipts_root_hash: 56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421, timestamp: SystemTime { tv_sec: 1738185698, tv_nsec: 330835075 }, gas_used: EvmGas(0), gas_limit: EvmGas(84000000) }
2025-02-07T11:58:43.171926Z DEBUG zilliqa::consensus: 826: apply late rewards in view 505955
2025-02-07T11:58:43.273017Z DEBUG zilliqa::consensus: 2380: added block from=None hash=015a025d610d3f7878d67122bdfdf2186e2c4419e684b5b690cc7a6bceb1b26e block.header.view=505955 block.header.number=505042
2025-02-07T11:58:43.273474Z  WARN zilliqa::consensus: 2449: Tried to set view to lower or same value - this is incorrect. value: 505956
2025-02-07T11:58:43.273488Z DEBUG zilliqa::consensus: 727: *** setting view to proposal view... view is now 505956
2025-02-07T11:58:43.291135Z DEBUG zilliqa::consensus: 767: can't vote for block proposal, we aren't in the committee of length 12
@shawn-zil shawn-zil added the area:consensus Related to the consensus protocol label Feb 7, 2025
@DrZoltanFazekas
Copy link
Contributor

At this point the chain was already close to block height 1000000 i.e. one of the above blocks should have been removed when the canonical one was finalized.

@DrZoltanFazekas DrZoltanFazekas removed the area:consensus Related to the consensus protocol label Feb 7, 2025
@86667
Copy link
Contributor

86667 commented Feb 10, 2025

The issue here is that the block 16163caa8e396ab0147aa19ea4ee4da31d09b639c32d0f122d2fcdbf2d4d5a28 (view: 505957, number: 505043) is made up an AggQC, where one the QCs are a forked block which we have not stored since it is not in the canonical chain.

This is fine, but our function get_highest_from_agg(), whose goal is to fetch the largest QC from the AggQC, attempts to fetch each QC's corresponding block:

    fn get_highest_from_agg(&self, agg: &AggregateQc) -> Result<QuorumCertificate> {
        agg.qcs
            .iter()
            .map(|qc| (qc, self.get_block(&qc.block_hash)))
            .try_fold(None, |acc, (qc, block)| {
                let block = block?.ok_or_else(|| anyhow!("missing block with hash {:?}", qc.block_hash))?;
                ... // find largest qc.view
    }

The fix then is to find the largest QC.view from the QC data itself, then attempt to fetch the block.

@86667 86667 linked a pull request Feb 10, 2025 that will close this issue
@86667 86667 closed this as completed Feb 11, 2025
@86667 86667 changed the title Something unusual happened in consensus. Syncing fails on certain blocks after new syncing algo merged Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants