-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validators should not serve BlockRequests
#1878
Comments
When we alter the Validators so that they will always respond with a 'null' response (to indicate that they don't have any blocks) the requesting node retries the range with a different peer. This is desired behaviour. However, it creates a potential dead-lock. By definition, the non-Validators are generally 'behind' the Validators. In the rare case where restarted Validators are trying to sync, they may not be able to if the only nodes that have the latest blocks are other Validators as the latest blocks have not propagated to non-Validators. This can happen e.g. when the network is restarted before the latest blocks get propagated to the non-Validators. As seen in this example where the Validator is requesting for blocks [770..772] but the non-Validators only have < 771.
Possible solutions:
I've implement the first solution for now, unless we think that a different solution makes more sense. |
Am putting this on-hold for now, since implementing this will kill consensus - in the case where the only ones who have the latest Proposals are the Validators and other nodes are unable to sync. Also, the reason for having it is due to performance issues, which might be improved elsewhere. |
* feat: added new ExternalMessage::RequestFromHeight and ExternalMessage::RequestFromHash message types. * feat: initial blockstore.rs skeleton. * feat: added request/response skeleton. * feat: hook up initial wiring of blockstore with consensus. * feat: added blockstore.rs. * feat: added in-flight check. * feat: added debug/warn/trace messages. * feat: initial requests firing. * feat: convert config value to usize for simplicity. * feat: added blockstore::handle_response_from_height(). * feat: checkpoint, successful RequestFromHeight-ResponseFromHeight. * feat: direct insert into DB, without receipts/touched/state. * feat: successfully injecting blocks/state_trie * feat: small refactor blockstore.rs * sec: make RequestId random, to mitigate response injections. * feat: minor reorg. * feat: disable speculative requests for now, until we have a better way to limit it. * feat: re-enabled speculative fetch. * feat: use InjectedProposals instead of ProcessProposals. * chore: minor cleanups. * feat: avoid single source of truth. * fix: insufficient peers in GCP. * feat: only inject blocks sourced from two peers - impossible to sync, atm. * feat: sort-of working sync with multiple sources of truth. * feat: pre-allocate enough capacity; corrected block_gap check. * feat: replace non-corroborated blocks in cache. * chore: clippy. checkpoint. [corroborate proposals] * feat: [speculative fetch] * feat: remove peer check, which allows it to proceed under circumstances where there is only 1 peer with the blocks. * chore: clippy. * feat: added handle_metadata_request/response(). * feat: [checkpoint - retrieve chain metadata]. * feat: added handle_multiblock_request/response(). * feat: [checkpoint - multi_block_request/response; never quite catching up.] * chore: clippy. * feat: sync phase#3 - zip it up. works for syncing new nodes. * feat: rename blockstore.rs to sync.rs - makes clear that its job is to sync. * nit: minor refactor - removing previous strategy. * feat: request multi-blocks from original meta-data peer. * feat: validates the chain metadata as it is retrieved. * chore: minor cleanup. * feat: perform checks to ensure multi-block response matches multi-block request. * feat: allow retries of request_missing_blocks(). * feat: added ability to retry phase 1, during phase 2 error. * feat: combined p1_cursor/p2_cursor into a self.state value. * feat: restructure sync_proposal() to make it legible. * checkpoint: working sync with state machine. * Revert "sec: make RequestId random, to mitigate response injections." This reverts commit 33d45f6. * feat: make fixed-sized queue size, configurable. * feat: v1 sync compatibility. * feat: use ChainMetaData as the main state variable structure. * feat: make sync compatible with older nodes. * feat: default to V1 peer; upgrade to V2 peer upon getting invalid response. * feat: filter V1 responses for gaps and forks. * feat: working phase 1 with protomainnet. * feat: removed sending BlockRequest from block_store.rs * chore: comments, cleanup. * fix: correct Phase 2 range, the stored value is accurate. * feat: ensure unique peers. * feat: output rate stats in block/s * feat: minor reorg, logging. * feat: added saving of metadata/segments to DB - allows continuation. * feat: added stateful sync algorithm feature - can continue after restart. * feat: rebuilt the algorithm to use DB for state, instead of in-memory. * feat: added PeerVer info to DB. * chore: post-rebase. * feat: removed block_store.rs * feat: added sync from timeout, not just proposals. * feat: made the batch_size dynamic, so that it can get past a larger range. * feat: added dynamic_batch_sizing() which is reactive, not pro-active. * feat: make dynamic_batch_sizing() work per-peer, not per sync. * fix: wire up peers in test Network. * fix: handle when V2 BlockResponse is late. * feat: sync batch_size should depend on the current request_range, not peer specific. * feat: simplified the request_missing_metadata() match selector. * fix: improve test sync, added Network::run_until_synced(); * fix: fixed unreliable::blocks_are_produced_while_a_node_restarts() test. * fix: staking::validators_can_join_and_become_proposer() test. * fix: tests. * nit: use Db::contains_block() instead of Db::get_block_by_hash(). * fix: tests. * feat: retry sync against upgraded Peer, immediately. * fix: checkpoints_test(), randomized add_peers() for tests. * fix: handle_forking(), validators_can_join() test. * fix: experiment. * fix: checkpoints_test(). * feat: check for checkpoint block, not just history. * fix: undoing some test timeouts. * feat: replace ChainMetaData with BlockHeader. * feat: changed BlockRequestV2 from 'hash'-based to 'height'-based. * feat: simplify checkpointed check. * nit: make sync_data temporary. * fix: better error handling for committee_for_hash(). * feat: sets starting_block during Sync::new(). * feat: store gas_used as a proxy for block size. * feat: reordered handle_metadata_response() to allow for micro-segmentation. * fix: removed dynamic_batch_sizing() as it should be unnecessary until block 1.0M in protomainnet. * feat: shifts txn verification from server side to client side. * fix: validators_can_join..() * fix PR comments: - https://github.com/Zilliqa/zq2/pull/2089/files#r1926920199 - https://github.com/Zilliqa/zq2/pull/2089/files#r1926894987 - https://github.com/Zilliqa/zq2/pull/2089/files#r1927243328 - https://github.com/Zilliqa/zq2/pull/2089/files#r1927261156 * feat: moved sync db layer from sync.rs to db.rs * feat: moved internal sync-peers to SyncPeers shared-state. https://github.com/Zilliqa/zq2/pull/2089/files#r1927206432 * fix: tests with shared-state SyncPeers. * fix: issue of upgraded node, encountered in protomainnet where a node was recorded as V1 in Phase1, but was updated to V2 in Phase2, causing the sync to be stuck in a loop. * nit: increase deposit_v3 boundary to 24. * feat: use libp2p timeout instead of internal sync timeout. * nit: change sync_data to sync_metadata table name; misc nits. * feat: added place-holder for active/passive sync. * fix #2227; and remove txn.verify() during Phase 2 - it is checked during Injection. * feat: place-holder to store old ZIL txn blocks. * nit: simplify run_until_synced(); * fix: flaw in get_next_peer(). * feat: early prototype for issue #1878. * Delete all non-finalized blocks from database at startup Previously we only deleted 'canonical' blocks. * Don't fail benchmark workflows on alert * Remove redundant config * Hide listen addrs * feat: store raw header blob in sync_metadata(). * feat: minor log changes; remove redundant check in handle_metadata_response(). --------- Co-authored-by: James Hinshelwood <[email protected]>
Syncing nodes can't distinguish the current validators from other nodes since their latest block contains an outdated validator set. But
theyvalidators can advertise that they don't have any blocks so that nodes don't send them request for blocks that they want to download.The text was updated successfully, but these errors were encountered: