Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

History pre-merge sync #13186

Open
3 tasks
joshieDo opened this issue Dec 6, 2024 · 2 comments
Open
3 tasks

History pre-merge sync #13186

joshieDo opened this issue Dec 6, 2024 · 2 comments
Labels
C-discussion A discussion about the direction and design of the project C-enhancement New feature or request C-tracking-issue An issue that collects information about a broad development initiative M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity

Comments

@joshieDo
Copy link
Collaborator

joshieDo commented Dec 6, 2024

History Pre-merge syncing from s3

Tracking Issues

Tasks

Preview Give feedback
  1. 4 of 7
    A-db A-static-files C-enhancement C-tracking-issue
    joshieDo
  2. 1 of 4
    A-staged-sync C-enhancement C-tracking-issue
    joshieDo

References:

Table of contents:

The download logic should reside inside ethereum/ and not "sdk".

Static files pre-merge

Data: ~295.7GB (each) Each file has 500k worth of blocks.

Offsets: ~13.6G Refresher: provides the data range for each transaction in the data file. each offset takes exactly 8 bytes. So it should be possible to query specific ones if necessary.

Last static file goes beyond the merge block (15_537_393) static_file_transactions_15500000_15999999

Important

There's missing data that needs to be additionally provided:

  • BlockBodyIndices
  • Ommers

Currently

A range of blocks is passed as input to the BodyDownloader and after finding peers that can serve them, will stream/buffer them to the BodyStage. This will consume them one by one and create the following indices: BlockBodyIndices, TransactionBlocks.

Could be

Extend downloader without any stage changes

  • Extend BodiesDownloader with the extra logic, and let node builder specificy it. (eg. Node::body_downloader(config.body))
  • BodiesDownloader would replicate the streaming behaviour from above by:
    • Downloading BlockBodyIndices
    • Downloading Ommers
    • Downloading offset file(s). Either for specific ranges or all straightaway.
    • Reads dataset above in order to make HTTP partial ranges on the static file data itself, build the block and buffer it to BodyStage. Could be batched.
  • Would require it to be stateful across restarts so we don't re-download indices and offsets.

Pros

  • Allows for easy unwinds or per-merge sync targets from the stage perspective.
  • Stages untouched.

Cons

  • We are decoding/decompressing data we can consume/read in bulk.
  • Rebuilding offsets that we already downloaded.
  • tldr: slower

New stage and Downloader

  • Make removal/addition of custom stages possible on node-builder.
  • Add a new stage (eg. S3Stage)
  • Add a new dedicated downloader.
    • aria2c
    • some rust crate?
    • in-house
  • S3Stage would be executed after HeaderStage and before BodyStage.
  • Downloads and loads everything in bulk.
    • Could contain hashes of all the files to ensure integrity before proceeding
  • How to handle partial pre-merge sync/unwinds (aka --debug.tip PRE_MERGE_BLOCK_NUMBER - 1_000). Proposals?
    • Sync targets: S3Stage would gatekeep staged sync. Maybe only accepts end ranges that match the static file batch ranges or the premerge block itself. If not passed, just hang with a warning message asking for a better range.
    • Unwinds: maybe add "unwind threshold" to stage logic. limiting how far back an unwind can go to. Could also be useful for op-mainnet where we import state and doesn't make sense to unwind pre-bedrock. Similarly, it would not make sense to unwind pre-merge
      • The command reth stage unwind -f could still ignore this limit if really necessary.
  • S3Stage could set BodyStageCheckpoint block as well as the unwind_threshold block. No changes necessary for BodyStage.

Pros

  • Can maximize download/bootstrap speed.

Cons

  • More complexity/footprint?

Abandoned?

  • Specialized-node which only serves pre-merge blocks through p2p. hardcoded whitelisted peer on users nodes. Infra is hard.
  • Re-genesis - download state and not history, op-mainnet style.

Individual batch sizes

static_file_transactions_0_499999 → 54M
static_file_transactions_500000_999999 → 149M
static_file_transactions_1000000_1499999 → 291M
static_file_transactions_1500000_1999999 → 390M
static_file_transactions_2000000_2499999 → 417M
static_file_transactions_2500000_2999999 → 408M
static_file_transactions_3000000_3499999 → 603M
static_file_transactions_3500000_3999999 → 1.9G
static_file_transactions_4000000_4499999 → 5.3G
static_file_transactions_4500000_4999999 → 8.8G
static_file_transactions_5000000_5499999 → 7.7G
static_file_transactions_5500000_5999999 → 8.6G
static_file_transactions_6000000_6499999 → 7.9G
static_file_transactions_6500000_6999999 → 6.2G
static_file_transactions_7000000_7499999 → 6.9G
static_file_transactions_7500000_7999999 → 8.2G
static_file_transactions_8000000_8499999 → 7.9G
static_file_transactions_8500000_8999999 → 7.5G
static_file_transactions_9000000_9499999 → 7.5G
static_file_transactions_9500000_9999999 → 8.2G
static_file_transactions_10000000_10499999 → 11G
static_file_transactions_10500000_10999999 → 14G
static_file_transactions_11000000_11499999 → 13G
static_file_transactions_11500000_11999999 → 14G
static_file_transactions_12000000_12499999 → 16G
static_file_transactions_12500000_12999999 → 16G
static_file_transactions_13000000_13499999 → 18G
static_file_transactions_13500000_13999999 → 18G
static_file_transactions_14000000_14499999 → 19G
static_file_transactions_14500000_14999999 → 20G
static_file_transactions_15000000_15499999 → 21G
static_file_transactions_15500000_15999999 → 22G
static_file_transactions_0_499999.off → 3.9M
static_file_transactions_500000_999999.off → 9.0M
static_file_transactions_1000000_1499999.off → 21M
static_file_transactions_1500000_1999999.off → 29M
static_file_transactions_2000000_2499999.off → 29M
static_file_transactions_2500000_2999999.off → 28M
static_file_transactions_3000000_3499999.off → 41M
static_file_transactions_3500000_3999999.off → 120M
static_file_transactions_4000000_4499999.off → 319M
static_file_transactions_4500000_4999999.off → 548M
static_file_transactions_5000000_5499999.off → 458M
static_file_transactions_5500000_5999999.off → 489M
static_file_transactions_6000000_6499999.off → 383M
static_file_transactions_6500000_6999999.off → 349M
static_file_transactions_7000000_7499999.off → 377M
static_file_transactions_7500000_7999999.off → 455M
static_file_transactions_8000000_8499999.off → 438M
static_file_transactions_8500000_8999999.off → 412M
static_file_transactions_9000000_9499999.off → 382M
static_file_transactions_9500000_9999999.off → 437M
static_file_transactions_10000000_10499999.off → 563M
static_file_transactions_10500000_10999999.off → 668M
static_file_transactions_11000000_11499999.off → 657M
static_file_transactions_11500000_11999999.off → 721M
static_file_transactions_12000000_12499999.off → 820M
static_file_transactions_12500000_12999999.off → 724M
static_file_transactions_13000000_13499999.off → 715M
static_file_transactions_13500000_13999999.off → 757M
static_file_transactions_14000000_14499999.off → 690M
static_file_transactions_14500000_14999999.off → 682M
static_file_transactions_15000000_15499999.off → 691M
static_file_transactions_15500000_15999999.off → 601M

Additional context

No response

@joshieDo joshieDo added C-discussion A discussion about the direction and design of the project C-enhancement New feature or request S-needs-design This issue requires design work to think about how it would best be accomplished labels Dec 6, 2024
Copy link
Contributor

This issue is stale because it has been open for 21 days with no activity.

@github-actions github-actions bot added the S-stale This issue/PR is stale and will close with no further activity label Dec 31, 2024
Copy link
Contributor

github-actions bot commented Jan 7, 2025

This issue was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 7, 2025
@github-project-automation github-project-automation bot moved this from Todo to Done in Reth Tracker Jan 7, 2025
@joshieDo joshieDo reopened this Jan 8, 2025
@github-project-automation github-project-automation bot moved this from Done to In Progress in Reth Tracker Jan 8, 2025
@joshieDo joshieDo added M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity and removed S-stale This issue/PR is stale and will close with no further activity labels Jan 8, 2025
@joshieDo joshieDo added C-tracking-issue An issue that collects information about a broad development initiative and removed S-needs-design This issue requires design work to think about how it would best be accomplished labels Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-discussion A discussion about the direction and design of the project C-enhancement New feature or request C-tracking-issue An issue that collects information about a broad development initiative M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity
Projects
Status: In Progress
Development

No branches or pull requests

1 participant