Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

astrid to wait and issue warnings if heimdall process is not synced #13786

Open
taratorio opened this issue Feb 12, 2025 · 2 comments · May be fixed by #13807
Open

astrid to wait and issue warnings if heimdall process is not synced #13786

taratorio opened this issue Feb 12, 2025 · 2 comments · May be fixed by #13807
Assignees
Labels
imp1 High importance polygon

Comments

@taratorio
Copy link
Member

this is a follow up to #13746 (comment)

prevent support burden for this scenario by detecting that users have a heimdalld process which isn't sync-ed correctly:

  • if erigon is ahead of heimdall make erigon issue warning and wait/poll heimdall until it catches up

can make use of either of the endpoints:

  1. curl http://localhost:1317/status - (has catching_up boolean flag and latest_block_time for the heimdall block time)
  2. curl http://localhost:1317/checkpoints/count
  3. curl http://localhost:1317/milestone/latest
  4. curl http://localhost:1317/bor/latest-span

can either plug these checks and warnings and waits in:

  • existing bridge/Service waitForScraper, heimdall/Service SynchronizeCheckpoints, heimdall/Service SynchronizeSpans, heimdall/Service SynchronizeMilestones
  • or create some new component for this and put it at the beginning of sync/Sync Run
@taratorio
Copy link
Member Author

looks like the catching_up field on curl http://localhost:1317/status isn't reliable - what happened now was we forgot to update our local heimdall to latest version and there was a hard fork - so it stopped syncing and broke erigon (with a trie root mismatch due to missing state sync events)

at that point (around 2025-02-13T10:00) when I run curl http://localhost:1317/status I got:

{"height":"0","result":{"latest_block_hash":"73AAB6EEB5098883B34BA2CF03BAA5AA11E2112CACE6A6E6CDB1F934C0973F22","latest_app_hash":"DCE74AC02FA0F594BFEBC492E332CE33FC5DF5ECC1DBC5B94FEFF911875FCA26","latest_block_height":"22393782","latest_block_time":"2025-02-13T07:58:10.920921412Z","catching_up":false}}

Notice that the latest_block_time is 2025-02-13T07:58:10.920921412Z which is 2 hours old but catching_up is still false - so not really reliable.

So we can use latest_block_time instead of catching_up

@taratorio
Copy link
Member Author

Probably another reliable option would be to use:

  1. curl http://localhost:1317/checkpoints/count
  2. curl http://localhost:1317/milestone/latest
  3. curl http://localhost:1317/bor/latest-span

and compare to what our latest entity id is for these 3 entities (using EntityStore.LastEntityId) - note this will come from snapshots/db thanks to our snapshot entity stores - based on this we will know if erigon is ahead of heimdall in which case it needs to "warn and wait"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imp1 High importance polygon
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants