Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(drand): add null HistoricalBeaconClient for old beacons #12830

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Jan 20, 2025

Fixes: #11802
Ref: #11802 (comment)

I'm pretty sure this will work since we only need to verify, not fetch, old beacon entries. So when we remove Servers from the config (like we've done for Mainnet and Incentinet), this will satisfy drand but also be a noop.

chain/beacon/drand/drand.go Outdated Show resolved Hide resolved
chain/beacon/drand/drand.go Outdated Show resolved Hide resolved
@rjan90
Copy link
Contributor

rjan90 commented Jan 20, 2025

Checked out this branch, and tried to see if I still encountered the #11802 issue.

It got a bit longer then previous:

lotus daemon --import-chain=/home/boost/forest_snapshot_mainnet_2025-01-20_height_4635403.forest.car.zst --halt-after-import
2025-01-20T12:42:18.434+0100	INFO	lotus	lotus/daemon.go:230	lotus repo: /mnt/lotuschain/.lotus
-------
2025-01-20T13:40:45.580+0100	INFO	chainstore	store/store.go:726	clearing block validation cache...
2025-01-20T13:40:53.733+0100	INFO	chainstore	store/store.go:765	696466 block validation entries cleared.
2025-01-20T13:40:53.733+0100	INFO	lotus	lotus/daemon.go:603	setting genesis
2025-01-20T13:40:53.746+0100	INFO	chainstore	store/store.go:685	New heaviest tipset! [bafy2bzacecnamqgqmifpluoeldx7zzglxcljo6oja4vrmtj7432rphldpdmm2] (height=0)
2025-01-20T13:40:53.750+0100	WARN	chainstore	store/store.go:713	no previous heaviest tipset found, using [bafy2bzacecnamqgqmifpluoeldx7zzglxcljo6oja4vrmtj7432rphldpdmm2]
2025-01-20T13:40:53.754+0100	INFO	drand	drand/drand.go:118	drand beacon without pubsub
2025-01-20T13:40:53.756+0100	INFO	drand	drand/drand.go:118	drand beacon without pubsub
2025-01-20T13:40:53.757+0100	INFO	drand	drand/drand.go:118	drand beacon without pubsub
2025-01-20T13:40:53.757+0100	INFO	lotus	lotus/daemon.go:621	validating imported chain...
2025-01-20T13:40:53.757+0100	INFO	drand	client/optimizing.go:194		{"optimizing_client": "endpoint down when speed tested", "client": "{}.(+verifier)", "err": "no historical randomness available", "errVerbose": "no historical randomness available:\n    github.com/filecoin-project/lotus/chain/beacon/drand.HistoricalBeaconClient.Info\n        /home/boost/lotus/chain/beacon/drand/drand.go:274"}
2025-01-20T13:40:53.757+0100	INFO	drand	client/optimizing.go:194		{"optimizing_client": "endpoint down when speed tested", "client": "{}.(+verifier)", "err": "no historical randomness available", "errVerbose": "no historical randomness available:\n    github.com/filecoin-project/lotus/chain/beacon/drand.HistoricalBeaconClient.Info\n        /home/boost/lotus/chain/beacon/drand/drand.go:274"}
2025-01-20T13:45:53.758+0100	INFO	drand	client/optimizing.go:194		{"optimizing_client": "endpoint down when speed tested", "client": "{}.(+verifier)", "err": "no historical randomness available", "errVerbose": "no historical randomness available:\n    github.com/filecoin-project/lotus/chain/beacon/drand.HistoricalBeaconClient.Info\n        /home/boost/lotus/chain/beacon/drand/drand.go:274"}
2025-01-20T13:45:53.758+0100	INFO	drand	client/optimizing.go:194		{"optimizing_client": "endpoint down when speed tested", "client": "{}.(+verifier)", "err": "no historical randomness available", "errVerbose": "no historical randomness available:\n    github.com/filecoin-project/lotus/chain/beacon/drand.HistoricalBeaconClient.Info\n        /home/boost/lotus/chain/beacon/drand/drand.go:274"}
2025-01-20T13:50:53.759+0100	INFO	drand	client/optimizing.go:194		{"optimizing_client": "endpoint down when speed tested", "client": "{}.(+verifier)", "err": "no historical randomness available", "errVerbose": "no historical randomness available:\n    github.com/filecoin-project/lotus/chain/beacon/drand.HistoricalBeaconClient.Info\n        /home/boost/lotus/chain/beacon/drand/drand.go:274"}
2025-01-20T13:50:53.759+0100	INFO	drand	client/optimizing.go:194		{"optimizing_client": "endpoint down when speed tested", "client": "{}.(+verifier)", "err": "no historical randomness available", "errVerbose": "no historical randomness available:\n    github.com/filecoin-project/lotus/chain/beacon/drand.HistoricalBeaconClient.Info\n        /home/boost/lotus/chain/beacon/drand/drand.go:274"}
2025-01-20T13:55:53.759+0100	INFO	drand	client/optimizing.go:194		{"optimizing_client": "endpoint down when speed tested", "client": "{}.(+verifier)", "err": "no historical randomness available", "errVerbose": "no historical randomness available:\n    github.com/filecoin-project/lotus/chain/beacon/drand.HistoricalBeaconClient.Info\n        /home/boost/lotus/chain/beacon/drand/drand.go:274"}
2025-01-20T13:55:53.759+0100	INFO	drand	client/optimizing.go:194		{"optimizing_client": "endpoint down when speed tested", "client": "{}.(+verifier)", "err": "no historical randomness available", "errVerbose": "no historical randomness available:\n    github.com/filecoin-project/lotus/chain/beacon/drand.HistoricalBeaconClient.Info\n        /home/boost/lotus/chain/beacon/drand/drand.go:274"}

It then looped on the endpoint down when speed tested / no historical randomness available for quite some time before:

2025-01-20T13:58:20.878+0100	INFO	statemgr	stmgr/stmgr.go:487	computing state (height: 0, ts=[bafy2bzacecnamqgqmifpluoeldx7zzglxcljo6oja4vrmtj7432rphldpdmm2])
2025-01-20T13:58:20.886+0100	INFO	statemgr	stmgr/stmgr.go:487	computing state (height: 1, ts=[bafy2bzacechdx6xd62lcyy7rnyc4uxcxhuwqslcxfvj77fxlwafij3nhzchpy])
2025-01-20T13:58:20.887+0100	WARN	chainstore	store/store.go:670	reorgWorker quit
2025-01-20T13:58:22.259+0100	INFO	badgerbs	[email protected]/db.go:1027	Storing value log head: {Fid:129 Len:33 Offset:576605263}
2025-01-20T13:58:23.216+0100	INFO	badgerbs	[email protected]/levels.go:1000	[Compactor: 173] Running compaction: {level:0 score:1.73 dropPrefixes:[]} for level: 0
2025-01-20T13:58:30.541+0100	INFO	badgerbs	[email protected]/levels.go:962	LOG Compact 0->1, del 8 tables, add 7 tables, took 7.324947987s
2025-01-20T13:58:30.541+0100	INFO	badgerbs	[email protected]/levels.go:1010	[Compactor: 173] Compaction for level: 0 DONE
2025-01-20T13:58:30.541+0100	INFO	badgerbs	[email protected]/db.go:550	Force compaction on level 0 done
ERROR: chain validation failed: getting block messages for tipset: failed to get messages for block: failed to load msgmeta (bafy2bzacecmwp4imjqhdg2zvc7j2s4xxahnn5jnudtrt335re24i4zim7ccfi): ipld: could not find bafy2bzacecmwp4imjqhdg2zvc7j2s4xxahnn5jnudtrt335re24i4zim7ccfi

@rvagg
Copy link
Member Author

rvagg commented Jan 20, 2025

OK, well I don't think the drand errors were your fatal problem here it's just looping in the background and logging those entries. But unfortunately we're stuck with the optimizing client: https://github.com/drand/drand/blob/v1.5.11/client/optimizing.go#L194, so perhaps we can't take this approach at all and need it fixed inside drand otherwise it's going to continue to speed test the null-client and spam those info log messages.

I think your problem here is that you can't use a snapshot file when trying to import the full "chain". At least that's my understanding from reading the code - the main difference between importing a snapshot and importing the chain is that a chain import does a full tipset validation back to genesis, which is why it wants the beacons to be available. It calls ValidateChain, which --import-snapshot won't do, and this walks back validating each tipset, executing all the messages; so it needs those messages too! I guess --import-chain is the trustless version of starting from scratch, you don't need state because it'll build it for you from genesis. But --import-snapshot will give you the historical tipsets but not messages and it'll start you from a trusted snapshot of the state, which you're just going to have to accept without being able to validate it.

I suppose a chain export is a lotus chain export without --recent-stateroots or --skip-old-msgs, while a snapshot is with both of those. Do we happen to make those available anywhere?

@rvagg
Copy link
Member Author

rvagg commented Jan 20, 2025

Updated this branch with a modification that attempts to use the "watcher" mechanism in drand because watchers don't get speed tested, so if I can convince it that my historical null-client is a watcher then maybe it'll not speed test it. I haven't tested this but @rjan90 if you still have that snapshot handy, would you mind running it again to see if you get any of those INFO logs or any other errors from drand?

@rjan90
Copy link
Contributor

rjan90 commented Jan 21, 2025

I haven't tested this but @rjan90 if you still have that snapshot handy, would you mind running it again to see if you get any of those INFO logs or any other errors from drand?

Reran with the latest commit, and this is what I get:

lotus daemon --import-chain=/home/boost/forest_snapshot_mainnet_2025-01-20_height_4635403.forest.car.zst --halt-after-import
2025-01-21T09:58:59.917+0100	INFO	lotus	lotus/daemon.go:230	lotus repo: /mnt/lotuschain/.lotus
-------
2025-01-21T10:56:07.051+0100	INFO	chainstore	store/store.go:726	clearing block validation cache...
2025-01-21T10:56:07.093+0100	INFO	chainstore	store/store.go:765	0 block validation entries cleared.
2025-01-21T10:56:07.093+0100	INFO	lotus	lotus/daemon.go:603	setting genesis
2025-01-21T10:56:07.097+0100	INFO	chainstore	store/store.go:685	New heaviest tipset! [bafy2bzacecnamqgqmifpluoeldx7zzglxcljo6oja4vrmtj7432rphldpdmm2] (height=0)
2025-01-21T10:56:07.100+0100	WARN	chainstore	store/store.go:713	no previous heaviest tipset found, using [bafy2bzacecnamqgqmifpluoeldx7zzglxcljo6oja4vrmtj7432rphldpdmm2]
2025-01-21T10:56:07.105+0100	INFO	drand	drand/drand.go:114	drand beacon without pubsub
2025-01-21T10:56:07.107+0100	INFO	drand	drand/drand.go:114	drand beacon without pubsub
2025-01-21T10:56:07.108+0100	INFO	drand	client/optimizing.go:194		{"optimizing_client": "endpoint down when speed tested", "client": "&{}.(+verifier)", "err": "no historical randomness available", "errVerbose": "no historical randomness available:\n    github.com/filecoin-project/lotus/chain/beacon/drand.historicalBeaconClient.Get\n        /home/boost/lotus/chain/beacon/drand/drand.go:274"}
2025-01-21T10:56:07.108+0100	INFO	drand	client/optimizing.go:194		{"optimizing_client": "endpoint down when speed tested", "client": "&{}.(+verifier)", "err": "no historical randomness available", "errVerbose": "no historical randomness available:\n    github.com/filecoin-project/lotus/chain/beacon/drand.historicalBeaconClient.Get\n        /home/boost/lotus/chain/beacon/drand/drand.go:274"}
2025-01-21T10:56:07.110+0100	INFO	drand	drand/drand.go:114	drand beacon without pubsub
2025-01-21T10:56:07.110+0100	INFO	lotus	lotus/daemon.go:621	validating imported chain...
2025-01-21T11:01:07.109+0100	INFO	drand	client/optimizing.go:194		{"optimizing_client": "endpoint down when speed tested", "client": "&{}.(+verifier)", "err": "no historical randomness available", "errVerbose": "no historical randomness available:\n    github.com/filecoin-project/lotus/chain/beacon/drand.historicalBeaconClient.Get\n        /home/boost/lotus/chain/beacon/drand/drand.go:274"}
2025-01-21T11:01:07.109+0100	INFO	drand	client/optimizing.go:194		{"optimizing_client": "endpoint down when speed tested", "client": "&{}.(+verifier)", "err": "no historical randomness available", "errVerbose": "no historical randomness available:\n    github.com/filecoin-project/lotus/chain/beacon/drand.historicalBeaconClient.Get\n        /home/boost/lotus/chain/beacon/drand/drand.go:274"}
2025-01-21T11:06:07.110+0100	INFO	drand	client/optimizing.go:194		{"optimizing_client": "endpoint down when speed tested", "client": "&{}.(+verifier)", "err": "no historical randomness available", "errVerbose": "no historical randomness available:\n    github.com/filecoin-project/lotus/chain/beacon/drand.historicalBeaconClient.Get\n        /home/boost/lotus/chain/beacon/drand/drand.go:274"}
2025-01-21T11:06:07.110+0100	INFO	drand	client/optimizing.go:194		{"optimizing_client": "endpoint down when speed tested", "client": "&{}.(+verifier)", "err": "no historical randomness available", "errVerbose": "no historical randomness available:\n    github.com/filecoin-project/lotus/chain/beacon/drand.historicalBeaconClient.Get\n        /home/boost/lotus/chain/beacon/drand/drand.go:274"}
2025-01-21T11:11:07.111+0100	INFO	drand	client/optimizing.go:194		{"optimizing_client": "endpoint down when speed tested", "client": "&{}.(+verifier)", "err": "no historical randomness available", "errVerbose": "no historical randomness available:\n    github.com/filecoin-project/lotus/chain/beacon/drand.historicalBeaconClient.Get\n        /home/boost/lotus/chain/beacon/drand/drand.go:274"}
2025-01-21T11:11:07.111+0100	INFO	drand	client/optimizing.go:194		{"optimizing_client": "endpoint down when speed tested", "client": "&{}.(+verifier)", "err": "no historical randomness available", "errVerbose": "no historical randomness available:\n    github.com/filecoin-project/lotus/chain/beacon/drand.historicalBeaconClient.Get\n        /home/boost/lotus/chain/beacon/drand/drand.go:274"}
------
2025-01-21T11:14:55.303+0100	INFO	statemgr	stmgr/stmgr.go:487	computing state (height: 0, ts=[bafy2bzacecnamqgqmifpluoeldx7zzglxcljo6oja4vrmtj7432rphldpdmm2])
2025-01-21T11:14:55.312+0100	INFO	statemgr	stmgr/stmgr.go:487	computing state (height: 1, ts=[bafy2bzacechdx6xd62lcyy7rnyc4uxcxhuwqslcxfvj77fxlwafij3nhzchpy])
2025-01-21T11:14:55.315+0100	WARN	chainstore	store/store.go:670	reorgWorker quit
2025-01-21T11:14:56.225+0100	INFO	badgerbs	[email protected]/db.go:1027	Storing value log head: {Fid:129 Len:33 Offset:578835815}

2025-01-21T11:14:57.434+0100	INFO	badgerbs	[email protected]/levels.go:1000	[Compactor: 173] Running compaction: {level:0 score:1.73 dropPrefixes:[]} for level: 0

2025-01-21T11:15:01.880+0100	INFO	badgerbs	[email protected]/levels.go:962	LOG Compact 0->1, del 9 tables, add 8 tables, took 4.446465215s

2025-01-21T11:15:01.881+0100	INFO	badgerbs	[email protected]/levels.go:1010	[Compactor: 173] Compaction for level: 0 DONE
2025-01-21T11:15:01.881+0100	INFO	badgerbs	[email protected]/db.go:550	Force compaction on level 0 done
ERROR: chain validation failed: getting block messages for tipset: failed to get messages for block: failed to load msgmeta (bafy2bzacecmwp4imjqhdg2zvc7j2s4xxahnn5jnudtrt335re24i4zim7ccfi): ipld: could not find bafy2bzacecmwp4imjqhdg2zvc7j2s4xxahnn5jnudtrt335re24i4zim7ccfi

@rvagg rvagg force-pushed the rvagg/historical-beacon-client branch from e68c3e7 to cf963a5 Compare January 21, 2025 10:20
@rvagg rvagg force-pushed the rvagg/historical-beacon-client branch from cf963a5 to 30ca496 Compare January 21, 2025 10:52
@rvagg
Copy link
Member Author

rvagg commented Jan 21, 2025

OK, one tiny change and this seems to work; I grabbed my own snapshot and have been running it and the only logs are the badger ones so I believe this is solved, even if it is a hack.

I'd still like @AnomalRoil to take a look and suggest alternative approaches.

Marking @Kubuxu as reviewer since he's touched a lot of this stuff previously.

@rvagg rvagg marked this pull request as ready for review January 21, 2025 10:53
@rvagg rvagg requested a review from Kubuxu January 21, 2025 10:53
@rvagg rvagg added the skip/changelog This change does not require CHANGELOG.md update label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip/changelog This change does not require CHANGELOG.md update
Projects
Status: 🔎 Awaiting review
Development

Successfully merging this pull request may close these issues.

Lotus daemon --import-chain errors out
2 participants