Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Live Snapshot without shutting down node #2816

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ganeshvanahalli
Copy link
Contributor

@ganeshvanahalli ganeshvanahalli commented Dec 3, 2024

This PR enables creation of live snapshot of databases (arbitrumdata, l2chaindata, ancient and wasm) without having to shutdown the node.

Supply the destination directory for the databases using the reloadable config option --snapshot-dir=<pathToDestDir> and one can trigger snapshot generation by invoking an arbdebug rpc command arbdebug_createDBSnapshot

Note: This feature is only available for archive nodes running on pebble databases
Sample usage-

start node => ./target/bin/nitro --conf.file=<pathToConfigWithSnapshotDir>
trigger snapshot creation => curl -X POST 127.0.0.1:8547 -H "Content-Type: application/json" --data '{"jsonrpc": "2.0","method": "arbdebug_createDBSnapshot","params": [],"id": 1}'

Testing

Triggered snapshot creation and successfully reused it run a new node in multiple scenario- small and large db sizes for arb1 and arb-sepolia nodes.

Pulls in geth PR- OffchainLabs/go-ethereum#380
Resolves NIT-2658

@cla-bot cla-bot bot added the s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA. label Dec 3, 2024
@@ -641,6 +643,11 @@ func mainImpl() int {
deferFuncs = []func(){func() { currentNode.StopAndWait() }}
}

// Live db snapshot creation is only supported on archive nodes
if nodeConfig.Execution.Caching.Archive {
go liveDBSnapshotter(ctx, chainDb, arbDb, execNode.ExecEngine.CreateBlocksMutex(), func() string { return liveNodeConfig.Get().SnapshotDir })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that:

  • we should use stopwaiter pattern for the liveDBSnapshotter
  • it might be nice to have a config option to disable (not start) the snapshotter, eg. if we are running sequencer, to be extra safe
  • we should be able to support also full nodes (non archive), I am describing it more in the comment for liveDBSnapshotter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want a config option to enable the snapshotter, off by default.

continue
}

createBlocksMutex.Lock()
Copy link
Contributor

@magicxyyz magicxyyz Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we could rearrange order of things a bit and patch geth a bit we could also support a non archive node (what I believe would be main use case for the db snapshoting).

  1. Instead of triggering the snapshot here, we could schedule a snapshot after next block is created. We could call e.g. execNode.ScheduleDBSnapshot, so we wouldn't need to have access to createBlockMutex or any internals of ExecutionNode (that probably will be especially important for execution split).
  2. In ExecutionEngine.appendBlock if a snapshot was scheduled, we could trigger the snapshot after s.bc.WriteBlockAndSetHeadWithTime.

To support full nodes (non archive) we need to make sure that the state for the block written with WriteBlockAndSetHeadWithTime is committed to disk. To do that we need to force commit the state. It could be done e.g. with a ForceTriedbCommitHook hook that I added in snap sync draft: https://github.com/OffchainLabs/go-ethereum/pull/280/files#diff-53d5f4b8a536ec2a8c8c92bf70b8268f1d77ad77e9f316e6f68a2bcae5303215

The hook would be set to a function created in gethexec scope and that would have access to ExecutionEngine, something like:

hook := func() bool {
    return execEngine.shouldForceCommitState()
}

func (e *ExecutionEngine) shouldForceCommitState() {
    return e.forceCommitState
}

func (e *ExecutionEngine) ScheduleDBSnapshot() {
    e.dbSnapshotScheduled.Store(true)
}

func (e *ExecutionEngine) appendBlock() error {
...
    snapshotScheduled := e.dbSnapshotScheduled.Load()
    if  snapshotScheduled {
        e.forceCommitState = true
    }
    status, err := s.bc.WriteBlockAndSetHeadWithTime(...)
    if err != nil {
        return err
    }
    ...
    if snapshotScheduled {
         e.forceCommitState = false
         chainDb.CreateDBSnapshot(snapshotDir)
    } 
...
}

That setting of the hook can be done similarly as in SnapHelper PR draft: https://github.com/OffchainLabs/nitro/pull/2122/files#diff-19d6494fe5ff01c95bfdd1e4af6d31d75207d21743af80f57f0cf93848a32e3e

Having written that, I am no longer sure if that's that straightforward as I thought when starting this comment 😓 but should be doable :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to go that way, I can split out simplified ForceTriedbCommitHook from my draft PRs, so it can be merged in earlier and used here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants