Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Live Snapshot without shutting down node #2816

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions cmd/nitro/nitro.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import (
"path/filepath"
"reflect"
"strings"
"sync"
"syscall"
"time"

Expand All @@ -36,6 +37,7 @@ import (
_ "github.com/ethereum/go-ethereum/eth/tracers/js"
_ "github.com/ethereum/go-ethereum/eth/tracers/native"
"github.com/ethereum/go-ethereum/ethclient"
"github.com/ethereum/go-ethereum/ethdb"
"github.com/ethereum/go-ethereum/graphql"
"github.com/ethereum/go-ethereum/log"
"github.com/ethereum/go-ethereum/metrics"
Expand Down Expand Up @@ -641,6 +643,11 @@ func mainImpl() int {
deferFuncs = []func(){func() { currentNode.StopAndWait() }}
}

// Live db snapshot creation is only supported on archive nodes
if nodeConfig.Execution.Caching.Archive {
go liveDBSnapshotter(ctx, chainDb, arbDb, execNode.ExecEngine.CreateBlocksMutex(), func() string { return liveNodeConfig.Get().SnapshotDir })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that:

  • we should use stopwaiter pattern for the liveDBSnapshotter
  • it might be nice to have a config option to disable (not start) the snapshotter, eg. if we are running sequencer, to be extra safe
  • we should be able to support also full nodes (non archive), I am describing it more in the comment for liveDBSnapshotter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want a config option to enable the snapshotter, off by default.

}

sigint := make(chan os.Signal, 1)
signal.Notify(sigint, os.Interrupt, syscall.SIGTERM)

Expand Down Expand Up @@ -674,6 +681,43 @@ func mainImpl() int {
return 0
}

func liveDBSnapshotter(ctx context.Context, chainDb, arbDb ethdb.Database, createBlocksMutex *sync.Mutex, snapshotDirGetter func() string) {
sigusr2 := make(chan os.Signal, 1)
signal.Notify(sigusr2, syscall.SIGUSR2)

for {
select {
case <-ctx.Done():
return
case <-sigusr2:
log.Info("Live databases snapshot creation triggered by SIGUSR2")
}

snapshotDir := snapshotDirGetter()
if snapshotDir == "" {
log.Error("Aborting live databases snapshot creation as destination directory is empty, try updating --snapshot-dir in the config file")
continue
}

createBlocksMutex.Lock()
Copy link
Contributor

@magicxyyz magicxyyz Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we could rearrange order of things a bit and patch geth a bit we could also support a non archive node (what I believe would be main use case for the db snapshoting).

  1. Instead of triggering the snapshot here, we could schedule a snapshot after next block is created. We could call e.g. execNode.ScheduleDBSnapshot, so we wouldn't need to have access to createBlockMutex or any internals of ExecutionNode (that probably will be especially important for execution split).
  2. In ExecutionEngine.appendBlock if a snapshot was scheduled, we could trigger the snapshot after s.bc.WriteBlockAndSetHeadWithTime.

To support full nodes (non archive) we need to make sure that the state for the block written with WriteBlockAndSetHeadWithTime is committed to disk. To do that we need to force commit the state. It could be done e.g. with a ForceTriedbCommitHook hook that I added in snap sync draft: https://github.com/OffchainLabs/go-ethereum/pull/280/files#diff-53d5f4b8a536ec2a8c8c92bf70b8268f1d77ad77e9f316e6f68a2bcae5303215

The hook would be set to a function created in gethexec scope and that would have access to ExecutionEngine, something like:

hook := func() bool {
    return execEngine.shouldForceCommitState()
}

func (e *ExecutionEngine) shouldForceCommitState() {
    return e.forceCommitState
}

func (e *ExecutionEngine) ScheduleDBSnapshot() {
    e.dbSnapshotScheduled.Store(true)
}

func (e *ExecutionEngine) appendBlock() error {
...
    snapshotScheduled := e.dbSnapshotScheduled.Load()
    if  snapshotScheduled {
        e.forceCommitState = true
    }
    status, err := s.bc.WriteBlockAndSetHeadWithTime(...)
    if err != nil {
        return err
    }
    ...
    if snapshotScheduled {
         e.forceCommitState = false
         chainDb.CreateDBSnapshot(snapshotDir)
    } 
...
}

That setting of the hook can be done similarly as in SnapHelper PR draft: https://github.com/OffchainLabs/nitro/pull/2122/files#diff-19d6494fe5ff01c95bfdd1e4af6d31d75207d21743af80f57f0cf93848a32e3e

Having written that, I am no longer sure if that's that straightforward as I thought when starting this comment 😓 but should be doable :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to go that way, I can split out simplified ForceTriedbCommitHook from my draft PRs, so it can be merged in earlier and used here

log.Info("Beginning snapshot creation for l2chaindata, ancient and wasm databases")
err := chainDb.CreateDBSnapshot(snapshotDir)
createBlocksMutex.Unlock()
if err != nil {
log.Error("Snapshot creation for l2chaindata, ancient and wasm databases failed", "err", err)
continue
}
log.Info("Live snapshot of l2chaindata, ancient and wasm databases were successfully created")

log.Info("Beginning snapshot creation for arbitrumdata database")
if err := arbDb.CreateDBSnapshot(snapshotDir); err != nil {
log.Error("Snapshot creation for arbitrumdata database failed", "err", err)
} else {
log.Info("Live snapshot of arbitrumdata database was successfully created")
}
}
}

type NodeConfig struct {
Conf genericconf.ConfConfig `koanf:"conf" reload:"hot"`
Node arbnode.Config `koanf:"node" reload:"hot"`
Expand All @@ -697,6 +741,7 @@ type NodeConfig struct {
Init conf.InitConfig `koanf:"init"`
Rpc genericconf.RpcConfig `koanf:"rpc"`
BlocksReExecutor blocksreexecutor.Config `koanf:"blocks-reexecutor"`
SnapshotDir string `koanf:"snapshot-dir" reload:"hot"`
}

var NodeConfigDefault = NodeConfig{
Expand All @@ -722,6 +767,7 @@ var NodeConfigDefault = NodeConfig{
PProf: false,
PprofCfg: genericconf.PProfDefault,
BlocksReExecutor: blocksreexecutor.DefaultConfig,
SnapshotDir: "",
}

func NodeConfigAddOptions(f *flag.FlagSet) {
Expand All @@ -748,6 +794,8 @@ func NodeConfigAddOptions(f *flag.FlagSet) {
conf.InitConfigAddOptions("init", f)
genericconf.RpcConfigAddOptions("rpc", f)
blocksreexecutor.ConfigAddOptions("blocks-reexecutor", f)

f.String("snapshot-dir", NodeConfigDefault.SnapshotDir, "directory in which snapshot of databases would be stored")
}

func (c *NodeConfig) ResolveDirectoryNames() error {
Expand Down
4 changes: 4 additions & 0 deletions cmd/replay/db.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ import (

type PreimageDb struct{}

func (db PreimageDb) CreateDBSnapshot(dir string) error {
return errors.New("createDBSnapshot method is not supported by PreimageDb")
}

func (db PreimageDb) Has(key []byte) (bool, error) {
if len(key) != 32 {
return false, nil
Expand Down
13 changes: 11 additions & 2 deletions execution/gethexec/api.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import (
"math/big"
"sync"
"sync/atomic"
"syscall"
"time"

"github.com/ethereum/go-ethereum/arbitrum"
Expand Down Expand Up @@ -40,10 +41,18 @@ type ArbDebugAPI struct {
blockchain *core.BlockChain
blockRangeBound uint64
timeoutQueueBound uint64
isArchiveNode bool
}

func NewArbDebugAPI(blockchain *core.BlockChain, blockRangeBound uint64, timeoutQueueBound uint64) *ArbDebugAPI {
return &ArbDebugAPI{blockchain, blockRangeBound, timeoutQueueBound}
func NewArbDebugAPI(blockchain *core.BlockChain, blockRangeBound uint64, timeoutQueueBound uint64, isArchiveNode bool) *ArbDebugAPI {
return &ArbDebugAPI{blockchain, blockRangeBound, timeoutQueueBound, isArchiveNode}
}

func (api *ArbDebugAPI) CreateDBSnapshot(ctx context.Context) error {
if !api.isArchiveNode {
return errors.New("live database snapshot creation is not available for non-archive nodes")
}
return syscall.Kill(syscall.Getpid(), syscall.SIGUSR2)
}

type PricingModelHistory struct {
Expand Down
4 changes: 4 additions & 0 deletions execution/gethexec/executionengine.go
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,10 @@ func NewExecutionEngine(bc *core.BlockChain) (*ExecutionEngine, error) {
}, nil
}

func (s *ExecutionEngine) CreateBlocksMutex() *sync.Mutex {
return &s.createBlocksMutex
}

func (s *ExecutionEngine) backlogCallDataUnits() uint64 {
s.cachedL1PriceData.mutex.RLock()
defer s.cachedL1PriceData.mutex.RUnlock()
Expand Down
1 change: 1 addition & 0 deletions execution/gethexec/node.go
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,7 @@ func CreateExecutionNode(
l2BlockChain,
config.RPC.ArbDebug.BlockRangeBound,
config.RPC.ArbDebug.TimeoutQueueBound,
config.Caching.Archive,
),
Public: false,
})
Expand Down
Loading