fix(damgr): failover altda->ethda should keep finalizing l2 chain #23

samlaf · 2025-01-29T04:11:54Z

Description

When altda->ethda failover happens, the op-node's L2 finalized head completely stalls.

This PR consists of 2 commits:

test to show the behavior (fails on this commit)
fix damgr so that finalizedHead can keep advancing even after failover

Tests

First commit contains test that shows the buggy behavior. To run:

git checkout <COMMIT>
go test -run ^TestAltDA_FinalizationAfterEthDAFailover$ github.com/layr-labs/optimism/op-e2e/actions/altda

Fix

The problem with the damgr as it is is that

d.finalizedHead = d.state.lastPrunedCommitment

is always run no matter what. This means that finalization is always driven by altda commitments, but after failover, damgr stops seeing the commitments (because they arent altda commitments anymore). The fix in the second commit is to let the finalization be driven by L1 finalization when there are no altda commitments managed by the damgr.

NOTE: I don't fully understand the subtleties of the damgr and derivation pipeline interactions, so might be doing something wrong here... but seems sound to me, especially since it passes the test.

Additional Context
The test is a bit ugly currently (see TODO comment below). Someone with better understanding of op-node and event processing order would surely be able to help me clean this up. I use this patch for debugging event processing, in case this is of use to anyone:

diff --git a/op-e2e/actions/altda/altda_test.go b/op-e2e/actions/altda/altda_test.go
index 9d60f2fee..c6ef361d7 100644
--- a/op-e2e/actions/altda/altda_test.go
+++ b/op-e2e/actions/altda/altda_test.go
@@ -194,7 +194,7 @@ func (a *L2AltDA) ActNewL2Tx(t helpers.Testing) {
 //
 // 17 makes sense because challengeWindow=16 and we create 1 extra block before that,
 // and 204 L2blocks = 17 L1blocks * 12 L2blocks/L1block (L1blocktime=12s, L2blocktime=1s)
-func (a *L2AltDA) ActNewL2TxFinalized(t helpers.Testing) {
+func (a *L2AltDA) ActNewL2TxFinalized(t helpers.Testing, logEvents ...bool) {
 	// Include a new l2 batcher transaction, submitting an input commitment to the l1.
 	a.ActNewL2Tx(t)
 	// Create ChallengeWindow empty blocks so the above batcher blocks can finalize (can't be challenged anymore)
@@ -203,7 +203,20 @@ func (a *L2AltDA) ActNewL2TxFinalized(t helpers.Testing) {
 	// TODO: understand why we need to drain the pipeline before AND after actL1Finalized
 	a.sequencer.ActL2PipelineFull(t)
 	a.ActL1Finalized(t)
-	a.sequencer.ActL2PipelineFull(t)
+	if logEvents == nil {
+		a.sequencer.ActL2PipelineFull(t)
+	} else {
+		// Log all events until the end of the pipeline
+		count := 0
+		a.sequencer.ActL2EventsUntil(t, func(ev event.Event) bool {
+			count++
+			a.log.Info("new event", "event", ev, "count", count)
+			if count == 100 {
+				return true
+			}
+			return false
+		}, 100, false)
+	}
 
 	// Uncomment the below code to observe the behavior described in the TODO above
 	// syncStatus := a.sequencer.SyncStatus()
@@ -680,5 +693,12 @@ func TestAltDA_FinalizationAfterEthDAFailover(gt *testing.T) {
 		ssAfter := harness.sequencer.SyncStatus()
 		// Even after failover, the finalized head should continue advancing normally
 		require.Equal(t, ssBefore.FinalizedL2.Number+diffL2Blocks, ssAfter.FinalizedL2.Number)
+
+		harness.log.Info("Sync status before",
+			"unsafeL1", ssBefore.HeadL1.Number, "safeL1", ssBefore.SafeL1.Number, "finalizedL1", ssBefore.FinalizedL1.Number,
+			"unsafeL2", ssBefore.UnsafeL2.Number, "safeL2", ssBefore.SafeL2.Number, "finalizedL2", ssBefore.FinalizedL2.Number)
+		harness.log.Info("Sync status after",
+			"unsafeL1", ssAfter.HeadL1.Number, "safeL1", ssAfter.SafeL1.Number, "finalizedL1", ssAfter.FinalizedL1.Number,
+			"unsafeL2", ssAfter.UnsafeL2.Number, "safeL2", ssAfter.SafeL2.Number, "finalizedL2", ssAfter.FinalizedL2.Number)
 	}
 }

… after failover to ethda Currently it does not, as shown by the test TestAltDA_FinalizationAfterEthDAFailover failing

Weiwei from Polymer found this bug. He proposed a solution. This is an alternative solution which seems simpler, but not 100% of its soundness.

samlaf · 2025-01-29T22:42:30Z

Tests are currently failing, but waiting for discussions in ethereum-optimism#14050 to get resolved. That PR will fix these tests and then we can just rebase on top of them.

samlaf added 4 commits January 28, 2025 23:05

test(altda): add a test to make sure altda node keeps finalizing even…

f28ab20

… after failover to ethda Currently it does not, as shown by the test TestAltDA_FinalizationAfterEthDAFailover failing

fix(damgr): ethda failover finalization stall bug

eb447fc

Weiwei from Polymer found this bug. He proposed a solution. This is an alternative solution which seems simpler, but not 100% of its soundness.

fix: damgr_test doesn't compile

2641566

ci: add op-alt-da tests to ci

a339f0d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(damgr): failover altda->ethda should keep finalizing l2 chain #23

fix(damgr): failover altda->ethda should keep finalizing l2 chain #23

samlaf commented Jan 29, 2025 •

edited

Loading

samlaf commented Jan 29, 2025

fix(damgr): failover altda->ethda should keep finalizing l2 chain #23

Are you sure you want to change the base?

fix(damgr): failover altda->ethda should keep finalizing l2 chain #23

Conversation

samlaf commented Jan 29, 2025 • edited Loading

samlaf commented Jan 29, 2025

samlaf commented Jan 29, 2025 •

edited

Loading