Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(damgr): failover altda->ethda should keep finalizing l2 chain #23

Open
wants to merge 4 commits into
base: eigenda-develop
Choose a base branch
from

Conversation

samlaf
Copy link
Collaborator

@samlaf samlaf commented Jan 29, 2025

Description

When altda->ethda failover happens, the op-node's L2 finalized head completely stalls.

This PR consists of 2 commits:

  1. test to show the behavior (fails on this commit)
  2. fix damgr so that finalizedHead can keep advancing even after failover

Tests

First commit contains test that shows the buggy behavior. To run:

git checkout <COMMIT>
go test -run ^TestAltDA_FinalizationAfterEthDAFailover$ github.com/layr-labs/optimism/op-e2e/actions/altda

Fix

The problem with the damgr as it is is that

d.finalizedHead = d.state.lastPrunedCommitment

is always run no matter what. This means that finalization is always driven by altda commitments, but after failover, damgr stops seeing the commitments (because they arent altda commitments anymore). The fix in the second commit is to let the finalization be driven by L1 finalization when there are no altda commitments managed by the damgr.

NOTE: I don't fully understand the subtleties of the damgr and derivation pipeline interactions, so might be doing something wrong here... but seems sound to me, especially since it passes the test.

Additional Context
The test is a bit ugly currently (see TODO comment below). Someone with better understanding of op-node and event processing order would surely be able to help me clean this up. I use this patch for debugging event processing, in case this is of use to anyone:

diff --git a/op-e2e/actions/altda/altda_test.go b/op-e2e/actions/altda/altda_test.go
index 9d60f2fee..c6ef361d7 100644
--- a/op-e2e/actions/altda/altda_test.go
+++ b/op-e2e/actions/altda/altda_test.go
@@ -194,7 +194,7 @@ func (a *L2AltDA) ActNewL2Tx(t helpers.Testing) {
 //
 // 17 makes sense because challengeWindow=16 and we create 1 extra block before that,
 // and 204 L2blocks = 17 L1blocks * 12 L2blocks/L1block (L1blocktime=12s, L2blocktime=1s)
-func (a *L2AltDA) ActNewL2TxFinalized(t helpers.Testing) {
+func (a *L2AltDA) ActNewL2TxFinalized(t helpers.Testing, logEvents ...bool) {
 	// Include a new l2 batcher transaction, submitting an input commitment to the l1.
 	a.ActNewL2Tx(t)
 	// Create ChallengeWindow empty blocks so the above batcher blocks can finalize (can't be challenged anymore)
@@ -203,7 +203,20 @@ func (a *L2AltDA) ActNewL2TxFinalized(t helpers.Testing) {
 	// TODO: understand why we need to drain the pipeline before AND after actL1Finalized
 	a.sequencer.ActL2PipelineFull(t)
 	a.ActL1Finalized(t)
-	a.sequencer.ActL2PipelineFull(t)
+	if logEvents == nil {
+		a.sequencer.ActL2PipelineFull(t)
+	} else {
+		// Log all events until the end of the pipeline
+		count := 0
+		a.sequencer.ActL2EventsUntil(t, func(ev event.Event) bool {
+			count++
+			a.log.Info("new event", "event", ev, "count", count)
+			if count == 100 {
+				return true
+			}
+			return false
+		}, 100, false)
+	}
 
 	// Uncomment the below code to observe the behavior described in the TODO above
 	// syncStatus := a.sequencer.SyncStatus()
@@ -680,5 +693,12 @@ func TestAltDA_FinalizationAfterEthDAFailover(gt *testing.T) {
 		ssAfter := harness.sequencer.SyncStatus()
 		// Even after failover, the finalized head should continue advancing normally
 		require.Equal(t, ssBefore.FinalizedL2.Number+diffL2Blocks, ssAfter.FinalizedL2.Number)
+
+		harness.log.Info("Sync status before",
+			"unsafeL1", ssBefore.HeadL1.Number, "safeL1", ssBefore.SafeL1.Number, "finalizedL1", ssBefore.FinalizedL1.Number,
+			"unsafeL2", ssBefore.UnsafeL2.Number, "safeL2", ssBefore.SafeL2.Number, "finalizedL2", ssBefore.FinalizedL2.Number)
+		harness.log.Info("Sync status after",
+			"unsafeL1", ssAfter.HeadL1.Number, "safeL1", ssAfter.SafeL1.Number, "finalizedL1", ssAfter.FinalizedL1.Number,
+			"unsafeL2", ssAfter.UnsafeL2.Number, "safeL2", ssAfter.SafeL2.Number, "finalizedL2", ssAfter.FinalizedL2.Number)
 	}
 }

… after failover to ethda

Currently it does not, as shown by the test TestAltDA_FinalizationAfterEthDAFailover failing
Weiwei from Polymer found this bug. He proposed a solution. This is an alternative solution which seems simpler, but not 100% of its soundness.
@samlaf
Copy link
Collaborator Author

samlaf commented Jan 29, 2025

Tests are currently failing, but waiting for discussions in ethereum-optimism#14050 to get resolved. That PR will fix these tests and then we can just rebase on top of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant