Skip to content

Commit

Permalink
Exit NGF container when leader lease is lost (#1130)
Browse files Browse the repository at this point in the history
Problem: If NGF Pod loses connection to the API server and cannot renew the leader election lease, it stops leading and cannot become the leader again. After it stops leading, it will not report any statuses until it is restarted.

Solution: Update the leader elector Start method to return an error if the leader lease is lost. This will cause the controller-runtime manager to exit, and the NGF container will restart. The new container will then attempt to become the leader again. This aligns with how the controller-runtime library handles losing a leader lease.
  • Loading branch information
kate-osborn authored Oct 12, 2023
1 parent 7820c2b commit 4f40fca
Showing 1 changed file with 14 additions and 2 deletions.
16 changes: 14 additions & 2 deletions internal/mode/static/leader.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package static

import (
"context"
"errors"
"fmt"
"time"

Expand Down Expand Up @@ -43,10 +44,21 @@ type leaderElectorRunnable struct {
le *leaderelection.LeaderElector
}

// Start runs the leaderelection.LeaderElector and blocks until the context is canceled or Run returns.
// Start runs the leaderelection.LeaderElector and blocks until the context is canceled or the leader lease is lost.
// If the leader lease is lost, Start returns an error, and the controller-runtime manager will exit, causing the Pod
// to restart. This is necessary otherwise components that need leader election might continue to run after the leader
// lease was lost.
func (l *leaderElectorRunnable) Start(ctx context.Context) error {
l.le.Run(ctx)
return nil

// Run exits if the context is canceled or the leader lease is lost. We only want to return an error if the
// context is not canceled.
select {
case <-ctx.Done():
return nil
default:
return errors.New("leader election lost")
}
}

// IsLeader returns if the Pod is the current leader.
Expand Down

0 comments on commit 4f40fca

Please sign in to comment.