Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uevent: should respawn the uevent monitor #76

Merged

Conversation

Vicente-Cheng
Copy link
Collaborator

Problem:
The udev monitor will not work with any errors.

Solution:
We monitor the error and try to respawn it.

Related Issue:
harvester/harvester#4925

Test plan:
Because the original error is hard to reproduce, I added another inject error here.

  1. set the parameter inject-udev-monitor-error as true.
  2. try to add a disk and provision it should work

@Vicente-Cheng Vicente-Cheng force-pushed the resapwn-uevent-monitor branch from 3126b4e to f70cfc9 Compare January 24, 2024 03:22
@Vicente-Cheng Vicente-Cheng changed the title uevent: should resapwn the uevent monitor uevent: should respawn the uevent monitor Jan 24, 2024
Copy link
Contributor

@tserong tserong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two small typos, other than that it looks fine (i.e. logic seems sensible).

I was going to say that the thing I don't understand is why it terminates in the first place if there's an error (our code has an infinite loop, right?) but then I went looking at

default:
_, buf, err := c.msgPeek()
if err != nil {
errs <- fmt.Errorf("Unable to check available uevent, err: %w", err)
break loop // stop iteration in case of error
}
bufToRead <- buf
}
and discovered the goroutine in Monitor will just stop if there's any kind of error.

pkg/udev/uevent.go Outdated Show resolved Hide resolved
pkg/udev/uevent.go Outdated Show resolved Hide resolved
@Vicente-Cheng
Copy link
Collaborator Author

Hi @tserong, thanks for the review!

I was going to say that the thing I don't understand is why it terminates in the first place if there's an error (our code has an infinite loop, right?) but then I went looking at ...

As you read the go-udev behavior, they will break loop with any error.
That means even if we do not stop the goroutine, we can not receive any udev event.

So that is why we need to respawn it (means creating a new monitor for incoming udev event)

    - NDM could not work w/o uevent monitor
    - add `inject-udev-monitor-error` for testing
    - Inject udev monitor error with CI

Signed-off-by: Vicente Cheng <[email protected]>
@Vicente-Cheng Vicente-Cheng force-pushed the resapwn-uevent-monitor branch from f70cfc9 to 3fb8b9f Compare January 24, 2024 06:50
@Vicente-Cheng Vicente-Cheng merged commit 3641087 into harvester:master Jan 24, 2024
8 checks passed
@Vicente-Cheng
Copy link
Collaborator Author

@Mergifyio backport v0.6.x

Copy link

mergify bot commented Jan 24, 2024

backport v0.6.x

✅ Backports have been created

@Vicente-Cheng
Copy link
Collaborator Author

@Mergifyio backport v0.5.x

Copy link

mergify bot commented Jan 24, 2024

backport v0.5.x

✅ Backports have been created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants