Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set atexit in mockupcamera #1169

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from
Draft

Conversation

walesch-yan
Copy link
Contributor

@walesch-yan walesch-yan commented Mar 3, 2025

There was an issue regarding the correct shutdown of video-streamer processes in the MXCuBE-Web repository (see mxcube/mxcubeweb#1577) . A patch was introduced in the latest version of video-streamer that handles a signal catching issue (see mxcube/video-streamer#34).

Until now, MXCuBE maintains a registry of spawned child processes in /tmp/mxcube.pid. However, the mockupcamera was never registered, which caused another issue for correct process termination. Additionally the temp file solution is violating ruff rule S108.

To solve this instead of registering the pid of the video-streamer process in a separate file, which will later be read by the server, this PR makes use of the atexit.register functionality directly.
Additionally, I moved the dependency of the video-streamer directly to mxcubecore, since it is actually not directly used in the web but in the camera components.

@walesch-yan
Copy link
Contributor Author

Until now, MXCuBE maintains a registry of spawned child processes in /tmp/mxcube.pid. However, the mockupcamera was never registered, which caused another issue for correct process termination. Additionally the temp file solution is violating ruff rule S108.

As of my knowledge, the only pids that are registered to this file are the video-streamer process from TangoLimaMpegVideo and the server in MXCuBE-Web. As such I would also advocate in completely removing this and use atexit manually in these locations or using another protocol for registrations. (A redis server for example, however this would need a more careful introduction).

@walesch-yan walesch-yan force-pushed the yw-fix-mockup-shutdown branch from 37aca15 to 3788388 Compare March 3, 2025 12:45
Comment on lines -100 to +108
with open("/tmp/mxcube.pid", "a") as f:
f.write("%s " % self._video_stream_process.pid)
atexit.register(self.clean_up)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now also modified the registration for the video-streamer in tangoLimaMPEGVideo

@fabcor-maxiv
Copy link
Contributor

fabcor-maxiv commented Mar 4, 2025

Looks like I still have some video-streamer and ffmpeg processes left behind with this branch of mxcubecore and mxcube_video_streamer==1.8.1.

I do see the following message that did not seem to exist before though:

2025-03-04 08:34:20,200 |HWR |INFO | Shutting down video_stream...

@walesch-yan
Copy link
Contributor Author

Looks like I still have some video-streamer and ffmpeg processes left behind with this branch of mxcubecore and mxcube_video_streamer==1.8.1.

I do see the following message that did not seem to exist before though:

2025-03-04 08:34:20,200 |HWR |INFO | Shutting down video_stream...

Yes, I added the message to make sure that the shutdown process is called.
Could it be a process that is still running from the previous version? I.e. when you run and exit the server after a few seconds, are there new processes that are left behind or the same ones?

I do see that my changes have one issue: if the shutdown is not called when exiting the code, the processes will not be terminated (this is similar as it was before), however different than before it will not clean-up after processes that have been run on previous server runs and not terminated correctly, as there is no information left of old processes/there is no file anymore that keeps track.
This is especially a problem when the server is crashing.
I agree that I could revert back to the old tempfile functionality as it is more practical, but we probably should find another solution security-wise and ruff-wise this one is not sustainable

@walesch-yan walesch-yan marked this pull request as draft March 4, 2025 08:14
@fabcor-maxiv
Copy link
Contributor

fabcor-maxiv commented Mar 4, 2025

Could it be a process that is still running from the previous version? I.e. when you run and exit the server after a few seconds, are there new processes that are left behind or the same ones?

I checked again now:

  • No ffmpeg or video-streamer process running
  • Start MXCuBE-Web
  • There are ffmpeg and video-streamer processes
  • Kill MXCuBE-Web with CTRL+C
  • Message: "HWR |INFO | Shutting down video_stream..."
  • But still there are ffmpeg and video-streamer processes running
  • Run pkill ffmpeg
  • Message: "Received SIGTERM, shutting down gracefully..."
  • ffmpeg and video-streamer processes are gone

I agree that I could revert back to the old tempfile functionality as it is more practical, but we probably should find another solution security-wise and ruff-wise this one is not sustainable

The atexit solution seems cleaner, I would prefer it, obviously. But if really we do not manage to make it work at all, then we can choose to ignore Ruff's rule.

Is there a way to check if the os.kill succeeded?

We could also try to send SIGKILL instead of SIGTERM.

@fabcor-maxiv
Copy link
Contributor

Sending SIGKILL seems to do the trick for me:

diff --git i/mxcubecore/HardwareObjects/mockup/MDCameraMockup.py w/mxcubecore/HardwareObjects/mockup/MDCameraMockup.py
index 853e40828..7f45a1311 100644
--- i/mxcubecore/HardwareObjects/mockup/MDCameraMockup.py
+++ w/mxcubecore/HardwareObjects/mockup/MDCameraMockup.py
@@ -120,7 +120,7 @@ class MDCameraMockup(BaseHardwareObjects.HardwareObject):
 
     def clean_up(self):
         logging.getLogger("HWR").info("Shutting down video_stream...")
-        os.kill(self._video_stream_process.pid, signal.SIGTERM)
+        os.kill(self._video_stream_process.pid, signal.SIGKILL)
 
     def start_video_stream_process(self) -> None:
         if (

I am not saying that is what we should do. In principle that should not be necessary. Sending SIGTERM should be good enough. I guess it needs more investigating...

@walesch-yan
Copy link
Contributor Author

Thanks for testing it @fabcor-maxiv :)

You are right, it seems that I incorrectly tested SIGTERM handling yesterday. I also tried with SIGINT, this seems to work as well as SIGKILL. I will investigate this and try to make it work for SIGTERM as well.

Is there a way to check if the os.kill succeeded?

One could wait a few seconds, then use psutil library, otherwise one could do something like

    try:
        os.kill(pid, 0) 
        print("process was still running")
    except OSError:
        print("no process detected")

since os.kill will raise an error if there is no process with the given pid and do nothing else. Also a good way to clean up simultaniouly during testing

@walesch-yan
Copy link
Contributor Author

I am not saying that is what we should do. In principle that should not be necessary. Sending SIGTERM should be good enough. I guess it needs more investigating...

I agree, however currently SIGKILL was actually always send, I thought having SIGTERM with graceful shutdown would be a better/more elegant solution.

I will spend a little more time investigating the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants