-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use io-watchdog with Torque? #2
Comments
It has been awhile since I looked at this project, but A way to use it under Torque might be to run the process to monitor under the
|
Good to know it. Is there a way to attach io-watchdog to an already running process by process id then? |
Does io-watchdog depend on LD_PRELOAD? or either can work without the other? |
Not in the current design. |
Clear on this point now. Another question. Normally, I can run an mpi program with command mpirun -n NUM_PROCS PROGRAM. If I wanna use io-watchdog to monitor one process, I have to launch the process being monitored with io-watchdog with others being started with mpirun normally. Is it possible to achieve so then? |
Currently if you want to only monitor one task in a parallel job, you might have to write a wrapper script, something like this (untested):
If you save this in Or, we could probably easily extend
I could create a branch with an experimental patch if you are willing to test it. I don't have a Torque system on which to test. |
If you try |
FYI, I also added an experimental patch to support |
Thanks a lot for so many suggestions. I do appreciate it a lot. Recently, I have been busy with other things. I will let you know if I have other problems. Thanks again. |
Is there a way to use io-watchdog with Torque other than Slurm.
For another, if there is only one process without writing for a long time period, whether io-watchdog will report this as a hang or not?
Thanks in advance:)
The text was updated successfully, but these errors were encountered: