Persistent hanging jobs with Stochastic Tools + Griffin, on Sawtooth #27322
-
Hi all, I've recently been using stochastic tools in conjunction with Griffin for providing sensitivity and uncertainty quantification, as well as attempting to develop a surrogate, for a nuclear thermal propulsion project. I've had some success, but I've consistently encountered issues on Sawtooth with simulation initializations, during solves, and at the solve completion. I get hanging jobs frequently, and these are relatively intensive (60 node) jobs. If there is any advice out there, I'd appreciate it. Here are the modules I've been using on sawtooth, loading
Here are a couple examples of the hanging jobs. Thanks in advance if anyone is able to provide some guidance. Jackson
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
The last one includes an error in stochastic tools. But still, some ranks were still hanging. Have you seen anything like this @loganharbour , @milljm ? |
Beta Was this translation helpful? Give feedback.
-
This does not work with the current timpi algorithm. You need to pass the timpi-sync command line parameters, or avoid mvapich (use openmpi or mpich) when compiling. |
Beta Was this translation helpful? Give feedback.
This does not work with the current timpi algorithm. You need to pass the timpi-sync command line parameters, or avoid mvapich (use openmpi or mpich) when compiling.