-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation error in Polaris for SOC-QMC calculations #5283
Comments
"Primitive cell ion 0" comes from h5. |
This looks like a problem with the inputs. If so you would get a failure for all similar runs and on one cpu core or 1 gpu only. |
This is expected output, convertpw4qmc doesn't have ion information. So this warning is expected. Another reason it would be nice to have pw2qmcpack handle spinors |
Actually |
Since it is a one node run, is the failure reproducible? Could you try run it again? |
The error that terminates the job is in this line: Yes, the QMCPACK ERROR is a false alarm, as I said I get it in the job I ran in Baseline as well, and it does not affect the calculation. |
I ran using multiple nodes, those failed as well. I don't have the output from that job now. I can try resubmitting with multiple nodes. |
I used convert4qmc |
Is it giving the same error with cpu build? |
@Hyeondeok-Shin CPU build gives no error, executed successfully. |
Describe the bug
DMC-SOC run produces segmentation error on Polaris, does not give any feedback to the user what might be wrong about the input. QMCPACK output does not give any indication of insufficient memory from the output, and I have tested the same run with reduced meshfactors to see if the job will go through, but none of those jobs succeeded. The same job with more demanding parameters run successfully on Baseline (OLCF) with no issues.
Here are the last few lines from the output file with meshfactor=1.0 using debug queue (1 node 4 MPI ranks, minimal setup) in Polaris:
I have tried reducing meshfactor of the splines, but it made no effect to the outcome.
dmc.err file is the following:
Manual and the workshop materials say that the primitive/supercell ERROR lines are expected, because the file converted using convertpw4qmc does not contain the ionic species information. I get the same ERROR lines in Baseline, but they do not affect the calculation.
To Reproduce
Steps to reproduce the behavior:
System:
soc_baseline.zip
soc_error_polaris.zip
The text was updated successfully, but these errors were encountered: