Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update actions checkout and setup-python (fixes warning). #277

Merged
merged 12 commits into from
Dec 10, 2023

Conversation

fdmalone
Copy link
Collaborator

@fdmalone fdmalone commented Dec 5, 2023

The mpi tests failing was a red herring. The examples were failing because of #278, this sent a signal to the other tests to fail but I'm guessing the error was only received on the root process so the MPI jobs would hang. I've set a timeout for these for the moment but there may be a more sensible thing to do.

@fdmalone
Copy link
Collaborator Author

fdmalone commented Dec 10, 2023

I found several problems:

  1. openmpi > 4.15 has an issue with bcast see: Deadlock in mpi_bcast using openmpi 4.1.5 from fortran open-mpi/ompi#11478. Using the fix suggested there in the mpirun launch fixes deadlocks.
  2. example 02 is flaky (fails on CI (zero walker weight) but can't reproduce reliably locally.
  3. mpich errors on finalize (so can't replace openmpi with mpich in CI).

@fdmalone fdmalone merged commit e1c4854 into develop Dec 10, 2023
7 checks passed
@fdmalone fdmalone deleted the update_actions branch December 10, 2023 01:23
linusjoonho added a commit that referenced this pull request Dec 10, 2024
1. fix msd-afqmc green's function with gpu 
change `walker_batch.Ga.fill(0.0 + 0.0j)` to `walker_batch.Ga =
xp.zeros_like(walker_batch.Ga)`
since cupy does not have cupy.ndarray.fill

2. fix initial walker of msd trial
from 
```
elif isinstance(trial, ParticleHole):
    initial_walker = numpy.hstack([trial.psi0a, trial.psi0b])
```
to 
```
elif isinstance(trial, ParticleHole):
    initial_walker = numpy.hstack([trial.psi0a, trial.psi0b])
    random_walker = numpy.random.random(initial_walker.shape)
    initial_walker = initial_walker + random_walker
    initial_walker, _ = numpy.linalg.qr(initial_walker)
```
Otherwise cause issues.

3. fix the integration. In #277, the msd example was disabled. Now fixed
with the second point mentioned above.

4. Modify the example for running msd-afqmc with MPI / GPU
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant