Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing a process pool based on environment #1096

Closed
wants to merge 1 commit into from

Conversation

iparask
Copy link
Member

@iparask iparask commented Jan 16, 2025

make_atomic_filterbin_map uses MPI and multiprocessing pool to launch processes. This can create conflicts and issues in HPC environments where process creation is not dynamic.

This PR provides a function that discovers the execution environment and returns an executor which can either be an mpipoolexecutor or a poolexecutor.

This work depends on simonsobs/pixell#286

Comment on lines +1 to +3
try:
from mpi4py.futures import MPICommExecutor, as_completed
from mpi4py import MPI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what conditions does this fail? Only when an environment is loaded that doesn't include mpi? If the mpi package is loaded but the environment is setup to use just 1 node (i.e. like how we've been running so far at nersc + tiger) will this also fail and move on to setup the ProcessPoolExecutor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an MPI implementation is detected it will use the MPIPoolExecutor and assign as many workers as the MPI world. If an MPI implementation is not detected it will use the ProcessPoolExecutor. For example:

ip8725@tiger-g04c1n6:/scratch/gpfs/SIMONSOBS/users/ip8725/tmp$ python mpi_futures.py
<class 'concurrent.futures.process.ProcessPoolExecutor'>
ip8725@tiger-g04c1n6:/scratch/gpfs/SIMONSOBS/users/ip8725/tmp$ module load soconda/3.10/20250109
Loading soconda/3.10/20250109
  Loading requirement: openmpi/gcc/4.1.6 hdf5/gcc/openmpi-4.1.6/1.14.4 anaconda3/2024.2 fftw/gcc/openmpi-4.1.6/3.3.10
(/scratch/gpfs/SIMONSOBS/env/20250109/soconda_3.10) ip8725@tiger-g04c1n6:/scratch/gpfs/SIMONSOBS/users/ip8725/tmp$ srun -n 2 python mpi_futures.py
<class 'mpi4py.futures.pool.MPIPoolExecutor'>

Tiger3 does not allow python mpi_futures.py while MPI is active (because of openmpi) and I have not found a way to go around it and use ProcessPoolExecutor yet.

I am not sure what will happen if I run python mpi_futures.py on NERSC while soconda is active.

I'm sure there are differences on how NERSC and Princeton configures SLURM.

Everything depends on the software execution environment and not on the number of nodes. That means that if MPI is enabled, even with 1 node, it will use the MPIPoolExecutor.

@mhasself
Copy link
Member

I know this is a draft, but the extensive reformatting in make_atomic_filterbin_map.py is not acceptable as part of these changes. Substantive changes and formatting changes that have no effect must be in different PRs (or, if same PR, separate commits that are preserved on merge).

@chervias
Copy link
Contributor

We are talking to @iparask about this. We should probably merge the new mapmaker code from #1044 first and then modify on top of that for this stuff.

@iparask
Copy link
Member Author

iparask commented Jan 17, 2025

@mhasself the formatting is because everything goes under the if rank == 0 statement. The call formatting I will remove.

I will see if I can find a way to avoid if rank == 0 in the while script.

@susannaaz susannaaz self-requested a review January 17, 2025 17:05
@iparask iparask changed the base branch from master to sat-mapmaker-updates-241127 January 17, 2025 18:17
@iparask iparask closed this Jan 17, 2025
@iparask iparask deleted the gp/fix/mpipool branch January 17, 2025 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants