Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with the number of chains in MCMC_Posterior with method = 'nuts' #1077

Closed
MolinAlexei opened this issue Mar 21, 2024 · 3 comments
Closed
Labels
bug Something isn't working

Comments

@MolinAlexei
Copy link

Hi,
I am running the following lines of code on the computing cluster I am using fro my work :

potential_fn, parameter_transform = likelihood_estimator_based_potential(
    density_estimator_SNLE, prior, x_o
)

# Build the posterior
posterior = MCMCPosterior(
    potential_fn, proposal=prior, theta_transform=parameter_transform, thin = 1, method = 'nuts', num_chains = 32, num_workers = 32,
    warmup_steps = 1000
)

samples = posterior.sample((5000,32))

and I am launching the code using SLURM with the following bash file :

#!/bin/bash

#SBATCH --job-name=PleaseWork
#SBATCH --exclusive 
#SBATCH --partition=xifu
#SBATCH --output=MCMC_SBI_xifusim.out

source /home/sila/miniconda3/etc/profile.d/conda.sh
conda activate myenv
###export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/mola/miniconda3/envs/myenv/lib

srun -n1 -c32 python MCMC_SBI_xifusim_obs.py

And I get the following warning :
/home/mola/miniconda3/envs/myenv/lib/python3.10/site-packages/pyro/infer/mcmc/api.py:497: UserWarning: num_chains=32 is more than available_cpu=31. Chains will be drawn sequentially.

However, if I choose 'slice_np' for the method, I get the proper behavior, i.e. the chains running in parallel with 32 workers.

Am I doing something wrong or is this normal ?
Thank you for the help.

@MolinAlexei MolinAlexei added the bug Something isn't working label Mar 21, 2024
@felixp8
Copy link
Contributor

felixp8 commented Mar 21, 2024

Hi!

The issue here seems to be that Pyro internally reserves one CPU for the main process and thus only allows for n_cpus - 1 chains to run in parallel, as you can see here. This is why in MCMCPosterior._pyro_mcmc the number of chains is set to one less than the cpu count also here, if num_chains=None. So, for Pyro samplers, I think you would need to reduce the chains you request or increase the CPUs you allot for your program.

@Baschdl
Copy link
Contributor

Baschdl commented Mar 22, 2024

@felixp8 How about adding this to the docs? Something like num_chains: ... Should generally be num_workers-1

@felixp8
Copy link
Contributor

felixp8 commented Mar 22, 2024

would make sense, I can add it to #1053

@felixp8 felixp8 closed this as completed Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants