Multiple runs using 1 processor each vs One run with multiple processors #1554

ah-dinh · 2024-05-01T08:56:19Z

ah-dinh
May 1, 2024

Hi,

I have 10 million particles located at location A, B, C, and D. They would all need to use the same velocity fields stored in a single netCDF (the velocity fields are chunked). Is there a performance difference if I run the A, B, C, and D separately (each with a single processor) versus all the particles together with MPI using 4 processors?

I'm asking because there seems to be a large overhead for each run. I did a small test with 5,000 particles. It takes 6.5 hours to run when the netCDF is stored on an external HDD. It also takes 6.5H when the netCDF is moved to an internal SSD. It also takes 6.5H when I increase the number of particles to 15,000. Reducing the time-step from 60 minutes to 5 minutes added an extra 15 minutes to the runtime. Increasing the number of particles to 150,000 and reducing the time-step to 1 minute increases the runtime to 10 hours.

Side question, when running on multiple processors, how often are the particles re-partitioned? Is there a way to control when the repartition function is called?

Thanks for your help,
Andy Dinh

erikvansebille · 2024-05-01T11:57:46Z

erikvansebille
May 1, 2024
Maintainer

Thanks for this question @ah-dinh. Did you have a look at the Kehl et al (2023) article, which is all about the efficiency of parcels?

I'm asking because there seems to be a large overhead for each run. I did a small test with 5,000 particles. It takes 6.5 hours to run when the netCDF is stored on an external HDD. It also takes 6.5H when the netCDF is moved to an internal SSD. It also takes 6.5H when I increase the number of particles to 15,000. Reducing the time-step from 60 minutes to 5 minutes added an extra 15 minutes to the runtime. Increasing the number of particles to 150,000 and reducing the time-step to 1 minute increases the runtime to 10 hours.

Your description above suggests that most of the time spent is in reading the hydrodynamic input data into the FieldSet object. That is independent on the number of particles, so would explain your scaling

Side question, when running on multiple processors, how often are the particles re-partitioned? Is there a way to control when the repartition function is called?

Currently, there is no re-partitioning in parcels. You can manually repartition by restarting your simulation after a while; but this is not built-in yet. The most important reason is that we don't (yet) have a method to create new particles during execution (in the JIT-loop), so can't move particles from one processor to another.

4 replies

JamiePringle May 1, 2024
Collaborator

@ah-dinh , if you use open_mfdataset to open your hydrodynamic fields, how long does it take? Xarray, especially older versions, can take an inoordinately long time; there are tricks on how to open the datasets more quickly. If it is an xarray problem, you will want to treat it differently then if it is a parcels problem.

ah-dinh May 1, 2024
Author

@erikvansebille Thank you for the paper, I will take a look at it.

I have one final question. For the 5000 particles example above, the particles are clustered around a small region. The netCDF containing the velocity fields is huge (pan-antarctic) but split into chunks of dimension (1, 225, 50, 90). When running the test cases, I'm seeing about 30GB of real memory usage (& 200GB compressed memory) - which seems like it is loading a much larger area than needed.

@JamiePringle Do you mean to lazy load the velocity fields or load it into memory? I tried both and the read speed seems pretty reasonable.

erikvansebille May 2, 2024
Maintainer

I have one final question. For the 5000 particles example above, the particles are clustered around a small region. The netCDF containing the velocity fields is huge (pan-antarctic) but split into chunks of dimension (1, 225, 50, 90). When running the test cases, I'm seeing about 30GB of real memory usage (& 200GB compressed memory) - which seems like it is loading a much larger area than needed.

Have you experimented with the chunksize parameter when you create the FieldSet? Does setting a smaller chunksize lead to a smaller memory footprint?

ah-dinh May 2, 2024
Author

Ahhhh, that seems to be it. I didn't realize you have to explicitly state the chunksize in the FieldSet. I thought it would just default to the same chunk size as the inputs. The 6.5H run is cut down to 11 minutes now! Thanks!

JamiePringle · 2024-05-02T13:16:49Z

JamiePringle
May 2, 2024
Collaborator

@ah-dinh , how are you measuring your memory usage. It appears completely outlandish. With chunks of that size should take 2555090*8 bytes, or 23Mb, per field (e.g. per u, v, t, etc). If all your particles start in the same region, they should be only a few chunks read in. So your memory usage seems very large. Either there is a chunk size issue (Try 225,50,90, or often better 225,50,10, assuming these numbers are the x,y and z dimensions in that order).

Or perhaps you are not reading memory usage properly. On all major operating systems (Linus, osx, windows) the reported memory usage for the system has almost NOTHING to do with what programs are using, since much of that memory, if not used for a program, is used to buffer IO. This is especially true if you are running IO intensive tasks. You can't just look at the system statistics, but at the process in particular.

To put this in perspective, I routinely run 567,035,651 particle runs on a machines with 256 and 128Gb of memory, with about 47,253,000 active particles at any one time.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple runs using 1 processor each vs One run with multiple processors #1554

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Multiple runs using 1 processor each vs One run with multiple processors #1554

ah-dinh May 1, 2024

Replies: 2 comments · 4 replies

erikvansebille May 1, 2024 Maintainer

JamiePringle May 1, 2024 Collaborator

ah-dinh May 1, 2024 Author

erikvansebille May 2, 2024 Maintainer

ah-dinh May 2, 2024 Author

JamiePringle May 2, 2024 Collaborator

ah-dinh
May 1, 2024

Replies: 2 comments 4 replies

erikvansebille
May 1, 2024
Maintainer

JamiePringle May 1, 2024
Collaborator

ah-dinh May 1, 2024
Author

erikvansebille May 2, 2024
Maintainer

ah-dinh May 2, 2024
Author

JamiePringle
May 2, 2024
Collaborator