Replies: 2 comments 4 replies
-
Thanks for this question @ah-dinh. Did you have a look at the Kehl et al (2023) article, which is all about the efficiency of parcels?
Your description above suggests that most of the time spent is in reading the hydrodynamic input data into the
Currently, there is no re-partitioning in parcels. You can manually repartition by restarting your simulation after a while; but this is not built-in yet. The most important reason is that we don't (yet) have a method to create new particles during execution (in the JIT-loop), so can't move particles from one processor to another. |
Beta Was this translation helpful? Give feedback.
-
@ah-dinh , how are you measuring your memory usage. It appears completely outlandish. With chunks of that size should take 2555090*8 bytes, or 23Mb, per field (e.g. per u, v, t, etc). If all your particles start in the same region, they should be only a few chunks read in. So your memory usage seems very large. Either there is a chunk size issue (Try 225,50,90, or often better 225,50,10, assuming these numbers are the x,y and z dimensions in that order). Or perhaps you are not reading memory usage properly. On all major operating systems (Linus, osx, windows) the reported memory usage for the system has almost NOTHING to do with what programs are using, since much of that memory, if not used for a program, is used to buffer IO. This is especially true if you are running IO intensive tasks. You can't just look at the system statistics, but at the process in particular. To put this in perspective, I routinely run 567,035,651 particle runs on a machines with 256 and 128Gb of memory, with about 47,253,000 active particles at any one time. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I have 10 million particles located at location A, B, C, and D. They would all need to use the same velocity fields stored in a single netCDF (the velocity fields are chunked). Is there a performance difference if I run the A, B, C, and D separately (each with a single processor) versus all the particles together with MPI using 4 processors?
I'm asking because there seems to be a large overhead for each run. I did a small test with 5,000 particles. It takes 6.5 hours to run when the netCDF is stored on an external HDD. It also takes 6.5H when the netCDF is moved to an internal SSD. It also takes 6.5H when I increase the number of particles to 15,000. Reducing the time-step from 60 minutes to 5 minutes added an extra 15 minutes to the runtime. Increasing the number of particles to 150,000 and reducing the time-step to 1 minute increases the runtime to 10 hours.
Side question, when running on multiple processors, how often are the particles re-partitioned? Is there a way to control when the repartition function is called?
Thanks for your help,
Andy Dinh
Beta Was this translation helpful? Give feedback.
All reactions