Reading a BP5 series (groupBased) using 120+ GB memory, any fix around it? #1724

OLuckyG · 2025-03-01T18:44:22Z

OLuckyG
Mar 1, 2025

Hello,
I am not sure if asking questions are done like this here. But I have a question regarding reading a file.
I am using ImpactX simulation, which is also an open source, particle-in-cell simulation. The backend for writing turn-by-turn data, I have it as .bp file format which is good for open journaling since I can access the data as its being written using openPMD API.

However, I have ran a large simulation which in return produced a huge data file. The ".bp" directory that is created is around: 122GB size large.

While, I was accessing it using:

import openpmd_api as io
series = io.Series(file_path, io.Access.read_only)

The RAM usage spikes to 120GB, which kind of makes sense due to the size of the file. Without performing any additional statistics on the data, the RAM used reaches 120GB. Basically when the python code hits the series in the script it stays there to read the series and it reaches 120GB RAM without progessing to next lines of code.

Is there a way to not use this much memory, because my machine is maxed at 134GB of RAM. Is there a method for lazy loading and accessing the corresponding data as needed? It seems just calling the series as I did, loads up everything even though lets say I just want to process every 50 or 100 turns data but it is written as per turn data if that makes sense.

Any suggestions will be appreciated!
Thanks!

Answered by franzpoeschel

Mar 4, 2025

I have replicated a similar workflow as yours and I think that I can verify this is a metadata issue in BP5. Using PIConGPU to create 10000 output steps, once using variable-based encoding, once using group-based encoding:

> ls -lisah openPMD_v/simData.bp5/
total 8.7G
119956563 4.0K drwxr-xr-x 2 franzpoeschel franzpoeschel 4.0K Mar  4 18:36 .
119956553 4.0K drwxr-xr-x 3 franzpoeschel franzpoeschel 4.0K Mar  4 18:34 ..
119956565 8.6G -rw-r--r-- 1 franzpoeschel franzpoeschel 8.6G Mar  4 18:36 data.0
119956567  39M -rw-r--r-- 1 franzpoeschel franzpoeschel  39M Mar  4 18:36 md.0
119956568 1.8M -rw-r--r-- 1 franzpoeschel franzpoeschel 1.8M Mar  4 18:36 md.idx
119956566 4.0K -rw-r--r-- 1 franzp…

View full answer

franzpoeschel · 2025-03-03T15:17:26Z

franzpoeschel
Mar 3, 2025
Maintainer

Hello,
considering that this occurs while opening the output, this implies a metadata size of ~120GB which should not happen.
There is a likely explanation for what is happening, since something similar occurred already a while ago:

You are using the BP5 engine of ADIOS2 (you can verify this by checking if the .bp directory contains mmd.0)
You are writing all your output into a single file, using groupBased Iteration encoding
You use a new ADIOS2 step for each output Iteration

You can verify 2. and 3. using bpls:

> bpls openPMD/simData.bp5/ -al /iterationEncoding 
  string    /iterationEncoding                                                      attr   = "groupBased"
> bpls openPMD/simData.bp5/ -t -P 'SelectSteps=0:1'

Step 0:
  float     /data/0/fields/B/x                                      {128, 128, 128}
  float     /data/0/fields/B/y                                      {128, 128, 128}
  float     /data/0/fields/B/z                                      {128, 128, 128}
  float     /data/0/fields/E/x                                      {128, 128, 128}
  float     /data/0/fields/E/y                                      {128, 128, 128}
  float     /data/0/fields/E/z                                      {128, 128, 128}
  float     /data/0/fields/e_all_chargeDensity                      {128, 128, 128}
  float     /data/0/fields/e_all_energyDensity                      {128, 128, 128}
  float     /data/0/particles/e/momentum/x                          {2621440}
  float     /data/0/particles/e/momentum/y                          {2621440}
  float     /data/0/particles/e/momentum/z                          {2621440}
  uint64_t  /data/0/particles/e/particlePatches/extent/x            {1}
  uint64_t  /data/0/particles/e/particlePatches/extent/y            {1}
  uint64_t  /data/0/particles/e/particlePatches/extent/z            {1}
  uint64_t  /data/0/particles/e/particlePatches/numParticles        {1}
  uint64_t  /data/0/particles/e/particlePatches/numParticlesOffset  {1}
  uint64_t  /data/0/particles/e/particlePatches/offset/x            {1}
  uint64_t  /data/0/particles/e/particlePatches/offset/y            {1}
  uint64_t  /data/0/particles/e/particlePatches/offset/z            {1}
  float     /data/0/particles/e/position/x                          {2621440}
  float     /data/0/particles/e/position/y                          {2621440}
  float     /data/0/particles/e/position/z                          {2621440}
  int32_t   /data/0/particles/e/positionOffset/x                    {2621440}
  int32_t   /data/0/particles/e/positionOffset/y                    {2621440}
  int32_t   /data/0/particles/e/positionOffset/z                    {2621440}
  float     /data/0/particles/e/weighting                           {2621440}
Step 1:
  float     /data/50/fields/B/x                                      {128, 128, 128}
  float     /data/50/fields/B/y                                      {128, 128, 128}
  float     /data/50/fields/B/z                                      {128, 128, 128}
  float     /data/50/fields/E/x                                      {128, 128, 128}
  float     /data/50/fields/E/y                                      {128, 128, 128}
  float     /data/50/fields/E/z                                      {128, 128, 128}
  float     /data/50/fields/e_all_chargeDensity                      {128, 128, 128}
  float     /data/50/fields/e_all_energyDensity                      {128, 128, 128}
  float     /data/50/particles/e/momentum/x                          {2621440}
  float     /data/50/particles/e/momentum/y                          {2621440}
  float     /data/50/particles/e/momentum/z                          {2621440}
  uint64_t  /data/50/particles/e/particlePatches/extent/x            {1}
  uint64_t  /data/50/particles/e/particlePatches/extent/y            {1}
  uint64_t  /data/50/particles/e/particlePatches/extent/z            {1}
  uint64_t  /data/50/particles/e/particlePatches/numParticles        {1}
  uint64_t  /data/50/particles/e/particlePatches/numParticlesOffset  {1}
  uint64_t  /data/50/particles/e/particlePatches/offset/x            {1}
  uint64_t  /data/50/particles/e/particlePatches/offset/y            {1}
  uint64_t  /data/50/particles/e/particlePatches/offset/z            {1}
  float     /data/50/particles/e/position/x                          {2621440}
  float     /data/50/particles/e/position/y                          {2621440}
  float     /data/50/particles/e/position/z                          {2621440}
  int32_t   /data/50/particles/e/positionOffset/x                    {2621440}
  int32_t   /data/50/particles/e/positionOffset/y                    {2621440}
  int32_t   /data/50/particles/e/positionOffset/z                    {2621440}
  float     /data/50/particles/e/weighting                           {2621440}

In this combination, the BP5 engine of ADIOS2 has been previously observed to have quadratic metadata output size depending on the number of steps created, due to the assumptions that it makes for its serialization. This means that if you write lots of Iterations, this exact situation may occur.
For this reason, we are currently phasing out support for group-based encoding in ADIOS2.

At the moment, the recommended alternative is to use file-based encoding which creates a new file for each output step. This can be done by including an expansion pattern in the filename: Series("simData_%T.bp5", Access.create). (We are currently working on an alternative encoding that will enable storage of all data into a single file based on ADIOS2 steps, but reading support is currently still a bit restricted for that, so we don't recommend it yet for production use).

For converting your existing dataset to file-based encoding, we will have to use an open mode in ADIOS2 that does not try to consume all metadata at once. There is one slight problem in that: Due to the difficulty in correctly associating attributes to steps in that read mode, openPMD-api 0.16 no longer supports that read mode on group-based files, meaning that you would have to temporarily downgrade to openPMD-api 0.15 for that.
You can use the following steps:

> python3 -m venv tmp
> source tmp/bin/activate                                                                                                                                        
(tmp) > pip install openpmd_api==0.15.2
(tmp) > openpmd-pipe --infile simData.bp5/ --outfile simData_%T.bp5
(tmp) > deactivate
> rm -r tmp/

I'm sorry for the troubles that this may cause, I will add a fix for the upcoming patch release so that we no longer create files in default configurations that are unreadable in certain read modes.

7 replies

OLuckyG Mar 3, 2025
Author

When I try the following:

bpls openPMD/monitor.bp/ -alt /iterationEncoding -P 'SelectSteps=0'

The output is:

Step 0:

Error: None of the variables/attributes matched any name/regexp you provided

I am guessing this due to:

The file is in an older BP format that doesn’t include that attribute? OR
The attribute is defined differently (or not at all) in the file?

I am sorry, I am not an expert at this.
Thanks for the answer!

franzpoeschel Mar 3, 2025
Maintainer

No worries, then let's try the full attribute output of the first step:

bpls simData.bp5/ -alt -P 'SelectSteps=0'

OLuckyG Mar 3, 2025
Author

The command:

bpls monitor.bp/ -alt -P 'SelectSteps=0'

Gives:

Step 0:

This output is similar to what I tried with python script importing adios2 library. Showing that there is no step available?
However, accessing it with openpmd_api does provide in the series the accurate number of turns available(for lesser number of turns I tried since 10K turns never finish iterating before it consumes too much RAM).

franzpoeschel Mar 3, 2025
Maintainer

Hm, I think I'll have to test this for myself in order to figure out which ways would even still work for accessing an output like that. I assumed that step-by-step reading would still work at least, but maybe not.
I'll probably not find the time for this today, I'll report by tomorrow.

To confirm that this really is the metadata's fault (it very likely is), can you try listing the sizes of the single files in the .bp directory, please? E.g.:

> ls -lish simData.bp5/
total 493M
119956549 493M -rw-r--r-- 1 franzpoeschel franzpoeschel 493M Mar  3 15:48 data.0
119956551  48K -rw-r--r-- 1 franzpoeschel franzpoeschel  47K Mar  3 15:48 md.0
119956552 4.0K -rw-r--r-- 1 franzpoeschel franzpoeschel  660 Mar  3 15:48 md.idx
119956550  16K -rw-r--r-- 1 franzpoeschel franzpoeschel  13K Mar  3 15:48 mmd.0
119956553 4.0K -rw-r--r-- 1 franzpoeschel franzpoeschel  777 Mar  3 15:48 profiling.json

OLuckyG Mar 3, 2025
Author

First of all I want to apologize, because for some reason the monitor.bp directory had such a small file sizes. Whereas, earlier it was filled with data.. I am not sure if this because of some action I did earlier with python script. However, I did have a backup of that directory. So trying things once again:
First: ls -lish .:

103849503  68G -rw-rw-r-- 1 ogilanli ogilanli  68G Mar  2 14:56 data.0
103849502  24G -rw-rw-r-- 1 ogilanli ogilanli  24G Mar  2 14:56 md.0
103849498 1.2M -rw-rw-r-- 1 ogilanli ogilanli 1.2M Mar  2 14:56 md.idx
103849504  31G -rw-rw-r-- 1 ogilanli ogilanli  31G Mar  2 14:56 mmd.0
103809042 4.0K -rw-rw-r-- 1 ogilanli ogilanli  836 Mar  2 14:56 profiling.json

Second: du -sh .:

122G    monitor.bp/

Now trying previous commands once again:

bpls monitor.bp/ -al /iterationEncoding -- Running this command will take the RAM usage to 100GB once again but I will not see any output because I need to kill the process before its done. Same with other commands you have provided.

Running, another simulation with smaller parameters: 1K turns, with 3K particles:

ls -lish .:

total 766M
234752668 208M -rw-rw-r-- 1 ogilanli ogilanli 208M Mar  3 11:35 data.0
234750098 249M -rw-rw-r-- 1 ogilanli ogilanli 249M Mar  3 11:35 md.0
234750097  76K -rw-rw-r-- 1 ogilanli ogilanli  72K Mar  3 11:35 md.idx
234752669 310M -rw-rw-r-- 1 ogilanli ogilanli 310M Mar  3 11:35 mmd.0
234752672 4.0K -rw-rw-r-- 1 ogilanli ogilanli  793 Mar  3 11:35 profiling.json

With: du -sh . : 766M .

Repeating the previous commands:

bpls monitor.bp/ -al /iterationEncoding

Gives:

 string    /iterationEncoding                                        attr   = "groupBased"

Secondly: bpls monitor.h5/ -t -P 'SelectSteps=0:1'
Gives:

Step 0:
  uint64_t  /data/1/particles/beam/id          {3000}
  double    /data/1/particles/beam/momentum/t  {3000}
  double    /data/1/particles/beam/momentum/x  {3000}
  double    /data/1/particles/beam/momentum/y  {3000}
  double    /data/1/particles/beam/position/t  {3000}
  double    /data/1/particles/beam/position/x  {3000}
  double    /data/1/particles/beam/position/y  {3000}
  double    /data/1/particles/beam/qm          {3000}
  double    /data/1/particles/beam/weighting   {3000}
Step 1:
  uint64_t  /data/939/particles/beam/id          {3000}
  double    /data/939/particles/beam/momentum/t  {3000}
  double    /data/939/particles/beam/momentum/x  {3000}
  double    /data/939/particles/beam/momentum/y  {3000}
  double    /data/939/particles/beam/position/t  {3000}
  double    /data/939/particles/beam/position/x  {3000}
  double    /data/939/particles/beam/position/y  {3000}
  double    /data/939/particles/beam/qm          {3000}
  double    /data/939/particles/beam/weighting   {3000}

I am sorry about earlier and the confusion it created! I did not know it was empy inside! Last time I checked it was all good with 123 GB of sizes. I guess I did something that messed it up...

Hopefully now it cleared off somethings!

Thanks for replies!

ax3l · 2025-03-04T16:56:47Z

ax3l
Mar 4, 2025
Maintainer

Hi @OLuckyG,

Thank you for your report!

I am co-developing both ImpactX and openPMD-api and will try to add a few more details to see if we can get this figured out, together with the details already shared by @franzpoeschel. I am tracking this as an issue report in BLAST-ImpactX/impactx#868 as well now.

First of all, to fully understand your problem: Do you mind sharing your full analysis routines? Is the memory already spiking just at Series() open or at a later point?

If you don't mind, can you share a reproducer, e.g., the ImpactX input (or simplified version) and a demonstrator in analysis? How many turns (outputs) and particles are in your >120GB simulation?

My guess currently is that your memory is not spiking on Series() open but on read of the data sets, and we can optimize this by rewriting your analysis to read in chunks.

Otherwise, I can guide you in the meantime to change the output mode in your ImpactX file to use another format that does not have this issue.

Also, to give you quick relief, are you aware of the new period_sample_intervals parameters of the ImpactX beam monitor? Does that help to reduce your output?

@franzpoeschel FYI: we default to group based encoding, so that might be part of the problem with BP5.

8 replies

OLuckyG Mar 4, 2025
Author

@ax3l I am using ImpactX via Python! Followed the steps through conda installer via mamba. Only reason following this path was, could not install adios2 through different channels...
Also, I think python version is the only way to read distribution from input?

I just realized there is a switch for encoding!! I am not that expert in this so I just left it as default which seems to be group-based. I did not know it would have caused a problem like this, since it was all good with the h5 file!

ax3l Mar 4, 2025
Maintainer

@OLuckyG perfect, then the options you can use for now are:
https://impactx.readthedocs.io/en/latest/usage/python.html#impactx.elements.BeamMonitor

ADIOS2 BP4

monitor = BeamMonitor("monitor", backend="bp4", encoding="g")

or

monitor = BeamMonitor("monitor", backend="bp4", encoding="f")

ADIOS2 BP5

monitor = BeamMonitor("monitor", backend="bp5", encoding="f")

or experimental

monitor = BeamMonitor("monitor", backend="bp5", encoding="v")

HDF5

monitor = BeamMonitor("monitor", backend="h5", encoding="g")

or

monitor = BeamMonitor("monitor", backend="h5", encoding="f")

ax3l Mar 4, 2025
Maintainer

I just realized there is a switch for encoding!! I am not that expert in this so I just left it as default which seems to be group-based. I did not know it would have caused a problem like this, since it was all good with the h5 file!

Yep, that default is definitely a bug and we will fix this now thanks to your report.

Unfortunately,

monitor = BeamMonitor("monitor", backend="bp5", encoding="g")

was the default since I updated ADIOS2 from 2.8 to 2.10, which I change and throw an error for in ImpactX 24.03+ via BLAST-ImpactX/impactx#870

OLuckyG Mar 4, 2025
Author

@ax3l
Thank you for the details and reply!

Honestly, I am not sure what difference will it make in terms of the output between group based and file based encodings.

Will that imply to do some changes the way files are being read for analysis or the openpmd_api will take care of it the backend?

ax3l Mar 5, 2025
Maintainer

What we call "file based" encoding creates a new openPMD file per turn of your beam monitor, e.g., openPMD_00001.bp/, openPMD_00002.bp/, etc. and likewise multiple files for .h5.
You read those back via

import openpmd_api as io
series = io.Series("openPMD_%T.bp", io.Access.read_only)

The "group based" and "variable based" encoding uses a single logical file, e.g., openPMD.bp/ or openPMD.h5, which contain all turns. The latter gives you speedups when reading back really long series of 1000+ turns (also referred to as iterations, steps or snapshots).
You read those back via

import openpmd_api as io
series = io.Series("openPMD.bp", io.Access.read_only)

The file name "encoding" is all that changes for you.

franzpoeschel · 2025-03-04T17:51:11Z

franzpoeschel
Mar 4, 2025
Maintainer

I have replicated a similar workflow as yours and I think that I can verify this is a metadata issue in BP5. Using PIConGPU to create 10000 output steps, once using variable-based encoding, once using group-based encoding:

> ls -lisah openPMD_v/simData.bp5/
total 8.7G
119956563 4.0K drwxr-xr-x 2 franzpoeschel franzpoeschel 4.0K Mar  4 18:36 .
119956553 4.0K drwxr-xr-x 3 franzpoeschel franzpoeschel 4.0K Mar  4 18:34 ..
119956565 8.6G -rw-r--r-- 1 franzpoeschel franzpoeschel 8.6G Mar  4 18:36 data.0
119956567  39M -rw-r--r-- 1 franzpoeschel franzpoeschel  39M Mar  4 18:36 md.0
119956568 1.8M -rw-r--r-- 1 franzpoeschel franzpoeschel 1.8M Mar  4 18:36 md.idx
119956566 4.0K -rw-r--r-- 1 franzpoeschel franzpoeschel 2.2K Mar  4 18:34 mmd.0
119956569 4.0K -rw-r--r-- 1 franzpoeschel franzpoeschel  844 Mar  4 18:36 profiling.json

> ls -lisah openPMD_g/simData.bp5/
total 71G
119956546 4.0K drwxr-xr-x 2 franzpoeschel franzpoeschel 4.0K Mar  4 18:19 .
119933961 4.0K drwxr-xr-x 3 franzpoeschel franzpoeschel 4.0K Mar  4 18:19 ..
119956548 7.1G -rw-r--r-- 1 franzpoeschel franzpoeschel 7.1G Mar  4 18:32 data.0
119956550  29G -rw-r--r-- 1 franzpoeschel franzpoeschel  29G Mar  4 18:32 md.0
119956551 1.5M -rw-r--r-- 1 franzpoeschel franzpoeschel 1.5M Mar  4 18:32 md.idx
119956549  36G -rw-r--r-- 1 franzpoeschel franzpoeschel  36G Mar  4 18:32 mmd.0
119956552 4.0K -rw-r--r-- 1 franzpoeschel franzpoeschel  768 Mar  4 18:19 profiling.json

The more than 60GB size difference is entirely in metadata.
Unfortunately, I can also confirm your observations from yesterday that there seems to be no way to open this file in any way without having a machine with a large amount of memory.

For your upcoming runs, either:

Use HDF5 where this problem does not occur
Use file-based encoding (@ax3l I'll ask you to assist him with that and to please deactivate group-based output for ADIOS2 from ImpactX and other codes that might be affected. This is a wontfix issue, which is why we are moving to variable-based encoding for the next release.)

For your existing data, either:

If possible, run the simulation again
Or maybe use a system with larger amount of memory and try following the steps that I wrote at the end of this comment: Reading a BP5 series (groupBased) using 120+ GB memory, any fix around it? #1724 (comment)

4 replies

OLuckyG Mar 4, 2025
Author

Got it thanks!!

I already set up a simulation in the H5 format! I was using the BP format for the sake of accessing the data while it was being written because I do not want to wait a long time to see that the results did not go as it was planned so it gives an early access to the data so I do not waste time in waiting for the results and I can change things as needed!

Thanks for all the replies! I will go ahead and mark it as answered!

ax3l Mar 4, 2025
Maintainer

Fix to ImpactX: BLAST-ImpactX/impactx#870

@OLuckyG you can still use this (i.e., bp4) to access the data while it is written :)
In the near future, we will have better support for bp4 + v encoding, which will then be the new default.

franzpoeschel Mar 4, 2025
Maintainer

Should we limit the number of steps written in BP5 + group-based encoding to a number of ~1000?
Better to crash the simulation with an error than to let it run and create files that no one can read due to their size.

ax3l Mar 5, 2025
Maintainer

Hm, it sounds to me like with the current (memory) read-back scaling we want to abort on any write attempt using BP5 + group-based, no?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading a BP5 series (groupBased) using 120+ GB memory, any fix around it? #1724

{{title}}

Replies: 3 comments 19 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Reading a BP5 series (groupBased) using 120+ GB memory, any fix around it? #1724

OLuckyG Mar 1, 2025

Replies: 3 comments · 19 replies

franzpoeschel Mar 3, 2025 Maintainer

OLuckyG Mar 3, 2025 Author

franzpoeschel Mar 3, 2025 Maintainer

OLuckyG Mar 3, 2025 Author

franzpoeschel Mar 3, 2025 Maintainer

OLuckyG Mar 3, 2025 Author

ax3l Mar 4, 2025 Maintainer

OLuckyG Mar 4, 2025 Author

ax3l Mar 4, 2025 Maintainer

ADIOS2 BP4

ADIOS2 BP5

HDF5

ax3l Mar 4, 2025 Maintainer

OLuckyG Mar 4, 2025 Author

ax3l Mar 5, 2025 Maintainer

franzpoeschel Mar 4, 2025 Maintainer

OLuckyG Mar 4, 2025 Author

ax3l Mar 4, 2025 Maintainer

franzpoeschel Mar 4, 2025 Maintainer

ax3l Mar 5, 2025 Maintainer

OLuckyG
Mar 1, 2025

Replies: 3 comments 19 replies

franzpoeschel
Mar 3, 2025
Maintainer

OLuckyG Mar 3, 2025
Author

franzpoeschel Mar 3, 2025
Maintainer

OLuckyG Mar 3, 2025
Author

franzpoeschel Mar 3, 2025
Maintainer

OLuckyG Mar 3, 2025
Author

ax3l
Mar 4, 2025
Maintainer

OLuckyG Mar 4, 2025
Author

ax3l Mar 4, 2025
Maintainer

ax3l Mar 4, 2025
Maintainer

OLuckyG Mar 4, 2025
Author

ax3l Mar 5, 2025
Maintainer

franzpoeschel
Mar 4, 2025
Maintainer

OLuckyG Mar 4, 2025
Author

ax3l Mar 4, 2025
Maintainer

franzpoeschel Mar 4, 2025
Maintainer

ax3l Mar 5, 2025
Maintainer