You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following the guide here https://awkward-array.org/how-to-convert-buffers.html it instructs to use ak.to_buffers in order to write HDF5 files. However, the output files can become unnecessary large very easily.
Please consider the following example
container, which will get saved to the file, contains an array of 1000 numbers, even though we only want 2 of them. It doesn't have to be 1000, in fact this number can be much larger.
What I think would be very nice here is an option to have the container be restricted to only the data that is necessary. This could even be an additional function, condensing an awkward array so that it is compact in memory.
I know that flattening can have a similar effect, but it doesn't work on arrays with records. Surprisingly doing something like ak.from_array(ak.to_arrow(arr)) has the desired effect on the array. However, this seems to be a very crude workaround.
The text was updated successfully, but these errors were encountered:
This is #701, and I agree that we need a function to trim the unreachable elements from an array. When manipulating these in memory, we want to keep them to avoid duplicating data, but writing them to disk is a copy anyway and at that point, it's time to trim them down. Such an operation should be automatically applied in pickling, for instance, and it should be discussed in the tutorial you're referencing.
This was also an issue in Awkward 0: scikit-hep/awkward-0.x#246. As indicated in #701, the ak.packed function would be a good addition. You found the same `ak.from_arrow(ak.to_arrow(arr))`` workaround we discussed there, which does some wasteful computations if there are option-types around (not so much otherwise, but it's still crude).
jpivarski
changed the title
Only output minimal container in ak.to_buffers
We need an ak.packed function
Feb 17, 2021
Following the guide here https://awkward-array.org/how-to-convert-buffers.html it instructs to use
ak.to_buffers
in order to write HDF5 files. However, the output files can become unnecessary large very easily.Please consider the following example
container
, which will get saved to the file, contains an array of 1000 numbers, even though we only want 2 of them. It doesn't have to be 1000, in fact this number can be much larger.What I think would be very nice here is an option to have the container be restricted to only the data that is necessary. This could even be an additional function, condensing an awkward array so that it is compact in memory.
I know that flattening can have a similar effect, but it doesn't work on arrays with records. Surprisingly doing something like
ak.from_array(ak.to_arrow(arr))
has the desired effect on the array. However, this seems to be a very crude workaround.The text was updated successfully, but these errors were encountered: