You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 4, 2021. It is now read-only.
Now event is a Python object, just like before, but it has no tab-completion, all its array-like members are not numpy-arrays but instances of AnyArray that need to be converted using for example protozfits.any_array_to_numpy.
This means, people who know for example they need only access to one very specific member of the event which also happens to be an integer, they can use pure_protobuf=True and iterate faster over all events of the file.
There must be a better way!
Offering these two possibilities might sound nice in the first moment, but it splits the users in two groups. Those who need speed and those who want to keep it simple. I believe a good software does not force users to choose between these two options, at least not at the very start of every project .. and this reader happens to be at the start ...
These are the features the "comfy" option has and the "fast" option is missing:
tab completion
correct enum representation (not mentioned above)
complete auto numpy conversion
shorter string representation (using the two above)
The speedwise problematic part in my understanding is the "auto numpy conversion". All arrays are always converted from AnyArray to numpy, even when the user never accesses them.
So the solution to the problem seems to be lazy evaluation.
At the moment (in the "comfy" version) the whole event is converted into a collections.named_tuple and given to the user. This conversion includes converting all AnyArrays into np.arrays.
Instead what could be done at import time is dynamically generate "useful" classes from the google protobuf descriptors. Then at iteration time, the "pure protobuf" object is hidden inside one of these new "useful" class instances, which offers tab completion on members. And only when a member is acutally accessed the instance looks at its internal pure_protobuf and performs whatever conversion is needed.
The thing is ... I have no idea (yet) how to do this and if initializing these instances in the end is more costly than just leaving the situation as it is.
I believe this is worth a little study. At least it is worth it, if already now users are having the feeling that the read performance in python should be better.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The
protozfits.File
can be used in two ways:Slow but comfortable
Doing this will give you a very easy to use
event
.Faster but annoying
Assuming you need faster iteration speed, then you can omit the conversion to "useful" Python things completely, by setting
pure_protobuf=True
:Now
event
is a Python object, just like before, but it has no tab-completion, all its array-like members are not numpy-arrays but instances of AnyArray that need to be converted using for exampleprotozfits.any_array_to_numpy
.This means, people who know for example they need only access to one very specific member of the
event
which also happens to be an integer, they can usepure_protobuf=True
and iterate faster over all events of the file.There must be a better way!
Offering these two possibilities might sound nice in the first moment, but it splits the users in two groups. Those who need speed and those who want to keep it simple. I believe a good software does not force users to choose between these two options, at least not at the very start of every project .. and this reader happens to be at the start ...
These are the features the "comfy" option has and the "fast" option is missing:
The speedwise problematic part in my understanding is the "auto numpy conversion". All arrays are always converted from AnyArray to numpy, even when the user never accesses them.
So the solution to the problem seems to be lazy evaluation.
At the moment (in the "comfy" version) the whole
event
is converted into acollections.named_tuple
and given to the user. This conversion includes converting all AnyArrays into np.arrays.Instead what could be done at import time is dynamically generate "useful" classes from the google protobuf descriptors. Then at iteration time, the "pure protobuf" object is hidden inside one of these new "useful" class instances, which offers tab completion on members. And only when a member is acutally accessed the instance looks at its internal pure_protobuf and performs whatever conversion is needed.
The thing is ... I have no idea (yet) how to do this and if initializing these instances in the end is more costly than just leaving the situation as it is.
I believe this is worth a little study. At least it is worth it, if already now users are having the feeling that the read performance in python should be better.
The text was updated successfully, but these errors were encountered: