Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement] Can this lib support npy append other data type? like string #8

Open
zhuwenxing opened this issue Jul 27, 2023 · 1 comment

Comments

@zhuwenxing
Copy link

It works well to append an item if this item is an array.
But it the item is an object like a string, then there will be error

[2023-07-27T07:48:55.983Z] 
[2023-07-27T07:48:55.983Z] array = 'FchBBdlm2uMLLr1mVjpwtFsnnSmkEvLbWo1DXvOAHbx9JAvWu9vH45eTBRCUqAZjU0Bmz34loB84MYKjekcWTFZaGh1fbAU61G9Tu6TJbIVLkb7t0jIzs...hRIJ8LTLVOCbYqEprSz55WXhvJ8NX1LoNVFJ3kA3YUp8NNNPQjJeZLQdhrb4GYvrxOVjt1Tx7QGgPO00kYa8cOjagDwlNSKLw3Dr6XRzeu3HXyDFpTanZQ'
[2023-07-27T07:48:55.983Z] 
[2023-07-27T07:48:55.983Z]     def header_data_from_array_1_0(array):
[2023-07-27T07:48:55.983Z]         """ Get the dictionary of header metadata from a numpy.ndarray.
[2023-07-27T07:48:55.983Z]     
[2023-07-27T07:48:55.983Z]         Parameters
[2023-07-27T07:48:55.983Z]         ----------
[2023-07-27T07:48:55.983Z]         array : numpy.ndarray
[2023-07-27T07:48:55.983Z]     
[2023-07-27T07:48:55.983Z]         Returns
[2023-07-27T07:48:55.983Z]         -------
[2023-07-27T07:48:55.983Z]         d : dict
[2023-07-27T07:48:55.983Z]             This has the appropriate entries for writing its string representation
[2023-07-27T07:48:55.983Z]             to the header of the file.
[2023-07-27T07:48:55.983Z]         """
[2023-07-27T07:48:55.983Z] >       d = {'shape': array.shape}
[2023-07-27T07:48:55.983Z] E       AttributeError: 'str' object has no attribute 'shape'
[2023-07-27T07:48:55.983Z] 
[2023-07-27T07:48:55.983Z] /usr/local/lib/python3.8/dist-packages/numpy/lib/format.py:357: AttributeError
@xor2k
Copy link
Owner

xor2k commented Nov 20, 2023

Sorry for the late reply!

As soon as you have arrays of type "object", Numpy defaults back to the general Python pickling mechanism. Numpy Append Array relies on native Numpy arrays and fixed-size dtypes (objects are not fixed-sized). Otherwise, appending would not be efficient anymore and things would need to be done in memory anyway. Also note that "object" arrays cannot be memory mapped for the same reason.

The problem is that string arrays are not so well supported in Numpy yet. There are some new developments though:

https://numpy.org/neps/nep-0055-string_dtype.html

What could be a solution is to use two arrays: one for the actual text, stored as bytes and one for offsets (e.g. int64) and (redundant) lengths into that array. This is a pretty common approach can be handled with NpyAppendArray.

Does this answer help you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants