Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not find heap data for the \'MATRIX\' variable-length array column. #288

Closed
volodymyrss opened this issue Sep 1, 2024 · 15 comments · Fixed by #287
Closed

Could not find heap data for the \'MATRIX\' variable-length array column. #288

volodymyrss opened this issue Sep 1, 2024 · 15 comments · Fixed by #287
Assignees

Comments

@volodymyrss
Copy link
Member

An issue observed in isgri-expert workflow by @ferrigno . Might have something to do with new python? Or not.

https://cdci.sentry.io/issues/5783401903/?alert_rule_id=723253&alert_timestamp=1725201418390&alert_type=email&environment=production&notification_uuid=18b8fc99-5748-4013-b594-1cf172b92a6c&project=1467382&referrer=alert_email

@ferrigno
Copy link
Contributor

ferrigno commented Sep 2, 2024

I have not this issue in my virtual environment. Which is the docker running the service?

@volodymyrss
Copy link
Member Author

I have not this issue in my virtual environment. Which is the docker running the service?

This is not happening in your service.

@burnout87
Copy link
Collaborator

Could it be something related to the astropy version?

@volodymyrss
Copy link
Member Author

I would guess possibly related to astropy, numpy, and python versions.

@burnout87
Copy link
Collaborator

I reproduced it locally, and it happens here

logger.debug(f'data: {self.data}')

@burnout87
Copy link
Collaborator

After some more debugging with @ferrigno , we think we figured out where the problem is.

It seems to be related to the decode functionality of the class NumpyDataUnit, when called inside the decode function of the NumpyDataProduct class.

We tried to go through each step, and it seems that there is a problem the moment we try to pickle.loads the binary data of the encoded object.

_data = pickle.loads(_binarys,encoding='bytes')

While researching online, it seems that "pickle isn't supported for HDUs ..." which might explains our issue.

astropy/astropy#15017

What do you think? Perhaps an options is to rewrite those functions, or use a different approach

@volodymyrss
Copy link
Member Author

It is not supported in a sense that nobody guarantees it works. Since we really need serialization of fits objects, it's our responsibility to make sure it does. In the past we made changes and contributions to public packages to make sure this functionality works.

I propose you first of all make a narrow test in oda-api to demonstrate exactly the serialization problem.

Then we see how to best fix it. Actually maybe in our case we can avoid using BufferedReader since we need the entire content at once anyway. May be a way of opening/creating the FITS file/HDU.

@andreatramacere
Copy link
Contributor

This morning I had a zoom meeting with @burnout87, I made a test with ISGRI rmf files, with the same structure as in the IBIS one, at least for the column storing the RMF matrix. The encoding/decoding workflow works as expected, indeed, if that would not work, the entire workflow tested in the oda_api would fail. This means that the issue could be in the IBIS rmf file or in the astropy.fits module. @ferrigno did you ever test IBIS rmf against oda_api? Is it possible that software writing the IBIS rmf uses a different approach or a different cfitsio version compared to the one used in ISGRI? Another possibility could be related to the size of the matrix exceed some buffer. By the way I guess IBIS and ISGRI are different instrument

Probably, the best approach, would be to write fake RMF files, increasing incrementally the size of the matrix column (both in number of rows, and element per row) and check if the problem shows up above a given size

@volodymyrss
Copy link
Member Author

This morning I had a zoom meeting with @burnout87, I made a test with ISGRI rmf files, with the same structure as in the IBIS one, at least for the column storing the RMF matrix. The encoding/decoding workflow works as expected, indeed, if that would not work, the entire workflow tested in the oda_api would fail. This means that the issue could be in the IBIS rmf file or in the astropy.fits module. @ferrigno did you ever test IBIS rmf against oda_api? Is it possible that software writing the IBIS rmf uses a different approach or a different cfitsio version compared to the one used in ISGRI? Another possibility could be related to the size of the matrix exceed some buffer. By the way I guess IBIS and ISGRI are different instrument

Probably, the best approach, would be to write fake RMF files, increasing incrementally the size of the matrix column (both in number of rows, and element per row) and check if the problem shows up above a given size

ISGRI is part of IBIS. ISGRI is the only part of IBIS we use in ODA. For us IBIS==ISGRI.

There can be differences, especially in what concerns variable size arrays. Which is why I ask @burnout87 to please add a test for the file @ferrigno had trouble with.

I agree that there must be something peculiar or problematic about this file. And indeed we were writing and passing this kind of files RMF since the very beginning.

@ferrigno
Copy link
Contributor

All concurs towards the fact that one needs to re-engineer the oda_api data_product handling with numpy>=2.0.
Now astropy is compatible with numpy>=2.0.0, but oda_api not anymore.
One needs to make a completely clean installation to find it out.

@volodymyrss
Copy link
Member Author

All concurs towards the fact that one needs to re-engineer the oda_api data_product handling with numpy>=2.0. Now astropy is compatible with numpy>=2.0.0, but oda_api not anymore. One needs to make a completely clean installation to find it out.

Completely clean installation is in the CI in the actions.
Did you already try to add the test?

@ferrigno
Copy link
Contributor

Not finished to make the test, maybe later.

@volodymyrss
Copy link
Member Author

It remains unclear to me why it's hard to explain that starting the the test is the best way to achieve clean result which can be discussed and communicated easily.

@ferrigno
Copy link
Contributor

ferrigno commented Oct 16, 2024

test added

def test_rmf():

it fails with :
astropy version 6.1.3
numpy version 1.26.4
oda_api Version: 1.2.28

@ferrigno
Copy link
Contributor

See the related PR #286
Reproduced the failure in the test on CI

@burnout87 burnout87 transferred this issue from oda-hub/dispatcher-app Oct 17, 2024
This was linked to pull requests Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants