You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the datasets library we are using ParquetFileFragment.to_batches() to stream batches of data while applying filters file-per-file. We create fragments from file-like objects (because files can be local or remote).
The code hangs very often, but in rare random cases it is able to terminate.
The issue appears when running the python script, but doesn't appear in google colab or in ipython.
The issue also appears in eltorio/ROCOv2-radiology which happens to also contain binary types. The issue doesn't seem to appear in datasets like AI-MO/NuminaMath-CoT which don't contain binary types.
In the original issue in datasets this message was also reported:
Fatal Python error: PyGILState_Release: thread state 0x7fa1f409ade0 must be current when releasing
Python runtime state: finalizing (tstate=0x0000000000ad2958)
Thread 0x00007fa33d157740 (most recent call first):
<no Python frame>
Component(s)
Parquet, Python
The text was updated successfully, but these errors were encountered:
In the
datasets
library we are usingParquetFileFragment.to_batches()
to stream batches of data while applying filters file-per-file. We create fragments from file-like objects (because files can be local or remote).However @AlexKoff88 reported that for some datasets like phiyodr/InpaintCOCO it causes the code to hang at huggingface/datasets#7357.
I managed to make a reproducible example:
file info:
Environment:
The code hangs very often, but in rare random cases it is able to terminate.
The issue appears when running the python script, but doesn't appear in google colab or in ipython.
The issue also appears in eltorio/ROCOv2-radiology which happens to also contain binary types. The issue doesn't seem to appear in datasets like AI-MO/NuminaMath-CoT which don't contain binary types.
In the original issue in
datasets
this message was also reported:Component(s)
Parquet, Python
The text was updated successfully, but these errors were encountered: