You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When scanning a Delta Table, the native Parquet reader scans all the Parquet files without pruning partitions. This is achieved with file_uris() call while scanning.
This is not a problem when the number of partitions is small. However, when the number is large (in my case 12 years of daily partitions) the Parquet reader scans every single file. The result will be the same, but lots of Parquet metadata have to be downloaded and checked.
Since the scan_delta function can't have access to filters applied afterwards, I was wondering if an extra partition_filters argument could be added, so it could be passed through file_uris(), as per the Delta documentation.
I'd be happy to open a PR myself if the issue gets accepted (I'd need some guidance to not break type checks though!). I'd start with something like
Description
When scanning a Delta Table, the native Parquet reader scans all the Parquet files without pruning partitions. This is achieved with
file_uris()
call while scanning.https://github.com/pola-rs/polars/blob/main/py-polars/polars/io/delta.py#L393
This is not a problem when the number of partitions is small. However, when the number is large (in my case 12 years of daily partitions) the Parquet reader scans every single file. The result will be the same, but lots of Parquet metadata have to be downloaded and checked.
Since the
scan_delta
function can't have access to filters applied afterwards, I was wondering if an extrapartition_filters
argument could be added, so it could be passed throughfile_uris()
, as per the Delta documentation.I'd be happy to open a PR myself if the issue gets accepted (I'd need some guidance to not break type checks though!). I'd start with something like
Thanks for the hard work on Polars!
The text was updated successfully, but these errors were encountered: