Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Support serialization of Arrow files on disk without the identifier "Feather" #38515

Open
jason-s opened this issue Oct 30, 2023 · 2 comments

Comments

@jason-s
Copy link

jason-s commented Oct 30, 2023

Describe the enhancement requested

The documentation for Arrow Columnar Format suggests that the separate Feather project has been subsumed into Arrow, and that it (Feather) is really just the canonical serialization format for Arrow tables:

We recommend the “.arrow” extension for files created with this format. Note that files created with this format are sometimes called “Feather V2” or with the “.feather” extension, the name and the extension derived from “Feather (V1)”, which was a proof of concept early in the Arrow project for language-agnostic fast data frame storage for Python (pandas) and R.

The Python support of Arrow serialization still uses the identifier feather: (see the Cookbook)

Once we have a table, it can be written to a Feather File using the functions provided by the pyarrow.feather module

import pyarrow.feather as ft

ft.write_feather(table, 'example.feather')

This functionality should be kept as is, for backwards compatibility, but I wonder if the pyarrow module should just have a write() function, without requiring the need to import the pyarrow.feather package or use the term feather. This would help to reduce confusion about file extensions and the relationship between "Arrow" and "Feather".

Component(s)

Python

@jason-s
Copy link
Author

jason-s commented Oct 30, 2023

See also apache/arrow-cookbook#329

@lidavidm
Copy link
Member

This came up again: apache/arrow-site#586 (comment)

We should:

  • Deprecate pyarrow.feather
  • Add write_table etc to pyarrow.ipc to be consistent with pyarrow.csv and pyarrow.parquet
    • Additionally, why is it pyarrow.parquet.write_table but then pyarrow.csv.write_csv and pyarrow.feather.write_feather? We should be consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants