Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support exporting to a local file via datafusion-cli #1214

Open
1 task
alamb opened this issue Nov 1, 2021 · 9 comments
Open
1 task

Support exporting to a local file via datafusion-cli #1214

alamb opened this issue Nov 1, 2021 · 9 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Nov 1, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
It would be great / handy to use the datafusion-cli to transform data and write the transformed data to another format (e.g. convert from CSV to parquet, with possibly some transformations along the way.

Describe the solution you'd like
I propose adding the INTO clause like clickhouse or mysql;
https://clickhouse.com/docs/en/faq/integration/file-export/
https://dev.mysql.com/doc/refman/5.7/en/select-into.html

So one could export the contents of a table into a file using a command such as

SELECT * FROM table INTO OUTFILE 'file' FORMAT CSV

Like #1213 I think this feature should be something that can be disabled for those implementations that do not want to allow their users to write to local files

Describe alternatives you've considered
We could potentially leave such features out of datafusion, as it exists in tools that use DataFusion such as https://github.com/roapi/roapi/tree/main/columnq-cli#format-conversion

Note that postgres uses the COPY command for the same purpose -- https://www.postgresql.org/docs/8.1/sql-copy.html

So like COPY foo to 'file'

Additional context

@alamb alamb added the enhancement New feature or request label Nov 1, 2021
@alamb
Copy link
Contributor Author

alamb commented Nov 1, 2021

cc @jimexist @alippai @Dandandan

@jimexist
Copy link
Member

jimexist commented Nov 1, 2021

that's something i'd want but before this i'd suppose we need better line editing and command execution structure for the cli, something on par with clickhouse-cli

@alamb
Copy link
Contributor Author

alamb commented Nov 1, 2021

auto complete for datafusion-cli is something I would love to have, FWIW

@jimexist
Copy link
Member

jimexist commented Nov 1, 2021

related #1216

@jimexist
Copy link
Member

jimexist commented Nov 1, 2021

in #1216 we can use a command to point the output to one file and then i guess we don't need to modify SQL?

@jimexist
Copy link
Member

jimexist commented Nov 6, 2021

upstream apache/datafusion-sqlparser-rs#364

@jimexist
Copy link
Member

jimexist commented Nov 6, 2021

@alamb I think you need to clarify on whether this is COPY the statement or \copy the command. the latter is executed on the client side.

@alamb
Copy link
Contributor Author

alamb commented Nov 7, 2021

@jimexist I don't have a clear thought on the mater. There are tradeoffs:

  1. Doing it client side (e.g. \copy ) has the benefit of easy to use
  2. Doing it "server" side (e.g. COPY or SELECT .. INTO can likely be made much faster but has security implications as well

The more I think about it, maybe we should leave such features out of datafusion (server side). The rationale is that the feature already exists in tools that use DataFusion such as https://github.com/roapi/roapi/tree/main/columnq-cli#format-conversion as @houqp pointed out.

So maybe adding it to datafusion-cli as a client side feature would be good enough?

@jimexist
Copy link
Member

jimexist commented Nov 7, 2021

@jimexist I don't have a clear thought on the mater. There are tradeoffs:

  1. Doing it client side (e.g. \copy ) has the benefit of easy to use
  2. Doing it "server" side (e.g. COPY or SELECT .. INTO can likely be made much faster but has security implications as well

The more I think about it, maybe we should leave such features out of datafusion (server side). The rationale is that the feature already exists in tools that use DataFusion such as https://github.com/roapi/roapi/tree/main/columnq-cli#format-conversion as @houqp pointed out.

So maybe adding it to datafusion-cli as a client side feature would be good enough?

i'm currently leaning towards client side as well, for the same reasoning you gave above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants