-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support format in options of COPY command #9744
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some notes for the reviewer.
---- | ||
1 | ||
|
||
query error DataFusion error: Invalid or Unsupported Configuration: This feature is not implemented: Unknown FileType: NOTVALIDFORMAT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error feels a bit long to me compared to others. We can reduce it to:
DataFusion error: This feature is not implemented: Unknown FileType: NOTVALIDFORMAT
I didn't do this because all the other branches were wrapping downstream errors with DataFusionError::Configuration
& then returning them.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the error could be nicer, but in this case I think we should keep the same basic pattern as the rest of the code (and perhaps update the pattern in a follow on PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thank you @tinfoil-knight!
We could perhaps add a test to validate the behavior when both stored as and format are specified in the same query. (i.e. stored as should take precedent).
@devinjdangelo I've added the test. Thank you for the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tinfoil-knight and @devinjdangelo -- the code looks good to me
I also tested it out locally and it worked great 🙏
DataFusion CLI v36.0.0
❯ COPY (select * from (values (1))) to '/tmp/copy.dat' OPTIONS (format parquet);
+-------+
| count |
+-------+
| 1 |
+-------+
1 row in set. Query took 0.030 seconds.
❯ \q
(venv) andrewlamb@Andrews-MacBook-Pro:~/Software/arrow-datafusion/datafusion-cli$ file /tmp/copy.dat
/tmp/copy.dat: Apache Parquet
While reviewing the PR I think it would be good to verify that the files created are actually the specified format as well as update the documentation. I'll make a follow on PR
Update: #9753
This reverts commit 40fb1b8.
Which issue does this PR close?
Closes #9713 .
Rationale for this change
Please see the description of the issue.
What changes are included in this PR?
If a
format
key is passed in the format / write options of theCOPY
command, we're using it to determine thefile_type
which is used to resolve format options for theCOPY
statement.Are these changes tested?
Yes. New tests have been added.
Are there any user-facing changes?
This PR is to improve backwards compatibility before the next release so no new changes from user's PoV.