Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support format in options of COPY command #9744

Merged
merged 3 commits into from
Mar 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions datafusion/sql/src/statement.rs
Original file line number Diff line number Diff line change
Expand Up @@ -850,7 +850,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
return plan_err!("Unsupported Value in COPY statement {}", value);
}
};
if !(&key.contains('.')) {
if !(key.contains('.') || key == "format") {
// If config does not belong to any namespace, assume it is
// a format option and apply the format prefix for backwards
// compatibility.
Expand All @@ -866,12 +866,16 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
FileType::from_str(&file_type).map_err(|_| {
DataFusionError::Configuration(format!("Unknown FileType {}", file_type))
})?
} else if let Some(format) = options.remove("format") {
// try to infer file format from the "format" key in options
FileType::from_str(&format)
.map_err(|e| DataFusionError::Configuration(format!("{}", e)))?
} else {
let e = || {
DataFusionError::Configuration(
"Format not explicitly set and unable to get file extension! Use STORED AS to define file format."
.to_string(),
)
"Format not explicitly set and unable to get file extension! Use STORED AS to define file format."
.to_string(),
)
};
// try to infer file format from file extension
let extension: &str = &Path::new(&statement.target)
Expand Down
12 changes: 12 additions & 0 deletions datafusion/sql/tests/sql_integration.rs
Original file line number Diff line number Diff line change
Expand Up @@ -442,6 +442,18 @@ CopyTo: format=csv output_url=output.csv options: ()
quick_test(sql, plan);
}

#[test]
fn plan_copy_stored_as_priority() {
let sql = "COPY (select * from (values (1))) to 'output/' STORED AS CSV OPTIONS (format json)";
let plan = r#"
CopyTo: format=csv output_url=output/ options: (format json)
Projection: column1
Values: (Int64(1))
"#
.trim();
quick_test(sql, plan);
}

#[test]
fn plan_insert() {
let sql =
Expand Down
38 changes: 38 additions & 0 deletions datafusion/sqllogictest/test_files/copy.slt
Original file line number Diff line number Diff line change
Expand Up @@ -514,6 +514,44 @@ OPTIONS (
);


# Format Options Support with format in OPTIONS i.e. COPY { table_name | query } TO 'file_name' OPTIONS (format <format-name>, ...)

query I
COPY (select * from (values (1))) to 'test_files/scratch/copy/'
OPTIONS (format parquet);
----
1

query I
COPY (select * from (values (1))) to 'test_files/scratch/copy/'
OPTIONS (format parquet, compression 'zstd(10)');
----
1

query I
COPY (select * from (values (1))) to 'test_files/scratch/copy/'
OPTIONS (format json, compression gzip);
----
1

query I
COPY (select * from (values (1))) to 'test_files/scratch/copy/'
OPTIONS (
format csv,
has_header false,
compression xz,
datetime_format '%FT%H:%M:%S.%9f',
delimiter ';',
null_value 'NULLVAL'
);
----
1

query error DataFusion error: Invalid or Unsupported Configuration: This feature is not implemented: Unknown FileType: NOTVALIDFORMAT
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error feels a bit long to me compared to others. We can reduce it to:
DataFusion error: This feature is not implemented: Unknown FileType: NOTVALIDFORMAT

I didn't do this because all the other branches were wrapping downstream errors with DataFusionError::Configuration & then returning them.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the error could be nicer, but in this case I think we should keep the same basic pattern as the rest of the code (and perhaps update the pattern in a follow on PR)

COPY (select * from (values (1))) to 'test_files/scratch/copy/'
OPTIONS (format notvalidformat, compression 'zstd(5)');


# Error cases:

# Copy from table with options
Expand Down
Loading