COPY fails on cli with Invalid statement #9927

hveiga · 2024-04-03T18:21:29Z

Describe the bug

I downloaded https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-37.0.0-rc2/ to test the new partition_by feature. I built datafusion-cli by running cargo build --release under datafusion-cli.

The use case is simple: load a parquet file and create multiple parquet files using hive-partitioned partitions.
When I try to run the documented COPY command on https://arrow.apache.org/datafusion/user-guide/sql/write_options.html I get an error.

To Reproduce

Build datafusion-cli from https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-37.0.0-rc2/.
Run ./datafusion-cli.
Create a table from a parquet file:

CREATE EXTERNAL TABLE t1
STORED AS PARQUET
LOCATION '/tmp/file.parquet';

Execute partition_by command:

COPY t1 TO '/tmp/hive_output/' (format parquet, partition_by 'col1');

Get an error: 🤔 Invalid statement: sql parser error: Unexpected token (

Expected behavior

Have the COPY statement generate the expected hive-partitioned parquet files.

Additional context

I don't know if I might be having an issue with my SQL statements or the COPY documentation is incorrect. Still, I thought it was good to report before 37.0.0 gets released. #9682

Thank you!

The text was updated successfully, but these errors were encountered:

tinfoil-knight · 2024-04-03T19:35:22Z

The documentation is outdated.

You need to specify options in this format: OPTIONS (...)
Also, partition_by needs to be specified separately as PARTITIONED BY (<column>) & not with the options.

This is how your query would look now:

COPY t1 TO '/tmp/hive_output/' PARTITIONED BY (col1) OPTIONS (format parquet);

tinfoil-knight · 2024-04-03T19:45:53Z

Note to Maintainers:
#9905 will remove any compatibility with the old syntax entirely so IMO we should only add the new syntax in updated docs.

hveiga · 2024-04-03T19:47:53Z

Just tested and COPY t1 TO '/tmp/hive_output/' PARTITIONED BY (col1) OPTIONS (format parquet); works as expected. Thank you!

tinfoil-knight · 2024-04-03T21:11:13Z

@hveiga I think we should keep this issue open until the documentation is updated with the latest syntax.

hveiga · 2024-04-03T21:17:34Z

@hveiga I think we should keep this issue open until the documentation is updated with the latest syntax.

Makes sense. Re-opening.

alamb · 2024-04-03T21:30:18Z

Looks like I missed a spot in #9754, sorry about that.

The new syntax is documented here: https://arrow.apache.org/datafusion/user-guide/sql/dml.html#copy

alamb · 2024-04-03T21:30:40Z

I'll make a PR

alamb · 2024-04-03T21:43:00Z

#9931

hveiga added the bug Something isn't working label Apr 3, 2024

hveiga closed this as completed Apr 3, 2024

hveiga reopened this Apr 3, 2024

alamb mentioned this issue Apr 3, 2024

Update documentation for COPY command #9931

Merged

comphead closed this as completed in #9931 Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COPY fails on cli with Invalid statement #9927

COPY fails on cli with Invalid statement #9927

hveiga commented Apr 3, 2024

tinfoil-knight commented Apr 3, 2024 •

edited

Loading

tinfoil-knight commented Apr 3, 2024

hveiga commented Apr 3, 2024

tinfoil-knight commented Apr 3, 2024

hveiga commented Apr 3, 2024

alamb commented Apr 3, 2024

alamb commented Apr 3, 2024

alamb commented Apr 3, 2024

COPY fails on cli with Invalid statement #9927

COPY fails on cli with Invalid statement #9927

Comments

hveiga commented Apr 3, 2024

Describe the bug

To Reproduce

Expected behavior

Additional context

tinfoil-knight commented Apr 3, 2024 • edited Loading

tinfoil-knight commented Apr 3, 2024

hveiga commented Apr 3, 2024

tinfoil-knight commented Apr 3, 2024

hveiga commented Apr 3, 2024

alamb commented Apr 3, 2024

alamb commented Apr 3, 2024

alamb commented Apr 3, 2024

tinfoil-knight commented Apr 3, 2024 •

edited

Loading