Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not handle ' characters in PARTITIONED BY clause #9714

Open
alamb opened this issue Mar 20, 2024 · 3 comments
Open

Can not handle ' characters in PARTITIONED BY clause #9714

alamb opened this issue Mar 20, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented Mar 20, 2024

Describe the bug

#9604 from @metesynnada rationalied COPY option handling 🙏

However one of the tests needed to be commented out: https://github.com/apache/arrow-datafusion/blob/1d0171ab9d33fc7896861dee85804d7daf0a6390/datafusion/sqllogictest/test_files/copy.slt#L114-L142

To Reproduce

@metesynnada explained on synnada-ai#10 (comment)

Before this pull request, there was a problem with how escape characters in the PARTITIONED BY clause were processed in both the CREATE EXTERNAL TABLE and COPY TO statements. The issue stemmed from converting the Value to a string too early in the parsing process, leading to errors.

Although it appears that the test passes, in reality, during the execution of CREATE EXTERNAL TABLE, the specified field names do not match as expected, indicating a silent error. This issue should be addressed separately, as it falls outside the scope of the current PR.

I propose that we standardize the approach to handling column names in both the schema definitions and the PARTITIONED BY clauses. Once we establish a consistent method, the relevance of this test can be reassessed.

I will proceed to open an issue to address it.

Expected behavior

The test should pass

Additional context

No response

@yyy1000
Copy link
Contributor

yyy1000 commented Mar 26, 2024

I'd like to try this, from the description,

Until the partition by parsing uses ColumnDef

So maybe trying to make partition by use ColumnDef, currently, my plan is:

  1. Make table_partition_cols field in https://github.com/apache/arrow-datafusion/blob/39f4aaf5cd1abfc9204c3eb96effdb4ebcf5b882/datafusion/sql/src/parser.rs#L206 CreateExternalTable statement to Vec<ColumnRef>
  2. make parse_partitions return ColumnRef like parse_columns. https://github.com/apache/arrow-datafusion/blob/39f4aaf5cd1abfc9204c3eb96effdb4ebcf5b882/datafusion/sql/src/parser.rs#L542
  3. In external_table_to_plan, do the same thing to table_partition_cols like column_defaults https://github.com/apache/arrow-datafusion/blob/39f4aaf5cd1abfc9204c3eb96effdb4ebcf5b882/datafusion/sql/src/statement.rs#L984-L987

So this will not convert the Ident value to string too early, but I also doubt whether it would fix this. I encountered similar issue related to ' characters in my past experience and it's a bit tricky.

@alamb
Copy link
Contributor Author

alamb commented Mar 27, 2024

@metesynnada I wonder if you have any thoughts on the suggestion above from @yyy1000 ?

@metesynnada
Copy link
Contributor

I believe this is a good start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants