Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support array_concat for Utf8View #14378

Merged
merged 4 commits into from
Feb 3, 2025
Merged

Support array_concat for Utf8View #14378

merged 4 commits into from
Feb 3, 2025

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 30, 2025

Which issue does this PR close?

Rationale for this change

As part of completing #13504 it is important to supoprt Utf8View in array_concat,

It turns out the only user of get_wider_type as noted by @jayzhan211 and @Omega359 in #13370 (comment) and in fact get_wider_type is not necessary

What changes are included in this PR?

  1. Add tests showing behavior of array_concat with different string types (see first commit)
  2. Remove unnecessary get_wider_type

Are these changes tested?

Yes, new tests

Are there any user-facing changes?

Error messages change slightly

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 30, 2025
/// Returns the wider type among arguments `lhs` and `rhs`.
/// The wider type is the type that can safely represent values from both types
/// without information loss. Returns an Error if types are incompatible.
pub fn get_wider_type(lhs: &DataType, rhs: &DataType) -> Result<DataType> {
Copy link
Contributor Author

@alamb alamb Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was used in array_concat only to compute the return type, and array_concat errors at runtime anyways if you pass it arrays with different types.

Thus this code is unnecessary so I propose removing it

> select array_concat([arrow_cast('1', 'LargeUtf8')], ['2']);
Arrow error: Invalid argument error: It is not possible to concatenate arrays of different data types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best optimization .. no code

@alamb alamb marked this pull request as draft January 30, 2025 21:11
arg_type.clone()
} else if !expr_type.equals_datatype(arg_type) {
return plan_err!(
"It is not possible to concatenate arrays of different types. Expected: {}, got: {}", expr_type, arg_type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the implementation of array_concat requires the field types of ListArray to be the same and errors at runtime if they aren't

Previously the return type was calculated only to get a runtime error. I simply moved the check to planning time

----
[1, 2, 3]

# Concatenating stringview
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these three queries work fine on main

----
[1, 2, 3]

# Concatenating Mixed types (doesn't work)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these queries fail on main though the errors are different for the string view one (see the first commit of this PR)

It seems to me like this display is 🤮 due to the Display impl of DataType being crap.

I'll file an upstream ticket to make this easier to understand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

@alamb
Copy link
Contributor Author

alamb commented Feb 3, 2025

Thank you for the review @jayzhan211 and @Omega359

@alamb alamb merged commit edbdefe into apache:main Feb 3, 2025
25 checks passed
@alamb alamb deleted the alamb/fix_coerce branch February 4, 2025 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

get_wider_type in binary.rs does not support utf8view
3 participants