Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Coalesce casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion #10268

Merged
merged 43 commits into from
May 26, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
c79156f
remove casting for coalesce
jayzhan211 Apr 27, 2024
a36e6b2
add more test
jayzhan211 Apr 27, 2024
bf16c92
add more test
jayzhan211 Apr 27, 2024
407e3c7
crate only visibility
jayzhan211 Apr 27, 2024
03b9162
polish comment
jayzhan211 Apr 27, 2024
4abf29d
improve test
jayzhan211 Apr 27, 2024
4965e8d
backup
jayzhan211 Apr 28, 2024
81f0235
introduce new signautre for coalesce
jayzhan211 Apr 28, 2024
bae996c
cleanup
jayzhan211 Apr 28, 2024
ddf9b1c
cleanup
jayzhan211 Apr 28, 2024
c2799ea
ignore err msg
jayzhan211 Apr 28, 2024
2574896
fmt
jayzhan211 Apr 28, 2024
6a17e57
fix doc
jayzhan211 Apr 28, 2024
4cba8c5
cleanup
jayzhan211 Apr 28, 2024
f1cfb8d
add more test
jayzhan211 Apr 28, 2024
d2e83d3
switch to type_resolution coercion
jayzhan211 Apr 28, 2024
3a88ad7
Merge remote-tracking branch 'upstream/main' into fix-coelesce
jayzhan211 Apr 29, 2024
03880a3
fix i64 and u64 case
jayzhan211 Apr 29, 2024
481f548
add more tests
jayzhan211 Apr 29, 2024
dfc4176
cleanup
jayzhan211 Apr 29, 2024
46a9060
add null case
jayzhan211 Apr 29, 2024
d656645
fmt
jayzhan211 Apr 29, 2024
5683447
fix
jayzhan211 Apr 29, 2024
b949fae
rename to type_union_resolution
jayzhan211 Apr 29, 2024
5aaeb5b
add comment
jayzhan211 Apr 29, 2024
a968c0e
Merge remote-tracking branch 'upstream/main' into fix-coelesce
jayzhan211 May 1, 2024
cf679c5
Merge remote-tracking branch 'upstream/main' into fix-coelesce
jayzhan211 May 2, 2024
15471ab
cleanup
jayzhan211 May 2, 2024
e5cc46b
fix test
jayzhan211 May 2, 2024
a810e85
add comment
jayzhan211 May 2, 2024
cb16cda
rm test
jayzhan211 May 3, 2024
53bedda
Merge remote-tracking branch 'upstream/main' into fix-coelesce
jayzhan211 May 12, 2024
a37da2d
cleanup since rebase
jayzhan211 May 12, 2024
70239e0
add more test
jayzhan211 May 12, 2024
be116f8
add more test
jayzhan211 May 12, 2024
8f4e991
fix msg
jayzhan211 May 12, 2024
6a8fe6f
Merge remote-tracking branch 'upstream/main' into fix-coelesce
jayzhan211 May 14, 2024
20e618e
Merge remote-tracking branch 'upstream/main' into fix-coelesce
jayzhan211 May 14, 2024
4153593
fmt
jayzhan211 May 14, 2024
030a519
rm pure_string_coercion
jayzhan211 May 14, 2024
5b797d5
rm duplicate
jayzhan211 May 14, 2024
b954479
change type in select.slt
jayzhan211 May 25, 2024
829b5a2
fix slt
jayzhan211 May 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 22 additions & 2 deletions datafusion/expr/src/signature.rs
Original file line number Diff line number Diff line change
Expand Up @@ -92,14 +92,19 @@ pub enum TypeSignature {
/// A function such as `concat` is `Variadic(vec![DataType::Utf8, DataType::LargeUtf8])`
Variadic(Vec<DataType>),
/// One or more arguments of an arbitrary but equal type.
/// DataFusion attempts to coerce all argument types to match the first argument's type
/// DataFusion attempts to coerce all argument types to match to the common type with comparision coercion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please link to https://docs.rs/datafusion/latest/datafusion/logical_expr/type_coercion/binary/fn.comparison_coercion.html that explains (however limited) what comparison coercion is?

///
/// # Examples
/// Given types in signature should be coercible to the same final type.
/// A function such as `make_array` is `VariadicEqual`.
///
/// `make_array(i32, i64) -> make_array(i64, i64)`
VariadicEqual,

This comment was marked as outdated.

/// One or more arguments of an arbitrary but equal type or Null.
/// No coercion is attempted.
///
/// Functions like `coalesce` is `VariadicEqual`.
VariadicEqualOrNull,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a similar signature but an exact args number for nullif and nvl.
Can name it as UniformEqual compare to VariadicEqual 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could do something like this (basically to flavor the type signature) 🤔

pub enum TypeSignature {
...
  /// Rather than the usual coercion rules, special type union rules are applied
  Union(Box<TypeSignature>)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea!

/// One or more arguments with arbitrary types
VariadicAny,
/// Fixed number of arguments of an arbitrary but equal type out of a list of valid types.
Expand Down Expand Up @@ -193,6 +198,9 @@ impl TypeSignature {
TypeSignature::VariadicEqual => {
vec!["CoercibleT, .., CoercibleT".to_string()]
}
TypeSignature::VariadicEqualOrNull => {
vec!["T or Null, .., T or Null".to_string()]
}
TypeSignature::VariadicAny => vec!["Any, .., Any".to_string()],
TypeSignature::OneOf(sigs) => {
sigs.iter().flat_map(|s| s.to_string_repr()).collect()
Expand Down Expand Up @@ -226,6 +234,11 @@ impl TypeSignature {
_ => false,
}
}

/// Skip coercion to match the signature.
pub(crate) fn skip_coercion(&self) -> bool {
matches!(self, TypeSignature::VariadicEqualOrNull)
jayzhan211 marked this conversation as resolved.
Show resolved Hide resolved
}
}

/// Defines the supported argument types ([`TypeSignature`]) and [`Volatility`] for a function.
Expand Down Expand Up @@ -255,13 +268,20 @@ impl Signature {
volatility,
}
}
/// An arbitrary number of arguments of the same type.
/// One or more number of arguments to the same type.
pub fn variadic_equal(volatility: Volatility) -> Self {
Self {
type_signature: TypeSignature::VariadicEqual,
volatility,
}
}
/// One or more number of arguments of the same type.
pub fn variadic_equal_or_null(volatility: Volatility) -> Self {
Self {
type_signature: TypeSignature::VariadicEqualOrNull,
volatility,
}
}
/// An arbitrary number of arguments of any type.
pub fn variadic_any(volatility: Volatility) -> Self {
Self {
Expand Down
49 changes: 30 additions & 19 deletions datafusion/expr/src/type_coercion/functions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ use arrow::{
use datafusion_common::utils::{coerced_fixed_size_list_to_list, list_ndims};
use datafusion_common::{internal_datafusion_err, internal_err, plan_err, Result};

use super::binary::{comparison_binary_numeric_coercion, comparison_coercion};
use super::binary::comparison_coercion;

/// Performs type coercion for function arguments.
///
Expand Down Expand Up @@ -62,11 +62,13 @@ pub fn data_types(
return Ok(current_types.to_vec());
}

// Try and coerce the argument types to match the signature, returning the
// coerced types from the first matching signature.
for valid_types in valid_types {
if let Some(types) = maybe_data_types(&valid_types, current_types) {
return Ok(types);
if !signature.type_signature.skip_coercion() {
// Try and coerce the argument types to match the signature, returning the
// coerced types from the first matching signature.
for valid_types in valid_types {
if let Some(types) = maybe_data_types(&valid_types, current_types) {
return Ok(types);
}
}
}

Expand Down Expand Up @@ -184,6 +186,27 @@ fn get_valid_types(
.iter()
.map(|valid_type| (0..*number).map(|_| valid_type.clone()).collect())
.collect(),
TypeSignature::VariadicEqualOrNull => {
current_types
.iter()
.find(|&t| t != &DataType::Null)
.map_or_else(
|| vec![vec![DataType::Null; current_types.len()]],
|t| {
let valid_types = current_types
.iter()
.map(|d| {
if d != &DataType::Null {
t.clone()
} else {
DataType::Null
}
})
.collect::<Vec<_>>();
vec![valid_types]
},
)
}
TypeSignature::VariadicEqual => {
let new_type = current_types.iter().skip(1).try_fold(
current_types.first().unwrap().clone(),
Expand Down Expand Up @@ -450,19 +473,7 @@ fn coerced_from<'a>(
{
Some(type_into.clone())
}
// More coerce rules.
// Note that not all rules in `comparison_coercion` can be reused here.
// For example, all numeric types can be coerced into Utf8 for comparison,
// but not for function arguments.
_ => comparison_binary_numeric_coercion(type_into, type_from).and_then(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is introduced in #9459, so I think it is safe to remove together with this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @viirya

|coerced_type| {
if *type_into == coerced_type {
Some(coerced_type)
} else {
None
}
},
),
_ => None,
}
}

Expand Down
11 changes: 6 additions & 5 deletions datafusion/functions/src/core/coalesce.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ use arrow::compute::{and, is_not_null, is_null};
use arrow::datatypes::DataType;

use datafusion_common::{exec_err, Result};
use datafusion_expr::type_coercion::functions::data_types;
use datafusion_expr::ColumnarValue;
use datafusion_expr::{ScalarUDFImpl, Signature, Volatility};

Expand All @@ -41,7 +40,7 @@ impl Default for CoalesceFunc {
impl CoalesceFunc {
pub fn new() -> Self {
Self {
signature: Signature::variadic_equal(Volatility::Immutable),
signature: Signature::variadic_equal_or_null(Volatility::Immutable),
}
}
}
Expand All @@ -60,9 +59,11 @@ impl ScalarUDFImpl for CoalesceFunc {
}

fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
// COALESCE has multiple args and they might get coerced, get a preview of this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 ❤️

let coerced_types = data_types(arg_types, self.signature());
coerced_types.map(|types| types[0].clone())
Ok(arg_types
.iter()
.find(|&t| t != &DataType::Null)
.unwrap_or(&DataType::Null)
.clone())
}

/// coalesce evaluates to the first value which is not NULL
Expand Down
2 changes: 1 addition & 1 deletion datafusion/sqllogictest/test_files/expr.slt
Original file line number Diff line number Diff line change
Expand Up @@ -1897,7 +1897,7 @@ SELECT substring('alphabet' for 1);
----
a

# The 'from' and 'for' parameters don't support string types, because they should be treated as
# The 'from' and 'for' parameters don't support string types, because they should be treated as
# regular expressions, which we have not implemented yet.
query error DataFusion error: Error during planning: No function matches the given name and argument types
SELECT substring('alphabet' FROM '3')
Expand Down
Loading
Loading