-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce user-defined signature #10439
Conversation
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
plan_datafusion_err!( | ||
"{}", | ||
"{} and {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep error for debugging
Signed-off-by: jayzhan211 <[email protected]>
@@ -833,12 +865,8 @@ mod test { | |||
signature: Signature::uniform(1, vec![DataType::Float32], Volatility::Stable), | |||
}) | |||
.call(vec![lit("Apple")]); | |||
let plan_err = Projection::try_new(vec![udf], empty) | |||
Projection::try_new(vec![udf], empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think matching err message is unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as the message is tested elsewhere I think this is fine
Signed-off-by: jayzhan211 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211 -- this looks great to me
Prior to merge, I think it is worth considering (but not required):
- Using a single new variant for
TypeSignature
(rather than 2) - Improving the doc strings (I left some suggestions)
- Moving the semantic changes to
coalsce
into their own PR
datafusion/expr/src/signature.rs
Outdated
/// | ||
/// `make_array(i32, i64) -> make_array(i64, i64)` | ||
VariadicEqual, | ||
/// One or more arguments of an arbitrary type and coerced with user-defined coercion rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since both TypeSignature::VariadicCoercion
and TypeCoercion::UniformCoercion
call into ScalarUDFImpl::coerce_types
I wonder if there is a reason to have both.
Maybe instead we could add a single variant for both cases. Perhaps something like:
pub enum TypeSignature {
...
/// The acceptable signature and coercions rules to coerce arguments to this
/// signature are special for this function. If this signature is specified,
/// Datafusion will call [`ScalarUDFImpl::coerce_types`] to prepare argument types.
FunctionDefined,
...
}
} | ||
TypeSignature::VariadicCoercion | TypeSignature::UniformCoercion(_) => { | ||
return internal_err!( | ||
"Coercion signature is handled in function-specific get_valid_types." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Coercion signature is handled in function-specific get_valid_types." | |
"Coercion signature should be handled by in function-specific coerce_types." |
@@ -111,6 +111,25 @@ impl ScalarUDFImpl for MakeArray { | |||
fn aliases(&self) -> &[String] { | |||
&self.aliases | |||
} | |||
|
|||
fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love it!
Ok(arg_types[1].clone()) | ||
} | ||
|
||
fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> { | ||
nvl2_func(args) | ||
} | ||
|
||
fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> { | ||
let new_type = arg_types.iter().skip(1).try_fold( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this pattern to coerce all arguments to the type of the first one is repeated several times -- maybe we could make some sort of utility function to do it
As a follow on PR perhaps
@@ -833,12 +865,8 @@ mod test { | |||
signature: Signature::uniform(1, vec![DataType::Float32], Volatility::Stable), | |||
}) | |||
.call(vec![lit("Apple")]); | |||
let plan_err = Projection::try_new(vec![udf], empty) | |||
Projection::try_new(vec![udf], empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as the message is tested elsewhere I think this is fine
@@ -914,7 +942,7 @@ mod test { | |||
.err() | |||
.unwrap(); | |||
assert_eq!( | |||
"type_coercion\ncaused by\nError during planning: Coercion from [Utf8] to the signature Uniform(1, [Float64]) failed.", | |||
"type_coercion\ncaused by\nError during planning: [data_types_with_aggregate_udf] Coercion from [Utf8] to the signature Uniform(1, [Float64]) failed.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could make this error say "User defined coercsion rule failed" so it was clearer what happened
@@ -23,7 +23,7 @@ select coalesce(1, 2, 3); | |||
1 | |||
|
|||
# test with first null | |||
query IT | |||
query ?T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still prefer to make the change to the coalsce coercion rules as its own PR, but I am also fine if we include it in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to keep the old signature VariadicEqual for coalesce to split the change for coalesce :(. I prefer to make the change at once.
datafusion/expr/src/udf.rs
Outdated
@@ -420,6 +425,11 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { | |||
fn short_circuits(&self) -> bool { | |||
false | |||
} | |||
|
|||
/// Coerce the types of the arguments to the types that the function expects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice -- I think it would be nice to expand this documentation to help people understand the API some more
Perhaps something like
/// Coerce the types of the arguments to the types that the function expects | |
/// Coerce arguments of a function call to types that the function can evaluate. | |
/// | |
/// This function is only called if [`Self::signature`] returns [`TypeSignature::UserDefined`]. Most | |
/// UDFs should return one of the other variants of `TypeSignature` which handle common | |
/// cases | |
/// | |
/// See the [type coercion module](datafusion_expr::type_coercion) | |
/// documentation for more details on type coercion | |
/// | |
/// For example, if your function requires a floating point arguments, but the user calls | |
/// it like `my_func(1::int)` (aka with `1` as an integer), coerce_types could return `[DataType::Float64]` | |
/// to ensure the argument was cast to `1::double` | |
/// | |
/// # Parameters | |
/// * `arg_types`: The argument types of the arguments this function with | |
/// | |
/// # Return value | |
/// A Vec the same length as `arg_types`. DataFusion will `CAST` the function call | |
/// arguments to these specific types. |
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me -- thanks @jayzhan211
I had one question on a comment in the docstrings but we could fix that as a follow on as well
/// Performs type coercion for function arguments. | ||
/// | ||
/// Returns the data types to which each argument must be coerced to | ||
/// match `signature`. | ||
/// | ||
/// For more details on coercion in general, please see the | ||
/// [`type_coercion`](crate::type_coercion) module. | ||
/// | ||
/// This function will be replaced with [data_types_with_scalar_udf], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we want to replace this function over time -- I think having the basic simple Signatures that handle most common coercions makes sense to have in DataFusion core (even if it could be done purely in a udf) as it will make creating UDFs easier for users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with you
Signed-off-by: jayzhan211 <[email protected]>
VariadicCoercion
and UniformCoercion
🎉 |
* introduce new sig Signed-off-by: jayzhan211 <[email protected]> * add udfimpl Signed-off-by: jayzhan211 <[email protected]> * replace fun Signed-off-by: jayzhan211 <[email protected]> * replace array Signed-off-by: jayzhan211 <[email protected]> * coalesce Signed-off-by: jayzhan211 <[email protected]> * nvl2 Signed-off-by: jayzhan211 <[email protected]> * rm variadic equal Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * rm err msg to fix ci Signed-off-by: jayzhan211 <[email protected]> * user defined sig Signed-off-by: jayzhan211 <[email protected]> * add err msg Signed-off-by: jayzhan211 <[email protected]> * fmt Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * fix ci Signed-off-by: jayzhan211 <[email protected]> * fix ci Signed-off-by: jayzhan211 <[email protected]> * upd comment Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]>
Which issue does this PR close?
Closes #10423 .
Rationale for this change
What changes are included in this PR?
Add
VariadicCoercion
,UniformCoercion
Remove
VariadicEqual
(coercion rule does not require the result type to be the same for all)Add the datafusion error aside from the signature error message for better debugging usage that is why many test files are updated.
Are these changes tested?
UDAF code is also included but no test is covered since no UDAF requires coercion there yet (We will cover it after moving builtin to UDAF)
Are there any user-facing changes?