-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Upgrade Guide for DataFusion 46.0.0 #14891
base: main
Are you sure you want to change the base?
Conversation
|
||
### Changes to `invoke()` and `invoke_batch()` deprecated | ||
|
||
We are migrating away from `ScalarUDFImpl::invoke()` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this section based on the report from @shehabgamin in #14123 (comment)
@shehabgamin any chance you have an example of the kind of error you saw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb Yes, users may see an error similar to something like this (depending on which UDF
they are calling .invoke()
or .invoke_batch()
with):
This feature is not implemented: Function concat does not implement invoke but called
Here is an example of resolving the error:
Old Code:
impl ScalarUDFImpl for SparkConcat {
...
fn invoke_batch(&self, args: &[ColumnarValue], number_rows: usize) -> Result<ColumnarValue> {
if args
.iter()
.any(|arg| matches!(arg.data_type(), DataType::List(_)))
{
ArrayConcat::new().invoke_batch(args, number_rows)
} else {
ConcatFunc::new().invoke_batch(args, number_rows)
}
}
}
New Code:
impl ScalarUDFImpl for SparkConcat {
...
fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
if args
.args
.iter()
.any(|arg| matches!(arg.data_type(), DataType::List(_)))
{
ArrayConcat::new().invoke_with_args(args)
} else {
ConcatFunc::new().invoke_with_args(args)
}
}
}
Changes:
1.
From:
fn invoke_batch(&self, args: &[ColumnarValue], number_rows: usize)
OR
fn invoke(&self, args: &[ColumnarValue])
To:
fn invoke_with_args(&self, args: ScalarFunctionArgs)
2.
From:
.invoke_batch(args, number_rows)
OR
.invoke(args)
To:
.invoke_with_args(args)
This will be definitely helpful for the user's upgrade, thank you @alamb |
Thanks @alamb for proceeding with this. Imho we still need to modify contribution guide to let contributors to maintain the upgration guide. Leaving this to one person is might be a huge burden |
Sounds good . Maybe once we have an example of what an upgrade guide looks like it will be easier to add / point people to an example In general trying to change people's behavior with documentation is a bit diecy in my opinion |
Yeah, example is awesome. The upgrade guide would be totally voluntarily but if anyone feels their changes are breaking and wanna give a downstream user a hint how to migrate they can refer to the example as you propose. |
Which issue does this PR close?
46.0.0
#14123DataSourceExec
for provided datasources, removeParquetExec
,CsvExec
, etc #14224Rationale for this change
As we have discussed in tickets, it would be good to have help users upgrade to new DataFusion versions, especially when major / disruptive API changes like #14224 are included
What changes are included in this PR?
Are these changes tested?
I build the docs manually
Are there any user-facing changes?