Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Upgrade Guide for DataFusion 46.0.0 #14891

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Feb 26, 2025

Which issue does this PR close?

Rationale for this change

As we have discussed in tickets, it would be good to have help users upgrade to new DataFusion versions, especially when major / disruptive API changes like #14224 are included

What changes are included in this PR?

  1. Add a upgrade guide section:
    Screenshot 2025-02-26 at 6 44 38 AM
  2. Add details on the upgrade to 46.0.0

Are these changes tested?

I build the docs manually

Are there any user-facing changes?

@github-actions github-actions bot added documentation Improvements or additions to documentation core Core DataFusion crate labels Feb 26, 2025
This was referenced Feb 26, 2025

### Changes to `invoke()` and `invoke_batch()` deprecated

We are migrating away from `ScalarUDFImpl::invoke()` and
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this section based on the report from @shehabgamin in #14123 (comment)

@shehabgamin any chance you have an example of the kind of error you saw?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb Yes, users may see an error similar to something like this (depending on which UDF they are calling .invoke() or .invoke_batch() with):

This feature is not implemented: Function concat does not implement invoke but called

Here is an example of resolving the error:

Old Code:

impl ScalarUDFImpl for SparkConcat {
...
    fn invoke_batch(&self, args: &[ColumnarValue], number_rows: usize) -> Result<ColumnarValue> {
        if args
            .iter()
            .any(|arg| matches!(arg.data_type(), DataType::List(_)))
        {
            ArrayConcat::new().invoke_batch(args, number_rows)
        } else {
            ConcatFunc::new().invoke_batch(args, number_rows)
        }
    }
}

New Code:

impl ScalarUDFImpl for SparkConcat {
...
    fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
        if args
            .args
            .iter()
            .any(|arg| matches!(arg.data_type(), DataType::List(_)))
        {
            ArrayConcat::new().invoke_with_args(args)
        } else {
            ConcatFunc::new().invoke_with_args(args)
        }
    }
}

Changes:
1.
From:
fn invoke_batch(&self, args: &[ColumnarValue], number_rows: usize) OR
fn invoke(&self, args: &[ColumnarValue])
To:
fn invoke_with_args(&self, args: ScalarFunctionArgs)

2.
From:
.invoke_batch(args, number_rows) OR
.invoke(args)
To:
.invoke_with_args(args)

@alamb alamb self-assigned this Feb 26, 2025
@xudong963
Copy link
Member

This will be definitely helpful for the user's upgrade, thank you @alamb

@comphead
Copy link
Contributor

Thanks @alamb for proceeding with this. Imho we still need to modify contribution guide to let contributors to maintain the upgration guide. Leaving this to one person is might be a huge burden

@alamb
Copy link
Contributor Author

alamb commented Feb 26, 2025

Thanks @alamb for proceeding with this. Imho we still need to modify contribution guide to let contributors to maintain the upgration guide. Leaving this to one person is might be a huge burden

Sounds good . Maybe once we have an example of what an upgrade guide looks like it will be easier to add / point people to an example

In general trying to change people's behavior with documentation is a bit diecy in my opinion

@comphead
Copy link
Contributor

Thanks @alamb for proceeding with this. Imho we still need to modify contribution guide to let contributors to maintain the upgration guide. Leaving this to one person is might be a huge burden

Sounds good . Maybe once we have an example of what an upgrade guide looks like it will be easier to add / point people to an example

In general trying to change people's behavior with documentation is a bit diecy in my opinion

Yeah, example is awesome. The upgrade guide would be totally voluntarily but if anyone feels their changes are breaking and wanna give a downstream user a hint how to migrate they can refer to the example as you propose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants