Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQL planner support for ROLLUP and CUBE grouping set expressions #2446

Merged
merged 10 commits into from
May 9, 2022

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented May 5, 2022

Which issue does this PR close?

Closes #2378

Rationale for this change

I would like to be able to create a valid logical plan from SQL queries containing ROLLUP grouping sets.

What changes are included in this PR?

Are there any user-facing changes?

Yes, we have new Expr variants.

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label May 5, 2022
@andygrove andygrove changed the title WIP: Add SQL planner support for ROLLUP grouping set expressions WIP: Add SQL planner support for ROLLUP and CUBE grouping set expressions May 6, 2022
@andygrove andygrove added the api change Changes the API exposed to users of the crate label May 6, 2022
@andygrove andygrove self-assigned this May 6, 2022
@andygrove andygrove changed the title WIP: Add SQL planner support for ROLLUP and CUBE grouping set expressions Add SQL planner support for ROLLUP and CUBE grouping set expressions May 6, 2022
@andygrove andygrove marked this pull request as ready for review May 6, 2022 17:11
@@ -138,9 +139,33 @@ pub fn create_udaf(
/// Create field meta-data from an expression, for use in a result set schema
pub fn exprlist_to_fields<'a>(
expr: impl IntoIterator<Item = &'a Expr>,
input_schema: &DFSchema,
plan: &LogicalPlan,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that we pass in a plan now instead of a schema

@andygrove
Copy link
Member Author

@alamb Yet another SQL planner PR for review when you have time.

Also @jimexist I think you may be interested in this one.

@alamb
Copy link
Contributor

alamb commented May 8, 2022

@alamb Yet another SQL planner PR for review when you have time.

Thanks @andygrove -- I have put this on my queue for review tomorrow

Copy link
Member

@yjshen yjshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I really like how you split features into steps.

@@ -124,6 +124,10 @@ impl ExprSchemable for Expr {
"QualifiedWildcard expressions are not valid in a logical query plan"
.to_owned(),
)),
Expr::GroupingSet(_) => {
// grouping sets do not really have a type and do not appear in projections
Ok(DataType::Null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor

alamb commented May 9, 2022

LGTM, I really like how you split features into steps.

I agree -- it makes the review process much easier ❤️

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -249,6 +249,24 @@ pub enum Expr {
Wildcard,
/// Represents a reference to all fields in a specific schema.
QualifiedWildcard { qualifier: String },
/// List of grouping set expressions. Only valid in the context of an aggregate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reviewing this PR I was wondering how you aim to plan these queries

it seems like a simple way would be to expand them to a set of UNION ALL queries, though then we will likely do the same input query many times which may be unideal.

Anyhow, I am looking forward to the next installment

@@ -198,6 +202,11 @@ impl ExprSchemable for Expr {
let data_type = expr.get_type(input_schema)?;
get_indexed_field(&data_type, key).map(|x| x.is_nullable())
}
Expr::GroupingSet(_) => {
// grouping sets do not really have the concept of nullable and do not appear
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is worth considering representing the grouping sets in a more specific way (like maybe a specific list on LogicalPlan::Aggregrate) rather than an Expr that can appear anywhere

enum GroupingExpr {
  Expr(Expr),
  GroupingSet(GroupingSet),
}

/// Aggregates its input based on a set of grouping and aggregate
/// expressions (e.g. SUM).
#[derive(Clone)]
pub struct Aggregate {
    /// The incoming logical plan
    pub input: Arc<LogicalPlan>,
    /// Grouping expressions
    pub group_expr: Vec<AggExprs>,
    /// Aggregate expressions
    pub aggr_expr: Vec<Expr>,
    /// The schema description of the aggregate output
    pub schema: DFSchemaRef,
}

Or something?

@alamb alamb merged commit 1fe038f into apache:master May 9, 2022
@alamb
Copy link
Contributor

alamb commented May 9, 2022

I think this is good to merge; There are some comments, but we can address them as part of follow on PRs if appropriate.

@alamb
Copy link
Contributor

alamb commented May 9, 2022

Nice work @andygrove

@andygrove andygrove deleted the rollup branch January 27, 2023 18:57
waralexrom pushed a commit to cube-js/arrow-datafusion that referenced this pull request May 20, 2024
…ons (apache#2446)

* Add SQL planner support for ROLLUP and CUBE grouping sets

* prep for review

* fix more todo comments

* code cleanup

* clippy

* fmt and clippy

* revert change

* clippy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for ROLLUP and CUBE grouping sets in SQL query planner and logical plan
3 participants