Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSQ: Write summary row for GROUP BY (). #16326

Merged
merged 11 commits into from
Feb 6, 2025
Merged

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Apr 24, 2024

This allows us to remove "msqIncompatible()" from a couple of tests that involve GROUP BY () summary rows.

To handle the case where the SQL layer drops a dimension like GROUP BY 'constant', this patch also adds a "hasDroppedDimensions" context flag to the groupBy query.

This allows us to remove "msqIncompatible()" from a couple of tests that involve
GROUP BY () summary rows.

To handle the case where the SQL layer drops a dimension like GROUP BY 'constant',
this patch also adds a "hasDroppedDimensions" context flag to the groupBy query.
@github-actions github-actions bot added Area - Batch Ingestion Area - Querying Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Apr 24, 2024
Copy link
Contributor

@LakshSingla LakshSingla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grouping#hasGroupingDimensionsDropped accomplishes something similar like CTX_HAS_DROPPED_DIMENSIONS, however it only lives in the SQL layer. I think we can drop hasGroupingDimensionsDropped and use the newly added context parameter to avoid redundancy.

@LakshSingla
Copy link
Contributor

I think this PR also enables the following test in MSQ
testCountStarWithLongColumnFiltersOnFloatLiterals

These are already supported by MSQ, however marked as msqIncompatible. Since they seemed relevant to your changes, we could enable them in the same PR.
testGroupByNothingWithImpossibleTimeFilter
testGroupByNothingWithLiterallyFalseFilter
testGroupingWithNotNullPlusNonNullInFilter
testGroupingWithNullPlusNonNullInFilter

@@ -537,4 +518,31 @@ private Function<Result<TimeseriesResultValue>, Result<TimeseriesResultValue>> m
);
};
}

public static Object[] getNullAggregations(List<AggregatorFactory> aggregatorSpecs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Javadoc

@gianm
Copy link
Contributor Author

gianm commented Jun 26, 2024

Grouping#hasGroupingDimensionsDropped accomplishes something similar like CTX_HAS_DROPPED_DIMENSIONS, however it only lives in the SQL layer. I think we can drop hasGroupingDimensionsDropped and use the newly added context parameter to avoid redundancy.

In this patch I am using Grouping#hasGroupingDimensionsDropped to initialize CTX_HAS_DROPPED_DIMENSIONS, so I think I need to keep it. At the point the Grouping object is created, we don't have a query context yet for that specific query, so we couldn't put things in the query context. (All we have is the global query context, but we don't want CTX_HAS_DROPPED_DIMENSIONS set in the global context. It should only be set in the context for the specific query object that actually has dropped dimensions.)

@gianm
Copy link
Contributor Author

gianm commented Jun 26, 2024

I think this PR also enables the following test in MSQ testCountStarWithLongColumnFiltersOnFloatLiterals

These are already supported by MSQ, however marked as msqIncompatible. Since they seemed relevant to your changes, we could enable them in the same PR. testGroupByNothingWithImpossibleTimeFilter testGroupByNothingWithLiterallyFalseFilter testGroupingWithNotNullPlusNonNullInFilter testGroupingWithNullPlusNonNullInFilter

Making this change revealed a bug in this patch: COUNT in SQL initializes to null, not 0 as expected, because the null-agg-frame is written with the combining aggregators. The combining aggregator of count is longSum, which initializes to null. To fix this we'll either need to:

  • add a parameter to longSum like initToZero that can be set true when it's created as the combining aggregator of a count
  • or, move the null-frame-witing from post-shuffle to pre-shuffle. (pre-shuffle we use the regular aggs, not combining aggs.)

I'm leaning towards the first one since it's technically the most correct. It sounds better to me if the combining agg of count initializes to 0, not null as it does today.

@LakshSingla
Copy link
Contributor

The first one seems better than the second one to me as well.

Copy link

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the [email protected] list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 27, 2024
Copy link

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Sep 24, 2024
@LakshSingla
Copy link
Contributor

Not stale

@LakshSingla LakshSingla reopened this Sep 24, 2024
@LakshSingla LakshSingla removed the stale label Sep 24, 2024
@gianm
Copy link
Contributor Author

gianm commented Oct 10, 2024

COUNT in SQL initializes to null, not 0 as expected, because the null-agg-frame is written with the combining aggregators. The combining aggregator of count is longSum, which initializes to null.

Oops, in looking into this more, I misunderstood the problem. It was actually not this. The null-agg-frame is written with the regular aggregators, not the combining ones. The real problem is an off-by-one in updating the resultRow, which is fixed in the latest push.

In the case where the query has granularity: ALL but also has intervals: [], the result row includes __time (as a result of changes in #10968). I pushed a change to account for this: check for __time in the result row signature, and if it's there:

  • Set it to 0L (it's going to be ignored anyway)
  • Copy the aggregation results into the result row starting from getResultRowAggregatorStart(), not position 0. This was the off-by-one that caused the test failure.

Copy link
Contributor

@Akshat-Jain Akshat-Jain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added some comments, appreciate your inputs on them. Thanks!

Copy link

github-actions bot commented Jan 8, 2025

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the [email protected] list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 8, 2025
@cryptoe cryptoe added this to the 33.0.0 milestone Jan 20, 2025
@github-actions github-actions bot removed the stale label Jan 21, 2025
@cryptoe cryptoe merged commit c9b3585 into apache:master Feb 6, 2025
79 checks passed
@gianm gianm deleted the msq-empty-agg-row branch February 6, 2025 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants