MSQ: Write summary row for GROUP BY (). #16326

gianm · 2024-04-24T04:29:04Z

This allows us to remove "msqIncompatible()" from a couple of tests that involve GROUP BY () summary rows.

To handle the case where the SQL layer drops a dimension like GROUP BY 'constant', this patch also adds a "hasDroppedDimensions" context flag to the groupBy query.

This allows us to remove "msqIncompatible()" from a couple of tests that involve GROUP BY () summary rows. To handle the case where the SQL layer drops a dimension like GROUP BY 'constant', this patch also adds a "hasDroppedDimensions" context flag to the groupBy query.

LakshSingla

Grouping#hasGroupingDimensionsDropped accomplishes something similar like CTX_HAS_DROPPED_DIMENSIONS, however it only lives in the SQL layer. I think we can drop hasGroupingDimensionsDropped and use the newly added context parameter to avoid redundancy.

LakshSingla · 2024-06-12T13:31:16Z

I think this PR also enables the following test in MSQ
testCountStarWithLongColumnFiltersOnFloatLiterals

These are already supported by MSQ, however marked as msqIncompatible. Since they seemed relevant to your changes, we could enable them in the same PR.
testGroupByNothingWithImpossibleTimeFilter
testGroupByNothingWithLiterallyFalseFilter
testGroupingWithNotNullPlusNonNullInFilter
testGroupingWithNullPlusNonNullInFilter

LakshSingla · 2024-06-12T13:36:52Z

processing/src/main/java/org/apache/druid/query/timeseries/TimeseriesQueryQueryToolChest.java

@@ -537,4 +518,31 @@ private Function<Result<TimeseriesResultValue>, Result<TimeseriesResultValue>> m
      );
    };
  }
+
+  public static Object[] getNullAggregations(List<AggregatorFactory> aggregatorSpecs)


nit: Javadoc

gianm · 2024-06-26T17:24:42Z

Grouping#hasGroupingDimensionsDropped accomplishes something similar like CTX_HAS_DROPPED_DIMENSIONS, however it only lives in the SQL layer. I think we can drop hasGroupingDimensionsDropped and use the newly added context parameter to avoid redundancy.

In this patch I am using Grouping#hasGroupingDimensionsDropped to initialize CTX_HAS_DROPPED_DIMENSIONS, so I think I need to keep it. At the point the Grouping object is created, we don't have a query context yet for that specific query, so we couldn't put things in the query context. (All we have is the global query context, but we don't want CTX_HAS_DROPPED_DIMENSIONS set in the global context. It should only be set in the context for the specific query object that actually has dropped dimensions.)

gianm · 2024-06-26T17:59:49Z

I think this PR also enables the following test in MSQ testCountStarWithLongColumnFiltersOnFloatLiterals

These are already supported by MSQ, however marked as msqIncompatible. Since they seemed relevant to your changes, we could enable them in the same PR. testGroupByNothingWithImpossibleTimeFilter testGroupByNothingWithLiterallyFalseFilter testGroupingWithNotNullPlusNonNullInFilter testGroupingWithNullPlusNonNullInFilter

Making this change revealed a bug in this patch: COUNT in SQL initializes to null, not 0 as expected, because the null-agg-frame is written with the combining aggregators. The combining aggregator of count is longSum, which initializes to null. To fix this we'll either need to:

add a parameter to longSum like initToZero that can be set true when it's created as the combining aggregator of a count
or, move the null-frame-witing from post-shuffle to pre-shuffle. (pre-shuffle we use the regular aggs, not combining aggs.)

I'm leaning towards the first one since it's technically the most correct. It sounds better to me if the combining agg of count initializes to 0, not null as it does today.

LakshSingla · 2024-06-27T04:24:37Z

The first one seems better than the second one to me as well.

github-actions · 2024-08-27T00:19:00Z

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the [email protected] list.
Thank you for your contributions.

github-actions · 2024-09-24T00:20:43Z

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

LakshSingla · 2024-09-24T04:27:26Z

Not stale

have an ignored __time column.

gianm · 2024-10-10T19:14:41Z

COUNT in SQL initializes to null, not 0 as expected, because the null-agg-frame is written with the combining aggregators. The combining aggregator of count is longSum, which initializes to null.

Oops, in looking into this more, I misunderstood the problem. It was actually not this. The null-agg-frame is written with the regular aggregators, not the combining ones. The real problem is an off-by-one in updating the resultRow, which is fixed in the latest push.

In the case where the query has granularity: ALL but also has intervals: [], the result row includes __time (as a result of changes in #10968). I pushed a change to account for this: check for __time in the result row signature, and if it's there:

Set it to 0L (it's going to be ignored anyway)
Copy the aggregation results into the result row starting from getResultRowAggregatorStart(), not position 0. This was the off-by-one that caused the test failure.

Akshat-Jain

Have added some comments, appreciate your inputs on them. Thanks!

...ry/src/main/java/org/apache/druid/msq/querykit/groupby/GroupByPostShuffleFrameProcessor.java

...e/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/groupby/GroupByQueryKit.java

processing/src/main/java/org/apache/druid/query/groupby/GroupingEngine.java

processing/src/main/java/org/apache/druid/query/timeseries/TimeseriesQueryQueryToolChest.java

github-actions · 2025-01-08T00:21:18Z

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the [email protected] list.
Thank you for your contributions.

github-actions bot added Area - Batch Ingestion Area - Querying Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Apr 24, 2024

gianm added 3 commits April 23, 2024 22:20

Update tests.

de94be7

Add test.

78de122

Fix test.

893682d

LakshSingla reviewed Jun 12, 2024

View reviewed changes

github-actions bot added the stale label Aug 27, 2024

github-actions bot closed this Sep 24, 2024

LakshSingla reopened this Sep 24, 2024

LakshSingla removed the stale label Sep 24, 2024

gianm added 2 commits October 9, 2024 23:28

Merge branch 'master' into msq-empty-agg-row

ec0e636

Adjustment in GroupByPostShuffleFrameProcessor for result rows that

b8a0d1c

have an ignored __time column.

gianm added 2 commits October 10, 2024 12:34

Javadocs, method names null -> empty

a94aceb

Adjust for empty MAX result.

e111520

Akshat-Jain reviewed Nov 8, 2024

View reviewed changes

github-actions bot added the stale label Jan 8, 2025

cryptoe added this to the 33.0.0 milestone Jan 20, 2025

github-actions bot removed the stale label Jan 21, 2025

gianm added 3 commits February 3, 2025 20:43

Merge branch 'master' into msq-empty-agg-row

3b600d8

Update visibility.

f8a8336

Remove comment.

18907c0

Akshat-Jain approved these changes Feb 6, 2025

View reviewed changes

cryptoe merged commit c9b3585 into apache:master Feb 6, 2025
79 checks passed

gianm deleted the msq-empty-agg-row branch February 6, 2025 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSQ: Write summary row for GROUP BY (). #16326

MSQ: Write summary row for GROUP BY (). #16326

gianm commented Apr 24, 2024

LakshSingla left a comment

LakshSingla commented Jun 12, 2024

LakshSingla Jun 12, 2024

gianm commented Jun 26, 2024

gianm commented Jun 26, 2024

LakshSingla commented Jun 27, 2024

github-actions bot commented Aug 27, 2024

github-actions bot commented Sep 24, 2024

LakshSingla commented Sep 24, 2024

gianm commented Oct 10, 2024 •

edited

Loading

Akshat-Jain left a comment

github-actions bot commented Jan 8, 2025

MSQ: Write summary row for GROUP BY (). #16326

MSQ: Write summary row for GROUP BY (). #16326

Conversation

gianm commented Apr 24, 2024

LakshSingla left a comment

Choose a reason for hiding this comment

LakshSingla commented Jun 12, 2024

LakshSingla Jun 12, 2024

Choose a reason for hiding this comment

gianm commented Jun 26, 2024

gianm commented Jun 26, 2024

LakshSingla commented Jun 27, 2024

github-actions bot commented Aug 27, 2024

github-actions bot commented Sep 24, 2024

LakshSingla commented Sep 24, 2024

gianm commented Oct 10, 2024 • edited Loading

Akshat-Jain left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 8, 2025

gianm commented Oct 10, 2024 •

edited

Loading