-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add CumCountWhere()
and RollingCountWhere()
features to UpdateBy
#6566
base: main
Are you sure you want to change the base?
feat: Add CumCountWhere()
and RollingCountWhere()
features to UpdateBy
#6566
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-review
engine/table/src/main/java/io/deephaven/engine/table/impl/updateby/UpdateByOperatorFactory.java
Outdated
Show resolved
Hide resolved
engine/table/src/main/java/io/deephaven/engine/table/impl/updateby/countwhere/CountFilter.java
Outdated
Show resolved
Hide resolved
...ble/src/main/java/io/deephaven/engine/table/impl/updateby/countwhere/CountWhereOperator.java
Show resolved
Hide resolved
...ble/src/main/java/io/deephaven/engine/table/impl/updateby/countwhere/CountWhereOperator.java
Outdated
Show resolved
Hide resolved
@@ -341,6 +341,21 @@ public void testAggCountWhere() { | |||
assertEquals(6L, counts.get(0)); | |||
counts = ColumnVectors.ofLong(doubleCounted, "filter15"); | |||
assertEquals(6L, counts.get(0)); | |||
|
|||
// Get a static set table for use in dynamic where filters (contains 0-3) | |||
final QueryTable setTable = (QueryTable) TableTools.newTable(col("sym", 1, 2, 3)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed that AggCountWhere didn't test DynamicWhereFilter
in CI, corrected here.
java-client/session/src/test/java/io/deephaven/client/impl/UpdateBySpecBuilderTest.java
Outdated
Show resolved
Hide resolved
@@ -230,6 +230,7 @@ static Count AggCount(String resultColumn) { | |||
* values that pass the supplied {@code filters}. | |||
* | |||
* @param resultColumn The {@link Count#column() output column} name | |||
* @param filters The filters to apply to the input columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected missing param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does code coverage look like?
Chunk<? extends Values>[] influencerValueChunkArr, | ||
LongChunk<OrderedRowKeys> affectedPosChunk, | ||
LongChunk<OrderedRowKeys> influencerPosChunk, | ||
IntChunk<? extends Values> pushChunk, | ||
IntChunk<? extends Values> popChunk, | ||
int len); | ||
int affectedCount, | ||
int influencerCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method parameters should have javadoc.
message UpdateByRollingCountWhere { | ||
UpdateByWindowScale reverse_window_scale = 1; | ||
UpdateByWindowScale forward_window_scale = 2; | ||
string column_name = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What column name is this for?
} | ||
|
||
message UpdateByCumulativeCountWhere { | ||
string column_name = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 and 2 instead of 3 and 4 for the first things would be consistent.
Please add comments to each of the messages and fields you are adding; it makes the protodoc better going forward.
@@ -665,7 +665,8 @@ def agg_all_by(self, agg: Aggregation, by: Union[str, List[str]]) -> Table: | |||
""" | |||
return super(Table, self).agg_all_by(agg, by) | |||
|
|||
def update_by(self, ops: Union[UpdateByOperation, List[UpdateByOperation]], by: Union[str, List[str]]) -> Table: | |||
def update_by(self, ops: Union[UpdateByOperation, List[UpdateByOperation]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have coverage of the None case?
Returns: | ||
an UpdateByOperation | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated
|
||
|
||
def rolling_count_where_time(ts_col: str, col: str, filters: Union[str, List[str]], | ||
rev_time: Union[int, str] = 0, fwd_time: Union[int, str] = 0) -> UpdateByOperation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default of zero does not match formula
filters (Union[str, Filter, List[str], List[Filter]], optional): the filter condition | ||
expression(s) or Filter object(s) | ||
rev_ticks (int): the look-behind window size (in rows/ticks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default matches formula, but is not listed as a default in the doc.
What ist he intention for = 0; =0 to do?
...ble/src/main/java/io/deephaven/engine/table/impl/updateby/countwhere/CountWhereOperator.java
Show resolved
Hide resolved
resultsChunk); | ||
} | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might just want an else instead of the continue logic.
TestHelper.assertWhereInt(actualIt, expectedIt, val -> val > 10 && val <= 50); | ||
} | ||
|
||
// Test on String column (representing all Object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having simple tests for Instant and Boolean are worthwhile in my experience, there is stuff that can go wrong with reinterpretation. This is maybe not necessary here though, because we might be covered by where testing. I go back and forth on this, because we do need to create fake tables in some circumstances or could have problems with the ChunkFilter not matching what the aggs read.
Groovy Examples
Python Examples
Performance Notes
TL:DR Performance compares very well.
RollingCountWhere()
has near identical performance to the comparison benchmarks (can be faster depending on the complexity of the filter.CumCountWhere()
also compares well toEma()
but can't catch up to zero-keyCumSum()
, which is is remarkably fast.Comparing
CumCountWhere
toCumSum
andEma
:Comparing
RollingCountWhere
toRollingCount
andRollingSum
: