Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for defining filters on measures #3624

Merged
merged 26 commits into from
Dec 18, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
2f56587
Initial commit
AdityaHegde Dec 5, 2023
102ac29
Using enum for expression and adding to metricsview_aggregation
AdityaHegde Dec 6, 2023
ccfeb0e
Merge branch 'main' into adityahegde/filters-on-measures
AdityaHegde Dec 7, 2023
d439c54
Comparison in measure filters
AdityaHegde Dec 7, 2023
655ab2e
Adding to metricsview toplist
AdityaHegde Dec 7, 2023
4808486
Updated proto spec
AdityaHegde Dec 11, 2023
74bddbd
Add generic expression syntax
AdityaHegde Dec 12, 2023
3bc151a
Final proto draft
AdityaHegde Dec 13, 2023
d0d5228
Integrate and test where filter
AdityaHegde Dec 13, 2023
0f9c159
Integrate where clause everywhere
AdityaHegde Dec 13, 2023
171a7f2
Merge branch 'main' into adityahegde/filters-on-measures
AdityaHegde Dec 13, 2023
9d9b016
Adding some tests for having clauses
AdityaHegde Dec 13, 2023
aa1b379
Add backwards compatibility for filter
AdityaHegde Dec 14, 2023
e9ff93b
fix lint
AdityaHegde Dec 14, 2023
a828601
Adding alias support in metricsview_comparison_toplist
AdityaHegde Dec 14, 2023
5a7526a
Apply sum() only when having is enabled
AdityaHegde Dec 14, 2023
38db0ff
Fix lint
AdityaHegde Dec 14, 2023
0ddcba5
PR comments
AdityaHegde Dec 15, 2023
f02b422
Move filter helpers to pkg/expressionpb
AdityaHegde Dec 15, 2023
a4086af
refactor filter builder to not use map
AdityaHegde Dec 15, 2023
1e49865
Backwards compatibility for older sort types
AdityaHegde Dec 15, 2023
0d02580
PR comments
AdityaHegde Dec 15, 2023
025a62d
Testing on druid and fixes
AdityaHegde Dec 18, 2023
b5754b3
Merge branch 'main' into adityahegde/filters-on-measures
AdityaHegde Dec 18, 2023
d1186d1
Using outer query
AdityaHegde Dec 18, 2023
1832ebf
Removing unused testdata
AdityaHegde Dec 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2,682 changes: 1,609 additions & 1,073 deletions proto/gen/rill/runtime/v1/queries.pb.go

Large diffs are not rendered by default.

671 changes: 671 additions & 0 deletions proto/gen/rill/runtime/v1/queries.pb.validate.go

Large diffs are not rendered by default.

35 changes: 35 additions & 0 deletions proto/rill/runtime/v1/expressions.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
syntax = "proto3";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: not sure what I think, but do you think we should consider shorter type names given the nesting of expressions and how often we'll end up looking at them printed?

I.e. using terms like expr (expression), cond (condition), val (value), op (operation), args (operands), ident (identifer), eq (equal), etc.?

For example, MongoDB does that: https://www.mongodb.com/docs/manual/reference/operator/query/expr/.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also suggest looking out for prior art for expressions expressed in protobufs. Two good places to look are:

package rill.runtime.v1;

import "google/protobuf/struct.proto";

message Expression {
oneof expression {
google.protobuf.Value value = 1;
string column = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name suggestion: identifier and make it 1 instead of 2

ConditionExpression condition_expression = 3;
}
}

message ConditionExpression {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Just name it Condition

Operation operation = 1;
repeated Expression operands = 2;
}

enum Operation {
OPERATION_UNSPECIFIED = 0;
OPERATION_EQUALS = 1;
OPERATION_NOT_EQUALS = 2;
OPERATION_LESSER = 3;
OPERATION_LESSER_OR_EQUALS = 4;
OPERATION_GREATER = 5;
OPERATION_GREATER_OR_EQUALS = 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more common to use "equals" or "equal"?

Also, should we consider the usual shorthands, like EQ, NEQ, LT, LTE, etc.?

OPERATION_OR = 7;
OPERATION_AND = 8;
OPERATION_BETWEEN = 9;
OPERATION_NOT_BETWEEN = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant – would suggest using an "and" of two expressions instead. It's also ambiguous – i.e. are they inclusive or exclusive of the start and end values?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are from duckdb docs: https://duckdb.org/docs/sql/expressions/comparison_operators.html#between-and-is-not-null
I think we can convert them to compound operations in UI and keep the API simple.

OPERATION_IN = 11;
OPERATION_NOT_IN = 12;
OPERATION_LIKE = 13;
OPERATION_NOT_LIKE = 14;
}
52 changes: 41 additions & 11 deletions proto/rill/runtime/v1/queries.proto
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import "rill/runtime/v1/export_format.proto";
import "rill/runtime/v1/schema.proto";
import "rill/runtime/v1/time_grain.proto";
import "validate/validate.proto";
import "expressions.proto";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Full path like above, i.e. rill/runtime/v1/expressions.proto
  2. Singular, i.e. expression.proto


service QueryService {
// Query runs a SQL query against the instance's OLAP datastore.
Expand Down Expand Up @@ -279,7 +280,8 @@ message MetricsViewAggregationRequest {
TimeRange time_range = 12;
google.protobuf.Timestamp time_start = 6; // Deprecated in favor of time_range
google.protobuf.Timestamp time_end = 7; // Deprecated in favor of time_range
MetricsViewFilter filter = 8;
Expression filter = 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add support for HAVING clauses (different from WHERE clauses).

Maybe we refactor to Expression having and Expression where? Or to avoid that, could keep Expression filter and add Expression having or Expression measure_filter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya i think separating will keep it cleaner. Combined filter was not clean in the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also having makes more sense. We can think of using where for the main filter in the near future.

repeated MetricsViewColumnAlias aliases = 13;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are aliases needed in other cases than MetricsViewComparison? For the other cases, I think it's okay to require clients to use the real names (since there's no ambiguity around base vs. comparison vs. delta values).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to use the same measure in measures and filter. Aggregation allows for count and count distinct so I thought those could be defined in aliases and used in measures and filter fields.

int64 limit = 9 [(validate.rules).int64.gte = 0];
int64 offset = 10 [(validate.rules).int64.gte = 0];
int32 priority = 11;
Expand All @@ -302,17 +304,41 @@ message MetricsViewAggregationMeasure {
repeated google.protobuf.Value builtin_measure_args = 3;
}

message MetricsViewAggregationSort {
string name = 1;
bool desc = 2;
}

message MetricsViewColumnAlias {
string name = 1;
oneof alias { // Is this overkill to future proof this?
MetricsViewMeasureAlias measure_alias = 2;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we should just flatten it. In light of the comment above, just one type MetricsViewComparisonMeasureAlias might be enough.


message MetricsViewMeasureAlias {
enum MeasureType {
MEASURE_TYPE_UNSPECIFIED = 0;
MEASURE_TYPE_COUNT = 1;
MEASURE_TYPE_COUNT_DISTINCT = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can already be defined with an alias through MetricsViewAggregationMeasure

MEASURE_TYPE_BASE_VALUE = 3;
MEASURE_TYPE_COMPARISON_VALUE = 4;
MEASURE_TYPE_ABS_DELTA = 5;
MEASURE_TYPE_REL_DELTA = 6;
}

string name = 1;
MeasureType type = 2;
repeated google.protobuf.Value args = 3;
string alias = 4;
}

enum BuiltinMeasure {
BUILTIN_MEASURE_UNSPECIFIED = 0;
BUILTIN_MEASURE_COUNT = 1;
BUILTIN_MEASURE_COUNT_DISTINCT = 2;
}

message MetricsViewAggregationSort {
string name = 1;
bool desc = 2;
}

message MetricsViewToplistRequest {
string instance_id = 1;
string metrics_view_name = 2 [(validate.rules).string.min_len = 1];
Expand All @@ -339,10 +365,11 @@ message MetricsViewComparisonRequest {
string metrics_view_name = 2 [(validate.rules).string.min_len = 1];
MetricsViewAggregationDimension dimension = 3;
repeated MetricsViewAggregationMeasure measures = 4;
repeated MetricsViewComparisonSort sort = 5;
repeated MetricsViewAggregationSort sort = 5;
TimeRange time_range = 6;
TimeRange comparison_time_range = 7;
MetricsViewFilter filter = 8;
Expression filter = 8;
repeated MetricsViewColumnAlias aliases = 14;
int64 limit = 9 [(validate.rules).int64.gte = 0];
int64 offset = 10 [(validate.rules).int64.gte = 0];
int32 priority = 11;
Expand Down Expand Up @@ -398,7 +425,8 @@ message MetricsViewTimeSeriesRequest {
google.protobuf.Timestamp time_start = 4;
google.protobuf.Timestamp time_end = 5;
TimeGrain time_granularity = 6;
MetricsViewFilter filter = 7;
Expression filter = 7;
repeated MetricsViewColumnAlias aliases = 11;
string time_zone = 10;
int32 priority = 8;
}
Expand All @@ -415,7 +443,8 @@ message MetricsViewTotalsRequest {
repeated InlineMeasure inline_measures = 9;
google.protobuf.Timestamp time_start = 4;
google.protobuf.Timestamp time_end = 5;
MetricsViewFilter filter = 7;
Expression filter = 7;
repeated MetricsViewColumnAlias aliases = 10;
int32 priority = 8;
}

Expand All @@ -430,7 +459,8 @@ message MetricsViewRowsRequest {
google.protobuf.Timestamp time_start = 3;
google.protobuf.Timestamp time_end = 4;
TimeGrain time_granularity = 10;
MetricsViewFilter filter = 5;
Expression filter = 5;
repeated MetricsViewColumnAlias aliases = 12;
repeated MetricsViewSort sort = 6;
int32 limit = 7 [(validate.rules).int32.gte = 0];
int64 offset = 8 [(validate.rules).int64.gte = 0];
Expand Down
125 changes: 124 additions & 1 deletion runtime/queries/metricsview.go
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,129 @@ func buildFilterClauseForCondition(mv *runtimev1.MetricsViewSpec, cond *runtimev
return fmt.Sprintf("AND (%s) ", condsClause), args, nil
}

func buildHavingClause(filter *runtimev1.MeasureFilter, mv *runtimev1.MetricsViewSpec, hasComparison bool) (string, []any, error) {
if filter.Expression == nil {
return "", []any{}, nil
}
return buildHavingClauseFromExpression(filter.Expression, mv, hasComparison)
}

func buildHavingClauseFromNode(node *runtimev1.MeasureFilterNode, mv *runtimev1.MetricsViewSpec, hasComparison bool) (string, []any, error) {
sql := ""
args := make([]any, 0)
switch e := node.Entry.(type) {
case *runtimev1.MeasureFilterNode_MeasureFilterMeasure:
expr, subArgs, err := buildHavingClauseFromMeasure(e.MeasureFilterMeasure, mv, hasComparison)
if err != nil {
return "", args, err
}
args = append(args, subArgs...)
sql += expr

case *runtimev1.MeasureFilterNode_MeasureFilterExpression:
expr, subArgs, err := buildHavingClauseFromExpression(e.MeasureFilterExpression, mv, hasComparison)
if err != nil {
return "", args, err
}
args = append(args, subArgs...)
sql += expr

case *runtimev1.MeasureFilterNode_Value:
arg, err := pbutil.FromValue(e.Value)
if err != nil {
return "", args, err
}
args = append(args, arg)
sql += "?"
}

return sql, args, nil
}

func buildHavingClauseFromMeasure(measure *runtimev1.MeasureFilterMeasure, mv *runtimev1.MetricsViewSpec, hasComparison bool) (string, []any, error) {
expr := ""
args := make([]any, 0)
switch measure.Measure.BuiltinMeasure {
case runtimev1.BuiltinMeasure_BUILTIN_MEASURE_UNSPECIFIED:
if measure.ColumnType != runtimev1.MeasureFilterMeasure_COLUMN_TYPE_UNSPECIFIED && !hasComparison {
return "", args, fmt.Errorf("comparison filter cannot be applied when it is not enabled/supported")
}

var ms *runtimev1.MetricsViewSpec_MeasureV2
for _, m := range mv.Measures {
if strings.EqualFold(m.Name, measure.Measure.Name) {
ms = m
break
}
}
if ms == nil {
return "", args, fmt.Errorf("measure %s not found", measure.Measure.Name)
}

switch measure.ColumnType {
case runtimev1.MeasureFilterMeasure_COLUMN_TYPE_UNSPECIFIED:
expr += ms.Name
case runtimev1.MeasureFilterMeasure_COLUMN_TYPE_PREVIOUS:
expr += ms.Name + "__previous"
case runtimev1.MeasureFilterMeasure_COLUMN_TYPE_DELTA_ABSOLUTE:
expr += ms.Name + "__delta_abs"
case runtimev1.MeasureFilterMeasure_COLUMN_TYPE_DELTA_RELATIVE:
expr += ms.Name + "__delta_rel"
}

case runtimev1.BuiltinMeasure_BUILTIN_MEASURE_COUNT:
//TODO: impl
case runtimev1.BuiltinMeasure_BUILTIN_MEASURE_COUNT_DISTINCT:
//TODO: impl
}
return expr, args, nil
}

func buildHavingClauseFromExpression(expr *runtimev1.MeasureFilterExpression, mv *runtimev1.MetricsViewSpec, hasComparison bool) (string, []any, error) {
args := make([]any, 0)
if len(expr.Entries) < 2 {
return "", args, fmt.Errorf("exactly 2 entries should be provided")
}

leftExpr, subArgs, err := buildHavingClauseFromNode(expr.Entries[0], mv, hasComparison)
if err != nil {
return "", args, err
}
args = append(args, subArgs...)
rightExpr, subArgs, err := buildHavingClauseFromNode(expr.Entries[1], mv, hasComparison)
if err != nil {
return "", args, err
}
args = append(args, subArgs...)

sql := fmt.Sprintf("(%s) %s (%s)", leftExpr, measureFilterClauseOperation(expr), rightExpr)
return sql, args, nil
}

func measureFilterClauseOperation(e *runtimev1.MeasureFilterExpression) string {
switch e.OperationType {
case runtimev1.MeasureFilterExpression_OPERATION_TYPE_EQUALS:
return "="
case runtimev1.MeasureFilterExpression_OPERATION_TYPE_NOT_EQUALS:
return "!="
case runtimev1.MeasureFilterExpression_OPERATION_TYPE_LESSER:
return "<"
case runtimev1.MeasureFilterExpression_OPERATION_TYPE_LESSER_OR_EQUALS:
return "<="
case runtimev1.MeasureFilterExpression_OPERATION_TYPE_GREATER:
return ">"
case runtimev1.MeasureFilterExpression_OPERATION_TYPE_GREATER_OR_EQUALS:
return ">="
case runtimev1.MeasureFilterExpression_OPERATION_TYPE_OR:
return "OR"
case runtimev1.MeasureFilterExpression_OPERATION_TYPE_AND:
return "AND"
case runtimev1.MeasureFilterExpression_OPERATION_TYPE_BETWEEN:
return "BETWEEN"
}
return "=" // TODO: handle unknown operation type
AdityaHegde marked this conversation as resolved.
Show resolved Hide resolved
}

func repeatString(val string, n int) []string {
res := make([]string, n)
for i := 0; i < n; i++ {
Expand Down Expand Up @@ -527,7 +650,7 @@ func writeParquet(meta []*runtimev1.MetricsViewColumn, data []*structpb.Struct,
case runtimev1.Type_CODE_UINT64:
recordBuilder.Field(idx).(*array.Uint64Builder).Append(uint64(v.GetNumberValue()))
case runtimev1.Type_CODE_INT128:
recordBuilder.Field(idx).(*array.Float64Builder).Append((v.GetNumberValue()))
recordBuilder.Field(idx).(*array.Float64Builder).Append(v.GetNumberValue())
case runtimev1.Type_CODE_FLOAT32:
recordBuilder.Field(idx).(*array.Float32Builder).Append(float32(v.GetNumberValue()))
case runtimev1.Type_CODE_FLOAT64, runtimev1.Type_CODE_DECIMAL:
Expand Down
17 changes: 16 additions & 1 deletion runtime/queries/metricsview_aggregation.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ type MetricsViewAggregation struct {
Offset int64 `json:"offset,omitempty"`
MetricsView *runtimev1.MetricsViewSpec `json:"-"`
ResolvedMVSecurity *runtime.ResolvedMetricsViewSecurity `json:"security"`
MeasureFilter *runtimev1.MeasureFilter `json:"measure_filter,omitempty"`

Result *runtimev1.MetricsViewAggregationResponse `json:"-"`
}
Expand Down Expand Up @@ -222,6 +223,18 @@ func (q *MetricsViewAggregation) buildMetricsAggregationSQL(mv *runtimev1.Metric
whereClause = "WHERE 1=1" + whereClause
}

havingClause := ""
if q.MeasureFilter != nil {
var havingClauseArgs []any
var err error
havingClause, havingClauseArgs, err = buildHavingClause(q.MeasureFilter, mv, false)
if err != nil {
return "", nil, err
}
havingClause = "HAVING " + havingClause
args = append(args, havingClauseArgs...)
}

sortingCriteria := make([]string, 0, len(q.Sort))
for _, s := range q.Sort {
sortCriterion := safeName(s.Name)
Expand All @@ -246,16 +259,18 @@ func (q *MetricsViewAggregation) buildMetricsAggregationSQL(mv *runtimev1.Metric
limitClause = fmt.Sprintf("LIMIT %d", *q.Limit)
}

sql := fmt.Sprintf("SELECT %s FROM %s %s %s %s %s %s OFFSET %d",
sql := fmt.Sprintf("SELECT %s FROM %s %s %s %s %s %s %s OFFSET %d",
strings.Join(selectCols, ", "),
safeName(mv.Table),
strings.Join(unnestClauses, ""),
whereClause,
groupClause,
havingClause,
orderClause,
limitClause,
q.Offset,
)
fmt.Println(sql, args)

return sql, args, nil
}
Expand Down
Loading