Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHIS-16705 Program Indicator Expression transformer #19964

Draft
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

luciano-fiandesio
Copy link
Contributor

@luciano-fiandesio luciano-fiandesio commented Feb 18, 2025

Summary

Work in progress

This PR addresses an issue with Enrollments and Events queries that contain a PI expression with internal subqueries as where condition.
These type of queries do not run in Doris, because Doris does not support correlation with outer layers of the parent query.

Example of a non supported query:

SELECT   enrollment,
		 sub(1 + 1)
FROM     analytics_enrollment_ur1edk5oe2n   AS subax
WHERE    ((
		  date_part('year',age(cast(
				 (
				 SELECT   scheduleddate
				 FROM     analytics_event_ur1edk5oe2n
				 WHERE    analytics_event_ur1edk5oe2n.enrollment = subax.enrollment
				 AND      scheduleddate IS NOT NULL
				 ORDER BY occurreddate DESC
				 LIMIT    1 ) AS date), cast(coalesce(completeddate,
		  	   	 (
				   SELECT   created
				   FROM     analytics_event_ur1edk5oe2n
				   WHERE    analytics_event_ur1edk5oe2n.enrollment = subax.enrollment
				   AND      created IS NOT NULL
				   ORDER BY occurreddate DESC
				   LIMIT    1 )) AS date)))) * 12 + date_part('month',age(cast(
						(
						SELECT   scheduleddate
						FROM     analytics_event_ur1edk5oe2n
						WHERE    analytics_event_ur1edk5oe2n.enrollment = subax.enrollment
						AND      scheduleddate IS NOT NULL
						ORDER BY occurreddate DESC
						LIMIT    1 ) AS date), cast(coalesce(completeddate,
							(
							SELECT   created
							FROM     analytics_event_ur1edk5oe2n
							WHERE    analytics_event_ur1edk5oe2n.enrollment = subax.enrollment
							AND      created IS NOT NULL
							ORDER BY occurreddate DESC
							LIMIT    1 )) AS date)))) > 1

This PR introduces a new component named CteOptimizationPipeline that has the following responsibilities:

  • Extract PI-generated subqueries (the subqueries are mostly generated by org.hisp.dhis.parser.expression.statement.DefaultStatementBuilder so they are easy to identify).
  • Transform the subqueries into CTE (Common Table Expressions)
  • Rebuild the original PI CTE and add the new CTEs and modify the original where condition so that the CTEs are referenced with join statements and the where condition is preserved (including the function chain).

Transformation examples

Source

with pi_hgtnuhsqbml
as (
select
	enrollment,
	sum(1+1) as value
from
	analytics_enrollment_ur1edk5oe2n as subax
where
	((
	date_part('year', age(cast(
	(
	select
		scheduleddate
	from
		analytics_event_ur1edk5oe2n
	where
		analytics_event_ur1edk5oe2n.enrollment = subax.enrollment
		and scheduleddate is not null
	order by
		occurreddate desc
	limit 1 ) as date),
	cast(coalesce(completeddate,
	(
	select
		created
	from
		analytics_event_ur1edk5oe2n
	where
		analytics_event_ur1edk5oe2n.enrollment = subax.enrollment
		and created is not null
	order by
		occurreddate desc
	limit 1 )) as date)))) * 12 ) > 1
	and "lw1SqmMlnfh" is not null
group by
	enrollment )
-- end of CTE
select
	ax.enrollment
from
	analytics_enrollment_ur1edk5oe2n as ax
where
	(((
	lastupdated >= '2018-01-01' and lastupdated < '2018-04-28')))
	and ( ax."uidlevel1" = 'ImspTQPwCqd' )

Target

with last_sched as (
select
	enrollment,
	scheduleddate
from
	(
	select
		enrollment,
		scheduleddate,
		row_number() over (partition by enrollment
	order by
		occurreddate desc) as rn
	from
		analytics_event_ur1edk5oe2n
	where
		scheduleddate is not null) t
where
	rn = 1),
last_created as (
select
	enrollment,
	created
from
	(
	select
		enrollment,
		created,
		row_number() over (partition by enrollment
	order by
		occurreddate desc) as rn
	from
		analytics_event_ur1edk5oe2n
	where
		created is not null) t
where
	rn = 1),
pi_hgtnuhsqbml as (
select
	subax.enrollment,
	sum(1 + 1) as value
from
	analytics_enrollment_ur1edk5oe2n as subax
left join last_sched as ls on
	subax.enrollment = ls.enrollment
left join last_created as lc on
	subax.enrollment = lc.enrollment
where
	((date_part('year',
	age(cast(ls.scheduleddate as date),
	cast(coalesce(completeddate,
	lc.created) as date)))) * 12) > 1
		and "lw1SqmMlnfh" is not null
	group by
		subax.enrollment)
select
	ax.enrollment
from
	analytics_enrollment_ur1edk5oe2n as ax
where
	(((lastupdated >= '2018-01-01'
		and lastupdated < '2018-04-28')))
	and (ax."uidlevel1" = 'ImspTQPwCqd')

What is not working

  • Support for multiple PI CTEs
  • Further query optimization, creating a Base CTE to avoid child CTE to scan the entire table
  • Some complex expressions don't seem to work
  • Increase test coverage

}
}

private record ProcessedExpressions(List<Expression> expressions, boolean hasChanges) {}

Check notice

Code scanning / CodeQL

Unused classes and interfaces Note

Unused class: ProcessedExpressions is not referenced within this codebase. If not used as an external API it should be removed.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm, the class is actually used...

GroupByElement groupBy = new GroupByElement();
List<Expression> groupByExpressions = new ArrayList<>();
groupByExpressions.add(new Column("enrollment"));
groupBy.setGroupByExpressions(groupByExpressions);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
GroupByElement.setGroupByExpressions
should be avoided because it has been deprecated.

Copilot Autofix AI 3 days ago

To fix the problem, we need to replace the deprecated setGroupByExpressions method with its recommended alternative. According to the JSqlParser library documentation, the setGroupByExpressions method has been replaced by the addGroupByExpression method, which allows adding expressions individually to the GroupByElement.

We will modify the code to use the addGroupByExpression method instead of setGroupByExpressions. This change will be made in the dataElementCountCte method of the SubqueryTransformer class.

Suggested changeset 1
dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/util/optimizer/cte/SubqueryTransformer.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/util/optimizer/cte/SubqueryTransformer.java b/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/util/optimizer/cte/SubqueryTransformer.java
--- a/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/util/optimizer/cte/SubqueryTransformer.java
+++ b/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/util/optimizer/cte/SubqueryTransformer.java
@@ -100,3 +100,5 @@
     groupByExpressions.add(new Column("enrollment"));
-    groupBy.setGroupByExpressions(groupByExpressions);
+    for (Expression expr : groupByExpressions) {
+      groupBy.addGroupByExpression(expr);
+    }
     plainSelect.setGroupByElement(groupBy);
EOF
@@ -100,3 +100,5 @@
groupByExpressions.add(new Column("enrollment"));
groupBy.setGroupByExpressions(groupByExpressions);
for (Expression expr : groupByExpressions) {
groupBy.addGroupByExpression(expr);
}
plainSelect.setGroupByElement(groupBy);
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
private void handleGroupBy(PlainSelect oldSelect, String fromAlias) {
GroupByElement groupBy = oldSelect.getGroupBy();
if (groupBy != null) {
List<Expression> groupByExpressions = groupBy.getGroupByExpressions();

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
GroupByElement.getGroupByExpressions
should be avoided because it has been deprecated.

Copilot Autofix AI 3 days ago

To fix the problem, we need to replace the usage of the deprecated getGroupByExpressions() method with its recommended alternative. According to the JSqlParser library documentation, the getGroupByExpressions() method has been replaced by getGroupByExpressionList(). We will update the code to use this new method.

  • Replace the call to groupBy.getGroupByExpressions() with groupBy.getGroupByExpressionList().
  • Update the variable name from groupByExpressions to groupByExpressionList to reflect the new method.
Suggested changeset 1
dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/util/optimizer/cte/pipeline/CteSqlRebuilder.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/util/optimizer/cte/pipeline/CteSqlRebuilder.java b/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/util/optimizer/cte/pipeline/CteSqlRebuilder.java
--- a/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/util/optimizer/cte/pipeline/CteSqlRebuilder.java
+++ b/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/util/optimizer/cte/pipeline/CteSqlRebuilder.java
@@ -200,6 +200,6 @@
     if (groupBy != null) {
-      List<Expression> groupByExpressions = groupBy.getGroupByExpressions();
-      if (groupByExpressions != null) {
+      List<Expression> groupByExpressionList = groupBy.getGroupByExpressionList();
+      if (groupByExpressionList != null) {
         List<Expression> newGroupByExpressions = new ArrayList<>();
-        for (Expression expr : groupByExpressions) {
+        for (Expression expr : groupByExpressionList) {
           if (expr instanceof Column col) {
EOF
@@ -200,6 +200,6 @@
if (groupBy != null) {
List<Expression> groupByExpressions = groupBy.getGroupByExpressions();
if (groupByExpressions != null) {
List<Expression> groupByExpressionList = groupBy.getGroupByExpressionList();
if (groupByExpressionList != null) {
List<Expression> newGroupByExpressions = new ArrayList<>();
for (Expression expr : groupByExpressions) {
for (Expression expr : groupByExpressionList) {
if (expr instanceof Column col) {
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
newGroupByExpressions.add(expr); // Keep other expressions
}
}
groupBy.setGroupByExpressions(newGroupByExpressions);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
GroupByElement.setGroupByExpressions
should be avoided because it has been deprecated.

Copilot Autofix AI 3 days ago

To fix the problem, we need to replace the usage of the deprecated setGroupByExpressions method with its recommended alternative. The GroupByElement class provides a method addGroupByExpression which can be used to add expressions to the group by clause individually. We will iterate over the newGroupByExpressions list and add each expression using the addGroupByExpression method.

Suggested changeset 1
dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/util/optimizer/cte/pipeline/CteSqlRebuilder.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/util/optimizer/cte/pipeline/CteSqlRebuilder.java b/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/util/optimizer/cte/pipeline/CteSqlRebuilder.java
--- a/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/util/optimizer/cte/pipeline/CteSqlRebuilder.java
+++ b/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/util/optimizer/cte/pipeline/CteSqlRebuilder.java
@@ -213,3 +213,6 @@
         }
-        groupBy.setGroupByExpressions(newGroupByExpressions);
+        groupBy.getGroupByExpressions().clear();
+        for (Expression newExpr : newGroupByExpressions) {
+            groupBy.addGroupByExpression(newExpr);
+        }
       }
EOF
@@ -213,3 +213,6 @@
}
groupBy.setGroupByExpressions(newGroupByExpressions);
groupBy.getGroupByExpressions().clear();
for (Expression newExpr : newGroupByExpressions) {
groupBy.addGroupByExpression(newExpr);
}
}
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
/**
* Validates the column from the SELECT clause.
*
* @param col the column from the SELECT clause.

Check notice

Code scanning / CodeQL

Spurious Javadoc @param tags Note

@param tag "col" does not match any actual parameter of method "validateColumn()".
Copy link
Contributor

@maikelarabori maikelarabori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, just minor comments/suggestions.
Cheers

@@ -474,6 +474,8 @@ public enum ErrorCode {
E7146("A {0} date was not specified in periods, dimensions, filters"),
E7147("Query failed because of a missing column: `{0}`"),
E7148("Could not create CTE SQL query, unexpected error: `{0}`"),
E7149("Could not pre-pend CTEs to Program Indicator CTE, unexpected error: `{0}`"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
E7149("Could not pre-pend CTEs to Program Indicator CTE, unexpected error: `{0}`"),
E7149("Could not prepend CTEs to Program Indicator CTE, unexpected error: `{0}`"),

: getAggregatedEnrollmentsSql(params, maxLimit);
}
System.out.println(sql);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgotten debug line

* @throws CteOptimizerException if an error occurs during parsing
*/
public Statement parse(String sql) {
if (StringUtils.isEmpty(sql)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (StringUtils.isEmpty(sql)) {
if (StringUtils.isBlank(sql)) {

* @param generatedCtes the collection of new CTE definitions.
*/
public static void appendExtractedCtes(Select select, Map<String, GeneratedCte> generatedCtes) {
List<WithItem> existingWithItems = select.getWithItemsList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's old code, but maybe select.getWithItemsList() could return an empty list, instead of a null?

from %s
where %s is not null
) t
WHERE rn = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
WHERE rn = 1
where rn = 1

@@ -1453,12 +1454,11 @@ protected List<String> getSelectColumnsWithCTE(EventQueryParams params, CteConte
if (queryItem.isProgramIndicator()) {
// For program indicators, use CTE reference
String piUid = queryItem.getItem().getUid();
CteDefinition cteDef = cteContext.getDefinitionByItemUid(piUid);
// COALESCE(fbyta.value, 0) as CH6wamtY9kK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgotten comment

return Optional.of((PlainSelect) selectBody);
}

protected Optional<Expression> hasSingleExpression(PlainSelect select) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be getSingleExpression?


protected Optional<SelectExpressionItem> extractSingleSelectExpressionItem(PlainSelect plain) {
List<SelectItem> items = plain.getSelectItems();
if (items == null || items.size() != 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (items == null || items.size() != 1) {
if (CollectionUtils.size(items) != 1) {


protected Optional<Expression> hasSingleExpression(PlainSelect select) {
List<SelectItem> selectItems = select.getSelectItems();
if (selectItems == null || selectItems.size() != 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (selectItems == null || selectItems.size() != 1) {
if (CollectionUtils.size(selectItems) != 1) {

boolean changed = false;

for (Expression expr : expressions) {
if (expr == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add null to the list?
I have the impression that it will not affect consumers: if the expression is null, there is nothing to deal with, so should we even set null values?

@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_PI_TRANSFORMER branch 2 times, most recently from c5c674f to c9212dc Compare February 20, 2025 09:05
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_PI_TRANSFORMER branch from c9212dc to 5c78ff5 Compare February 20, 2025 15:22
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
11 New issues
11 New Code Smells (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants