Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ontology): refactor to 'lazy' ontology input and filter so heavy lifting is moved to backend and we don't need to download complete ontologies #4584

Open
wants to merge 37 commits into
base: master
Choose a base branch
from

Conversation

mswertz
Copy link
Member

@mswertz mswertz commented Dec 29, 2024

What are the main changes you did

  • we now get very big queries when you select root nodes in large ontologies, because all children are included as seperate filters. Makes all slow. Instead this PR aims to do the heavy lifting server side. Also will allow data to be smaller, and excel/csv uploads easier, when users can simply include the terms they mean without the sub terms.

N.B. blocks on some of the changes from #4539

How to test

  • see junit tests
  • see changes to the javascript => see that URLs are smaller

Checklist

  • updated docs in case of new feature
  • added/updated tests
  • created 'match_including_children'
  • created 'match_including_parents'
  • created 'contains_any' that should be faster that current nested 'equal'
  • created 'contains_all' that should be faster than complex 'and' in directory
  • create a 'is_null' and 'is_not_null' filter so we can get root elements (parent == null)
  • use 'match_including_children' in ontology filters instead of sending all children elements

Do later

  • lazy load the ontology contents so subtree data is only loaded when user clicks
  • enable search on the lazy loaded ontology (should return parent paths for search matches)

@mswertz mswertz marked this pull request as draft December 29, 2024 22:06
@mswertz mswertz changed the title feat: new backend operator to do ontology filter expansion to children server side feat: new backend operator '_match_any_in_subtree' to do ontology filter expansion to children server side Dec 29, 2024
@mswertz mswertz changed the title feat: new backend operator '_match_any_in_subtree' to do ontology filter expansion to children server side feat: for ontologies, new backend operator '_match_any_in_subtree' to do ontology filter expansion to children server side Jan 12, 2025
@mswertz mswertz changed the title feat: for ontologies, new backend operator '_match_any_in_subtree' to do ontology filter expansion to children server side feat: for ontologies, new backend operator 'match_including_children' and 'match_including_parents' to do enable ontology filter expansion on server side Jan 12, 2025
@mswertz mswertz changed the title feat: for ontologies, new backend operator 'match_including_children' and 'match_including_parents' to do enable ontology filter expansion on server side feat(ontology): for ontology filter, new backend operator 'match_including_children' and 'match_including_parents' to do enable ontology filter expansion on server side Jan 12, 2025
@mswertz mswertz changed the title feat(ontology): for ontology filter, new backend operator 'match_including_children' and 'match_including_parents' to do enable ontology filter expansion on server side feat(ontology): refactor ontology input and filter so heavy lifting is moved to backend and we don't need to download complete ontologies Jan 14, 2025
@mswertz mswertz changed the title feat(ontology): refactor ontology input and filter so heavy lifting is moved to backend and we don't need to download complete ontologies feat(ontology): refactor to 'lazy' ontology input and filter so heavy lifting is moved to backend and we don't need to download complete ontologies Jan 14, 2025
@@ -1,7 +1,6 @@
package org.molgenis.emx2.sql;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restructured this file so more changes than needed but I hope will make this huge file more easy to understand

@@ -99,30 +99,29 @@ void testToJsonb() throws JsonProcessingException {
String trailingData = "{\"key\":\"value\"}trailing";
int invalidJavaType = 1;

assertAll(
// valid: object
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor so individual errors are shown

@@ -638,6 +638,7 @@ the following function are available:
- textSearch(value)
- between(value)
- notBetween(value)
- _match_any_in_subtree(name) - use this to filter in ontology columns matching also when overlap exists in children of 'name' term
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo update docs with all these changes

@mswertz
Copy link
Member Author

mswertz commented Jan 29, 2025

use case not coverered is when a data entry says 'select all cancers' and then I search for 'a specific cancer'.

@mswertz mswertz requested a review from chinook25 January 29, 2025 22:51
@mswertz mswertz marked this pull request as ready for review January 30, 2025 08:50
Copy link
Member

@harmbrugge harmbrugge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to go trough the SqlQuery and the actual sql function together, because it's hard for me to fully understand what is going on

.name("MolgenisIsNotNullEnum")
.value(IsNullOrNotNull.NULL.name(), IsNullOrNotNull.NULL)
.value(IsNullOrNotNull.NOT_NULL.name(), IsNullOrNotNull.NOT_NULL)
.build();
private static GraphQLObjectType fileDownload =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my knowledge: Does an enum give you a certain behaviour as a graphql type?

filterBuilder.field(
GraphQLInputObjectField.newInputObjectField()
.name(FILTER_IS)
.type(isNullOrNotNullEnum)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah here we go, the enum gives you fixed options

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense to make this a boolean operator _is_null true false?

value.remove(FILTER_CONTAINS_ALL);
}

if (value.size() == 0) continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is getting a bit chunky

String result =
execute("{Pet(filter:{tags:{_match_any_including_children:\"colors\"}}){name}}").toString();
assertTrue(result.contains("tom"));
assertFalse(result.contains("pooky")); // poor pooky has no color
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sad

column,
DSL.select(
field(
"\"MOLGENIS\".get_terms_including_children({0},{1},{2})",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice, you delegated it to the db

.map(
value ->
condition(
"0 < ( SELECT COUNT(*) FROM unnest({1}) AS v WHERE to_tsquery({0}) @@ to_tsvector(v)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not fully comprehend this

Copy link
Contributor

@chinook25 chinook25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't gone through it all yet, but here are some comments of what I've found thus far.

@@ -286,6 +287,7 @@ export default {
this.key++;
},
deselect(item: string) {
console.log("haat " + item);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't forget to remove

filter[col.id] = { equals: conditions };
} else if (col.columnType.startsWith("ONTOLOGY")) {
filter[col.id] = {
match_including_children: conditions.map((term) => term.name),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TS error, term is supposed to be a string, so term.name is not possible

@@ -577,6 +637,51 @@ public static FilterBean[] convertMapToFilterArray(
+ " unknown in table "
+ table.getTableName());
Column c = optional.get();
Map value = (Map) entry.getValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is quite rough to read due to it's logical complexity. I suggest moving the content of the outer ifs (the ones starting on line 595) to functions to make it more clear what is going on.

}

@Test
void testNullAndNotNullt() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nullt?

.as(name(convertToPascalCase(select.getColumn()))));
}

// asemble final query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assemble

Comment on lines +318 to +320
SelectConnectByStep<org.jooq.Record> from =
fromQuery(table, List.of(asterisk()), column, tableAlias, subAlias, filters, searchTerms);
from = applyLimitOffsetOrderBy(table, select, from, subAlias);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a far immediately be overwritten by a modified version feels a bit weird. Suggest first assigning an intermediate var and have from not be overwritten.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants