Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow computing the empty transitive path #1800

Closed
wants to merge 2 commits into from

Conversation

RobinTF
Copy link
Collaborator

@RobinTF RobinTF commented Feb 12, 2025

It seems like the code was already there, so all this PR does is removing the check.

@sparql-conformance
Copy link

Conformance check passed ✅

Test Status Changes 📊

Number of Tests Previous Status Current Status
3 Failed Passed

Details: https://qlever.cs.uni-freiburg.de/sparql-conformance-ui?cur=f21e5611708a6b9e2f2ce86f217792b016173c44&prev=2fad76532e66d45fb9fd7063c6c563daa0eeea42

Copy link

codecov bot commented Feb 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.04%. Comparing base (2fad765) to head (f21e561).
Report is 7 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1800      +/-   ##
==========================================
- Coverage   90.05%   90.04%   -0.01%     
==========================================
  Files         396      396              
  Lines       37928    37921       -7     
  Branches     4262     4260       -2     
==========================================
- Hits        34156    34146      -10     
- Misses       2477     2481       +4     
+ Partials     1295     1294       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hannahbast
Copy link
Member

@RobinTF Awesome that you are looking into this. Will this also remove false negative "This query might have to evaluate the empty path" exception, like for the following query?

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT * WHERE {
  wd:Q11629 wdt:P279/(wdt:P279* | wdt:P31*) | wdt:P31/(wdt:P279* | wdt:P31*) ?type .
}

https://qlever.cs.uni-freiburg.de/wikidata/Zb82ho

@RobinTF
Copy link
Collaborator Author

RobinTF commented Feb 12, 2025

@hannahbast Depends on the query planning, I haven't tried any more complex queries. But this is the only "This query might have to evaluate the empty path" exception there is, so you definitely won't get this exact exception any more.
All I really did is analyse the code and I came to the conclusion that everything should just work if I remove the check, and my tests all did work as expected.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a substantially differen implementation.

Comment on lines -158 to -164
if (minDist_ == 0 && !isBoundOrId() && lhs_.isVariable() &&
rhs_.isVariable()) {
AD_THROW(
"This query might have to evaluate the empty path, which is "
"currently "
"not supported");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is not quite correct.
The empty path must in principle contain everything, not only entities that are in some way contained in the Path with the *. This means, that we have to fix this in the TurtleParser.

@hannahbast
Copy link
Member

hannahbast commented Feb 12, 2025

@RobinTF OK, I just tried this on Wikidata and see what's happening. Let's take a simpler example query (prefixes omitted for brevity):

SELECT * WHERE {
  wd:Q11629 wdt:P31/(wdt:P279*|wdt:P31*) ?class
}

With this PR, the query no longer throws the "empty path" exception, but it also never finishes. The problem is that in the current implementation, QLever evaluates this just as it would evaluate the following query:

SELECT * WHERE {
  wd:Q11629 wdt:P31 ?tmp .
  { ?tmp wdt:P279* ?class } UNION { ?tmp wdt:P31* ?class }
}

But the result for each of the two { ... } is a table with one row per subject, for all subjects, that is of size 2.2 B for Wikidata.

That would eventually give the correct result but takes forever and is obviously not how the query should be evaluated.

@RobinTF
Copy link
Collaborator Author

RobinTF commented Feb 12, 2025

@hannahbast Can you clarify? The issue you described seems just like poor query planning to me, where the query planner performs the join operation in a very inefficient position. If you replace the asterisk in your query with + you get the very same suboptimal query tree, the only difference is that it contains fewer results, so it doesn't crash. This doesn't seem to be an issue with the empty path if you ask me.

@hannahbast
Copy link
Member

@RobinTF My point is the following:

  1. Almost all of the real-world queries, for which QLever currently throws the "This query might have to evaluate the empty path" exception, are of the kind of my examples above.

  2. Queries where you actually have to evaluate the empty path are very rare in practice. An simple example with a rather useless result would be SELECT * WHERE { ?x wdt:P31* ?y }. I have to think about a simple useful example.

  3. Therefore, a PR that correctly evaluates the empty path but does not fix the query planning issue, would not make things better and rather worse (because queries that threw an exception before would then take forever without users understanding why).

  4. I therefore think that the two issues (evaluating the empty path when there is no other way and avoiding the evaluation of the empty path unless there is no other way) should either be handled by the same change or the latter before the former.

@RobinTF
Copy link
Collaborator Author

RobinTF commented Feb 12, 2025

@hannahbast Thank you for clarifying. I wanted to have a look at implementing negated paths anyways, so I spent some time today thinking about query planning of this particular case. So I might just start with the issue at hand first.
A small thought: If sort supported on-disk laziness (I talked with @joka921 briefly about this), then it would still take ages, but it would eventually finish

@RobinTF
Copy link
Collaborator Author

RobinTF commented Feb 14, 2025

This needs some other stuff to happen beforehand

@RobinTF RobinTF closed this Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants