Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make empty path return empty set when the graph is empty #1806

Closed

Conversation

RobinTF
Copy link
Collaborator

@RobinTF RobinTF commented Feb 14, 2025

When the graph is empty, the empty path should return an empty set, even when one of the two sides has a binding. For example, consider the following query on an empty graph. So far, QLever would return the binding { "?a": 1, "?b": 1 } because the left size has the binding { "?a": 1 }. But the correct result is to return an empty solution set because the graph is empty. This is now fixed.

SELECT * WHERE {
  VALUES ?a { 1 }
  ?a a? ?b
}

@hannahbast
Copy link
Member

@RobinTF Thanks, Robin, for looking into this. I don't quite understand the description. What exactly is this PR supposed to do?

@RobinTF
Copy link
Collaborator Author

RobinTF commented Feb 14, 2025

@hannahbast It's a bit hard to describe because it's a very rare case. It should fix a compliance test (which we will hopefully see once the builds finish).
So anyways imagine this query (which is more or less similar to the compliance test) being run on an empty graph:

SELECT * WHERE {
  VALUES ?a { 1 }
  ?a a? ?b
}

Because it is run on an empty graph ?a a ?b doesn't match anything, so the graph is empty, so the result should be empty too. This isn't the case currently, because ?a is "bound", so we use that as a starting point for TransitivePath, without checking if it is contained in the graph. This check is rather expensive i.e. linear in time for every value of ?a (we could of course make it more efficient at the cost of higher memory usage), so I tried to make sure it is run only if we can't include it based on any other measures we have.

I hope this clears things up a bit.

EDIT:
Currently QLever would return a result of 1 1, with this change the result is really empty.

Copy link

codecov bot commented Feb 14, 2025

Codecov Report

Attention: Patch coverage is 96.96970% with 1 line in your changes missing coverage. Please review.

Project coverage is 90.06%. Comparing base (1570033) to head (418c92e).

Files with missing lines Patch % Lines
src/engine/TransitivePathImpl.h 93.75% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1806      +/-   ##
==========================================
+ Coverage   90.02%   90.06%   +0.04%     
==========================================
  Files         396      396              
  Lines       37974    37992      +18     
  Branches     4262     4267       +5     
==========================================
+ Hits        34185    34217      +32     
+ Misses       2493     2491       -2     
+ Partials     1296     1284      -12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hannahbast hannahbast changed the title Prevent empty path from matching when not contained in graph Make empty path return empty set when the graph is empty Feb 14, 2025
@hannahbast
Copy link
Member

hannahbast commented Feb 14, 2025

@RobinTF Thanks for the clarification, I have revised the title and description accordingly. I have a follow-up question:

What if the graph is not empty but does not contain the binding (or bindings) for ?a in your example above. Should the result then still be empty? If yes (which I suppose), I should update the title and description accordingly.

And yet more specifically: If some of the bindings are in the graph and some or not, should the solution only contain those bindings that are in the graph?

@RobinTF
Copy link
Collaborator Author

RobinTF commented Feb 14, 2025

@RobinTF Thanks for the clarification, I have revised the title and description accordingly. I have a follow-up question:

What if the graph is not empty but does not contain the binding (or bindings) for ?a in your example above. Should the result then still be empty? If yes (which I suppose), I should update the title and description accordingly.

And yet more specifically: If some of the bindings are in the graph and some or not, should the solution only contain those bindings that are in the graph?

Yes, that's how I see it too. When there is a statement like ?s <some-iri>? ?o or ?s <some-iri>* ?o, then the result can only ever contain values for ?s and ?o where they at some point are part of a triple containing <some-iri> as predicate. Unrelated triples are not supposed to be included is how I understand this.

EDIT: To clarify, it is not important where in the triple (subject or object position) the value is found for the empty path, as long as it is somewhere.

maxDist_));
candidates.push_back(makeTransitivePath(getExecutionContext(),
alternativeSubtree, lhs, rhs,
minDist_, maxDist_, useBinSearch));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes a bug in the unit tests, where both implementations are supposed to get tested, but for bound variables it got reset to the configured default regardless of test configuration

@hannahbast
Copy link
Member

hannahbast commented Feb 14, 2025

@RobinTF OK, that's a completely new spin now. Are you saying that matches for the empty path have to be a subject or object of the underlying predicate or property path? That's not how Johannes and I understood this so far, but it's quite possible that we were wrong. Can you pinpoint from there exactly in the standard you are inferring this?

@sparql-conformance
Copy link

Conformance check passed ✅

Test Status Changes 📊

Number of Tests Previous Status Current Status
1 Failed Passed

Details: https://qlever.cs.uni-freiburg.de/sparql-conformance-ui?cur=418c92eba59fd9f9c895ec83fb3ba40a5f174fdf&prev=1570033d07eb625dd3c2624c866eeb241f8639ef

Copy link

@RobinTF
Copy link
Collaborator Author

RobinTF commented Feb 14, 2025

@hannahbast I had a look at some of the compliance tests and you're right. The values don't have to be related to the predicates. But this means that VALUES has just some arbitrary limitation that it doesn't count for property paths. In this case this has to be solved via query planning, by making sure the value is at least joined once with the index.

@RobinTF RobinTF closed this Feb 14, 2025
@RobinTF RobinTF deleted the fix-empty-path-pattern-match branch February 14, 2025 18:27
@hannahbast
Copy link
Member

@RobinTF I just added #1809 to document what needs to be done for a correct implementation of the empty path. Please have a look and let me know if you have questions or if you think that something is wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants