-
Notifications
You must be signed in to change notification settings - Fork 659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-2924: Lateral - fixed injection for tables and enhanced QueryExec API for easier testing. #2925
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,23 +18,21 @@ | |
|
||
package org.apache.jena.sparql.algebra.table ; | ||
|
||
import java.util.Collections; | ||
import java.util.List ; | ||
|
||
import org.apache.jena.sparql.ARQException ; | ||
import org.apache.jena.sparql.core.Var ; | ||
import org.apache.jena.sparql.engine.binding.Binding ; | ||
|
||
/** Immutable table. */ | ||
public class TableData extends TableN { | ||
public TableData(List<Var> variables, List<Binding> rows) { | ||
super(variables, rows) ; | ||
super(Collections.unmodifiableList(variables), Collections.unmodifiableList(rows)) ; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TableData appears to be intended as immutable but this was so far not enforced - one could call getRows() and modify them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The constructor already existed. The wrappers prevent mutation via
its a step closer towards preventing incorrect use of |
||
} | ||
|
||
@Override | ||
public void addBinding(Binding binding) { | ||
throw new ARQException("Can't add bindings to an existing data table") ; | ||
} | ||
|
||
public List<Binding> getRows() { | ||
return rows ; | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -202,4 +202,19 @@ public static int hashCode(Binding bind) { | |
} | ||
return hash; | ||
} | ||
|
||
@Override | ||
public Binding detach() { | ||
Binding newParent = (parent == null) ? null : parent.detach(); | ||
Binding result = (newParent == parent) | ||
? detachWithOriginalParent() | ||
: detachWithNewParent(newParent); | ||
return result; | ||
} | ||
|
||
protected Binding detachWithOriginalParent() { | ||
return this; | ||
} | ||
|
||
protected abstract Binding detachWithNewParent(Binding newParent); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would be nicer in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was also thinking that BindingFactory could take care of making bindings independent. But then again, why should BindingFactory have special logic for BindingTDB instances? In principle, |
||
} |
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -19,6 +19,8 @@ | |||||||
package org.apache.jena.sparql.engine.iterator; | ||||||||
|
||||||||
import java.util.ArrayList; | ||||||||
import java.util.Collection; | ||||||||
import java.util.HashSet; | ||||||||
import java.util.LinkedHashSet; | ||||||||
import java.util.List; | ||||||||
import java.util.Set; | ||||||||
|
@@ -27,12 +29,13 @@ | |||||||
import org.apache.jena.atlas.lib.SetUtils; | ||||||||
import org.apache.jena.graph.Node; | ||||||||
import org.apache.jena.graph.Triple; | ||||||||
import org.apache.jena.sparql.algebra.Algebra; | ||||||||
import org.apache.jena.sparql.algebra.Op; | ||||||||
import org.apache.jena.sparql.algebra.Table; | ||||||||
import org.apache.jena.sparql.algebra.TransformCopy; | ||||||||
import org.apache.jena.sparql.algebra.op.*; | ||||||||
import org.apache.jena.sparql.algebra.table.Table1; | ||||||||
import org.apache.jena.sparql.algebra.table.TableN; | ||||||||
import org.apache.jena.sparql.algebra.table.TableData; | ||||||||
import org.apache.jena.sparql.core.*; | ||||||||
import org.apache.jena.sparql.engine.ExecutionContext; | ||||||||
import org.apache.jena.sparql.engine.QueryIterator; | ||||||||
|
@@ -292,17 +295,33 @@ public Op transform(OpTable opTable) { | |||||||
// By the assignment restriction, the binding only needs to be added to each row of the table. | ||||||||
Table table = opTable.getTable(); | ||||||||
// Table vars. | ||||||||
List<Var> vars = new ArrayList<>(table.getVars()); | ||||||||
binding.vars().forEachRemaining(vars::add); | ||||||||
TableN table2 = new TableN(vars); | ||||||||
List<Var> tableVars = table.getVars(); | ||||||||
List<Var> vars = new ArrayList<>(tableVars); | ||||||||
|
||||||||
// Track variables that appear both in the table and the binding. | ||||||||
List<Var> commonVars = new ArrayList<>(); | ||||||||
|
||||||||
// Index variables in a set if there are more than a few of them. | ||||||||
Collection<Var> tableVarsIndex = tableVars.size() > 4 ? new HashSet<>(tableVars) : tableVars; | ||||||||
binding.vars().forEachRemaining(v -> { | ||||||||
if (tableVarsIndex.contains(v)) { | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The table vars can't contain a binding var (or am I missing something?). That case is caught by the text https://github.com/w3c/sparql-dev/blob/main/SEP/SEP-0007/sep-0007.md It's tested in (I removed the If so, it simplifies to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (See comment on the line containing |
||||||||
commonVars.add(v); | ||||||||
} else { | ||||||||
vars.add(v); | ||||||||
} | ||||||||
}); | ||||||||
|
||||||||
List<Binding> bindings = new ArrayList<>(table.size()); | ||||||||
BindingBuilder builder = BindingFactory.builder(); | ||||||||
table.iterator(null).forEachRemaining(row->{ | ||||||||
builder.reset(); | ||||||||
builder.addAll(row); | ||||||||
builder.addAll(binding); | ||||||||
table2.addBinding(builder.build()); | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. addBinding checks whether the table's variables need to be updated. |
||||||||
if (Algebra.compatible(row, binding, commonVars.iterator())) { | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If there is a variable in common from substitution it must be the same term. So this is a contains relationship. (Given the comments above there may be a simpler way.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is potentially another can of worms I was about to open w.r.t. sep7: Perhaps its worthy of opening an issue and discussion on sparql-dev. The reason I added the compatibility check was because I had the following query in mind when writing the tests, which for reasons I do not yet understand is currently prohibited. SELECT * {
VALUES ?department {
<urn:dept1>
<urn:dept2>
}
LATERAL {
SELECT * {
VALUES (?department ?employee) {
( <urn:dept1> <urn:person1> )
( <urn:dept1> <urn:person2> )
( <urn:dept2> <urn:person1> )
}
} ORDER BY ?employee LIMIT 1
}
} My expectation:
Actual result: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There must be a problem with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Hm, depending on the viewpoint it is the effective query after substitution, but before that, it's I think the scope error is in-line with the SEP7 statement:
But I think that this is too restrictive, because it prevents several useful cases for inlining data using VALUES blocks. IMO if substitution leads to incompatible bindings then they should just be discarded as usual - no? Perhaps you have corner cases in mind where under this logic the substitution could become ambiguous and that this might be the reason for the restrictions. Removing the restrictions on syntax-lateral-bad-{04, 05, 06, 08}.arq For example, SELECT * {
?s ?p ?o
LATERAL { BIND( 123 AS ?o) }
} This is certainly not an efficient way to write a query, but I don't understand why the query should be syntactically illegal in the first place. To me it looks like it would only retain bindings where ?o = 123. So it would be the task of an optimizer to rewrite the When disabling the SyntaxVarScope check for ElementBind, each ?o gets injected into the unit table beneath There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
My bad - I was thinking about For
That would turn We have to have a restriction of some kind. The SEP-0007 design makes it a compile-time condition (test once per query). In general, some system have optimizations that on the static analysis of the query. Making the RHS of a LATERAL have runtime variance would interfere. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I have to get Jena 5.3.0 out without this. This issue is important and something that is, I hope, going to get more importance in the SPARQL community. WG sparql-query PR 177 at least gets the essential machinery into the resp with the WG contribution licnesing. A small step, but a step. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Well, the tradeoff to what extend existing implementations need to drive restrictions of new features in a standard. For the sake of closing the issue, should I modify/simplify the logic in QueryIterLateral or should I leave the slightly more general approach as it is for now? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No worries, whenever it's ready. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, please.
SHACL WG has started and SHACL Values Insertion (which is similar but different) will need SPARQL syntax level solution, which implies static rules for scoping. As will parameterized query. I hope, in spirit, these different situations have a related approach. |
||||||||
builder.reset(); | ||||||||
builder.addAll(row); | ||||||||
binding.forEach(builder::set); | ||||||||
bindings.add(builder.build()); | ||||||||
} | ||||||||
}); | ||||||||
return OpTable.create(table2); | ||||||||
return OpTable.create(new TableData(vars, bindings)); | ||||||||
} | ||||||||
|
||||||||
private Triple applyReplacement(Triple triple, Function<Var, Node> replacement) { | ||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what going on. Could you explain this please?
This seems to duplicate the intent of
RowSet.materialize
andRowSet.rewindable
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference (to my understanding) is that a Table acts as a Collection and a RowSet as an Iterator.
A Table can thus be seen as a factory for RowSets via
Table.toRowSet()
- each obtained RowSet is an independent iterator over the Table.My qualms with RowSet.materialize/rewindable is that these methods operate on the iterator level; it seemed clean to me to add Table as a collection of bindings.