Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-2924: Lateral - fixed injection for tables and enhanced QueryExec API for easier testing. #2925

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions jena-arq/src/main/java/org/apache/jena/query/ResultSet.java
Original file line number Diff line number Diff line change
Expand Up @@ -103,4 +103,8 @@ public default ResultSet materialise() {
}

public void close();

default RowSet asRowSet() {
return RowSet.adapt(this);
}
}
19 changes: 12 additions & 7 deletions jena-arq/src/main/java/org/apache/jena/sparql/algebra/Algebra.java
Original file line number Diff line number Diff line change
Expand Up @@ -161,22 +161,27 @@ public static Binding merge(Binding bindingLeft, Binding bindingRight) {

// If compatible, merge. Iterate over variables in right but not in left.
BindingBuilder b = Binding.builder(bindingLeft);
for ( Iterator<Var> vIter = bindingRight.vars() ; vIter.hasNext() ; ) {
Var v = vIter.next();
Node n = bindingRight.get(v);
bindingRight.forEach((v, n) -> {
if ( !bindingLeft.contains(v) )
b.add(v, n);
}
});
return b.build();
}

public static boolean compatible(Binding bindingLeft, Binding bindingRight) {
// Test to see if compatible: Iterate over variables in left
for ( Iterator<Var> vIter = bindingLeft.vars() ; vIter.hasNext() ; ) {
Var v = vIter.next();
return compatible(bindingLeft, bindingRight, bindingLeft.vars());
}

/** Test to see if bindings are compatible for all variables of the provided iterator. */
public static boolean compatible(Binding bindingLeft, Binding bindingRight, Iterator<Var> vars) {
while (vars.hasNext() ) {
Var v = vars.next();
Node nLeft = bindingLeft.get(v);
Node nRight = bindingRight.get(v);
if ( nLeft == null )
continue;

Node nRight = bindingRight.get(v);
if ( nRight != null && !nRight.equals(nLeft) )
return false;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@

package org.apache.jena.sparql.algebra;

import java.util.ArrayList ;
import java.util.List ;

import org.apache.jena.graph.Node ;
Expand All @@ -27,31 +28,45 @@
import org.apache.jena.sparql.algebra.table.TableUnit ;
import org.apache.jena.sparql.core.Var ;
import org.apache.jena.sparql.engine.QueryIterator ;
import org.apache.jena.sparql.engine.binding.Binding ;
import org.apache.jena.sparql.exec.RowSet ;

public class TableFactory
{
public static Table createUnit()
{ return new TableUnit() ; }

public static Table createEmpty()
{ return new TableEmpty() ; }

public static Table create()
{ return new TableN() ; }

public static Table create(List<Var> vars)
{ return new TableN(vars) ; }

public static Table create(QueryIterator queryIterator)
{
{
if ( queryIterator.isJoinIdentity() ) {
queryIterator.close();
return createUnit() ;
}

return new TableN(queryIterator) ;
}

public static Table create(Var var, Node value)
{ return new Table1(var, value) ; }

/** Creates a mutable table from the detached bindings of the row set. */
public static Table create(RowSet rs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what going on. Could you explain this please?

This seems to duplicate the intent of RowSet.materialize and RowSet.rewindable.

Copy link
Contributor Author

@Aklakan Aklakan Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference (to my understanding) is that a Table acts as a Collection and a RowSet as an Iterator.
A Table can thus be seen as a factory for RowSets via Table.toRowSet() - each obtained RowSet is an independent iterator over the Table.
My qualms with RowSet.materialize/rewindable is that these methods operate on the iterator level; it seemed clean to me to add Table as a collection of bindings.

{
List<Var> vars = new ArrayList<>(rs.getResultVars());
List<Binding> list = new ArrayList<>();
rs.forEach(row -> {
Binding b = row.detach();
list.add(b);
});
return new TableN(vars, list);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -18,23 +18,21 @@

package org.apache.jena.sparql.algebra.table ;

import java.util.Collections;
import java.util.List ;

import org.apache.jena.sparql.ARQException ;
import org.apache.jena.sparql.core.Var ;
import org.apache.jena.sparql.engine.binding.Binding ;

/** Immutable table. */
public class TableData extends TableN {
public TableData(List<Var> variables, List<Binding> rows) {
super(variables, rows) ;
super(Collections.unmodifiableList(variables), Collections.unmodifiableList(rows)) ;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableData appears to be intended as immutable but this was so far not enforced - one could call getRows() and modify them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collections.unmodifiableList only puts a wrapper around the base list. The base variables and rows can be modified.

Copy link
Contributor Author

@Aklakan Aklakan Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor already existed. The wrappers prevent mutation via TableData.getRows().add() which wasn't the case before. Its not perfect, but from your comment

It should be "immutable once built"

its a step closer towards preventing incorrect use of TableData.

}

@Override
public void addBinding(Binding binding) {
throw new ARQException("Can't add bindings to an existing data table") ;
}

public List<Binding> getRows() {
return rows ;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,15 @@
import java.util.ArrayList ;
import java.util.Iterator ;
import java.util.List ;
import java.util.Objects ;

import org.apache.jena.sparql.core.Var ;
import org.apache.jena.sparql.engine.ExecutionContext ;
import org.apache.jena.sparql.engine.QueryIterator ;
import org.apache.jena.sparql.engine.binding.Binding ;
import org.apache.jena.sparql.engine.iterator.QueryIterPlainWrapper ;

/** Mutable table. */
public class TableN extends TableBase {
protected List<Binding> rows = new ArrayList<>() ;
protected List<Var> vars = new ArrayList<>() ;
Expand All @@ -48,16 +50,13 @@ public TableN(QueryIterator qIter) {
materialize(qIter) ;
}

protected TableN(List<Var> variables, List<Binding> rows) {
this.vars = variables ;
this.rows = rows ;
public TableN(List<Var> variables, List<Binding> rows) {
this.vars = Objects.requireNonNull(variables) ;
this.rows = Objects.requireNonNull(rows) ;
}

private void materialize(QueryIterator qIter) {
while (qIter.hasNext()) {
Binding binding = qIter.nextBinding() ;
addBinding(binding) ;
}
qIter.forEachRemaining(this::addBinding);
qIter.close() ;
}

Expand Down Expand Up @@ -105,4 +104,8 @@ public List<String> getVarNames() {
public List<Var> getVars() {
return vars ;
}

public List<Binding> getRows() {
return rows;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -90,4 +90,11 @@ public default boolean contains(String varName) {

@Override
public boolean equals(Object other);

/**
* Returns a binding which is guaranteed to be independent of
* any resources such as an ongoing query execution or a disk-based dataset.
* May return itself if it is already detached.
*/
public Binding detach();
}
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,9 @@ protected void forEach1(BiConsumer<Var, Node> action) { }

@Override
protected Node get1(Var var) { return null; }

@Override
protected Binding detachWithNewParent(Binding newParent) {
return new Binding0(newParent);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -70,4 +70,9 @@ protected Node get1(Var v) {
return value;
return null;
}

@Override
protected Binding detachWithNewParent(Binding newParent) {
return new Binding1(newParent, var, value);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -77,4 +77,9 @@ protected Node get1(Var v)
return value2;
return null;
}

@Override
protected Binding detachWithNewParent(Binding newParent) {
return new Binding2(newParent, var1, value1, var2, value2);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -132,4 +132,9 @@ protected Node get1(Var var) {

return null;
}

@Override
protected Binding detachWithNewParent(Binding newParent) {
return new Binding3(newParent, var1, value1, var2, value2, var3, value3);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -154,4 +154,9 @@ protected Node get1(Var var) {

return null;
}

@Override
protected Binding detachWithNewParent(Binding newParent) {
return new Binding4(newParent, var1, value1, var2, value2, var3, value3, var4, value4);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -202,4 +202,19 @@ public static int hashCode(Binding bind) {
}
return hash;
}

@Override
public Binding detach() {
Binding newParent = (parent == null) ? null : parent.detach();
Binding result = (newParent == parent)
? detachWithOriginalParent()
: detachWithNewParent(newParent);
return result;
}

protected Binding detachWithOriginalParent() {
return this;
}

protected abstract Binding detachWithNewParent(Binding newParent);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be nicer in BindingFactory ... if we were using Java21 and could use switch patterns. Oh well, another time.

Copy link
Contributor Author

@Aklakan Aklakan Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking that BindingFactory could take care of making bindings independent. But then again, why should BindingFactory have special logic for BindingTDB instances? In principle, BindingTDB.detach could be implemented as just filling the already present (but unused) cache - so no copy would be needed then.

}
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,9 @@ protected int size1() {
protected boolean isEmpty1() {
return map.isEmpty();
}

@Override
protected Binding detachWithNewParent(Binding newParent) {
return new BindingOverMap(newParent, map);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,17 @@ public BindingProject(Collection<Var> vars, Binding bind) {
protected boolean accept(Var var) {
return projectionVars.contains(var) ;
}

@Override
public Binding detach() {
Binding b = binding.detach();
return b == binding
? this
: new BindingProject(projectionVars, b);
}

@Override
protected Binding detachWithNewParent(Binding newParent) {
throw new UnsupportedOperationException("Should never be called.");
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@
import org.apache.jena.graph.Node ;
import org.apache.jena.sparql.core.Var ;

/** Common framework for projection;
/** Common framework for projection;
* the projection policy is provided by
* abstract method {@link #accept(Var)}
* abstract method {@link #accept(Var)}
*/
public abstract class BindingProjectBase extends BindingBase {
private List<Var> actualVars = null ;
private final Binding binding ;
protected final Binding binding ;

public BindingProjectBase(Binding bind) {
super(null) ;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,17 @@ public BindingProjectNamed(Binding bind) {
protected boolean accept(Var var) {
return var.isNamedVar() ;
}

@Override
public Binding detach() {
Binding b = binding.detach();
return b == binding
? this
: new BindingProjectNamed(b);
}

@Override
protected Binding detachWithNewParent(Binding newParent) {
throw new UnsupportedOperationException("Should never be called.");
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,9 @@ private BindingRoot() {
public void format1(StringBuilder sBuff) {
sBuff.append("[Root]");
}

@Override
protected Binding detachWithNewParent(Binding newParent) {
return this;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
package org.apache.jena.sparql.engine.iterator;

import java.util.ArrayList;
import java.util.Collection;
import java.util.HashSet;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Set;
Expand All @@ -27,12 +29,13 @@
import org.apache.jena.atlas.lib.SetUtils;
import org.apache.jena.graph.Node;
import org.apache.jena.graph.Triple;
import org.apache.jena.sparql.algebra.Algebra;
import org.apache.jena.sparql.algebra.Op;
import org.apache.jena.sparql.algebra.Table;
import org.apache.jena.sparql.algebra.TransformCopy;
import org.apache.jena.sparql.algebra.op.*;
import org.apache.jena.sparql.algebra.table.Table1;
import org.apache.jena.sparql.algebra.table.TableN;
import org.apache.jena.sparql.algebra.table.TableData;
import org.apache.jena.sparql.core.*;
import org.apache.jena.sparql.engine.ExecutionContext;
import org.apache.jena.sparql.engine.QueryIterator;
Expand Down Expand Up @@ -292,17 +295,33 @@ public Op transform(OpTable opTable) {
// By the assignment restriction, the binding only needs to be added to each row of the table.
Table table = opTable.getTable();
// Table vars.
List<Var> vars = new ArrayList<>(table.getVars());
binding.vars().forEachRemaining(vars::add);
TableN table2 = new TableN(vars);
List<Var> tableVars = table.getVars();
List<Var> vars = new ArrayList<>(tableVars);

// Track variables that appear both in the table and the binding.
List<Var> commonVars = new ArrayList<>();

// Index variables in a set if there are more than a few of them.
Collection<Var> tableVarsIndex = tableVars.size() > 4 ? new HashSet<>(tableVars) : tableVars;
binding.vars().forEachRemaining(v -> {
if (tableVarsIndex.contains(v)) {
Copy link
Member

@afs afs Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table vars can't contain a binding var (or am I missing something?). That case is caught by the text

https://github.com/w3c/sparql-dev/blob/main/SEP/SEP-0007/sep-0007.md
"Disallow syntactic forms that set variables that may already be present in the current row."

It's tested in SyntaxVarScope.checkLATEAL.

(I removed the if, left the else and the two tests pass.)

If so, it simplifies to vars.add(v);
And hence "vars = tableVars" (copy not needed).

Copy link
Contributor Author

@Aklakan Aklakan Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(See comment on the line containing Algebra.compatible)

commonVars.add(v);
} else {
vars.add(v);
}
});

List<Binding> bindings = new ArrayList<>(table.size());
BindingBuilder builder = BindingFactory.builder();
table.iterator(null).forEachRemaining(row->{
builder.reset();
builder.addAll(row);
builder.addAll(binding);
table2.addBinding(builder.build());
Copy link
Contributor Author

@Aklakan Aklakan Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addBinding checks whether the table's variables need to be updated.
This check is not needed here, so in the revised code I just collect the bindings in a list instead and create the table afterwards.

if (Algebra.compatible(row, binding, commonVars.iterator())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a variable in common from substitution it must be the same term.

So this is a contains relationship. (Given the comments above there may be a simpler way.)

Copy link
Contributor Author

@Aklakan Aklakan Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is potentially another can of worms I was about to open w.r.t. sep7:

Perhaps its worthy of opening an issue and discussion on sparql-dev. The reason I added the compatibility check was because I had the following query in mind when writing the tests, which for reasons I do not yet understand is currently prohibited.
Removing the restrictions in SyntaxVarScope.checkLATERAL on Element_Data (and Element_Bind) and adding the check for compatible bindings (effectively filtering bindings of tables down to those compatible with the current row; as done in the PR) naturally makes the query work.

SELECT * {
  VALUES ?department {
    <urn:dept1>
    <urn:dept2>
  }
  LATERAL {
    SELECT * {
      VALUES (?department ?employee) {
        ( <urn:dept1> <urn:person1> )
        ( <urn:dept1> <urn:person2> )
        ( <urn:dept2> <urn:person1> )
      }
    } ORDER BY ?employee LIMIT 1
  }
}

My expectation:

department employee
urn:dept1 urn:person1
urn:dept2 urn:person1

Actual result:
Scoping error due to check in SyntaxVarScope.checkLATERAL .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There must be a problem with SyntaxVarScope.checkLATERAL because the SELECT * is effectively SELECT ?employee which works.

Copy link
Contributor Author

@Aklakan Aklakan Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT * is effectively SELECT ?employee which works.

Hm, depending on the viewpoint it is the effective query after substitution, but before that, it's SELECT ?department ?employee as to make ?department in-scope such that the substitution could affect the table and discard its incompatible rows.

I think the scope error is in-line with the SEP7 statement:

"Disallow syntactic forms that set variables that may already be present in the current row."

But I think that this is too restrictive, because it prevents several useful cases for inlining data using VALUES blocks. IMO if substitution leads to incompatible bindings then they should just be discarded as usual - no? Perhaps you have corner cases in mind where under this logic the substitution could become ambiguous and that this might be the reason for the restrictions.

Removing the restrictions on Element_Bind and Element_Data in checkLATERAL in order to make the above "department-employee" query work causes the following test cases to fail because they no longer raise an varscope/syntax error:

syntax-lateral-bad-{04, 05, 06, 08}.arq

For example, syntax-lateral-bad-04:

SELECT * {
   ?s ?p ?o
   LATERAL { BIND( 123 AS ?o) }
}

This is certainly not an efficient way to write a query, but I don't understand why the query should be syntactically illegal in the first place. To me it looks like it would only retain bindings where ?o = 123. So it would be the task of an optimizer to rewrite the Lateral/Bind to a plain FILTER(?o = 123).

When disabling the SyntaxVarScope check for ElementBind, each ?o gets injected into the unit table beneath (extend ((?o 123)) (table unit)) such as (extend ((?o 123)) (table (row (?o 456)))) and the evaluation of OpExtend discards the incompatible binding - which seems like the natural way it should be to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, depending on the viewpoint it is the effective query after substitution,

My bad - I was thinking about GROUP BY.

For ORDER BY, the query is illegal by the assignment restriction.

To me it looks like it would only retain bindings where ?o = 123.

That would turn BIND into a filter. (ARQ's extension LET does that.)

We have to have a restriction of some kind. The SEP-0007 design makes it a compile-time condition (test once per query).

In general, some system have optimizations that on the static analysis of the query. Making the RHS of a LATERAL have runtime variance would interfere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have to get Jena 5.3.0 out without this.

This issue is important and something that is, I hope, going to get more importance in the SPARQL community.

WG sparql-query PR 177 at least gets the essential machinery into the resp with the WG contribution licnesing. A small step, but a step.

Copy link
Contributor Author

@Aklakan Aklakan Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, some system have optimizations that on the static analysis of the query.

Well, the tradeoff to what extend existing implementations need to drive restrictions of new features in a standard.

For the sake of closing the issue, should I modify/simplify the logic in QueryIterLateral or should I leave the slightly more general approach as it is for now? Essentially, the var scope check should prevent non-empty sets of commonVars. No, commonVars is non-empty in the case of nested laterals - because the outer lateral already injects its binding into the unit tables - so subsequent inner laterals attempt to do the same injection. Perhaps the way it is is safe - the alternative is to remove the compatibility check on commonVars (and the commonVars set itself) - because the injections should be compatible in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have to get Jena 5.3.0 out without this.

No worries, whenever it's ready.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of closing the issue, should I modify/simplify the logic

Yes, please.

Well, the tradeoff to what extend existing implementations need to drive restrictions of new features in a standard.

SHACL WG has started and SHACL Values Insertion (which is similar but different) will need SPARQL syntax level solution, which implies static rules for scoping. As will parameterized query.

I hope, in spirit, these different situations have a related approach.

builder.reset();
builder.addAll(row);
binding.forEach(builder::set);
bindings.add(builder.build());
}
});
return OpTable.create(table2);
return OpTable.create(new TableData(vars, bindings));
}

private Triple applyReplacement(Triple triple, Function<Var, Node> replacement) {
Expand Down
Loading