Skip to content

Commit

Permalink
Merge pull request #98 from w3c/Issue95
Browse files Browse the repository at this point in the history
Turn the input to Group(..) and to Aggregation(..) into solution sequences
  • Loading branch information
hartig authored Jun 29, 2023
2 parents 35a4a45 + 8e30456 commit ae554d5
Showing 1 changed file with 92 additions and 72 deletions.
164 changes: 92 additions & 72 deletions spec/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -8745,7 +8745,11 @@ <h5>Grouping and Aggregation</h5>
<p>Step: GROUP BY</p>
<p>If the <code>GROUP BY</code> keyword is used, or there is implicit grouping due to the
use of aggregates in the projection, then grouping is performed by the
<a href="#defn_algGroup">Group</a> function. It divides the solution set into groups of one or
<a href="#defn_algGroup">Group</a> function.
In this case, before grouping, the solution set is converted into a solution
sequence by applying the <a href="#defn_algToList">ToList</a> function.
Next, the <a href="#defn_algGroup">Group</a> function
divides this solution sequence into groups of one or
more solutions, with the same overall cardinality. In case of implicit grouping, a fixed
constant (1) is used to group all solutions into a single group.</p>
<p>Step: Aggregates</p>
Expand All @@ -8765,9 +8769,9 @@ <h5>Grouping and Aggregation</h5>
Let E := [], a list of pairs of the form (variable, expression)

If Q contains GROUP BY exprlist
Let G := Group(exprlist, P)
Let G := Group(exprlist, ToList(P))
Else If Q contains an aggregate in SELECT, HAVING, ORDER BY
Let G := Group((1), P)
Let G := Group((1), ToList(P))
Else
skip the rest of the aggregate step
End
Expand Down Expand Up @@ -9415,10 +9419,10 @@ <h4>Aggregate Algebra</h4>
<div id="defn_algGroup">
<b>Definition: Group</b>
</div>
<p>Group evaluates a list of expressions against a solution sequence, producing a set
<p>Group evaluates a list of expressions against a solution sequence Ψ, producing a set
of partial functions from keys to solution sequences.</p>
<p>Group(exprlist, Ω) = { ListEval(exprlist, μ) → { μ' | μ' in Ω, ListEval(exprlist, μ)
= ListEval(exprlist, μ') } | μ in Ω }</p>
<p>Group(exprlist, Ψ) = { ListEval(exprlist, μ) → [ μ' | μ' in Ψ, ListEval(exprlist, μ)
= ListEval(exprlist, μ') ] | μ in Ψ }</p>
</div>
<div class="defn">
<p><b>Definition: ListEval</b></p>
Expand All @@ -9441,22 +9445,37 @@ <h4>Aggregate Algebra</h4>
</div>
<p>Let <i>exprlist</i> be a list of expressions or *, <i>func</i> a set function,
<i>scalarvals</i> a set of partial functions (possibly empty) passed from the aggregate
in the query, and let { key<sub>1</sub>→Ω<sub>1</sub>, ...,
key<sub>m</sub>→Ω<sub>m</sub> } be a multiset of partial functions from keys to
in the query, and let { key<sub>1</sub>→Ψ<sub>1</sub>, ...,
key<sub>m</sub>→Ψ<sub>m</sub> } be a set of partial functions from keys to
solution sequences as produced by the grouping step.</p>
<p>Aggregation applies the set function func to the given multiset and produces a
single value for each key and partition of solutions for that key.</p>
<p>Aggregation(exprlist, func, scalarvals, { key<sub>1</sub>→Ω<sub>1</sub>, ...,
key<sub>m</sub>→Ω<sub>m</sub> } )<br>
&nbsp;&nbsp;&nbsp;= { (key, F(Ω)) | key → Ω in { key<sub>1</sub>→Ω<sub>1</sub>, ...,
key<sub>m</sub>→Ω<sub>m</sub> } }</p>
<p>Aggregation applies the set function func to the given set and produces a
single value for each key and group of solutions for that key.</p>
<p>Aggregation(exprlist, func, scalarvals, { key<sub>1</sub>→Ψ<sub>1</sub>, ...,
key<sub>m</sub>→Ψ<sub>m</sub> } )<br>
&nbsp;&nbsp;&nbsp;= { (key, F(Ψ)) | key → Ψ in { key<sub>1</sub>→Ψ<sub>1</sub>, ...,
key<sub>m</sub>→Ψ<sub>m</sub> } }</p>
<p>where<br>
&nbsp;&nbsp;M(Ω) = { ListEval(exprlist, μ) | μ in Ω }<br>
&nbsp;&nbsp;F(Ω) = func(M(Ω), scalarvals), for non-DISTINCT<br>
&nbsp;&nbsp;F(Ω) = func(Distinct(M(Ω)), scalarvals), for DISTINCT</p>
&nbsp;&nbsp;M(Ψ) = [ ListEval(exprlist, μ) | μ in Ψ ]<br>
&nbsp;&nbsp;F(Ψ) = func(M(Ψ), scalarvals), for non-<code>DISTINCT</code><br>
&nbsp;&nbsp;F(Ψ) = func(Dedup(M(Ψ)), scalarvals), for <code>DISTINCT</code></p>
<p>with Dedup(M(Ψ)) being an order-preserving, duplicate-free version of the sequence M(Ψ); that is, Dedup(M(Ψ)) is a sequence of RDF terms that has the following four properties.</p>
<ol>
<li>Every unique element in M(Ψ) is contained in Dedup(M(Ψ)).</li>
<li>Every element in Dedup(M(Ψ)) is contained in M(Ψ).</li>
<li>Dedup(M(Ψ)) is free of duplicates. That is, the element at the |i|-th position in Dedup(M(Ψ)) is not the same term as the element at the |j|-th position in Dedup(M(Ψ)) for every two natural numbers |i| and |j| such that |i| &ne; |j|.</li>
<li>For any two elements <var>e<sub>1</sub></var> and <var>e<sub>2</sub></var> in Dedup(M(Ψ)), the relative order of their first occurrences in M(Ψ) is preserved in Dedup(M(Ψ)). That is, if <var>i<sub>1</sub></var>&nbsp;&lt;&nbsp;<var>i<sub>2</sub></var>, then <var>j<sub>1</sub></var>&nbsp;&lt;&nbsp;<var>j<sub>2</sub></var>, where
<ul>
<li><var>i<sub>1</sub></var> is the smallest natural number such that <var>e<sub>1</sub></var> is at the <var>i<sub>1</sub></var>-th position in M(Ψ),</li>
<li><var>i<sub>2</sub></var> is the smallest natural number such that <var>e<sub>2</sub></var> is at the <var>i<sub>2</sub></var>-th position in M(Ψ),</li>
<li><var>j<sub>1</sub></var> is the position of <var>e<sub>1</sub></var> in Dedup(M(Ψ)), and</li>
<li><var>j<sub>2</sub></var> is the position of <var>e<sub>2</sub></var> in Dedup(M(Ψ)).</li>
</ul>
</li>
</ol>

<p><b>Special Case:</b> when <code>COUNT</code> is used with the expression
<code>*</code> the value of F will be the cardinality of the group solution sequence,
<code>card[Ω]</code>, or <code>card[Distinct(Ω)]</code> if the <code>DISTINCT</code>
<code>card[Ψ]</code>, or <code>card[Dedup(Ψ)]</code> if the <code>DISTINCT</code>
keyword is present.</p>
</div>
<p><i>scalarvals</i> are used to pass values to the underlying set function, bypassing
Expand All @@ -9466,7 +9485,7 @@ <h4>Aggregate Algebra</h4>
<p>All aggregates may have the <code>DISTINCT</code> keyword as the first token in their
argument list. If this keyword is present then first argument to func is Distinct(M).</p>
<p>Example</p>
<p>Given a solution multiset (Ω) with the following values:</p>
<p>Given a solution sequence Ψ with the following values:</p>
<table>
<tbody>
<tr>
Expand Down Expand Up @@ -9497,10 +9516,10 @@ <h4>Aggregate Algebra</h4>
</table>
<p>And the query expression SELECT (ex:agg(?y, ?z) AS ?agg) WHERE { ?x ?y ?z } GROUP BY
?x.</p>
<p>We produce G = Group((?x), Ω) = { ( (1), { μ<sub>1</sub>, μ<sub>2</sub> } ), ( (2), {
μ<sub>3</sub> } ) }</p>
<p>We produce G = Group((?x), Ψ) = { (1) → [μ<sub>1</sub>, μ<sub>2</sub>], (2) →
[μ<sub>3</sub>] }</p>
<p>And so Aggregation((?y, ?z), ex:agg, {}, G) =<br>
{ ((1), eg:agg({(2, 3), (3, 4)}, {})), ((2), eg:agg({(5, 6)}, {})) }.</p>
{ ((1), eg:agg([(2, 3), (3, 4)], {})), ((2), eg:agg([(5, 6)], {})) }.</p>
<div class="defn">
<p><b>Definition: AggregateJoin</b></p>
<p>Let S<sub>1</sub>, ..., S<sub>n</sub> be a list of sets, where each set
Expand All @@ -9511,24 +9530,24 @@ <h4>Aggregate Algebra</h4>
..., agg<sub>n</sub>→val<sub>n</sub> | key in K and key→val<sub>i</sub> in
S<sub>i</sub> for each 1 &lt;= i &lt;= n }</p>
</div>
<p>Flatten is a function which is used to collapse multisets of lists into a multiset, so
for example { (1, 2), (3, 4) } becomes { 1, 2, 3, 4 }.</p>
<p>Flatten is a function which is used to collapse a sequence of lists into a single list.
For example, [(1,&nbsp;2), (3,&nbsp;4)] becomes (1, 2, 3, 4).</p>
<div class="defn">
<p><b>Definition: Flatten</b></p>
<p>The Flatten(M) function takes a multiset of lists, M {(L<sub>1</sub>, L<sub>2</sub>,
...), ...}, and returns the multiset { x | L in M and x in L }.</p>
<p>The Flatten(S) function takes a sequence of lists, S = [(L<sub>1</sub>, L<sub>2</sub>,
...), ...], and returns the list ( x | L in S and x in L ).</p>
</div>
<section id="setFunctions">
<h5>Set Functions</h5>
<p>The set functions which underlie SPARQL aggregates all have a common signature:
SetFunc(M), or SetFunc(M, scalarvals) where M is a multiset of lists, and scalarvals is
SetFunc(S), or SetFunc(S, scalarvals) where S is a sequence of lists, and scalarvals is
one or more scalar values that are passed to the set function indirectly via the ( ...
; key=value ) syntax for aggregates in the SPARQL grammar. The only use of this that is
supported by the built-in aggregates in SPARQL Query 1.1 is <code>GROUP_CONCAT</code>,
as in <code>GROUP_CONCAT(?x ; separator=", ")</code>.</p>
<p>Note that the name "Set Function" is somewhat historical — the arguments to set
functions are in fact multisets. The name is retained due to the commonality with SQL
Set Functions, which also operate over multisets.</p>
functions are in fact sequences. The name is retained due to the commonality with SQL
Set Functions, which operate over multisets.</p>
<p>The set functions defined in this document are Count, Sum, Min, Max, Avg,
GroupConcat, and Sample — corresponding to the aggregates <code>COUNT</code>,
<code>SUM</code>, <code>MIN</code>, <code>MAX</code>, <code>AVG</code>,
Expand All @@ -9546,10 +9565,10 @@ <h5>Count</h5>
has a bound, non-error value within the aggregate group.</p>
<div class="defn">
<p><b>Definition: <span id="defn_aggCount">Count</span></b></p>
<pre class="code nohighlight">xsd:integer Count(multiset M)</pre>
<p>N = Flatten(M)</p>
<p>remove error elements from N</p>
<p>Count(M) = card[N]</p>
<pre class="code nohighlight">xsd:integer Count(sequence S)</pre>
<p>L = Flatten(S)</p>
<p>remove error elements from L</p>
<p>Count(S) = card[L]</p>
</div>
</section>
<section id="aggSum">
Expand All @@ -9561,13 +9580,14 @@ <h5>Sum</h5>
be 6.0 (float).</p>
<div class="defn">
<p><b>Definition: <span id="defn_aggSum">Sum</span></b></p>
<pre class="code nohighlight">numeric Sum(multiset M)</pre>
<p>Sum(M) = Sum(ToList(Flatten(M))).</p>
<p>Sum(S) = op:numeric-add(S<sub>1</sub>, Sum(S<sub>2..n</sub>)) when card[S] &gt;
<pre class="code nohighlight">numeric Sum(sequence S)</pre>
<p>L = Flatten(S)</p>
<p>Sum(S) = Sum(L)</p>
<p>Sum(L) = op:numeric-add(L<sub>1</sub>, Sum(L<sub>2..n</sub>)) when card[L] &gt;
1<br>
Sum(S) = op:numeric-add(S<sub>1</sub>, 0) when card[S] = 1<br>
Sum(S) = "0"^^xsd:integer when card[S] = 0</p>
<p>In this way, Sum({1, 2, 3}) = op:numeric-add(1, op:numeric-add(2,
Sum(L) = op:numeric-add(L<sub>1</sub>, 0) when card[L] = 1<br>
Sum(L) = "0"^^xsd:integer when card[L] = 0</p>
<p>In this way, Sum( (1, 2, 3) ) = op:numeric-add(1, op:numeric-add(2,
op:numeric-add(3, 0))).</p>
</div>
</section>
Expand All @@ -9577,11 +9597,11 @@ <h5>Avg</h5>
average value for an expression over a group. It is defined in terms of Sum and Count.
<div class="defn">
<p><b>Definition: <span id="defn_aggAvg">Avg</span></b></p>
<pre class="code nohighlight">numeric Avg(multiset M)</pre>
<p>Avg(M) = "0"^^xsd:integer, where Count(M) = 0</p>
<p>Avg(M) = Sum(M) / Count(M), where Count(M) &gt; 0</p>
<pre class="code nohighlight">numeric Avg(sequence S)</pre>
<p>Avg(S) = "0"^^xsd:integer, where Count(S) = 0</p>
<p>Avg(S) = Sum(S) / Count(S), where Count(S) &gt; 0</p>
</div>
<p>For example, Avg({1, 2, 3}) = Sum({1, 2, 3})/Count({1, 2, 3}) = 6/3 = 2.</p>
<p>For example, Avg([(1), (2), (3)]) = Sum([(1), (2), (3)])/Count([(1), (2), (3)]) = 6/3 = 2.</p>
</section>
<section id="aggMin">
<h5>Min</h5>
Expand All @@ -9591,12 +9611,12 @@ <h5>Min</h5>
arbitrarily typed expressions.</p>
<div class="defn">
<p><b>Definition: <span id="defn_aggMin">Min</span></b></p>
<pre class="code nohighlight">term Min(multiset M)</pre>
<p>Min(M) = Min(ToList(Flatten(M)))</p>
<p>Min({}) = error.</p>
<p>The flattened multiset of values passed as an argument is converted to a sequence
S, this sequence is ordered as per the <code>ORDER BY ASC</code> clause.</p>
<p>Min(S) = S<sub>0</sub></p>
<pre class="code nohighlight">term Min(sequence S)</pre>
<p>L = Flatten(S)</p>
<p>Min(S) = Min(L)</p>
<p>The flattened list L of values is ordered as per the <code>ORDER BY ASC</code> clause.</p>
<p>Min(L) = L<sub>0</sub> if card[L] > 0<br>
Min(L) = error if card[L] = 0</p>
</div>
</section>
<section id="aggMax">
Expand All @@ -9607,12 +9627,12 @@ <h5>Max</h5>
arbitrarily typed expressions.</p>
<div class="defn">
<p><b>Definition: <span id="defn_aggMax">Max</span></b></p>
<pre class="code nohighlight">term Max(multiset M)</pre>
<p>Max(M) = Max(ToList(Flatten(M)))</p>
<p>Max({}) = error.</p>
<p>The multiset of values passed as an argument is converted to a sequence S, this
sequence is ordered as per the <code>ORDER BY DESC</code> clause.</p>
<p>Max(S) = S<sub>0</sub></p>
<pre class="code nohighlight">term Max(sequence S)</pre>
<p>L = Flatten(S)</p>
<p>Max(S) = Max(L)</p>
<p>The flattened list L of values is ordered as per the <code>ORDER BY DESC</code> clause.</p>
<p>Max(L) = L<sub>0</sub> if card[L] > 0<br>
Max(L) = error if card[L] = 0</p>
</div>
</section>
<section id="aggGroupConcat">
Expand All @@ -9623,33 +9643,33 @@ <h5>GroupConcat</h5>
SEPARATOR.</p>
<div class="defn">
<p><b>Definition: <span id="defn_aggGroupConcat">GroupConcat</span></b></p>
<pre class="code nohighlight">literal GroupConcat(multiset M)</pre>
<pre class="code nohighlight">literal GroupConcat(sequence S)</pre>
<p>If the "separator" scalar argument is absent from GROUP_CONCAT then it is taken to
be the "space" character, unicode codepoint U+0020.</p>
<p>The multiset of values, M passed as an argument is converted to a sequence S.</p>
<p>GroupConcat(M, scalarvals) = GroupConcat(Flatten(M), scalarvals("separator"))</p>
<p>GroupConcat(S, sep) = "", where <span style=
"font-size: 140%">|</span>S<span style="font-size: 140%">|</span> = 0</p>
<p>GroupConcat(S, sep) = CONCAT("", S<sub>0</sub>), where
<span style="font-size: 140%">|</span>S<span style="font-size: 140%">|</span> = 1</p>
<p>GroupConcat(S, sep) = CONCAT(S<sub>0</sub>, sep, GroupConcat(S<sub>1..n-1</sub>,
sep)), where <span style="font-size: 140%">|</span>S<span style="font-size: 140%">|</span> &gt; 1</p>
</div>
<p>For example, GroupConcat({"a", "b", "c"}, {"separator" → "."}) = "a.b.c".</p>
<p>L = Flatten(S)</p>
<p>GroupConcat(S, scalarvals) = GroupConcat(L, scalarvals("separator"))</p>
<p>GroupConcat(L, sep) = "", where <span style=
"font-size: 140%">|</span>L<span style="font-size: 140%">|</span> = 0</p>
<p>GroupConcat(L, sep) = CONCAT("", L<sub>0</sub>), where
<span style="font-size: 140%">|</span>L<span style="font-size: 140%">|</span> = 1</p>
<p>GroupConcat(L, sep) = CONCAT(L<sub>0</sub>, sep, GroupConcat(L<sub>1..n-1</sub>,
sep)), where <span style="font-size: 140%">|</span>L<span style="font-size: 140%">|</span> &gt; 1</p>
</div>
<p>For example, GroupConcat([("a"), ("b"), ("c")], {"separator" → "."}) = "a.b.c".</p>
</section>
<section id="aggSample">
<h5>Sample</h5>
<p>Sample is a set function which returns an arbitrary value from the multiset passed
<p>Sample is a set function which returns an arbitrary value from the sequence passed
to it.</p>
<div class="defn">
<p><b>Definition: <span id="defn_aggSample">Sample</span></b></p>
<pre class="code nohighlight">RDFTerm Sample(multiset M)</pre>
<p>Sample(M) = v, where v in Flatten(M)</p>
<p>Sample({}) = error</p>
<pre class="code nohighlight">RDFTerm Sample(sequence S)</pre>
<p>Sample(S) = v, where v in Flatten(S)</p>
<p>Sample([]) = error</p>
</div>
<p>For example, given Sample({"a", "b", "c"}), "a", "b", and "c" are all valid return
<p>For example, given Sample([("a"), ("b"), ("c")]), "a", "b", and "c" are all valid return
values. Note that Sample() is not required to be deterministic for a given input, the
only restriction is that the output value must be present in the input multiset.</p>
only restriction is that the output value must be present in the input sequence.</p>
</section>
</section>
<section id="sparqlAlgebraEval">
Expand Down

0 comments on commit ae554d5

Please sign in to comment.