diff --git a/spec/index.html b/spec/index.html index a12589e..d3386a9 100644 --- a/spec/index.html +++ b/spec/index.html @@ -8745,7 +8745,11 @@
Grouping and Aggregation

Step: GROUP BY

If the GROUP BY keyword is used, or there is implicit grouping due to the use of aggregates in the projection, then grouping is performed by the - Group function. It divides the solution set into groups of one or + Group function. + In this case, before grouping, the solution set is converted into a solution + sequence by applying the ToList function. + Next, the Group function + divides this solution sequence into groups of one or more solutions, with the same overall cardinality. In case of implicit grouping, a fixed constant (1) is used to group all solutions into a single group.

Step: Aggregates

@@ -8765,9 +8769,9 @@
Grouping and Aggregation
Let E := [], a list of pairs of the form (variable, expression) If Q contains GROUP BY exprlist - Let G := Group(exprlist, P) + Let G := Group(exprlist, ToList(P)) Else If Q contains an aggregate in SELECT, HAVING, ORDER BY - Let G := Group((1), P) + Let G := Group((1), ToList(P)) Else skip the rest of the aggregate step End @@ -9415,10 +9419,10 @@

Aggregate Algebra

Definition: Group
-

Group evaluates a list of expressions against a solution sequence, producing a set +

Group evaluates a list of expressions against a solution sequence Ψ, producing a set of partial functions from keys to solution sequences.

-

Group(exprlist, Ω) = { ListEval(exprlist, μ) → { μ' | μ' in Ω, ListEval(exprlist, μ) - = ListEval(exprlist, μ') } | μ in Ω }

+

Group(exprlist, Ψ) = { ListEval(exprlist, μ) → [ μ' | μ' in Ψ, ListEval(exprlist, μ) + = ListEval(exprlist, μ') ] | μ in Ψ }

Definition: ListEval

@@ -9441,22 +9445,37 @@

Aggregate Algebra

Let exprlist be a list of expressions or *, func a set function, scalarvals a set of partial functions (possibly empty) passed from the aggregate - in the query, and let { key1→Ω1, ..., - keym→Ωm } be a multiset of partial functions from keys to + in the query, and let { key1→Ψ1, ..., + keym→Ψm } be a set of partial functions from keys to solution sequences as produced by the grouping step.

-

Aggregation applies the set function func to the given multiset and produces a - single value for each key and partition of solutions for that key.

-

Aggregation(exprlist, func, scalarvals, { key1→Ω1, ..., - keym→Ωm } )
-    = { (key, F(Ω)) | key → Ω in { key1→Ω1, ..., - keym→Ωm } }

+

Aggregation applies the set function func to the given set and produces a + single value for each key and group of solutions for that key.

+

Aggregation(exprlist, func, scalarvals, { key1→Ψ1, ..., + keym→Ψm } )
+    = { (key, F(Ψ)) | key → Ψ in { key1→Ψ1, ..., + keym→Ψm } }

where
-   M(Ω) = { ListEval(exprlist, μ) | μ in Ω }
-   F(Ω) = func(M(Ω), scalarvals), for non-DISTINCT
-   F(Ω) = func(Distinct(M(Ω)), scalarvals), for DISTINCT

+   M(Ψ) = [ ListEval(exprlist, μ) | μ in Ψ ]
+   F(Ψ) = func(M(Ψ), scalarvals), for non-DISTINCT
+   F(Ψ) = func(Dedup(M(Ψ)), scalarvals), for DISTINCT

+

with Dedup(M(Ψ)) being an order-preserving, duplicate-free version of the sequence M(Ψ); that is, Dedup(M(Ψ)) is a sequence of RDF terms that has the following four properties.

+
    +
  1. Every unique element in M(Ψ) is contained in Dedup(M(Ψ)).
  2. +
  3. Every element in Dedup(M(Ψ)) is contained in M(Ψ).
  4. +
  5. Dedup(M(Ψ)) is free of duplicates. That is, the element at the |i|-th position in Dedup(M(Ψ)) is not the same term as the element at the |j|-th position in Dedup(M(Ψ)) for every two natural numbers |i| and |j| such that |i| ≠ |j|.
  6. +
  7. For any two elements e1 and e2 in Dedup(M(Ψ)), the relative order of their first occurrences in M(Ψ) is preserved in Dedup(M(Ψ)). That is, if i1 < i2, then j1 < j2, where + +
  8. +
+

Special Case: when COUNT is used with the expression * the value of F will be the cardinality of the group solution sequence, - card[Ω], or card[Distinct(Ω)] if the DISTINCT + card[Ψ], or card[Dedup(Ψ)] if the DISTINCT keyword is present.

scalarvals are used to pass values to the underlying set function, bypassing @@ -9466,7 +9485,7 @@

Aggregate Algebra

All aggregates may have the DISTINCT keyword as the first token in their argument list. If this keyword is present then first argument to func is Distinct(M).

Example

-

Given a solution multiset (Ω) with the following values:

+

Given a solution sequence Ψ with the following values:

@@ -9497,10 +9516,10 @@

Aggregate Algebra

And the query expression SELECT (ex:agg(?y, ?z) AS ?agg) WHERE { ?x ?y ?z } GROUP BY ?x.

-

We produce G = Group((?x), Ω) = { ( (1), { μ1, μ2 } ), ( (2), { - μ3 } ) }

+

We produce G = Group((?x), Ψ) = { (1) → [μ1, μ2], (2) → + [μ3] }

And so Aggregation((?y, ?z), ex:agg, {}, G) =
- { ((1), eg:agg({(2, 3), (3, 4)}, {})), ((2), eg:agg({(5, 6)}, {})) }.

+ { ((1), eg:agg([(2, 3), (3, 4)], {})), ((2), eg:agg([(5, 6)], {})) }.

Definition: AggregateJoin

Let S1, ..., Sn be a list of sets, where each set @@ -9511,24 +9530,24 @@

Aggregate Algebra

..., aggn→valn | key in K and key→vali in Si for each 1 <= i <= n }

-

Flatten is a function which is used to collapse multisets of lists into a multiset, so - for example { (1, 2), (3, 4) } becomes { 1, 2, 3, 4 }.

+

Flatten is a function which is used to collapse a sequence of lists into a single list. + For example, [(1, 2), (3, 4)] becomes (1, 2, 3, 4).

Definition: Flatten

-

The Flatten(M) function takes a multiset of lists, M {(L1, L2, - ...), ...}, and returns the multiset { x | L in M and x in L }.

+

The Flatten(S) function takes a sequence of lists, S = [(L1, L2, + ...), ...], and returns the list ( x | L in S and x in L ).

Set Functions

The set functions which underlie SPARQL aggregates all have a common signature: - SetFunc(M), or SetFunc(M, scalarvals) where M is a multiset of lists, and scalarvals is + SetFunc(S), or SetFunc(S, scalarvals) where S is a sequence of lists, and scalarvals is one or more scalar values that are passed to the set function indirectly via the ( ... ; key=value ) syntax for aggregates in the SPARQL grammar. The only use of this that is supported by the built-in aggregates in SPARQL Query 1.1 is GROUP_CONCAT, as in GROUP_CONCAT(?x ; separator=", ").

Note that the name "Set Function" is somewhat historical — the arguments to set - functions are in fact multisets. The name is retained due to the commonality with SQL - Set Functions, which also operate over multisets.

+ functions are in fact sequences. The name is retained due to the commonality with SQL + Set Functions, which operate over multisets.

The set functions defined in this document are Count, Sum, Min, Max, Avg, GroupConcat, and Sample — corresponding to the aggregates COUNT, SUM, MIN, MAX, AVG, @@ -9546,10 +9565,10 @@

Count
has a bound, non-error value within the aggregate group.

Definition: Count

-
xsd:integer Count(multiset M)
-

N = Flatten(M)

-

remove error elements from N

-

Count(M) = card[N]

+
xsd:integer Count(sequence S)
+

L = Flatten(S)

+

remove error elements from L

+

Count(S) = card[L]

@@ -9561,13 +9580,14 @@
Sum
be 6.0 (float).

Definition: Sum

-
numeric Sum(multiset M)
-

Sum(M) = Sum(ToList(Flatten(M))).

-

Sum(S) = op:numeric-add(S1, Sum(S2..n)) when card[S] > +

numeric Sum(sequence S)
+

L = Flatten(S)

+

Sum(S) = Sum(L)

+

Sum(L) = op:numeric-add(L1, Sum(L2..n)) when card[L] > 1
- Sum(S) = op:numeric-add(S1, 0) when card[S] = 1
- Sum(S) = "0"^^xsd:integer when card[S] = 0

-

In this way, Sum({1, 2, 3}) = op:numeric-add(1, op:numeric-add(2, + Sum(L) = op:numeric-add(L1, 0) when card[L] = 1
+ Sum(L) = "0"^^xsd:integer when card[L] = 0

+

In this way, Sum( (1, 2, 3) ) = op:numeric-add(1, op:numeric-add(2, op:numeric-add(3, 0))).

@@ -9577,11 +9597,11 @@
Avg
average value for an expression over a group. It is defined in terms of Sum and Count.

Definition: Avg

-
numeric Avg(multiset M)
-

Avg(M) = "0"^^xsd:integer, where Count(M) = 0

-

Avg(M) = Sum(M) / Count(M), where Count(M) > 0

+
numeric Avg(sequence S)
+

Avg(S) = "0"^^xsd:integer, where Count(S) = 0

+

Avg(S) = Sum(S) / Count(S), where Count(S) > 0

-

For example, Avg({1, 2, 3}) = Sum({1, 2, 3})/Count({1, 2, 3}) = 6/3 = 2.

+

For example, Avg([(1), (2), (3)]) = Sum([(1), (2), (3)])/Count([(1), (2), (3)]) = 6/3 = 2.

Min
@@ -9591,12 +9611,12 @@
Min
arbitrarily typed expressions.

Definition: Min

-
term Min(multiset M)
-

Min(M) = Min(ToList(Flatten(M)))

-

Min({}) = error.

-

The flattened multiset of values passed as an argument is converted to a sequence - S, this sequence is ordered as per the ORDER BY ASC clause.

-

Min(S) = S0

+
term Min(sequence S)
+

L = Flatten(S)

+

Min(S) = Min(L)

+

The flattened list L of values is ordered as per the ORDER BY ASC clause.

+

Min(L) = L0 if card[L] > 0
+ Min(L) = error if card[L] = 0

@@ -9607,12 +9627,12 @@
Max
arbitrarily typed expressions.

Definition: Max

-
term Max(multiset M)
-

Max(M) = Max(ToList(Flatten(M)))

-

Max({}) = error.

-

The multiset of values passed as an argument is converted to a sequence S, this - sequence is ordered as per the ORDER BY DESC clause.

-

Max(S) = S0

+
term Max(sequence S)
+

L = Flatten(S)

+

Max(S) = Max(L)

+

The flattened list L of values is ordered as per the ORDER BY DESC clause.

+

Max(L) = L0 if card[L] > 0
+ Max(L) = error if card[L] = 0

@@ -9623,33 +9643,33 @@
GroupConcat
SEPARATOR.

Definition: GroupConcat

-
literal GroupConcat(multiset M)
+
literal GroupConcat(sequence S)

If the "separator" scalar argument is absent from GROUP_CONCAT then it is taken to be the "space" character, unicode codepoint U+0020.

-

The multiset of values, M passed as an argument is converted to a sequence S.

-

GroupConcat(M, scalarvals) = GroupConcat(Flatten(M), scalarvals("separator"))

-

GroupConcat(S, sep) = "", where |S| = 0

-

GroupConcat(S, sep) = CONCAT("", S0), where - |S| = 1

-

GroupConcat(S, sep) = CONCAT(S0, sep, GroupConcat(S1..n-1, - sep)), where |S| > 1

-
-

For example, GroupConcat({"a", "b", "c"}, {"separator" → "."}) = "a.b.c".

+

L = Flatten(S)

+

GroupConcat(S, scalarvals) = GroupConcat(L, scalarvals("separator"))

+

GroupConcat(L, sep) = "", where |L| = 0

+

GroupConcat(L, sep) = CONCAT("", L0), where + |L| = 1

+

GroupConcat(L, sep) = CONCAT(L0, sep, GroupConcat(L1..n-1, + sep)), where |L| > 1

+ +

For example, GroupConcat([("a"), ("b"), ("c")], {"separator" → "."}) = "a.b.c".

Sample
-

Sample is a set function which returns an arbitrary value from the multiset passed +

Sample is a set function which returns an arbitrary value from the sequence passed to it.

Definition: Sample

-
RDFTerm Sample(multiset M)
-

Sample(M) = v, where v in Flatten(M)

-

Sample({}) = error

+
RDFTerm Sample(sequence S)
+

Sample(S) = v, where v in Flatten(S)

+

Sample([]) = error

-

For example, given Sample({"a", "b", "c"}), "a", "b", and "c" are all valid return +

For example, given Sample([("a"), ("b"), ("c")]), "a", "b", and "c" are all valid return values. Note that Sample() is not required to be deterministic for a given input, the - only restriction is that the output value must be present in the input multiset.

+ only restriction is that the output value must be present in the input sequence.