diff --git a/spec/index.html b/spec/index.html index a12589e..d3386a9 100644 --- a/spec/index.html +++ b/spec/index.html @@ -8745,7 +8745,11 @@
Step: GROUP BY
If the GROUP BY
keyword is used, or there is implicit grouping due to the
use of aggregates in the projection, then grouping is performed by the
- Group function. It divides the solution set into groups of one or
+ Group function.
+ In this case, before grouping, the solution set is converted into a solution
+ sequence by applying the ToList function.
+ Next, the Group function
+ divides this solution sequence into groups of one or
more solutions, with the same overall cardinality. In case of implicit grouping, a fixed
constant (1) is used to group all solutions into a single group.
Step: Aggregates
@@ -8765,9 +8769,9 @@Group evaluates a list of expressions against a solution sequence, producing a set +
Group evaluates a list of expressions against a solution sequence Ψ, producing a set of partial functions from keys to solution sequences.
-Group(exprlist, Ω) = { ListEval(exprlist, μ) → { μ' | μ' in Ω, ListEval(exprlist, μ) - = ListEval(exprlist, μ') } | μ in Ω }
+Group(exprlist, Ψ) = { ListEval(exprlist, μ) → [ μ' | μ' in Ψ, ListEval(exprlist, μ) + = ListEval(exprlist, μ') ] | μ in Ψ }
Definition: ListEval
@@ -9441,22 +9445,37 @@Let exprlist be a list of expressions or *, func a set function, scalarvals a set of partial functions (possibly empty) passed from the aggregate - in the query, and let { key1→Ω1, ..., - keym→Ωm } be a multiset of partial functions from keys to + in the query, and let { key1→Ψ1, ..., + keym→Ψm } be a set of partial functions from keys to solution sequences as produced by the grouping step.
-Aggregation applies the set function func to the given multiset and produces a - single value for each key and partition of solutions for that key.
-Aggregation(exprlist, func, scalarvals, { key1→Ω1, ...,
- keym→Ωm } )
- = { (key, F(Ω)) | key → Ω in { key1→Ω1, ...,
- keym→Ωm } }
Aggregation applies the set function func to the given set and produces a + single value for each key and group of solutions for that key.
+Aggregation(exprlist, func, scalarvals, { key1→Ψ1, ...,
+ keym→Ψm } )
+ = { (key, F(Ψ)) | key → Ψ in { key1→Ψ1, ...,
+ keym→Ψm } }
where
- M(Ω) = { ListEval(exprlist, μ) | μ in Ω }
- F(Ω) = func(M(Ω), scalarvals), for non-DISTINCT
- F(Ω) = func(Distinct(M(Ω)), scalarvals), for DISTINCT
DISTINCT
DISTINCT
+ with Dedup(M(Ψ)) being an order-preserving, duplicate-free version of the sequence M(Ψ); that is, Dedup(M(Ψ)) is a sequence of RDF terms that has the following four properties.
+Special Case: when COUNT
is used with the expression
*
the value of F will be the cardinality of the group solution sequence,
- card[Ω]
, or card[Distinct(Ω)]
if the DISTINCT
+ card[Ψ]
, or card[Dedup(Ψ)]
if the DISTINCT
keyword is present.
scalarvals are used to pass values to the underlying set function, bypassing @@ -9466,7 +9485,7 @@
All aggregates may have the DISTINCT
keyword as the first token in their
argument list. If this keyword is present then first argument to func is Distinct(M).
Example
-Given a solution multiset (Ω) with the following values:
+Given a solution sequence Ψ with the following values:
And the query expression SELECT (ex:agg(?y, ?z) AS ?agg) WHERE { ?x ?y ?z } GROUP BY ?x.
-We produce G = Group((?x), Ω) = { ( (1), { μ1, μ2 } ), ( (2), { - μ3 } ) }
+We produce G = Group((?x), Ψ) = { (1) → [μ1, μ2], (2) → + [μ3] }
And so Aggregation((?y, ?z), ex:agg, {}, G) =
- { ((1), eg:agg({(2, 3), (3, 4)}, {})), ((2), eg:agg({(5, 6)}, {})) }.
Definition: AggregateJoin
Let S1, ..., Sn be a list of sets, where each set @@ -9511,24 +9530,24 @@
Flatten is a function which is used to collapse multisets of lists into a multiset, so - for example { (1, 2), (3, 4) } becomes { 1, 2, 3, 4 }.
+Flatten is a function which is used to collapse a sequence of lists into a single list. + For example, [(1, 2), (3, 4)] becomes (1, 2, 3, 4).
Definition: Flatten
-The Flatten(M) function takes a multiset of lists, M {(L1, L2, - ...), ...}, and returns the multiset { x | L in M and x in L }.
+The Flatten(S) function takes a sequence of lists, S = [(L1, L2, + ...), ...], and returns the list ( x | L in S and x in L ).
The set functions which underlie SPARQL aggregates all have a common signature:
- SetFunc(M), or SetFunc(M, scalarvals) where M is a multiset of lists, and scalarvals is
+ SetFunc(S), or SetFunc(S, scalarvals) where S is a sequence of lists, and scalarvals is
one or more scalar values that are passed to the set function indirectly via the ( ...
; key=value ) syntax for aggregates in the SPARQL grammar. The only use of this that is
supported by the built-in aggregates in SPARQL Query 1.1 is GROUP_CONCAT
,
as in GROUP_CONCAT(?x ; separator=", ")
.
Note that the name "Set Function" is somewhat historical — the arguments to set - functions are in fact multisets. The name is retained due to the commonality with SQL - Set Functions, which also operate over multisets.
+ functions are in fact sequences. The name is retained due to the commonality with SQL + Set Functions, which operate over multisets.The set functions defined in this document are Count, Sum, Min, Max, Avg,
GroupConcat, and Sample — corresponding to the aggregates COUNT
,
SUM
, MIN
, MAX
, AVG
,
@@ -9546,10 +9565,10 @@
Definition: Count
-xsd:integer Count(multiset M)-
N = Flatten(M)
-remove error elements from N
-Count(M) = card[N]
+xsd:integer Count(sequence S)+
L = Flatten(S)
+remove error elements from L
+Count(S) = card[L]
Definition: Sum
-numeric Sum(multiset M)-
Sum(M) = Sum(ToList(Flatten(M))).
-Sum(S) = op:numeric-add(S1, Sum(S2..n)) when card[S] > +
numeric Sum(sequence S)+
L = Flatten(S)
+Sum(S) = Sum(L)
+Sum(L) = op:numeric-add(L1, Sum(L2..n)) when card[L] >
1
- Sum(S) = op:numeric-add(S1, 0) when card[S] = 1
- Sum(S) = "0"^^xsd:integer when card[S] = 0
In this way, Sum({1, 2, 3}) = op:numeric-add(1, op:numeric-add(2,
+ Sum(L) = op:numeric-add(L1, 0) when card[L] = 1
+ Sum(L) = "0"^^xsd:integer when card[L] = 0
In this way, Sum( (1, 2, 3) ) = op:numeric-add(1, op:numeric-add(2, op:numeric-add(3, 0))).
Definition: Avg
-numeric Avg(multiset M)-
Avg(M) = "0"^^xsd:integer, where Count(M) = 0
-Avg(M) = Sum(M) / Count(M), where Count(M) > 0
+numeric Avg(sequence S)+
Avg(S) = "0"^^xsd:integer, where Count(S) = 0
+Avg(S) = Sum(S) / Count(S), where Count(S) > 0
For example, Avg({1, 2, 3}) = Sum({1, 2, 3})/Count({1, 2, 3}) = 6/3 = 2.
+For example, Avg([(1), (2), (3)]) = Sum([(1), (2), (3)])/Count([(1), (2), (3)]) = 6/3 = 2.
Definition: Min
-term Min(multiset M)-
Min(M) = Min(ToList(Flatten(M)))
-Min({}) = error.
-The flattened multiset of values passed as an argument is converted to a sequence
- S, this sequence is ordered as per the ORDER BY ASC
clause.
Min(S) = S0
+term Min(sequence S)+
L = Flatten(S)
+Min(S) = Min(L)
+The flattened list L of values is ordered as per the ORDER BY ASC
clause.
Min(L) = L0 if card[L] > 0
+ Min(L) = error if card[L] = 0
Definition: Max
-term Max(multiset M)-
Max(M) = Max(ToList(Flatten(M)))
-Max({}) = error.
-The multiset of values passed as an argument is converted to a sequence S, this
- sequence is ordered as per the ORDER BY DESC
clause.
Max(S) = S0
+term Max(sequence S)+
L = Flatten(S)
+Max(S) = Max(L)
+The flattened list L of values is ordered as per the ORDER BY DESC
clause.
Max(L) = L0 if card[L] > 0
+ Max(L) = error if card[L] = 0
Definition: GroupConcat
-literal GroupConcat(multiset M)+
literal GroupConcat(sequence S)
If the "separator" scalar argument is absent from GROUP_CONCAT then it is taken to be the "space" character, unicode codepoint U+0020.
-The multiset of values, M passed as an argument is converted to a sequence S.
-GroupConcat(M, scalarvals) = GroupConcat(Flatten(M), scalarvals("separator"))
-GroupConcat(S, sep) = "", where |S| = 0
-GroupConcat(S, sep) = CONCAT("", S0), where - |S| = 1
-GroupConcat(S, sep) = CONCAT(S0, sep, GroupConcat(S1..n-1, - sep)), where |S| > 1
-For example, GroupConcat({"a", "b", "c"}, {"separator" → "."}) = "a.b.c".
+L = Flatten(S)
+GroupConcat(S, scalarvals) = GroupConcat(L, scalarvals("separator"))
+GroupConcat(L, sep) = "", where |L| = 0
+GroupConcat(L, sep) = CONCAT("", L0), where + |L| = 1
+GroupConcat(L, sep) = CONCAT(L0, sep, GroupConcat(L1..n-1, + sep)), where |L| > 1
+ +For example, GroupConcat([("a"), ("b"), ("c")], {"separator" → "."}) = "a.b.c".
Sample is a set function which returns an arbitrary value from the multiset passed +
Sample is a set function which returns an arbitrary value from the sequence passed to it.
Definition: Sample
-RDFTerm Sample(multiset M)-
Sample(M) = v, where v in Flatten(M)
-Sample({}) = error
+RDFTerm Sample(sequence S)+
Sample(S) = v, where v in Flatten(S)
+Sample([]) = error
For example, given Sample({"a", "b", "c"}), "a", "b", and "c" are all valid return +
For example, given Sample([("a"), ("b"), ("c")]), "a", "b", and "c" are all valid return values. Note that Sample() is not required to be deterministic for a given input, the - only restriction is that the output value must be present in the input multiset.
+ only restriction is that the output value must be present in the input sequence.