From ddf420686a437c905b3867ec9cb3c2214db7405b Mon Sep 17 00:00:00 2001
From: Olaf Hartig Step: GROUP BY If the Step: Aggregates Group evaluates a list of expressions against a solution sequence, producing a set
+ Group evaluates a list of expressions against a solution sequence Ψ, producing a set
of partial functions from keys to solution sequences. Group(exprlist, Ω) = { ListEval(exprlist, μ) → { μ' | μ' in Ω, ListEval(exprlist, μ)
- = ListEval(exprlist, μ') } | μ in Ω } Group(exprlist, Ψ) = { ListEval(exprlist, μ) → [ μ' | μ' in Ψ, ListEval(exprlist, μ)
+ = ListEval(exprlist, μ') ] | μ in Ψ } Definition: ListEval Let exprlist be a list of expressions or *, func a set function,
scalarvals a set of partial functions (possibly empty) passed from the aggregate
- in the query, and let { key1→Ω1, ...,
- keym→Ωm } be a multiset of partial functions from keys to
+ in the query, and let { key1→Ψ1, ...,
+ keym→Ψm } be a set of partial functions from keys to
solution sequences as produced by the grouping step. Aggregation applies the set function func to the given multiset and produces a
- single value for each key and partition of solutions for that key. Aggregation(exprlist, func, scalarvals, { key1→Ω1, ...,
- keym→Ωm } ) Aggregation applies the set function func to the given set and produces a
+ single value for each key and group of solutions for that key. Aggregation(exprlist, func, scalarvals, { key1→Ψ1, ...,
+ keym→Ψm } ) whereGrouping and Aggregation
GROUP BY
keyword is used, or there is implicit grouping due to the
use of aggregates in the projection, then grouping is performed by the
- Group function. It divides the solution set into groups of one or
+ Group function.
+ In this case, before grouping, the solution set is converted into a solution
+ sequence by applying the ToList function.
+ Next, the Group function
+ divides this solution sequence into groups of one or
more solutions, with the same overall cardinality. In case of implicit grouping, a fixed
constant (1) is used to group all solutions into a single group.Grouping and Aggregation
Let E := [], a list of pairs of the form (variable, expression)
If Q contains GROUP BY exprlist
- Let G := Group(exprlist, P)
+ Let G := Group(exprlist, ToList(P))
Else If Q contains an aggregate in SELECT, HAVING, ORDER BY
- Let G := Group((1), P)
+ Let G := Group((1), ToList(P))
Else
skip the rest of the aggregate step
End
@@ -9415,10 +9419,10 @@ Aggregate Algebra
Aggregate Algebra
- = { (key, F(Ω)) | key → Ω in { key1→Ω1, ...,
- keym→Ωm } }
+ = { (key, F(Ψ)) | key → Ψ in { key1→Ψ1, ...,
+ keym→Ψm } }
- M(Ω) = { ListEval(exprlist, μ) | μ in Ω }
- F(Ω) = func(M(Ω), scalarvals), for non-DISTINCT
- F(Ω) = func(Distinct(M(Ω)), scalarvals), for DISTINCT
+ F(Ψ) = func(M(Ψ), scalarvals), for non-DISTINCT
+ F(Ψ) = func(Distinct(M(Ψ)), scalarvals), for DISTINCT
Special Case: when COUNT
is used with the expression
*
the value of F will be the cardinality of the group solution sequence,
- card[Ω]
, or card[Distinct(Ω)]
if the DISTINCT
+ card[Ψ]
, or card[Distinct(Ψ)]
if the DISTINCT
keyword is present.
scalarvals are used to pass values to the underlying set function, bypassing @@ -9466,7 +9470,7 @@
All aggregates may have the DISTINCT
keyword as the first token in their
argument list. If this keyword is present then first argument to func is Distinct(M).
Example
-Given a solution multiset (Ω) with the following values:
+Given a solution sequence Ψ with the following values:
And the query expression SELECT (ex:agg(?y, ?z) AS ?agg) WHERE { ?x ?y ?z } GROUP BY ?x.
-We produce G = Group((?x), Ω) = { ( (1), { μ1, μ2 } ), ( (2), { - μ3 } ) }
+We produce G = Group((?x), Ψ) = { (1) → [μ1, μ2], (2) → + [μ3] }
And so Aggregation((?y, ?z), ex:agg, {}, G) =
- { ((1), eg:agg({(2, 3), (3, 4)}, {})), ((2), eg:agg({(5, 6)}, {})) }.
Definition: AggregateJoin
Let S1, ..., Sn be a list of sets, where each set
From c2fe1ccc91b2687ff79f367db45b3de648ae6431 Mon Sep 17 00:00:00 2001
From: Olaf Hartig SPARQL Algebra
Definition: Distinct
-Let Ψ be a sequence of solution mappings. We define:
-Distinct(Ψ) = [ μ | μ in Ψ ]
-card[Distinct(Ψ)](μ) = 1
-The order of Distinct(Ψ) must preserve any ordering given by OrderBy.
+Let Ψ be a sequence of elements which may be either solution mappings or lists of RDF terms. We define:
+Distinct(Ψ) = [ e | e in Ψ ]
+card[Distinct(Ψ)](e) = 1
+The order of Distinct(Ψ) must preserve any ordering given by OrderBy (if any).
Definition: Reduced
From e95109b185c5c977620877df0b85a4d44e6bd880 Mon Sep 17 00:00:00 2001 From: Olaf HartigFlatten is a function which is used to collapse multisets of lists into a multiset, so - for example { (1, 2), (3, 4) } becomes { 1, 2, 3, 4 }.
+Flatten is a function which is used to collapse a sequence of lists into a single list. + For example, [(1, 2), (3, 4)] becomes (1, 2, 3, 4).
Definition: Flatten
-The Flatten(M) function takes a multiset of lists, M {(L1, L2, - ...), ...}, and returns the multiset { x | L in M and x in L }.
+The Flatten(S) function takes a sequence of lists, S = [(L1, L2, + ...), ...], and returns the list ( x | L in S and x in L ).
The set functions which underlie SPARQL aggregates all have a common signature:
- SetFunc(M), or SetFunc(M, scalarvals) where M is a multiset of lists, and scalarvals is
+ SetFunc(S), or SetFunc(S, scalarvals) where S is a sequence of lists, and scalarvals is
one or more scalar values that are passed to the set function indirectly via the ( ...
; key=value ) syntax for aggregates in the SPARQL grammar. The only use of this that is
supported by the built-in aggregates in SPARQL Query 1.1 is GROUP_CONCAT
,
as in GROUP_CONCAT(?x ; separator=", ")
.
Note that the name "Set Function" is somewhat historical — the arguments to set - functions are in fact multisets. The name is retained due to the commonality with SQL - Set Functions, which also operate over multisets.
+ functions are in fact sequences. The name is retained due to the commonality with SQL + Set Functions, which operate over multisets.The set functions defined in this document are Count, Sum, Min, Max, Avg,
GroupConcat, and Sample — corresponding to the aggregates COUNT
,
SUM
, MIN
, MAX
, AVG
,
@@ -9550,10 +9550,10 @@
Definition: Count
-xsd:integer Count(multiset M)-
N = Flatten(M)
-remove error elements from N
-Count(M) = card[N]
+xsd:integer Count(sequence S)+
L = Flatten(S)
+remove error elements from L
+Count(S) = card[L]
Definition: Sum
-numeric Sum(multiset M)-
Sum(M) = Sum(ToList(Flatten(M))).
-Sum(S) = op:numeric-add(S1, Sum(S2..n)) when card[S] > +
numeric Sum(sequence S)+
L = Flatten(S)
+Sum(S) = Sum(L)
+Sum(L) = op:numeric-add(L1, Sum(L2..n)) when card[L] >
1
- Sum(S) = op:numeric-add(S1, 0) when card[S] = 1
- Sum(S) = "0"^^xsd:integer when card[S] = 0
In this way, Sum({1, 2, 3}) = op:numeric-add(1, op:numeric-add(2,
+ Sum(L) = op:numeric-add(L1, 0) when card[L] = 1
+ Sum(L) = "0"^^xsd:integer when card[L] = 0
In this way, Sum( (1, 2, 3) ) = op:numeric-add(1, op:numeric-add(2, op:numeric-add(3, 0))).
Definition: Avg
-numeric Avg(multiset M)-
Avg(M) = "0"^^xsd:integer, where Count(M) = 0
-Avg(M) = Sum(M) / Count(M), where Count(M) > 0
+numeric Avg(sequence S)+
Avg(S) = "0"^^xsd:integer, where Count(S) = 0
+Avg(S) = Sum(S) / Count(S), where Count(S) > 0
For example, Avg({1, 2, 3}) = Sum({1, 2, 3})/Count({1, 2, 3}) = 6/3 = 2.
+For example, Avg([(1), (2), (3)]) = Sum([(1), (2), (3)])/Count([(1), (2), (3)]) = 6/3 = 2.
Definition: Min
-term Min(multiset M)-
Min(M) = Min(ToList(Flatten(M)))
-Min({}) = error.
-The flattened multiset of values passed as an argument is converted to a sequence
- S, this sequence is ordered as per the ORDER BY ASC
clause.
Min(S) = S0
+term Min(sequence S)+
L = Flatten(S)
+Min(S) = Min(L)
+The flattened list L of values is ordered as per the ORDER BY ASC
clause.
Min(L) = L0 if card[L] > 0
+ Min(L) = error if card[L] = 0
Definition: Max
-term Max(multiset M)-
Max(M) = Max(ToList(Flatten(M)))
-Max({}) = error.
-The multiset of values passed as an argument is converted to a sequence S, this
- sequence is ordered as per the ORDER BY DESC
clause.
Max(S) = S0
+term Max(sequence S)+
L = Flatten(S)
+Max(S) = Max(L)
+The flattened list L of values is ordered as per the ORDER BY DESC
clause.
Max(L) = L0 if card[L] > 0
+ Max(L) = error if card[L] = 0
Definition: GroupConcat
-literal GroupConcat(multiset M)+
literal GroupConcat(sequence S)
If the "separator" scalar argument is absent from GROUP_CONCAT then it is taken to be the "space" character, unicode codepoint U+0020.
-The multiset of values, M passed as an argument is converted to a sequence S.
-GroupConcat(M, scalarvals) = GroupConcat(Flatten(M), scalarvals("separator"))
-GroupConcat(S, sep) = "", where |S| = 0
-GroupConcat(S, sep) = CONCAT("", S0), where - |S| = 1
-GroupConcat(S, sep) = CONCAT(S0, sep, GroupConcat(S1..n-1, - sep)), where |S| > 1
-For example, GroupConcat({"a", "b", "c"}, {"separator" → "."}) = "a.b.c".
+L = Flatten(S)
+GroupConcat(S, scalarvals) = GroupConcat(L, scalarvals("separator"))
+GroupConcat(L, sep) = "", where |L| = 0
+GroupConcat(L, sep) = CONCAT("", L0), where + |L| = 1
+GroupConcat(L, sep) = CONCAT(L0, sep, GroupConcat(L1..n-1, + sep)), where |L| > 1
+ +For example, GroupConcat([("a"), ("b"), ("c")], {"separator" → "."}) = "a.b.c".
Sample is a set function which returns an arbitrary value from the multiset passed +
Sample is a set function which returns an arbitrary value from the sequence passed to it.
Definition: Sample
-RDFTerm Sample(multiset M)-
Sample(M) = v, where v in Flatten(M)
-Sample({}) = error
+RDFTerm Sample(sequence S)+
Sample(S) = v, where v in Flatten(S)
+Sample([]) = error
For example, given Sample({"a", "b", "c"}), "a", "b", and "c" are all valid return +
For example, given Sample([("a"), ("b"), ("c")]), "a", "b", and "c" are all valid return values. Note that Sample() is not required to be deterministic for a given input, the - only restriction is that the output value must be present in the input multiset.
+ only restriction is that the output value must be present in the input sequence.SELECT
, HAVING
, ORDER BY
Let G := Group((1), ToList(P))
Else
skip the rest of the aggregate step
@@ -9456,8 +9456,8 @@ where
M(Ψ) = [ ListEval(exprlist, μ) | μ in Ψ ]
- F(Ψ) = func(M(Ψ), scalarvals), for non-DISTINCT
- F(Ψ) = func(Distinct(M(Ψ)), scalarvals), for DISTINCT
DISTINCT
DISTINCT
Special Case: when Definition: Distinct Let Ψ be a sequence of elements which may be either solution mappings or lists of RDF terms. We define: Distinct(Ψ) = [ e | e in Ψ ] card[Distinct(Ψ)](e) = 1 The order of Distinct(Ψ) must preserve any ordering given by OrderBy (if any). Let Ψ be a sequence of elements which may be either solution mappings or lists of RDF terms. Distinct(Ψ) is a sequence of elements that has the following properties. Definition: Reduced Definition: Distinct Let Ψ be a sequence of elements which may be either solution mappings or lists of RDF terms. Distinct(Ψ) is a sequence of elements that has the following properties. Let Ψ be a sequence of solution mappings. We define: Distinct(Ψ) = [ μ | μ in Ψ ] card[Distinct(Ψ)](μ) = 1 The order of Distinct(Ψ) must preserve any ordering given by OrderBy. Definition: Reduced where with Dedup(M(Ψ)) being an order-preserving, duplicate-free version of the sequence M(Ψ); that is, Dedup(M(Ψ)) is a sequence of RDF terms that has the following four properties. Special Case: when scalarvals are used to pass values to the underlying set function, bypassing
From 8e30456504de274cd1f396615ad3060443a00135 Mon Sep 17 00:00:00 2001
From: Olaf Hartig COUNT
is used with the expression
*
the value of F will be the cardinality of the group solution sequence,
card[Ψ]
, or card[Distinct(Ψ)]
if the DISTINCT
From ff80086d0b37ff19eff77b72b664db42d95f0326 Mon Sep 17 00:00:00 2001
From: Olaf Hartig SPARQL Algebra
+
+
+ SPARQL Algebra
-
+
-
- Aggregate Algebra
M(Ψ) = [ ListEval(exprlist, μ) | μ in Ψ ]
F(Ψ) = func(M(Ψ), scalarvals), for non-DISTINCT
- F(Ψ) = func(Distinct(M(Ψ)), scalarvals), for DISTINCT
DISTINCT
+
+
+
+
+ COUNT
is used with the expression
*
the value of F will be the cardinality of the group solution sequence,
- card[Ψ]
, or card[Distinct(Ψ)]
if the DISTINCT
+ card[Ψ]
, or card[Dedup(Ψ)]
if the DISTINCT
keyword is present. elements again from the algorithm.
---
spec/index.html | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/spec/index.html b/spec/index.html
index ed0e4f7..59c4643 100644
--- a/spec/index.html
+++ b/spec/index.html
@@ -8770,7 +8770,7 @@
Grouping and Aggregation
If Q contains GROUP BY exprlist
Let G := Group(exprlist, ToList(P))
-Else If Q contains an aggregate in SELECT
, HAVING
, ORDER BY
+Else If Q contains an aggregate in SELECT, HAVING, ORDER BY
Let G := Group((1), ToList(P))
Else
skip the rest of the aggregate step