Skip to content

Commit

Permalink
Add sorting/ordering
Browse files Browse the repository at this point in the history
  • Loading branch information
polyfractal committed May 30, 2014
1 parent 2e192df commit 8f50da5
Show file tree
Hide file tree
Showing 2 changed files with 182 additions and 1 deletion.
179 changes: 179 additions & 0 deletions 300_Aggregations/50_sorting_ordering.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@

=== Sorting multi-value buckets

Multi-value buckets -- like the `terms`, `histogram` and `date_histogram` --
dynamically produce many buckets. How does Elasticsearch decide what order
these buckets are presented to the user?

By default, buckets are ordered by `doc_count` in descending order. This is a
good default because often we want to find the documents that maximize some
criteria: price, population, frequency.

But sometimes you'll want to modify this sort order, and there are a few ways to
do it depending on the bucket.

==== Intrinsic sorts

These sort modes are "intrinsic" to the bucket...they operate on data that bucket
generates such as `doc_count`. They share the same syntax but differ slightly
depending on the bucket being used.

Let's perform a `terms` aggregation but sort by `doc_count` ascending:

[source,js]
--------------------------------------------------
GET /cars/transactions/_search?search_type=count
{
"aggs" : {
"colors" : {
"terms" : {
"field" : "color",
"order": {
"_count" : "asc" <1>
}
}
}
}
}
--------------------------------------------------
// SENSE: 300_Aggregations/50_sorting_ordering.json
<1> Using the `_count` keyword, we can sort by `doc_count` ascending

We introduce a "order" object into the aggregation, which allows us to sort on
one of several values:

- `_count`: Sort by document count. Works with `terms`, `histogram`, `date_histogram`
- `_term`: Sort by the string value of a term alphabetically. Works only with `terms`
- `_key`: Sort by the numeric value of each bucket's key (conceptually similar to `_term`).
Works only with `histogram` and `date_histogram`

==== Sorting by a metric

Often, you'll find yourself wanting to sort based on a metric's calculated value.
For our car sales analytics dashboard, we may want to build a bar chart of
sales by car color, but order the bars by the average price ascending.

We can do this by adding a metric to our bucket, then referencing that
metric from the "order" parameter:

[source,js]
--------------------------------------------------
GET /cars/transactions/_search?search_type=count
{
"aggs" : {
"colors" : {
"terms" : {
"field" : "color",
"order": {
"avg_price" : "asc" <2>
}
},
"aggs": {
"avg_price": {
"avg": {"field": "price"} <1>
}
}
}
}
}
--------------------------------------------------
// SENSE: 300_Aggregations/50_sorting_ordering.json
<1> The average price is calculated for each bucket
<2> Then the buckets are ordered by the calculated average in ascending order

This lets you over-ride the sort order with any metric, simply by referencing
the name of the metric. Some metrics, however, emit multiple values. The
`extended_stats` metric is a good example: it provides half a dozen individual
metrics.

[INFO]
.Applicable buckets
====
Metric-based sorting works with `terms`, `histogram` and `date_histogram`
====

If you want to sort on a multi-value metric, you just need to use the fully-qualified
dot path:

[source,js]
--------------------------------------------------
GET /cars/transactions/_search?search_type=count
{
"aggs" : {
"colors" : {
"terms" : {
"field" : "color",
"order": {
"stats.variance" : "asc" <1>
}
},
"aggs": {
"stats": {
"extended_stats": {"field": "price"}
}
}
}
}
}
--------------------------------------------------
// SENSE: 300_Aggregations/50_sorting_ordering.json
<1> Using dot notation, we can sort on the metric we are interested in

In this example we are sorting on the variance of each bucket, so that colors
with the least variance in price will appear before those that have more variance.

==== Sorting based on "deep" metrics

In the prior examples, the metric was a direct child of the bucket. An average
price was calculated for each term. It is possible to sort on "deeper" metrics,
which are grandchildren or great-grandchildren of the bucket...with some limitations.

You can define a path to a deeper, nested metric using angle brackets (`>`), like
so: `my_bucket>another_bucket>metric`

The caveat is that each nested bucket in the path must be a "single value" bucket.
A `filter` bucket produces a single bucket: all documents which match the
filtering criteria. Multi-valued buckets (such as `terms`) generate many
dynamic buckets, which makes it impossible to specify a deterministic path.

Currently there are only two single-value buckets: `filter` and `global`. As
a quick example, let's build a histogram of car prices, but order the buckets
by the variance in price of red and green (but not blue) cars in each price range.

[source,js]
--------------------------------------------------
GET /cars/transactions/_search?search_type=count
{
"aggs" : {
"colors" : {
"histogram" : {
"field" : "price",
"interval": 20000,
"order": {
"red_green_cars>stats.variance" : "asc" <1>
}
},
"aggs": {
"red_green_cars": {
"filter": { "terms": {"color": ["red", "green"]}}, <2>
"aggs": {
"stats": {"extended_stats": {"field" : "price"}} <3>
}
}
}
}
}
}
--------------------------------------------------
// SENSE: 300_Aggregations/50_sorting_ordering.json
<1> Sort the buckets generated by the histogram according to the variance of a nested metric
<2> Because we are using a single-value `filter`, we can use nested sorting
<3> Sort on the stats generated by this metric

In this example, you can see that we are accessing a nested metric. The `stats`
metric is a child of `red_green_cars`, which is in turn a child of `colors`. To
sort on that metric, we define the path as `"red_green_cars>stats.variance"`.
This is allowed because the `filter` bucket is a single-valued bucket.



4 changes: 3 additions & 1 deletion 303_Making_Graphs.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ include::300_Aggregations/35_date_histogram.asciidoc[]

include::300_Aggregations/40_scope.asciidoc[]

include::300_Aggregations/45_filtering.asciidoc[]
include::300_Aggregations/45_filtering.asciidoc[]

include::300_Aggregations/50_sorting_ordering.asciidoc[]

0 comments on commit 8f50da5

Please sign in to comment.