diff --git a/300_Aggregations/50_sorting_ordering.asciidoc b/300_Aggregations/50_sorting_ordering.asciidoc new file mode 100644 index 000000000..291848e21 --- /dev/null +++ b/300_Aggregations/50_sorting_ordering.asciidoc @@ -0,0 +1,179 @@ + +=== Sorting multi-value buckets + +Multi-value buckets -- like the `terms`, `histogram` and `date_histogram` -- +dynamically produce many buckets. How does Elasticsearch decide what order +these buckets are presented to the user? + +By default, buckets are ordered by `doc_count` in descending order. This is a +good default because often we want to find the documents that maximize some +criteria: price, population, frequency. + +But sometimes you'll want to modify this sort order, and there are a few ways to +do it depending on the bucket. + +==== Intrinsic sorts + +These sort modes are "intrinsic" to the bucket...they operate on data that bucket +generates such as `doc_count`. They share the same syntax but differ slightly +depending on the bucket being used. + +Let's perform a `terms` aggregation but sort by `doc_count` ascending: + +[source,js] +-------------------------------------------------- +GET /cars/transactions/_search?search_type=count +{ + "aggs" : { + "colors" : { + "terms" : { + "field" : "color", + "order": { + "_count" : "asc" <1> + } + } + } + } +} +-------------------------------------------------- +// SENSE: 300_Aggregations/50_sorting_ordering.json +<1> Using the `_count` keyword, we can sort by `doc_count` ascending + +We introduce a "order" object into the aggregation, which allows us to sort on +one of several values: + +- `_count`: Sort by document count. Works with `terms`, `histogram`, `date_histogram` +- `_term`: Sort by the string value of a term alphabetically. Works only with `terms` +- `_key`: Sort by the numeric value of each bucket's key (conceptually similar to `_term`). +Works only with `histogram` and `date_histogram` + +==== Sorting by a metric + +Often, you'll find yourself wanting to sort based on a metric's calculated value. +For our car sales analytics dashboard, we may want to build a bar chart of +sales by car color, but order the bars by the average price ascending. + +We can do this by adding a metric to our bucket, then referencing that +metric from the "order" parameter: + +[source,js] +-------------------------------------------------- +GET /cars/transactions/_search?search_type=count +{ + "aggs" : { + "colors" : { + "terms" : { + "field" : "color", + "order": { + "avg_price" : "asc" <2> + } + }, + "aggs": { + "avg_price": { + "avg": {"field": "price"} <1> + } + } + } + } +} +-------------------------------------------------- +// SENSE: 300_Aggregations/50_sorting_ordering.json +<1> The average price is calculated for each bucket +<2> Then the buckets are ordered by the calculated average in ascending order + +This lets you over-ride the sort order with any metric, simply by referencing +the name of the metric. Some metrics, however, emit multiple values. The +`extended_stats` metric is a good example: it provides half a dozen individual +metrics. + +[INFO] +.Applicable buckets +==== +Metric-based sorting works with `terms`, `histogram` and `date_histogram` +==== + +If you want to sort on a multi-value metric, you just need to use the fully-qualified +dot path: + +[source,js] +-------------------------------------------------- +GET /cars/transactions/_search?search_type=count +{ + "aggs" : { + "colors" : { + "terms" : { + "field" : "color", + "order": { + "stats.variance" : "asc" <1> + } + }, + "aggs": { + "stats": { + "extended_stats": {"field": "price"} + } + } + } + } +} +-------------------------------------------------- +// SENSE: 300_Aggregations/50_sorting_ordering.json +<1> Using dot notation, we can sort on the metric we are interested in + +In this example we are sorting on the variance of each bucket, so that colors +with the least variance in price will appear before those that have more variance. + +==== Sorting based on "deep" metrics + +In the prior examples, the metric was a direct child of the bucket. An average +price was calculated for each term. It is possible to sort on "deeper" metrics, +which are grandchildren or great-grandchildren of the bucket...with some limitations. + +You can define a path to a deeper, nested metric using angle brackets (`>`), like +so: `my_bucket>another_bucket>metric` + +The caveat is that each nested bucket in the path must be a "single value" bucket. +A `filter` bucket produces a single bucket: all documents which match the +filtering criteria. Multi-valued buckets (such as `terms`) generate many +dynamic buckets, which makes it impossible to specify a deterministic path. + +Currently there are only two single-value buckets: `filter` and `global`. As +a quick example, let's build a histogram of car prices, but order the buckets +by the variance in price of red and green (but not blue) cars in each price range. + +[source,js] +-------------------------------------------------- +GET /cars/transactions/_search?search_type=count +{ + "aggs" : { + "colors" : { + "histogram" : { + "field" : "price", + "interval": 20000, + "order": { + "red_green_cars>stats.variance" : "asc" <1> + } + }, + "aggs": { + "red_green_cars": { + "filter": { "terms": {"color": ["red", "green"]}}, <2> + "aggs": { + "stats": {"extended_stats": {"field" : "price"}} <3> + } + } + } + } + } +} +-------------------------------------------------- +// SENSE: 300_Aggregations/50_sorting_ordering.json +<1> Sort the buckets generated by the histogram according to the variance of a nested metric +<2> Because we are using a single-value `filter`, we can use nested sorting +<3> Sort on the stats generated by this metric + +In this example, you can see that we are accessing a nested metric. The `stats` +metric is a child of `red_green_cars`, which is in turn a child of `colors`. To +sort on that metric, we define the path as `"red_green_cars>stats.variance"`. +This is allowed because the `filter` bucket is a single-valued bucket. + + + diff --git a/303_Making_Graphs.asciidoc b/303_Making_Graphs.asciidoc index 8f355e77b..1152f7b97 100644 --- a/303_Making_Graphs.asciidoc +++ b/303_Making_Graphs.asciidoc @@ -6,4 +6,6 @@ include::300_Aggregations/35_date_histogram.asciidoc[] include::300_Aggregations/40_scope.asciidoc[] -include::300_Aggregations/45_filtering.asciidoc[] \ No newline at end of file +include::300_Aggregations/45_filtering.asciidoc[] + +include::300_Aggregations/50_sorting_ordering.asciidoc[] \ No newline at end of file