diff --git a/03_Aggregations.asciidoc b/03_Aggregations.asciidoc index 950cdb9b1..a31f4c75c 100644 --- a/03_Aggregations.asciidoc +++ b/03_Aggregations.asciidoc @@ -33,9 +33,8 @@ _near real-time_, just like search. This is extremely powerful for reporting and dashboards. Instead of performing _rollups_ of your data (_that crusty Hadoop job that takes a week to run_), you can visualize your data in real time, allowing you to respond immediately. - -// Perhaps mention "not precalculated, out of date, and irrelevant"? -// Perhaps "aggs are calculated in the context of the user's search, so you're not showing them that you have 10 4 star hotels on your site, but that you have 10 4 star hotels that *match their criteria*". +Your report changes as your data changes, rather than being pre-calculated, out of +date and irrelevant. Finally, aggregations operate alongside search requests.((("aggregations", "operating alongside search requests"))) This means you can both search/filter documents _and_ perform analytics at the same time, on the diff --git a/300_Aggregations/10_facets.asciidoc b/300_Aggregations/10_facets.asciidoc deleted file mode 100644 index 9c7b8ff66..000000000 --- a/300_Aggregations/10_facets.asciidoc +++ /dev/null @@ -1,13 +0,0 @@ - -=== What about Facets? - -If you've used Elasticsearch in ((("aggregations", "facets and")))((("facets")))the past, you are probably aware of _facets_. -You can think of Aggregations as "facets on steroids". Everything you can do -with facets, you can do with aggregations. - -But there are plenty of operations that are possible in aggregations which are -simply impossible with facets. - -Facets have not been officially depreciated yet, but you can expect that to -happen eventually. We recommend migrating your facets over to aggregations when -you get the chance, and starting all new projects with aggregations instead of facets. \ No newline at end of file diff --git a/300_Aggregations/20_basic_example.asciidoc b/300_Aggregations/20_basic_example.asciidoc index 5d281c01a..122a371b6 100644 --- a/300_Aggregations/20_basic_example.asciidoc +++ b/300_Aggregations/20_basic_example.asciidoc @@ -6,6 +6,13 @@ and their syntax,((("aggregations", "basic example", id="ix_basicex"))) but aggr Once you learn how to think about aggregations, and how to nest them appropriately, the syntax is fairly trivial. +[NOTE] +========================= +A complete list of aggregation buckets and metrics can be found at the http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html[online +reference documentation]. We'll cover many of them in this chapter, but glance +over it after finishing so you are familiar with the full range of capabilities. +========================= + So let's just dive in and start with an example. We are going to build some aggregations that might be useful to a car dealer. Our data will be about car transactions: the car model, manufacturer, sale price, when it sold, and more. @@ -40,12 +47,12 @@ using a simple aggregation. We will do this using a `terms` bucket: [source,js] -------------------------------------------------- -GET /cars/transactions/_search?search_type=count <1> +GET /cars/transactions/_search?search_type=count { - "aggs" : { <2> - "colors" : { <3> + "aggs" : { <1> + "colors" : { <2> "terms" : { - "field" : "color" <4> + "field" : "color" <3> } } } @@ -53,19 +60,22 @@ GET /cars/transactions/_search?search_type=count <1> -------------------------------------------------- // SENSE: 300_Aggregations/20_basic_example.json -// Add the search_type=count thing as a sidebar, so it doesn't get in the way -<1> Because we don't care about search results, we are going to use the `count` -<>, which((("count search type"))) will be faster. -<2> Aggregations are placed under the ((("aggs parameter")))top-level `aggs` parameter (the longer `aggregations` +<1> Aggregations are placed under the ((("aggs parameter")))top-level `aggs` parameter (the longer `aggregations` will also work if you prefer that). -<3> We then name the aggregation whatever we want: `colors`, in this example -<4> Finally, we define a single bucket of type `terms`. +<2> We then name the aggregation whatever we want: `colors`, in this example +<3> Finally, we define a single bucket of type `terms`. Aggregations are executed in the context of search results,((("searching", "aggregations executed in context of search results"))) which means it is just another top-level parameter in a search request (for example, using the `/_search` endpoint). Aggregations can be paired with queries, but we'll tackle that later in <<_scoping_aggregations>>. +[NOTE] +========================= +You'll notice that we used the `count` <>.((("count search type"))) +Because we don't care about search results -- the aggregation totals -- the +`count` search_type will be faster because it omits the fetch phase. +========================= Next we define a name for our aggregation. Naming is up to you; the response will be labeled with the name you provide so that your application diff --git a/300_Aggregations/21_add_metric.asciidoc b/300_Aggregations/21_add_metric.asciidoc index 0c70f0eb5..06ac44056 100644 --- a/300_Aggregations/21_add_metric.asciidoc +++ b/300_Aggregations/21_add_metric.asciidoc @@ -82,7 +82,6 @@ and what field we want the average to be calculated on (`price`): -------------------------------------------------- <1> New `avg_price` element in response -// Would love to have a graph under each example showing how the data can be displayed (later, i know) Although the response has changed minimally, the data we get out of it has grown substantially.((("avg_price metric (example)"))) Before, we knew there were four red cars. Now we know that the average price of red cars is $32,500. This is something that you can plug directly diff --git a/300_Aggregations/28_bucket_metric_list.asciidoc b/300_Aggregations/28_bucket_metric_list.asciidoc deleted file mode 100644 index f4320b5db..000000000 --- a/300_Aggregations/28_bucket_metric_list.asciidoc +++ /dev/null @@ -1,50 +0,0 @@ -// I'd limit this list to the metrics and rely on the obvious. You don't need to explain what min/max/avg etc are. Then say that we'll discusss these more interesting metrics in later chapters: cardinality, percentiles, significant terms. The buckets I'd mention under the relevant section, eg Histo & Range, etc - -== Available Buckets and Metrics - -Elasticsearch has various buckets and metrics.((("buckets", "available types of, reference list")))((("aggregations", "available buckets and metrics"))) The reference documentation -does a great job describing the parameters and how they affect -the component. Instead of re-describing them here, we are simply going to -link to the reference docs and provide a brief description. Skim the list -so you know what is available, and check the reference docs when you need -exact parameters.((("metrics", "available types of, reference list"))) - -[float] -=== Buckets - - - See http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_scoping_aggregations.html#_global_bucket[Global]: includes all documents in your index - - See http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_filter_bucket.html#_filter_bucket[Filter]: only includes documents that match - the filter - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-missing-aggregation.html[Missing]: all documents which _do not_ have - a particular field - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html[Terms]: generates a new bucket for each unique term - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-range-aggregation.html[Range]: creates arbitrary ranges which documents - fall into - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-daterange-aggregation.html[Date Range]: similar to Range, but calendar - aware - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-iprange-aggregation.html[IPV4 Range]: similar to Range, but can handle "IP logic" like CIDR masks, etc - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geodistance-aggregation.html[Geo Distance]: similar to Range, but operates on - geo points - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-histogram-aggregation.html[Histogram]: equal-width, dynamic ranges - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html[Date Histogram]: similar to Histogram, but - calendar aware - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html[Nested]: a special bucket for working with - nested documents (see <>) - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geohashgrid-aggregation.html[Geohash Grid]: partitions documents according to - what geohash grid they fall into (see <>) - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html[TopHits]: Return the top search results grouped by the value of a field (see <>) - -[float] -=== Metrics - - - Individual statistics: See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-min-aggregation.html[Min], See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-max-aggregation.html[Max], See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-avg-aggregation.html[Avg], See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-sum-aggregation.html[Sum] - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-stats-aggregation.html[Stats]: calculates min/mean/max/sum/count of documents in bucket - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-extendedstats-aggregation.html[Extended Stats]: Same as stats, except it also includes variance, std deviation, sum of squares - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-valuecount-aggregation.html[Value Count]: calculates the number of values, which may - be different from the number of documents (for example, multi-valued fields) - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html[Cardinality]: calculates number of distinct/unique values (see <>) - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-percentile-aggregation.html[Percentiles]: calculates percentiles/quantiles for - numeric values in a bucket (see <>) - - See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html#search-aggregations-bucket-significantterms-aggregation[Significant Terms]: finds "uncommonly common" terms - (see <>) - diff --git a/300_Aggregations/30_histogram.asciidoc b/300_Aggregations/30_histogram.asciidoc index ed3704717..266b0f481 100644 --- a/300_Aggregations/30_histogram.asciidoc +++ b/300_Aggregations/30_histogram.asciidoc @@ -12,9 +12,6 @@ undoubtedly had a few bar charts in it. The histogram works by specifying an int prices, you might specify an interval of 20,000. This would create a new bucket every $20,000. Documents are then sorted into buckets. -// Perhaps "demonstrate" that a car of 28,000 gets dropped into the "20,000" bucket,while a car of 15,000 gets dropped into the "0" bucket -// Delete "Just like the ...." - For our dashboard, we want a bar chart of car sale prices, but we also want to know the top-selling make per price range. This is easily accomplished using a `terms` bucket ((("terms bucket", "nested in a histogram bucket")))((("buckets", "nested in other buckets", "terms bucket nested in histogram bucket")))nested inside the `histogram`: @@ -48,11 +45,10 @@ interval that defines the bucket size. <2> A `terms` bucket is nested inside each price range, which will show us the top make per price range. -// Make the point that the upper limit is exclusive As you can see, our query is built around the `price` aggregation, which contains a `histogram` bucket. This bucket requires a numeric field to calculate buckets on, and an interval size. The interval defines how "wide" each bucket -is. An interval of 20000 means we will have the ranges `[0-20000, 20000-40000, ...]`. +is. An interval of 20000 means we will have the ranges `[0-19999, 20000-39999, ...]`. Next, we define a nested bucket inside the histogram. This is a `terms` bucket over the `make` field. There is also a new `size` parameter, which defines the number of terms we want to generate. A `size` of `1` means we want only the top make diff --git a/300_Aggregations/40_scope.asciidoc b/300_Aggregations/40_scope.asciidoc index ec924e670..9e61a0c76 100644 --- a/300_Aggregations/40_scope.asciidoc +++ b/300_Aggregations/40_scope.asciidoc @@ -137,8 +137,6 @@ by adding a search bar.((("dashboards", "adding a search bar"))) This allows th of the graphs (which are powered by aggregations, and thus scoped to the query) update in real time. Try that with Hadoop! -// Maybe add two screenshots of a Kibana dashboard that changes considerably - [float] === Global Bucket diff --git a/302_Example_Walkthrough.asciidoc b/302_Example_Walkthrough.asciidoc index 563cb5f96..93a3983e8 100644 --- a/302_Example_Walkthrough.asciidoc +++ b/302_Example_Walkthrough.asciidoc @@ -5,6 +5,4 @@ include::300_Aggregations/21_add_metric.asciidoc[] include::300_Aggregations/22_nested_bucket.asciidoc[] -include::300_Aggregations/23_extra_metrics.asciidoc[] - -include::300_Aggregations/28_bucket_metric_list.asciidoc[] \ No newline at end of file +include::300_Aggregations/23_extra_metrics.asciidoc[] \ No newline at end of file