Skip to content

Commit

Permalink
Agg comment and file cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
polyfractal authored and clintongormley committed Jan 5, 2015
1 parent ead31c9 commit 1731b70
Show file tree
Hide file tree
Showing 8 changed files with 24 additions and 87 deletions.
5 changes: 2 additions & 3 deletions 03_Aggregations.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,8 @@ _near real-time_, just like search.
This is extremely powerful for reporting and dashboards. Instead of performing
_rollups_ of your data (_that crusty Hadoop job that takes a week to run_),
you can visualize your data in real time, allowing you to respond immediately.

// Perhaps mention "not precalculated, out of date, and irrelevant"?
// Perhaps "aggs are calculated in the context of the user's search, so you're not showing them that you have 10 4 star hotels on your site, but that you have 10 4 star hotels that *match their criteria*".
Your report changes as your data changes, rather than being pre-calculated, out of
date and irrelevant.

Finally, aggregations operate alongside search requests.((("aggregations", "operating alongside search requests"))) This means you can
both search/filter documents _and_ perform analytics at the same time, on the
Expand Down
13 changes: 0 additions & 13 deletions 300_Aggregations/10_facets.asciidoc

This file was deleted.

30 changes: 20 additions & 10 deletions 300_Aggregations/20_basic_example.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ and their syntax,((("aggregations", "basic example", id="ix_basicex"))) but aggr
Once you learn how to think about aggregations, and how to nest them appropriately,
the syntax is fairly trivial.

[NOTE]
=========================
A complete list of aggregation buckets and metrics can be found at the http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html[online
reference documentation]. We'll cover many of them in this chapter, but glance
over it after finishing so you are familiar with the full range of capabilities.
=========================

So let's just dive in and start with an example. We are going to build some
aggregations that might be useful to a car dealer. Our data will be about car
transactions: the car model, manufacturer, sale price, when it sold, and more.
Expand Down Expand Up @@ -40,32 +47,35 @@ using a simple aggregation. We will do this using a `terms` bucket:

[source,js]
--------------------------------------------------
GET /cars/transactions/_search?search_type=count <1>
GET /cars/transactions/_search?search_type=count
{
"aggs" : { <2>
"colors" : { <3>
"aggs" : { <1>
"colors" : { <2>
"terms" : {
"field" : "color" <4>
"field" : "color" <3>
}
}
}
}
--------------------------------------------------
// SENSE: 300_Aggregations/20_basic_example.json

// Add the search_type=count thing as a sidebar, so it doesn't get in the way
<1> Because we don't care about search results, we are going to use the `count`
<<search-type,search_type>>, which((("count search type"))) will be faster.
<2> Aggregations are placed under the ((("aggs parameter")))top-level `aggs` parameter (the longer `aggregations`
<1> Aggregations are placed under the ((("aggs parameter")))top-level `aggs` parameter (the longer `aggregations`
will also work if you prefer that).
<3> We then name the aggregation whatever we want: `colors`, in this example
<4> Finally, we define a single bucket of type `terms`.
<2> We then name the aggregation whatever we want: `colors`, in this example
<3> Finally, we define a single bucket of type `terms`.

Aggregations are executed in the context of search results,((("searching", "aggregations executed in context of search results"))) which means it is
just another top-level parameter in a search request (for example, using the `/_search`
endpoint). Aggregations can be paired with queries, but we'll tackle that later
in <<_scoping_aggregations>>.

[NOTE]
=========================
You'll notice that we used the `count` <<search-type,search_type>>.((("count search type")))
Because we don't care about search results -- the aggregation totals -- the
`count` search_type will be faster because it omits the fetch phase.
=========================

Next we define a name for our aggregation. Naming is up to you;
the response will be labeled with the name you provide so that your application
Expand Down
1 change: 0 additions & 1 deletion 300_Aggregations/21_add_metric.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,6 @@ and what field we want the average to be calculated on (`price`):
--------------------------------------------------
<1> New `avg_price` element in response

// Would love to have a graph under each example showing how the data can be displayed (later, i know)
Although the response has changed minimally, the data we get out of it has grown
substantially.((("avg_price metric (example)"))) Before, we knew there were four red cars. Now we know that the
average price of red cars is $32,500. This is something that you can plug directly
Expand Down
50 changes: 0 additions & 50 deletions 300_Aggregations/28_bucket_metric_list.asciidoc

This file was deleted.

6 changes: 1 addition & 5 deletions 300_Aggregations/30_histogram.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,6 @@ undoubtedly had a few bar charts in it. The histogram works by specifying an int
prices, you might specify an interval of 20,000. This would create a new bucket
every $20,000. Documents are then sorted into buckets.

// Perhaps "demonstrate" that a car of 28,000 gets dropped into the "20,000" bucket,while a car of 15,000 gets dropped into the "0" bucket
// Delete "Just like the ...."

For our dashboard, we want a bar chart of car sale prices, but we
also want to know the top-selling make per price range. This is easily accomplished
using a `terms` bucket ((("terms bucket", "nested in a histogram bucket")))((("buckets", "nested in other buckets", "terms bucket nested in histogram bucket")))nested inside the `histogram`:
Expand Down Expand Up @@ -48,11 +45,10 @@ interval that defines the bucket size.
<2> A `terms` bucket is nested inside each price range, which will show us the
top make per price range.

// Make the point that the upper limit is exclusive
As you can see, our query is built around the `price` aggregation, which contains
a `histogram` bucket. This bucket requires a numeric field to calculate
buckets on, and an interval size. The interval defines how "wide" each bucket
is. An interval of 20000 means we will have the ranges `[0-20000, 20000-40000, ...]`.
is. An interval of 20000 means we will have the ranges `[0-19999, 20000-39999, ...]`.

Next, we define a nested bucket inside the histogram. This is a `terms` bucket
over the `make` field. There is also a new `size` parameter, which defines the number of terms we want to generate. A `size` of `1` means we want only the top make
Expand Down
2 changes: 0 additions & 2 deletions 300_Aggregations/40_scope.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -137,8 +137,6 @@ by adding a search bar.((("dashboards", "adding a search bar"))) This allows th
of the graphs (which are powered by aggregations, and thus scoped to the query)
update in real time. Try that with Hadoop!

//<TODO> Maybe add two screenshots of a Kibana dashboard that changes considerably

[float]
=== Global Bucket

Expand Down
4 changes: 1 addition & 3 deletions 302_Example_Walkthrough.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,4 @@ include::300_Aggregations/21_add_metric.asciidoc[]

include::300_Aggregations/22_nested_bucket.asciidoc[]

include::300_Aggregations/23_extra_metrics.asciidoc[]

include::300_Aggregations/28_bucket_metric_list.asciidoc[]
include::300_Aggregations/23_extra_metrics.asciidoc[]

0 comments on commit 1731b70

Please sign in to comment.