diff --git a/07_Admin.asciidoc b/07_Admin.asciidoc new file mode 100644 index 000000000..d56ef8ed7 --- /dev/null +++ b/07_Admin.asciidoc @@ -0,0 +1,21 @@ +[[administration]] += Administration, Monitoring and Deployment + +[partintro] +-- +The majority of this book has been aimed at building applications using Elasticsearch +as the backend. This section is a little different. Here, you will learn +how to manage Elasticsearch itself. Elasticsearch is a very complex piece of +software, with many moving parts. There are a large number of APIs designed +to help you manage your Elasticsearch deployment. + +In this chapter, we will cover three main topics: + +- Monitoring your cluster's vital statistics, what behaviors are normal and which +should be cause for alarm, and how to interpret various stats provided by Elasticsearch +- Deploying your cluster to production, including best-practices and important +configuration which should (or should not!) be changed +- Post-deployment logistics, such as how to perform a rolling restart or backup +your cluster +-- + diff --git a/300_Aggregations/100_circuit_breaker_fd_settings.asciidoc.orig b/300_Aggregations/100_circuit_breaker_fd_settings.asciidoc.orig new file mode 100644 index 000000000..6fa5b1115 --- /dev/null +++ b/300_Aggregations/100_circuit_breaker_fd_settings.asciidoc.orig @@ -0,0 +1,254 @@ + +=== Limiting Memory Usage + +In order for aggregations (or any operation that requires access to field +values) to be fast, access to fielddata must be fast, which is why it is +loaded into memory. But loading too much data into memory will cause slow +garbage collections as the JVM tries to find extra space in the heap, or +possibly even an OutOfMemory exception. + +It may surprise you to find that Elasticsearch does not load into fielddata +just the values for the documents which match your query. It loads the values +for *all documents in your index*, even documents with a different `_type`! + +The logic is: if you need access to documents X, Y, and Z for this query, you +will probably need access to other documents in the next query. It is cheaper +to load all values once, and to *keep them in memory*, than to have to scan +the inverted index on every request. + +The JVM heap is a limited resource which should be used wisely. A number of +mechanisms exist to limit the impact of fielddata on heap usage. These limits +are important because abuse of the heap will cause node instability (thanks to +slow garbage collections) or even node death (with an OutOfMemory exception). + +.Choosing a heap size +****************************************** + +There are two rules to apply when setting the Elasticsearch heap size, with +the `$ES_HEAP_SIZE` environment variable: + +* *No more than 50% of available RAM* ++ +Lucene makes good use of the filesystem caches, which are managed by the +kernel. Without enough filesystem cache space, performance will suffer. + +* *No more than 32GB* ++ +If the heap is less than 32GB, the JVM can use compressed pointers, which +saves a lot of memory: 4 bytes per pointer instead of 8 bytes. ++ +Increasing the heap from 32GB to 34GB would mean that you have much *less* +memory available, because all pointers are taking double the space. Also, +with bigger heaps, garbage collection becomes more costly and can result in +node instability. + +This limit has a direct impact on much memory can be devoted to fielddata. + +****************************************** + +[[fielddata-size]] +==== Fielddata size + +The `indices.fielddata.cache.size` controls how much heap space is allocated +to fielddata. When you run a query that requires access to new field values, +it will load the values into memory and then try to add them to fielddata. If +the resulting fielddata size would exceed the specified `size`, then other +values would be evicted in order to make space. + +By default, this setting is *unbounded* -- Elasticsearch will never evict data +from fielddata. + +This default was chosen deliberately: fielddata is not a transient cache. It +is an in-memory data structure that must be accessible for fast execution, and +it is expensive to build. If you have to reload data for every request, +performance is going to be awful. + +A bounded size forces the data structure to evict data. We will look at when +to set this value below, but first a warning: + +[WARNING] +======================================= +*This setting is a safeguard, not a solution for insufficient memory.* + +If you don't have enough memory to keep your fielddata resident in memory, +Elasticsearch will constantly have to reload data from disk, and evict other +data to make space. Evictions cause heavy disk I/O and generate a large +amount of "garbage" in memory, which must be garbage collected later on. + +======================================= + +Imagine that you are indexing logs, using a new index every day. Normally you +are only interested in data from the last day or two. While you keep older +indices around, you seldom need to query them. However, with the default +settings, the fielddata from the old indices is never evicted! fielddata +will just keep on growing until you trip the fielddata circuit breaker -- see +<> below -- which will prevent you from loading any more +fielddata. + +At that point you're stuck. While you can still run queries which access +fielddata from the old indices, you can't load any new values. Instead, we +should evict old values to make space for the new values. + +To prevent this scenario, place an upper limit on the fielddata by adding this +setting to the `config/elasticsearch.yml` file: + +[source,yaml] +----------------------------- +indices.fielddata.cache.size: 40% <1> +----------------------------- +<1> Can be set to a percentage of the heap size, or a concrete + value like `5gb`. + +With this setting in place, the least recently used fielddata will be evicted +to make space for newly loaded data. + +[WARNING] +==== +There is another setting which you may see online: `indices.fielddata.cache.expire` + +We beg that you *never* use this setting! It will likely be deprecated in the +future. + +This setting tells Elasticsearch to evict values from fielddata if they are older +than `expire`, whether the values are being used or not. + +This is *terrible* for performance. Evictions are costly, and this effectively +_schedules_ evictions on purpose, for no real gain. + +There isn't a good reason to use this setting; we literally cannot theory-craft +a hypothetically useful situation. It only exists for backwards compatibility at +the moment. We only mention the setting in this book since, sadly, it has been +recommended in various articles on the internet as a good ``performance tip''. + +It is not. Never use it! +==== + +<<<<<<< HEAD +[[monitoring-fielddata]] +==== Monitoring fielddata + +It is important to keep a close watch on how much memory is being used by +fielddata, and whether any data is being evicted. High eviction counts can +indicate a serious resource issue and a reason for poor performance. + +Fielddata usage can be monitored: + +* per-index using the {ref}indices-stats.html[`indices-stats` API]: ++ +[source,json] +------------------------------- +GET /_stats/fielddata?fields=* +------------------------------- + +* per-node using the {ref}cluster-nodes-stats.html[`nodes-stats` API]: ++ +[source,json] +------------------------------- +GET /_nodes/stats/indices/fielddata?fields=* +------------------------------- + +* or even per-index per-node: ++ +[source,json] +------------------------------- +GET /_nodes/stats/indices/fielddata?level=indices&fields=* +------------------------------- + +By setting `?fields=*` the memory usage is broken down for each field. + + +[[circuit-breaker]] +======= +[[circuit_breaker]] +>>>>>>> manage_monitor +==== Circuit Breaker + +An astute reader might have noticed a problem with the fielddata size settings. +fielddata size is checked _after_ the data is loaded. What happens if a query +arrives which tries to load more into fielddata than available memory? The +answer is ugly: you would get an OutOfMemoryException. + +Elasticsearch includes a _fielddata circuit breaker_ which is designed to deal +with this situation. The circuit breaker estimates the memory requirements of +a query by introspecting the fields involved (their type, cardinality, size, +etc). It then checks to see whether loading the required fielddata would push +the total fielddata size over the configured percentage of the heap. + +If the estimated query size is larger than the limit, the circuit breaker is +"tripped" and the query will be aborted and return an exception. This happens +*before* data is loaded, which means that you won't hit an +OutOfMemoryException. + +*************************************** + +Elasticsearch has a family of circuit breakers, all of which work to ensure +that memory limits are not exceeded: + +`indices.breaker.fielddata.limit`:: + + The `fielddata` circuit breaker limits the size of fielddata to 60% of the + heap, by default. + +`indices.breaker.request.limit`:: + + The `request` circuit breaker estimates the size of structures required to + complete other parts of a request, such as creating aggregation buckets, + and limits them to 40% of the heap, by default. + +`indices.breaker.total.limit`:: + + The `total` circuit breaker wraps the `request` and `fielddata` circuit + breakers to ensure that the combination of the two doesn't use more than + 70% of the heap by default. + +*************************************** + +The circuit breaker limits can be specified in the `config/elasticsearch.yml` +file, or can be updated dynamically on a live cluster: + +[source,js] +---- +PUT /_cluster/settings +{ + "persistent" : { + "indices.breaker.fielddata.limit" : 40% <1> + } +} +---- +<1> The limit is a percentage of the heap. + + +It is best to configure the circuit breaker with a relatively conservative +value. Remember that fielddata needs to share the heap with the `request` +circuit breaker, the indexing memory buffer, the filter cache, Lucene data +structures for open indices, and various other transient data structures. For +this reason it defaults to a fairly conservative 60%. Overly optimistic +settings can cause potential OOM exceptions, which will take down an entire +node. + +On the other hand, an overly conservative value will simply return a query +exception which can be handled by your application. An exception is better +than a crash. These exceptions should also encourage you to reassess your +query: why *does* a single query need more than 60% of the heap? + +.Circuit breaker and Fielddata size +****************************************** + +In <> we spoke about adding a limit to the size of fielddata, +to ensure that old unused fielddata can be evicted. The relationship between +`indices.fielddata.cache.size` and `indices.breaker.fielddata.limit` is an +important one. If the circuit breaker limit is lower than the cache size, +then no data will ever be evicted. In order for it to work properly, the +circuit breaker limit *must* be higher than the cache size. +****************************************** + +It is important to note that the circuit breaker compares estimated query size +against the total heap size, *not* against the actual amount of heap memory +used. This is done for a variety of technical reasons (e.g. the heap may look +"full" but is actually just garbage waiting to be collected, which is hard to +estimate properly). But as the end-user, this means the setting needs to be +conservative, since it is comparing against total heap, not ``free'' heap. + + + + diff --git a/300_Aggregations/110_docvalues.asciidoc.orig b/300_Aggregations/110_docvalues.asciidoc.orig new file mode 100644 index 000000000..4059b638f --- /dev/null +++ b/300_Aggregations/110_docvalues.asciidoc.orig @@ -0,0 +1,85 @@ +<<<<<<< HEAD +[[doc-values]] +======= +[[doc_values]] +>>>>>>> manage_monitor +=== Doc Values + +In-memory fielddata is limited by the size of your heap. While this is a +problem that can be solved by scaling horizontally -- you can always add more +nodes -- you will find that heavy use of aggregations and sorting can exhaust +your heap space while other resources on the node are under-utilised. + +While fielddata defaults to loading values into memory on-the-fly, this is not +the only option. It can also be written to disk at index time in a way that +provides all of the functionality of in-memory fielddata, but without the +heap memory usage. This alternative format is called _doc values_. + +Doc values were added to Elasticsearch in version 1.0.0 but, until recently, +they were much slower than in-memory fielddata. By benchmarking and profiling +performance, various bottlenecks have been identified -- in both Elasticsearch +and Lucene -- and removed. + +Doc values are now only about 10 - 25% slower than in-memory fielddata, and +come with two major advantages: + + * They live on disk instead of in heap memory. This allows you to work with + quantities of fielddata that would normally be too large to fit into + memory. In fact, your heap space (`$ES_HEAP_SIZE`) can now be set to a + smaller size, which improves the speed of garbage collection and, + consequently, node stability. + + * Doc values are built at index time, not at search time. While in-memory + fielddata has to be built on-the-fly at search time by uninverting the + inverted index, doc values are pre-built and much faster to initialize. + +The trade-off is a larger index size and slightly slower fielddata access. Doc +values are remarkably efficient, so for many queries you might not even notice +the slightly slower speed. Combine that with faster garbage collections and +improved initialization times and you may notice a net gain. + +The more filesystem cache space that you have available, the better doc values +will perform. If the files holding the doc values are resident in the file +system cache, then accessing the files is almost equivalent to reading from +RAM. And the filesystem cache is managed by the kernel instead of the JVM. + +==== Enabling Doc Values + +Doc values can be enabled for numeric, date, boolean, binary, and geo-point +fields, and for `not_analyzed` string fields. They do not currently work with +`analyzed` string fields. Doc values are enabled per-field in the field +mapping, which means that you can combine in-memory fielddata with doc values. + +[source,js] +---- +PUT /music/_mapping/song +{ + "properties" : { + "tag": { + "type": "string", + "index" : "not_analyzed", + "doc_values": true <1> + } + } +} +---- +<1> Setting `doc_values` to `true` at field creation time is all + that is required to use disk-based fielddata instead of in-memory + fielddata. + +That's it! Queries, aggregations, sorting, and scripts will function as +normal... they'll just be using doc values now. There is no other +configuration necessary. + +.When to use doc values +****************************************** + +Use doc values freely. The more you use them, the less stress you place on +the heap. It is possible that doc values will become the default format in +the near future. + +****************************************** + + + + diff --git a/300_Aggregations/28_bucket_metric_list.asciidoc.orig b/300_Aggregations/28_bucket_metric_list.asciidoc.orig new file mode 100644 index 000000000..6fbf9179e --- /dev/null +++ b/300_Aggregations/28_bucket_metric_list.asciidoc.orig @@ -0,0 +1,58 @@ +// I'd limit this list to the metrics and rely on the obvious. You don't need to explain what min/max/avg etc are. Then say that we'll discusss these more interesting metrics in later chapters: cardinality, percentiles, significant terms. The buckets I'd mention under the relevant section, eg Histo & Range, etc + +== Available Buckets and Metrics + +There are a number of different buckets and metrics. The reference documentation +does a great job describing the various parameters and how they affect +the component. Instead of re-describing them here, we are simply going to +link to the reference docs and provide a brief description. Skim the list +so that you know what is available, and check the reference docs when you need +exact parameters. + +[float] +=== Buckets + + - {ref}search-aggregations-bucket-global-aggregation.html[Global]: includes all documents in your index + - {ref}search-aggregations-bucket-filter-aggregation.html[Filter]: only includes documents that match + the filter + - {ref}search-aggregations-bucket-missing-aggregation.html[Missing]: all documents which _do not_ have + a particular field + - {ref}search-aggregations-bucket-terms-aggregation.html[Terms]: generates a new bucket for each unique term + - {ref}search-aggregations-bucket-range-aggregation.html[Range]: creates arbitrary ranges which documents + fall into + - {ref}search-aggregations-bucket-daterange-aggregation.html[Date Range]: similar to Range, but calendar + aware + - {ref}search-aggregations-bucket-iprange-aggregation.html[IPV4 Range]: similar to Range, but can handle "IP logic" like CIDR masks, etc + - {ref}search-aggregations-bucket-geodistance-aggregation.html[Geo Distance]: similar to Range, but operates on + geo points + - {ref}search-aggregations-bucket-histogram-aggregation.html[Histogram]: equal-width, dynamic ranges + - {ref}search-aggregations-bucket-datehistogram-aggregation.html[Date Histogram]: similar to Histogram, but + calendar aware + - {ref}search-aggregations-bucket-nested-aggregation.html[Nested]: a special bucket for working with + nested documents (see <>) + - {ref}search-aggregations-bucket-geohashgrid-aggregation.html[Geohash Grid]: partitions documents according to + what geohash grid they fall into (see <>) + - {ref}search-aggregations-metrics-top-hits-aggregation.html[TopHits]: Return the top search results grouped by the value of a field (see <>) + +[float] +=== Metrics + + - Individual statistics: {ref}search-aggregations-metrics-min-aggregation.html[Min], {ref}search-aggregations-metrics-max-aggregation.html[Max], {ref}search-aggregations-metrics-avg-aggregation.html[Avg], {ref}search-aggregations-metrics-sum-aggregation.html[Sum] + - {ref}search-aggregations-metrics-stats-aggregation.html[Stats]: calculates min/mean/max/sum/count of documents in bucket + - {ref}search-aggregations-metrics-extendedstats-aggregation.html[Extended Stats]: Same as stats, except it also includes variance, std deviation, sum of squares + - {ref}search-aggregations-metrics-valuecount-aggregation.html[Value Count]: calculates the number of values, which may + be different from the number of documents (e.g. multi-valued fields) +<<<<<<< HEAD + - {ref}search-aggregations-metrics-cardinality-aggregation.html[Cardinality]: calculates number of distinct/unique values (see <>) + - {ref}search-aggregations-metrics-percentile-aggregation.html[Percentiles]: calculates percentiles/quantiles for + numeric values in a bucket (see <>) + - {ref}search-aggregations-bucket-significantterms-aggregation.html[Significant Terms]: finds "uncommonly common" terms + (see <>) +======= + - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html[Cardinality]: calculates number of distinct/unique values (covered in more detail <>) + - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-percentile-aggregation.html[Percentiles]: calculates percentiles/quantiles for + numeric values in a bucket (covered in more detail <>) + - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html[Significant Terms]: finds "uncommonly common" terms + (covered in more detail <>) +>>>>>>> manage_monitor + diff --git a/300_Aggregations/30_histogram.asciidoc b/300_Aggregations/30_histogram.asciidoc index fdabe481a..1fde824a6 100644 --- a/300_Aggregations/30_histogram.asciidoc +++ b/300_Aggregations/30_histogram.asciidoc @@ -101,4 +101,51 @@ The response is fairly self-explanatory, but it should be noted that the histogram keys correspond to the lower boundary of the interval. The key `0` means `0-20,000`, the key `20000` means `20,000-40,000`, etc. +Graphically, you could represent the above data in a histogram like this: + +[[barcharts-histo1]] +image::images/300_30_histo1.svg["Histogram of top makes per price range"] + +Of course, you can build bar charts with any aggregation which emits categories +and statistics, not just the `histogram` bucket. Let's build a bar chart of +popular makes, their average price, and then calculate the standard error +to add error bars on our chart. This will make use of the `terms` bucket +and an `extended_stats` metric: + +[source,js] +---- +GET /cars/transactions/_search?search_type=count +{ + "aggs": { + "makes": { + "terms": { + "field": "make", + "size": 10 + }, + "aggs": { + "stats": { + "extended_stats": { + "field": "price" + } + } + } + } + } +} +---- + +This will return a list of makes (sorted by popularity) and a variety of statistics +about each. In particular, we are interested in `stats.avg`, `stats.count`, +and `stats.std_deviation`. Using this information, we can calculate the standard error: + +................................ +std_err = std_deviation / count +................................ + +Which will allow us to build a chart like this: + +[[barcharts-bar1]] +image::images/300_30_bar1.svg["Barchart of average price per make, with error bars"] + + diff --git a/300_Aggregations/35_date_histogram.asciidoc b/300_Aggregations/35_date_histogram.asciidoc index b7f2823c7..2fcf608ee 100644 --- a/300_Aggregations/35_date_histogram.asciidoc +++ b/300_Aggregations/35_date_histogram.asciidoc @@ -172,14 +172,19 @@ to tell Elasticsearch that we want buckets even if they fall _before_ the minimum value or _after_ the maximum value. The `extended_bounds` parameter does just that. Once you add those two settings, -you'll get a response that is easy to plug straight into your graphing libraries. +you'll get a response that is easy to plug straight into your graphing libraries +and give you a graph like this: + +[[date-histo-ts1]] +image::images/300_35_ts1.svg["Line chart of cars sold per month"] === Extended Example Just like we've seen a dozen times already, buckets can be nested in buckets for more sophisticated behavior. For illustration, we'll build an aggregation -which shows the average price of the top-selling car each month. - +which shows the total sum of prices for all makes, listed by quarter. Let's also +calculate the sum of prices per individual make per quarter, so we can see +which car type is bringing in the most money to our business: [source,js] -------------------------------------------------- @@ -189,7 +194,7 @@ GET /cars/transactions/_search?search_type=count "sales": { "date_histogram": { "field": "sold", - "interval": "month", + "interval": "quarter", <1> "format": "yyyy-MM-dd", "min_doc_count" : 0, "extended_bounds" : { @@ -198,16 +203,18 @@ GET /cars/transactions/_search?search_type=count } }, "aggs": { - "top_selling": { + "per_make_sum": { "terms": { - "field": "make", - "size": 1 + "field": "make" }, "aggs": { - "avg_price": { - "avg": { "field": "price" } + "sum_price": { + "sum": { "field": "price" } <2> } } + }, + "total_sum": { + "sum": { "field": "price" } <3> } } } @@ -215,62 +222,52 @@ GET /cars/transactions/_search?search_type=count } -------------------------------------------------- // SENSE: 300_Aggregations/35_date_histogram.json +<1> Note that we changed the interval from `month` to `quarter` +<2> Calculate the sum per make +<3> And the total sum of all makes combined together Which returns a (heavily truncated) response: [source,js] -------------------------------------------------- { -... - "aggregations": { - "sales": { - "buckets": [ - { - "key_as_string": "2014-01-01", - "key": 1388534400000, - "doc_count": 1, - "top_selling": { - "buckets": [ - { - "key": "bmw", - "doc_count": 1, - "avg_price": { - "value": 80000 - } - } - ] - } +.... +"aggregations": { + "sales": { + "buckets": [ + { + "key_as_string": "2014-01-01", + "key": 1388534400000, + "doc_count": 2, + "total_sum": { + "value": 105000 }, - { - "key_as_string": "2014-02-01", - "key": 1391212800000, - "doc_count": 1, - "top_selling": { - "buckets": [ - { - "key": "ford", - "doc_count": 1, - "avg_price": { - "value": 25000 - } + "per_make_sum": { + "buckets": [ + { + "key": "bmw", + "doc_count": 1, + "sum_price": { + "value": 80000 } - ] - } - }, - { - "key_as_string": "2014-03-01", - "key": 1393632000000, - "doc_count": 0, - "top_selling": { - "buckets": []<1> - } + }, + { + "key": "ford", + "doc_count": 1, + "sum_price": { + "value": 25000 + } + } + ] } + }, ... } -------------------------------------------------- <1> Empty bucket because no cars were sold in March -As you would expect, we see a list of buckets corresponding to each month, -including months that had no car sales (e.g. March). Each month -then has bucket corresponding to the top selling make, and that -bucket contains a metric which calculates the average price for that month. \ No newline at end of file +We can take this response and put it into a graph, showing a line chart for +total sale price, and a bar chart for each individual make (per quarter): + +[[date-histo-ts2]] +image::images/300_35_ts2.svg["Line chart of cars sold per month"] \ No newline at end of file diff --git a/300_Aggregations/65_percentiles.asciidoc b/300_Aggregations/65_percentiles.asciidoc index aad22728d..469ecac3e 100644 --- a/300_Aggregations/65_percentiles.asciidoc +++ b/300_Aggregations/65_percentiles.asciidoc @@ -23,14 +23,14 @@ metric is easily skewed by just a single outlier. This graph visualizes the problem. If you rely on simple metrics like mean or median, you might see a graph that looks like this: [[percentile-mean-median]] -image::images/300_65_percentile1.png["Assessing website latency using mean/median"] +image::images/300_65_percentile1.svg["Assessing website latency using mean/median"] Everything looks fine. There is a slight bump, but nothing to be concerned about. But if we load up the 99th percentile (the value which accounts for the slowest 1% of latencies), we see an entirely different story: [[percentile-mean-median-percentile]] -image::images/300_65_percentile2.png["Assessing website latency using percentiles"] +image::images/300_65_percentile2.svg["Assessing website latency using percentiles"] Woah! At 9:30 AM, the mean is only 75ms. As a system administrator, you wouldn't look at this value twice. Everything normal! But the 99th percentile is telling diff --git a/305_Significant_Terms.asciidoc b/305_Significant_Terms.asciidoc index 037bbea9c..6145776b9 100644 --- a/305_Significant_Terms.asciidoc +++ b/305_Significant_Terms.asciidoc @@ -1,4 +1,4 @@ include::300_Aggregations/70_sigterms_intro.asciidoc[] -include::300_Aggregations/75_sigterms.asciidoc[] \ No newline at end of file +include::300_Aggregations/75_sigterms.asciidoc[] diff --git a/305_Significant_Terms.asciidoc.orig b/305_Significant_Terms.asciidoc.orig new file mode 100644 index 000000000..7591d0194 --- /dev/null +++ b/305_Significant_Terms.asciidoc.orig @@ -0,0 +1,9 @@ +<<<<<<< HEAD +======= +[[sig-terms]] +== Significant Terms +>>>>>>> manage_monitor + +include::300_Aggregations/70_sigterms_intro.asciidoc[] + +include::300_Aggregations/75_sigterms.asciidoc[] \ No newline at end of file diff --git a/500_Cluster_Admin.asciidoc b/500_Cluster_Admin.asciidoc index cd91db951..997313149 100644 --- a/500_Cluster_Admin.asciidoc +++ b/500_Cluster_Admin.asciidoc @@ -1,21 +1,12 @@ [[cluster-admin]] -== Cluster management and monitoring (TODO) +== Monitoring -This chapter discusses how cluster management and monitoring. +include::500_Cluster_Admin/10_intro.asciidoc[] -=== Cluster health -. +include::500_Cluster_Admin/15_marvel.asciidoc[] +include::500_Cluster_Admin/20_health.asciidoc[] -=== Cluster settings -. - - -=== Nodes stats -. - - -=== Nodes info -. - +include::500_Cluster_Admin/30_node_stats.asciidoc[] +include::500_Cluster_Admin/40_other_stats.asciidoc[] diff --git a/500_Cluster_Admin/10_intro.asciidoc b/500_Cluster_Admin/10_intro.asciidoc new file mode 100644 index 000000000..b1143976d --- /dev/null +++ b/500_Cluster_Admin/10_intro.asciidoc @@ -0,0 +1,15 @@ + +Elasticsearch is often deployed as a cluster of nodes. There are a variety of +APIs that let you manage and monitor the cluster itself, rather than interact +with the data stored within the cluster. + +As with most functionality in Elasticsearch, there is an over-arching design goal +that tasks should be performed through an API rather than by modifying static +configuration files. This becomes especially important as your cluster scales. +Even with a provisioning system (puppet, chef, ansible, etc), a single HTTP API call +is often simpler than pushing new configurations to hundreds of physical machines. + +To that end, this chapter will be discussing the various APIs that allow you to +dynamically tweak, tune and configure your cluster. We will also cover a +host of APIs that provide statistics about the cluster itself so that you can +monitor for health and performance. \ No newline at end of file diff --git a/500_Cluster_Admin/15_marvel.asciidoc b/500_Cluster_Admin/15_marvel.asciidoc new file mode 100644 index 000000000..b8123cdb9 --- /dev/null +++ b/500_Cluster_Admin/15_marvel.asciidoc @@ -0,0 +1,29 @@ + +=== Marvel for Monitoring + +At the very beginning of the book (<>) we encouraged you to install +Marvel, a management monitoring tool for Elasticsearch, because it would enable +interactive code samples throughout the book. + +If you didn't install Marvel then, we encourage you to install it now. This +chapter will introduce a large number of APIs that emit an even larger number +of statistics. These stats track everything from heap memory usage and garbage +collection counts to open file descriptors. These statistics are invaluable +for debugging a misbehaving cluster. + +The problem is that these APIs provide a single data point -- the statistic +_right now_. Often you'll want to see historical data too, so that you can +plot a trend. Knowing memory usage at this instant is helpful, but knowing +memory usage _over time_ is much more useful. + +Furthermore, the output of these APIs can get truly hairy as your cluster grows. +Once you have a dozen nodes, let alone a hundred, reading through stacks of JSON +becomes very tedious. + +Marvel periodically polls these APIs and stores the data back in Elasticsearch. +This allows Marvel to query and aggregate the metrics, then provide interactive +graphs in your browser. There are no proprietary statistics that Marvel exposes; +it uses the same stats APIs that are accessible to you. But it does greatly +simplify the collection and graphing of those statistics. + +Marvel is free to use in development, so you should definitely try it out! \ No newline at end of file diff --git a/500_Cluster_Admin/20_health.asciidoc b/500_Cluster_Admin/20_health.asciidoc new file mode 100644 index 000000000..129de27d6 --- /dev/null +++ b/500_Cluster_Admin/20_health.asciidoc @@ -0,0 +1,222 @@ + +=== Cluster Health + +An Elasticsearch cluster may consist of a single node with a single index. Or it +may have a hundred data nodes, three dedicated masters, a few dozen clients nodes +...all operating on a thousand indices (and tends of thousands of shards). + +No matter the scale of the cluster, you'll want a quick way to assess the status +of your cluster. The _Cluster Health_ API fills that role. You can think of it +as a ten-thousand foot view of your cluster. It can reassure you that everything +is alright, or alert you to a problem somewhere in your cluster. + +Let's execute a Health API and see what the response looks like: + +[source,bash] +---- +GET _cluster/health +---- + +Like other APIs in Elasticsearch, Cluster Health will return a JSON response. +This makes it convenient to parse for automation and alerting. The response +contains some critical information about your cluster: + +[source,js] +---- +{ + "cluster_name": "elasticsearch_zach", + "status": "green", + "timed_out": false, + "number_of_nodes": 1, + "number_of_data_nodes": 1, + "active_primary_shards": 10, + "active_shards": 10, + "relocating_shards": 0, + "initializing_shards": 0, + "unassigned_shards": 0 +} +---- + +The most important piece of information in the response is the `"status"` field. +The status may be one of three values: + +- *Green:* all primary and replica shards are allocated. Your cluster is 100% +operational +- *Yellow:* all primary shards are allocated, but at least one replica is missing. +No data is missing so search results will still be complete. However, your +high-availability is compromised to some degree. If _more_ shards disappear, you +might lose data. Think of Yellow as a warning which should prompt investigation. +- *Red:* at least one primary shard (and all of it's replicas) are missing. This means +that you are missing data: searches will return partial results and indexing +into that shard will return an exception. + +The Green/Yellow/Red status is a great way to glance at your cluster and understand +what's going on. The rest of the metrics give you a general summary of your cluster: + +- `number_of_nodes` and `number_of_data_nodes` are fairly self-descriptive. +- `active_primary_shards` are the number of primary shards in your cluster. This +is an aggregate total across all indices. +- `active_shards` is an aggregate total of _all_ shards across all indices, which +includes replica shards +- `relocating_shards` shows the number of shards that are currently moving from +one node to another node. This number is often zero, but can increase when +Elasticsearch decides a cluster is not properly balanced, a new node is added, +a node is taken down, etc. +- `initializing_shards` is a count of shards that are being freshly created. For +example, when you first create an index, the shards will all briefly reside in +"initializing" state. This is typically a transient event and shards shouldn't +linger in "initializing" too long. You may also see initializing shards when a +node is first restarted...as shards are loaded from disk they start as "initializing" +- `unassigned_shards` are shards that exist in the cluster state, but cannot be +found in the cluster itself. A common source of unassigned shards are unassigned +replicas. For example, an index with 5 shards and 1 replica will have 5 unassigned +replicas in a single-node cluster. Unassigned shards will also be present if your +cluster is red (since primaries are missing) + +==== Drilling deeper: finding problematic indices + +Imagine something goes wrong one day, and you notice that your cluster health +looks like this: + +[source,js] +---- +{ + "cluster_name": "elasticsearch_zach", + "status": "red", + "timed_out": false, + "number_of_nodes": 8, + "number_of_data_nodes": 8, + "active_primary_shards": 90, + "active_shards": 180, + "relocating_shards": 0, + "initializing_shards": 0, + "unassigned_shards": 20 +} +---- + +Ok, so what can we deduce from this health status? Well, our cluster is Red, +which means we are missing data (primary + replicas). We know our cluster has +ten nodes, but only see 8 data nodes listed in the health. Two of our nodes +have gone missing. We see that there are 20 unassigned shards. + +That's about all the information we can glean. The nature of those missing +shards are still a mystery. Are we missing 20 indices with one primary shard each? +One index with 20 primary shards? Ten indices with one primary + one replica? +Which index? + +To answer these questions, we need to ask the Cluster Health for a little more +information by using the `level` parameter. + +[source,bash] +---- +GET _cluster/health?level=indices +---- + +This parameter will make the Cluster Health API to add a list of indices in our +cluster and details about each of those indices (status, number of shards, +unassigned shards, etc): + +[source,js] +---- +{ + "cluster_name": "elasticsearch_zach", + "status": "red", + "timed_out": false, + "number_of_nodes": 8, + "number_of_data_nodes": 8, + "active_primary_shards": 90, + "active_shards": 180, + "relocating_shards": 0, + "initializing_shards": 0, + "unassigned_shards": 20 + "indices": { + "v1": { + "status": "green", + "number_of_shards": 10, + "number_of_replicas": 1, + "active_primary_shards": 10, + "active_shards": 20, + "relocating_shards": 0, + "initializing_shards": 0, + "unassigned_shards": 0 + }, + "v2": { + "status": "red", <1> + "number_of_shards": 10, + "number_of_replicas": 1, + "active_primary_shards": 0, + "active_shards": 0, + "relocating_shards": 0, + "initializing_shards": 0, + "unassigned_shards": 20 <2> + }, + "v3": { + "status": "green", + "number_of_shards": 10, + "number_of_replicas": 1, + "active_primary_shards": 10, + "active_shards": 20, + "relocating_shards": 0, + "initializing_shards": 0, + "unassigned_shards": 0 + }, + .... + } +} +---- +<1> We can now see that the `v2` index is the index which has made the cluster Red +<2> And it becomes clear that all 20 missing shards are from this index + +Once we ask for the indices output, it becomes immediately clear which index is +having problems: the `v2` index. We also see that the index has 10 primary shards +and one replica, and that all 20 shards are missing. Presumably these 20 shards +were on the two nodes that are missing from our cluster. + +The `level` parameter accepts one more option: + +[source,bash] +---- +GET _cluster/health?level=shards +---- + +The `shards` option will provide a very verbose output, which lists the status +and location of every shard inside every index. This output is sometimes useful, +but due to the verbosity can difficult to work with. Once you know the index +that is having problems, other APIs that we discuss in this chapter will tend +to be more helpful. + +==== Blocking for status changes + +The Cluster Health API has another neat trick which is very useful when building +unit and integration tests, or automated scripts that work with Elasticsearch. +You can specify a `wait_for_status` parameter, which will make the call block +until the status is satisfied. For example: + +[source,bash] +---- +GET _cluster/health?wait_for_status=green +---- + +This call will block (e.g. not return control to your program) until the cluster +health has turned green, meaning all primary + replica shards have been allocated. +This is very important for automated scripts and tests. + +If you create an index, Elasticsearch must broadcast the change in cluster state +to all nodes. Those nodes must initialize those new shards, then respond to the +master that the shards are Started. This process is very fast, but due to network +latency may take 10-20ms. + +If you have an automated script that A) creates an index and then B) immediately +attempts to index a document, this operation may fail since the index has not +been fully initialized yet. The time between A) and B) will likely be <1ms... +not nearly enough time to account for network latency. + +Rather than sleeping, just have your script/test call the cluster health with +a `wait_for_status` parameter. As soon as the index is fully created, the cluster +health will change to Green, the call returns control to your script, and you may +begin indexing. + +Valid options are `green`, `yellow` and `red`. The call will return when the +requested status (or one "higher") is reached. E.g. if you request `yellow`, +a status change to `yellow` or `green` will unblock the call. + diff --git a/500_Cluster_Admin/30_node_stats.asciidoc b/500_Cluster_Admin/30_node_stats.asciidoc new file mode 100644 index 000000000..aba2256d4 --- /dev/null +++ b/500_Cluster_Admin/30_node_stats.asciidoc @@ -0,0 +1,542 @@ + +=== Monitoring individual nodes + +Cluster Health is at one end of the spectrum -- a very high-level overview of +everything in your cluster. The _Node Stats_ API is at the other end. It provides +an bewildering array of statistics about each node in your cluster. + +Node Stats provides so many stats that, until you are accustomed to the output, +you may be unsure which metrics are most important to keep an eye on. We'll +highlight the most important metrics to monitor (but note: we'd encourage you to +log all the metrics provided -- or use Marvel -- because you'll never know when +you need one stat or another) + +The Node Stats API can be executed with the following: + +[source,bash] +---- +GET _nodes/stats +---- + +Starting at the top of the output, we see the cluster name and our first node: + +[source,js] +---- +{ + "cluster_name": "elasticsearch_zach", + "nodes": { + "UNr6ZMf5Qk-YCPA_L18BOQ": { + "timestamp": 1408474151742, + "name": "Zach", + "transport_address": "inet[zacharys-air/192.168.1.131:9300]", + "host": "zacharys-air", + "ip": [ + "inet[zacharys-air/192.168.1.131:9300]", + "NONE" + ], +... +---- + +The nodes are listed in a hash, with the key being the UUID of the node. Some +information about the node's network properties are displayed (transport address, +host, etc). These values are useful for debugging discovery problems, where +nodes won't join the cluster. Often you'll see that the port being used is wrong, +or the node is binding to the wrong IP address/interface. + +==== Indices section + +The indices section lists aggregate statistics for all the indices that reside +on this particular node. + +[source,js] +---- + "indices": { + "docs": { + "count": 6163666, + "deleted": 0 + }, + "store": { + "size_in_bytes": 2301398179, + "throttle_time_in_millis": 122850 + }, +---- + +- `docs` shows how many documents reside on +this node, as well as the number of deleted docs which haven't been purged +from segments yet. + +- The `store` portion indicates how much physical storage is consumed by the node. +This metric includes both primary and replica shards. If the throttle time is +large, it may be an indicator that your disk throttling is set too low +(discussed later in TODO). + +[source,js] +---- + "indexing": { + "index_total": 803441, + "index_time_in_millis": 367654, + "index_current": 99, + "delete_total": 0, + "delete_time_in_millis": 0, + "delete_current": 0 + }, + "get": { + "total": 6, + "time_in_millis": 2, + "exists_total": 5, + "exists_time_in_millis": 2, + "missing_total": 1, + "missing_time_in_millis": 0, + "current": 0 + }, + "search": { + "open_contexts": 0, + "query_total": 123, + "query_time_in_millis": 531, + "query_current": 0, + "fetch_total": 3, + "fetch_time_in_millis": 55, + "fetch_current": 0 + }, + "merges": { + "current": 0, + "current_docs": 0, + "current_size_in_bytes": 0, + "total": 1128, + "total_time_in_millis": 21338523, + "total_docs": 7241313, + "total_size_in_bytes": 5724869463 + }, +---- + +- `indexing` shows how many docs have been indexed. This value is a monotonically +increasing counter, it doesn't decrease when docs are deleted. Also note that it +is incremented any time an _index_ operation happens internally, which includes +things like updates. ++ +Also listed are times for indexing, how many docs are currently being indexed, +and similar statistics for deletes. + +- `get` shows statistics about get-by-ID statistics. This includes GETs and +HEAD requests for a single document + +- `search` describes the number of active searches (`open_contexts`), number of +queries total and how much time has been spent on queries since the node was +started. The ratio between `query_total / query_time_in_milis` can be used as a +rough indicator for how efficient your queries are. The larger the ratio, +the more time each query is taking and you should consider tuning or optimization. ++ +The fetch statistics details the second half of the query process (the "fetch" in +query-then-fetch). If more time is spent in fetch than query, this can be an +indicator of slow disks or very large documents which are being fetched. Or +potentially search requests with too large of paginations (`size: 10000`, etc). + +- `merges` contains information about Lucene segment merges. It will tell you +how many merges are currently active, how many docs are involved, the cumulative +size of segments being merged and how much time has been spent on merges in total. ++ +Merge statistics can be important if your cluster is write-heavy. Merging consumes +a large amount of disk I/O and CPU resources. If your index is write-heavy and +you see large merge numbers, be sure to read the section on optimizing for indexing +(TODO). ++ +Note: updates and deletes will contribute to large merge numbers too, since they +cause segment "fragmentation" which needs to be merged out eventually. + +[source,js] +---- + "filter_cache": { + "memory_size_in_bytes": 48, + "evictions": 0 + }, + "id_cache": { + "memory_size_in_bytes": 0 + }, + "fielddata": { + "memory_size_in_bytes": 0, + "evictions": 0 + }, + "segments": { + "count": 319, + "memory_in_bytes": 65812120 + }, + ... +---- + +- `filter_cache` describes how much memory is used by the cached filter bitsets, +and how many times a filter has been evicted. A large number of evictions +_could_ be indicative that you need to increase the filter cache size, or that +your filters are not caching well (e.g. churn heavily due to high cardinality, +such as caching "now" date expressions). ++ +However, evictions are a difficult metric to evaluate. Filters are cached on a +per-segment basis, and evicting a filter from a small segment is much less +expensive than a filter on a large segment. It's possible that you have a large +number of evictions, but they all occur on small segments, which means they have +little impact on query performance. ++ +Use the eviction metric as a rough guideline. If you see a large number, investigate +your filters to make sure they are caching well. Filters that constantly evict, +even on small segments, will be much less effective than properly cached filters. + +- `id_cache` shows the memory usage by Parent/Child mappings. When you use +parent/children, the `id_cache` maintains an in-memory-join table which maintains +the relationship. This statistic will show you how much memory is being used. +There is little you can do to affect this memory usage, since it is a fairly linear +relationship with the number of parent/child docs. It is heap-resident, however, +so a good idea to keep an eye on it. + +- `field_data` displays the memory used by field data, which is used for aggregations, +sorting, etc. There is also an eviction count. Unlike `filter_cache`, the eviction +count here is very useful: it should be zero, or very close. Since field data +is not a cache, any eviction is very costly and should be avoided. If you see +evictions here, you need to re-evaluate your memory situation, field data limits, +queries or all three. + +- `segments` will tell you how many Lucene segments this node currently serves. +This can be an important number. Most indices should have around 50-150 segments, +even if they are terrabytes in size with billions of documents. Large numbers +of segments can indicate a problem with merging (e.g. merging is not keeping up +with segment creation). Note that this statistic is the aggregate total of all +indices on the node, so keep that in mind. ++ +The `memory` statistic gives you an idea how much memory is being used by the +Lucene segments themselves. This includes low-level data structures such as +posting lists, dictionaries and bloom filters. A very large number of segments +will increase the amount of overhead lost to these data structures, and the memory +usage can be a handy metric to gauge that overhead. + +==== OS and Process Sections + +The OS and Process sections are fairly self-explanatory and won't be covered +in great detail. They list basic resource statistics such as CPU and load. The +OS section describes it for the entire OS, while the Process section shows just +what the Elasticsearch JVM process is using. + +These are obviously useful metrics, but are often being measured elsewhere in your +monitoring stack. Some stats include: + +- CPU +- Load +- Memory usage +- Swap usage +- Open file descriptors + +==== JVM Section + +The JVM section contains some critical information about the JVM process which +is running Elasticsearch. Most importantly, it contains garbage collection details, +which have a large impact on the stability of your Elasticsearch cluster. + +[[garbage_collector_primer]] +.Garbage Collection Primer +********************************** +Before we describe the stats, it is useful to give a crash course in garbage +collection and it's impact on Elasticsearch. If you are familar with garbage +collection in the JVM, feel free to skip down. + +Java is a _garbage collected_ language, which means that the programmer does +not manually manage memory allocation and deallocation. The programmer simply +writes code, and the Java Virtual Machine (JVM) manages the process of allocating +memory as needed, and then later cleaning up that memory when no longer needed. + +When memory is allocated to a JVM process, it is allocated in a big chunk called +the _heap_. The JVM then breaks the heap into two different groups, referred to as +"generations": + +- Young (or Eden): the space where newly instantiated objects are allocated. The +young generation space is often quite small, usually 100mb-500mb. The young-gen +also contains two "survivor" spaces +- Old: the space where older objects are stored. These objects to be long-lived +and persist for a long time. The old-gen is often much larger than then young-gen, +and Elasticsearch nodes can see old-gens as large as 30gb. + +When an object is instantiated, it is placed into young-gen. When the young +generation space is full, a young-gen GC is started. Objects that are still +"alive" are moved into one of the survivor spaces, and "dead" objects are removed. +If an object has survived several young-gen GCs, it will be "tenured" into the +old generation. + +A similar process happens in the old generation: when the space becomes full, a +garbage collection is started and "dead" objects are removed. + +Nothing comes for free, however. Both the young and old generation garbage collectors +have phases which "stop the world". During this time, the JVM literally halts +execution of the program so that it can trace the object graph and collect "dead" +objects. + +During this "stop the world" phase, nothing happens. Requests are not serviced, +pings are not responded to, shards are not relocated. The world quite literally +stops. + +This isn't a big deal for the young generation; its small size means GCs execute +quickly. But the old-gen is quite a bit larger, and a slow GC here could mean +1s or even 15s of pausing...which is unacceptable for server software. + +The garbage collectors in the JVM are _very_ sophisticated algorithms and do +a great job minimizing pauses. And Elasticsearch tries very hard to be "garbage +collection friendly", by intelligently reusing objects internally, reusing network +buffers, offering features like <>, etc. But ultimately, +GC frequency and duration is a metric that needs to be watched by you since it +is the number one culprit for cluster instability. + +A cluster which is frequently experiencing long GC will be a cluster that is under +heavy load with not enough memory. These long GCs will make nodes drop off the +cluster for brief periods. This instability causes shards to relocate frequently +as ES tries to keep the cluster balanced and enough replicas available. This in +turn increases network traffic and Disk I/O, all while your cluster is attempting +to service the normal indexing and query load. + +In short, long GCs are bad and they need to be minimized as much as possible. +********************************** + +Because garbage collection is so critical to ES, you should become intimately +familiar with this section of the Node Stats API: + +[source,js] +---- + "jvm": { + "timestamp": 1408556438203, + "uptime_in_millis": 14457, + "mem": { + "heap_used_in_bytes": 457252160, + "heap_used_percent": 44, + "heap_committed_in_bytes": 1038876672, + "heap_max_in_bytes": 1038876672, + "non_heap_used_in_bytes": 38680680, + "non_heap_committed_in_bytes": 38993920, + +---- + +- The `jvm` section first lists some general stats about heap memory usage. You +can see how much of the heap is being used, how much is committed (actually allocated +to the process), and the max size the heap is allowed to grow to. Ideally, +`heap_committed_in_bytes` should be identical to `heap_max_in_bytes`. If the +committed size is smaller, the JVM will have to resize the heap eventually... +and this is a very expensive process. If your numbers are not identical, see +this section <> in the next chapter to configure it correctly. ++ +The `heap_used_percent` metric is a useful number to keep an eye on. Elasticsearch +is configured to initiate GCs when the heap reaches 75% full. If your node is +consistently >= 75%, that indicates that your node is experiencing "memory pressure". +This is a warning sign that slow GCs may be in your near future. ++ +If the heap usage is consistently >=85%, you are in trouble. Heaps over 90-95% +are in risk of horrible performance with long 10-30s GCs at best, Out-of-memory +(OOM) exceptions at worst. + +[source,js] +---- + "pools": { + "young": { + "used_in_bytes": 138467752, + "max_in_bytes": 279183360, + "peak_used_in_bytes": 279183360, + "peak_max_in_bytes": 279183360 + }, + "survivor": { + "used_in_bytes": 34865152, + "max_in_bytes": 34865152, + "peak_used_in_bytes": 34865152, + "peak_max_in_bytes": 34865152 + }, + "old": { + "used_in_bytes": 283919256, + "max_in_bytes": 724828160, + "peak_used_in_bytes": 283919256, + "peak_max_in_bytes": 724828160 + } + } + }, +---- + +- The `young`, `survivor` and `old` sections will give you a breakdown of memory +usage of each generation in the GC. These stats are handy to keep an eye on +relative sizes, but are often not overly important when debugging problems. + +[source,js] +---- + "gc": { + "collectors": { + "young": { + "collection_count": 13, + "collection_time_in_millis": 923 + }, + "old": { + "collection_count": 0, + "collection_time_in_millis": 0 + } + } + } +---- + +- `gc` section shows the garbage collection counts and cumulative time for both +young and old generations. You can safely ignore the young generation counts +for the most part: this number will usually be very large. That is perfectly +normal. ++ +In contrast, the old generation collection count should remain very small, and +have a small `collection_time_in_millis`. These are cumulative counts, so it is +hard to give an exact number when you should start worrying (e.g. a node with a +1-year uptime will have a large count even if it is healthy) -- this is one of the +reasons why tools such as Marvel are so helpful. GC counts _over time_ are the +important consideration. ++ +Time spent GC'ing is also important. For example, a certain amount of garbage +is generated while indexing documents. This is normal, and causes a GC every +now-and-then. These GCs are almost always fast -- a millisecond or two -- and +do not impact the node. This is much different from 10 second GCs. ++ +Our best advice is to collect collection counts and duration periodically (or use Marvel) +and keep an eye out for frequent GCs. You can also enable slow-GC logging, +discussed in <> + +==== Threadpool Section + +Elasticsearch maintains a number of threadpools internally. These threadpools +cooperate to get work done, passing work between each other as necessary. In +general, you don't need to configure or tune the threadpools, but it is sometimes +useful to see their stats so you can gain insight into how your cluster is behaving. + +There are about a dozen threadpools, but they all share the same format: + +[source,js] +---- + "index": { + "threads": 1, + "queue": 0, + "active": 0, + "rejected": 0, + "largest": 1, + "completed": 1 + } +---- + +Each threadpool lists the number of threads that are configured (`threads`), +how many of those threads are actively processing some work (`active`) and how +many work units are sitting in a queue (`queue`). + +If the queue fills up to its limit, new workunits will begin to be rejected and +you will see that reflected in the `rejected` statistic. This is often a sign +that your cluster is starting to bottleneck on some resources, since a full +queue means your node/cluster is processing at maximum speed but unable to keep +up with the influx of work. + +.Bulk Rejections +**** +If you are going to encounter queue rejections, it will most likely be caused +by Bulk indexing requests. It is easy to send many Bulk requests to Elasticsearch +using concurrent import processes. More is better, right? + +In reality, each cluster has a certain limit at which it can not keep up with +ingestion. Once this threshold is crossed, the queue will quickly fill up and +new bulks will be rejected. + +This is a _good thing_. Queue rejections are a useful form of back-pressure. They +let you know that your cluster is at maximum capacity, which is much better than +sticking data into an in-memory queue. Increasing the queue size doesn't increase +performance, it just hides the problem. If your cluster can only process 10,000 +doc/s, it doesn't matter if the queue is 100 or 10,000,000...your cluster can +still only process 10,000 docs/s. + +The queue simply hides the performance problem and carries real risk of data-loss. +Anything sitting in a queue is by definition not processed yet. If the node +goes down, all those requests are lost forever. Furthermore, the queue eats +up a lot of memory, which is not ideal. + +It is much better to handle queuing in your application by gracefully handling +the back-pressure from a full queue. When you receive bulk rejections you should: + +1. Pause the import thread for 3-5 seconds +2. Extract the rejected actions from the bulk response, since it is probable that +many of the actions were successful. The bulk response will tell you which succeeded, +and which were rejected. +3. Send a new bulk request with just the rejected actions +4. Repeat at step 1. if rejections were encountered again + +Using this procedure, your code naturally adapts to the load of your cluster and +naturally backs off. + +Rejections are not errors: they just mean you should try again later. +**** + +There are a dozen different threadpools. Most you can safely ignore, but a few +are good to keep an eye on: + +- `indexing`: threadpool for normal indexing requests +- `bulk`: bulk requests, which are distinct from the non-bulk indexing requests +- `get`: GET-by-ID operations +- `search`: all search and query requests +- `merging`: threadpool dedicated to managing Lucene merges + +==== FS and Network sections + +Continuing down the Node Stats API, you'll see a bunch of statistics about your +filesystem: free space, data directory paths, disk IO stats, etc. If you are +not monitoring free disk space, you can get those stats here. The Disk IO stats +are also handy, but often more specialized command-line tools (`iostat`, etc) +are more useful. + +Obviously, Elasticsearch has a difficult time functioning if you run out of disk +space...so make sure you don't :) + +There are also two sections on network statistics: + +[source,js] +---- + "transport": { + "server_open": 13, + "rx_count": 11696, + "rx_size_in_bytes": 1525774, + "tx_count": 10282, + "tx_size_in_bytes": 1440101928 + }, + "http": { + "current_open": 4, + "total_opened": 23 + }, +---- + +- `transport` shows some very basic stats about the "transport address". This +relates to inter-node communication (often on port 9300) and any TransportClient +or NodeClient connections. Don't worry yourself if you see many connections here, +Elasticsearch maintains a large number of connections between nodes + +- `http` represents stats about the HTTP port (often 9200). If you see a very +large `total_opened` number that is constantly increasing, that is a sure-sign +that one of your HTTP clients is not using keep-alive connections. Persistent, +keep-alive connections are important for performance, since building up and tearing +down sockets is expensive (and wastes file descriptors). Make sure your clients +are configured appropriately. + +==== Circuit Breaker + +Finally, we come to the last section: stats about the field data circuit breaker +(introduced in <>): + +[source,js] +---- + "fielddata_breaker": { + "maximum_size_in_bytes": 623326003, + "maximum_size": "594.4mb", + "estimated_size_in_bytes": 0, + "estimated_size": "0b", + "overhead": 1.03, + "tripped": 0 + } +---- + +Here, you can determine what the maximum circuit breaker size is (e.g. at what +size the circuit breaker will trip if a query attempts to use more memory). It +will also let you know how many times the circuit breaker has been tripped, and +the currently configured "overhead". The overhead is used to pad estimates +since some queries are more difficult to estimate than others. + +The main thing to watch is the `tripped` metric. If this number is large, or +consistently increasing, it's a sign that your queries may need to be optimized +or that you may need to obtain more memory (either per box, or by adding more +nodes). + + + + diff --git a/500_Cluster_Admin/40_other_stats.asciidoc b/500_Cluster_Admin/40_other_stats.asciidoc new file mode 100644 index 000000000..c1e19dc83 --- /dev/null +++ b/500_Cluster_Admin/40_other_stats.asciidoc @@ -0,0 +1,364 @@ + +=== Cluster Stats + +The _Cluster Stats_ API provides very similar output to the Node Stats. There +is one crucial difference: Node Stats shows you statistics per-node, while +Cluster Stats will show you the sum total of all nodes in a single metric. + +This provides some useful stats to glance at. You can see that your entire cluster +is using 50% available heap, filter cache is not evicting heavily, etc. It's +main use is to provide a quick summary which is more extensive than +the Cluster Health, but less detailed than Node Stats. It is also useful for +clusters which are very large, which makes Node Stats output difficult +to read. + +The API may be invoked with: + +[source,js] +---- +GET _cluster/stats +---- + +=== Index Stats + +So far, we have been looking at _node-centric_ statistics. How much memory does +this node have? How much CPU is being used? How many searches is this node +servicing? Etc. etc. + +Sometimes it is useful to look at statistics from an _index-centric_ perspective. +How many search requests is _this index_ receiving? How much time is spent fetching +docs in _that index_, etc. + +To do this, select the index (or indices) that you are interested in and +execute an Index Stats API: + +[source,js] +---- +GET my_index/_stats <1> + +GET my_index,another_index/_stats <2> + +GET _all/_stats <3> +---- +<1> Stats for `my_index` +<2> Stats for multiple indices can be requested by comma separating their names +<3> Stats indices can be requested using the special `_all` index name + +The stats returned will be familar to the Node Stats output: search, fetch, get, +index, bulk, segment counts, etc + +Index-centric stats can be useful for identifying or verifying "hot" indices +inside your cluster, or trying to determine while some indices are faster/slower +than others. + +In practice, however, node-centric statistics tend to be more useful. Entire +nodes tend to bottleneck, not individual indices. And because indices +are usually spread across multiple nodes, index-centric statistics +are usually not very helpful because it aggregates different physical machines +operating in different environments. + +Index-centric stats are a useful tool to keep in your repertoire, but are not usually +the first tool to reach for. + +=== Pending Tasks + +There are certain tasks that only the master can perform, such as creating a new +index or moving shards around the cluster. Since a cluster can only have one +master, only one node can ever process cluster-level metadata changes. In +99.9999% of the time, this is never a problem. The queue of metadata changes +remains essentially zero. + +In some _very rare_ clusters, the number of metadata changes occurs faster than +the master can process them. This leads to a build up of pending actions which +are queued. + +The _Pending Tasks_ API will show you what (if any) cluster-level metadata changes +are pending in the queue: + +[source,js] +---- +GET _cluster/pending_tasks +---- + +Usually, the response will look like this: + +[source,js] +---- +{ + "tasks": [] +} +---- + +Meaning there are no pending tasks. If you have one of the rare clusters that +bottlenecks on the master node, your pending task list may look like this: + +[source,js] +---- +{ + "tasks": [ + { + "insert_order": 101, + "priority": "URGENT", + "source": "create-index [foo_9], cause [api]", + "time_in_queue_millis": 86, + "time_in_queue": "86ms" + }, + { + "insert_order": 46, + "priority": "HIGH", + "source": "shard-started ([foo_2][1], node[tMTocMvQQgGCkj7QDHl3OA], [P], s[INITIALIZING]), reason [after recovery from gateway]", + "time_in_queue_millis": 842, + "time_in_queue": "842ms" + }, + { + "insert_order": 45, + "priority": "HIGH", + "source": "shard-started ([foo_2][0], node[tMTocMvQQgGCkj7QDHl3OA], [P], s[INITIALIZING]), reason [after recovery from gateway]", + "time_in_queue_millis": 858, + "time_in_queue": "858ms" + } + ] +} +---- + +You can see that tasks are assigned a priority (`URGENT` is processed before `HIGH`, +etc), the order it was inserted, how long the action has been queued and +what the action is trying to perform. In the above list, there is a Create Index +action and two Shard Started actions pending. + +.When should I worry about Pending Tasks? +**** +As mentioned, the master node is rarely the bottleneck for clusters. The only +time it can potentially bottleneck is if the cluster state is both very large +_and_ updated frequently. + +For example, if you allow customers to create as many dynamic fields as they wish, +and have a unique index for each customer every day, you're cluster state will grow +very large. The cluster state includes (among other things) a list of all indices, +their types, and the fields for each index. + +So if you have 100,000 customers, and each customer averages 1000 fields and 90 +days of retention....that's nine billion fields to keep in the cluster state. +Whenever this changes, the nodes must be notified. + +The master must process these changes which requires non-trivial CPU overhead, +plus the network overhead of pushing the updated cluster state to all nodes. + +It is these clusters which may begin to see cluster state actions queuing up. +There is no easy solution to this problem, however. You have three options: + +- Obtain a beefier master node. Vertical scaling just delays the inevitable, +unfortunately +- Restrict the dynamic nature of the documents in some way, so as to limit the +cluster state size. +- Spin up another cluster once a certain threshold has been crossed. +**** + +=== Cat API + +If you work from the command line often, the _Cat_ APIs will be very helpful +to you. Named after the linux `cat` command, these APIs are designed to be +work like *nix command line tools. + +They provide statistics that are identical to all the previously discussed APIs +(Health, Node Stats, etc), but present the output in tabular form instead of +JSON. This is _very_ convenient as a system administrator and you just want +to glance over your cluster, or find nodes with high memory usage, etc. + +Executing a plain GET against the Cat endpoint will show you all available +APIs: + +[source,shell] +---- +GET /_cat + +=^.^= +/_cat/allocation +/_cat/shards +/_cat/shards/{index} +/_cat/master +/_cat/nodes +/_cat/indices +/_cat/indices/{index} +/_cat/segments +/_cat/segments/{index} +/_cat/count +/_cat/count/{index} +/_cat/recovery +/_cat/recovery/{index} +/_cat/health +/_cat/pending_tasks +/_cat/aliases +/_cat/aliases/{alias} +/_cat/thread_pool +/_cat/plugins +/_cat/fielddata +/_cat/fielddata/{fields} +---- + +Many of these APIs should look familiar to you (and yes, that's a cat at the top +:) ). Let's take a look at the Cat Health API: + +[source,shell] +---- +GET /_cat/health + +1408723713 12:08:33 elasticsearch_zach yellow 1 1 114 114 0 0 114 +---- + +The first thing you'll notice is that the response is plain text in tabular form, +not JSON. The second thing you'll notices is that there are no column headers +enabled by default. This is designed to emulate *nix tools, since it is assumed +that once you become familiar with the output you no longer want to see +the headers. + +To enable headers, add the `?v` parameter: + +[source,shell] +---- +GET /_cat/health?v + +epoch timestamp cluster status node.total node.data shards pri relo init unassign +1408723890 12:11:30 elasticsearch_zach yellow 1 1 114 114 0 0 114 +---- + +Ah, much better. We now see the timestamp, cluster name, the status, how many +nodes are in the cluster, etc. All the same information as the Cluster Health +API. + +Let's look at Node Stats in the Cat API: + +[source,shell] +---- +GET /_cat/nodes?v + +host ip heap.percent ram.percent load node.role master name +zacharys-air 192.168.1.131 45 72 1.85 d * Zach +---- + +We see some stats about the nodes in our cluster, but it is very basic compared +to the full Node Stats output. There are many additional metrics that you can +include, but rather than consulting the documentation, let's just ask the Cat +API what is available. + +You can do this by adding `?help` to any API: + +[source,shell] +---- +GET /_cat/nodes?help + +id | id,nodeId | unique node id +pid | p | process id +host | h | host name +ip | i | ip address +port | po | bound transport port +version | v | es version +build | b | es build hash +jdk | j | jdk version +disk.avail | d,disk,diskAvail | available disk space +heap.percent | hp,heapPercent | used heap ratio +heap.max | hm,heapMax | max configured heap +ram.percent | rp,ramPercent | used machine memory ratio +ram.max | rm,ramMax | total machine memory +load | l | most recent load avg +uptime | u | node uptime +node.role | r,role,dc,nodeRole | d:data node, c:client node +master | m | m:master-eligible, *:current master +... +... +---- +(Note that the output has been truncated for brevity) + +The first column shows the "fullname", the second column shows the "short name", +and the third column offers a brief description about the parameter . Now that +we know some column names, we can ask for those explicitly using the `?h` +parameter: + +[source,shell] +---- +GET /_cat/nodes?v&h=ip,port,heapPercent,heapMax + +ip port heapPercent heapMax +192.168.1.131 9300 53 990.7mb +---- + +Because the Cat API tries to behave like *nix utilities, you can pipe the output +to other tools such as sort, grep, awk, etc. For example, we can find the largest +index in our cluster by using: + +[source,shell] +---- +% curl 'localhost:9200/_cat/indices?bytes=b' | sort -rnk8 + +yellow test_names 5 1 3476004 0 376324705 376324705 +yellow .marvel-2014.08.19 1 1 263878 0 160777194 160777194 +yellow .marvel-2014.08.15 1 1 234482 0 143020770 143020770 +yellow .marvel-2014.08.09 1 1 222532 0 138177271 138177271 +yellow .marvel-2014.08.18 1 1 225921 0 138116185 138116185 +yellow .marvel-2014.07.26 1 1 173423 0 132031505 132031505 +yellow .marvel-2014.08.21 1 1 219857 0 128414798 128414798 +yellow .marvel-2014.07.27 1 1 75202 0 56320862 56320862 +yellow wavelet 5 1 5979 0 54815185 54815185 +yellow .marvel-2014.07.28 1 1 57483 0 43006141 43006141 +yellow .marvel-2014.07.21 1 1 31134 0 27558507 27558507 +yellow .marvel-2014.08.01 1 1 41100 0 27000476 27000476 +yellow kibana-int 5 1 2 0 17791 17791 +yellow t 5 1 7 0 15280 15280 +yellow website 5 1 12 0 12631 12631 +yellow agg_analysis 5 1 5 0 5804 5804 +yellow v2 5 1 2 0 5410 5410 +yellow v1 5 1 2 0 5367 5367 +yellow bank 1 1 16 0 4303 4303 +yellow v 5 1 1 0 2954 2954 +yellow p 5 1 2 0 2939 2939 +yellow b0001_072320141238 5 1 1 0 2923 2923 +yellow ipaddr 5 1 1 0 2917 2917 +yellow v2a 5 1 1 0 2895 2895 +yellow movies 5 1 1 0 2738 2738 +yellow cars 5 1 0 0 1249 1249 +yellow wavelet2 5 1 0 0 615 615 +---- + +By adding `?bytes=b` we disable the "human readable" formatting on numbers and +force them to be listed as bytes. This output is then piped into `sort` so that +our indices are ranked according to size (the 8th column). + +Unfortunately, you'll notice that the Marvel indices are clogging up the results, +and we don't really care about those indices right now. Let's pipe the output +through `grep` and remove anything mentioning marvel: + +[source,shell] +---- +% curl 'localhost:9200/_cat/indices?bytes=b' | sort -rnk8 | grep -v marvel + +yellow test_names 5 1 3476004 0 376324705 376324705 +yellow wavelet 5 1 5979 0 54815185 54815185 +yellow kibana-int 5 1 2 0 17791 17791 +yellow t 5 1 7 0 15280 15280 +yellow website 5 1 12 0 12631 12631 +yellow agg_analysis 5 1 5 0 5804 5804 +yellow v2 5 1 2 0 5410 5410 +yellow v1 5 1 2 0 5367 5367 +yellow bank 1 1 16 0 4303 4303 +yellow v 5 1 1 0 2954 2954 +yellow p 5 1 2 0 2939 2939 +yellow b0001_072320141238 5 1 1 0 2923 2923 +yellow ipaddr 5 1 1 0 2917 2917 +yellow v2a 5 1 1 0 2895 2895 +yellow movies 5 1 1 0 2738 2738 +yellow cars 5 1 0 0 1249 1249 +yellow wavelet2 5 1 0 0 615 615 +---- + +Voila! After piping through `grep` (with `-v` to invert the matches), we get +a sorted list of indices without marvel cluttering it up. + +This is just a simple example of the flexibility of Cat at the command line. +Once you get used to using Cat, you'll see it like any other *nix tool and start +going crazy with piping, sorting, grepping. If you are a system admin and spend +any length of time ssh'd into boxes...definitely spend some time getting familiar +with the Cat API. + + + + diff --git a/510_Deployment.asciidoc b/510_Deployment.asciidoc index 06f098578..3cecc0808 100644 --- a/510_Deployment.asciidoc +++ b/510_Deployment.asciidoc @@ -1,27 +1,17 @@ [[deploy]] -== Deploying Elasticsearch (TODO) +== Production Deployment -This chapter discusses the issues to be aware of when it comes time to deploy -Elasticsearch into production. +include::510_Deployment/10_intro.asciidoc[] -=== Node communication -. +include::510_Deployment/20_hardware.asciidoc[] +include::510_Deployment/30_other.asciidoc[] -=== How many shards do I need? -. +include::510_Deployment/40_config.asciidoc[] +include::510_Deployment/45_dont_touch.asciidoc[] -=== Memory requirements -. - - -=== Reindexing data -. - - -=== Backing up your data -. - +include::510_Deployment/50_heap.asciidoc[] +include::510_Deployment/60_file_descriptors.asciidoc[] diff --git a/510_Deployment/10_intro.asciidoc b/510_Deployment/10_intro.asciidoc new file mode 100644 index 000000000..cf0f6cb95 --- /dev/null +++ b/510_Deployment/10_intro.asciidoc @@ -0,0 +1,14 @@ +If you have made it this far in the book, hopefully you've learned a thing or +two about Elasticsearch and are ready to deploy your cluster to production. +This chapter is not meant to be an exhaustive guide to running your cluster +in production, but it will cover the key things to consider before putting +your cluster live. + +There are three main areas that will be covered: + +- Logistical considerations, such as hardware recommendations and deployment +strategies +- Configuration changes that are more suited to a production environment +- Post-deployment considerations, such as security, maximizing indexing performance +and backups + diff --git a/510_Deployment/20_hardware.asciidoc b/510_Deployment/20_hardware.asciidoc new file mode 100644 index 000000000..d94468500 --- /dev/null +++ b/510_Deployment/20_hardware.asciidoc @@ -0,0 +1,118 @@ +[[hardware]] +=== Hardware + +If you've been following the normal development path, you've probably been playing +with Elasticsearch on your laptop, or a small cluster of machines laying around. +But when it comes time to deploy Elasticsearch to production, there are a few +recommendations that you should consider. Nothing is a hard-and-fast rule; +Elasticsearch is used for a wide range of tasks and on bewildering array of +machines. But they provide good starting points based on our experience with +production clusters + +==== Memory + +If there is one resource that you will run out of first, it will likely be memory. +Sorting and aggregations can both be memory hungry, so enough heap space to +accommodate these are important. Even when the heap is comparatively small, +extra memory can be given to the OS file system cache. Because many data structures +used by Lucene are disk-based formats, Elasticsearch leverages the OS cache to +great effect. + +A machine with 64gb of RAM is the ideal sweet-spot, but 32gb and 16gb machines +are also very common. Less than 8gb tends to be counterproductive (you end up +needing many, many small machines) and greater than 64gb has problems which we will +discuss in <> + +==== CPUs + +Most Elasticsearch deployments tend to be rather light on CPU requirements. As +such, the exact processor setup matters less than the other resources. You should +choose a modern processor with multiple cores. Common clusters utilize 2-8 +core machines. + +If you need to choose between faster CPUs or more cores...choose more cores. The +extra concurrency that multiple cores offers will far outweigh a slightly faster +clock-speed. + +==== Disks + +Disks are important for all clusters, and doubly so for indexing-heavy clusters +(such as those that ingest log data). Disks are the slowest subsystem in a server, +which means that write-heavy clusters can easily saturate their disks which in +turn becomes the bottleneck of the cluster. + +If you can afford SSDs, they are by far superior to any spinning media. SSD-backed +nodes see boosts in both query and indexing performance. If you can afford it, +SSDs are the way to go. + +.Check your IO Scheduler +**** +If you are using SSDs, make sure your OS I/O Scheduler is configured correctly. +When you write data to disk, the I/O Scheduler decides when that data is +_actually_ sent to the disk. The default under most *nix distributions is a +scheduler called `cfq` (Completely Fair Queuing). + +This scheduler allocates "time slices" to each process, and then optimizes the +delivery of these various queues to the disk. It is optimized for spinning media: +the nature of rotating platters means it is more efficient to write data to disk +based on physical layout. + +This is very inefficient for SSD, however, since there are no spinning platters +involved. Instead, `deadline` or `noop` should be used instead. The deadline +scheduler optimizes based on how long writes have been pending, while noop +is just a simple FIFO queue. + +This simple change can have dramatic impacts. We've seen a 500x improvement +to write throughput just by using the correct scheduler. +**** + +If you use spinning media, try to obtain the fastest disks possible (high +performance server disks 15k RPM drives). + +Using RAID 0 is an effective way to increase disk speed, for both spinning disks +and SSD. There is no need to use mirroring or parity variants of RAID, since +high-availability is built into Elasticsearch via replicas. + +Finally, avoid network-attached storages (NAS). People routinely claim their +NAS solution is faster and more reliable than local drives. Despite these claims, +we have never seen NAS live up to their hype. NAS are often slower, display +larger latencies with a wider deviation in average latency, and are a single +point of failure. + +==== Network + +A fast and reliable network is obviously important to performance in a distributed +system. Low latency helps assure that nodes can communicate easily, while +high bandwidth helps shard movement and recovery. Modern datacenter networking +(1gigE, 10gigE) is sufficient for the vast majority of clusters. + +Avoid clusters that span multiple data-centers, even if the data-centers are +colocated in close proximity. Definitely avoid clusters that span large geographic +distances. + +Elasticsearch clusters assume that all nodes are equal...not that half the nodes +are actually 150ms distant in another datacenter. Larger latencies tend to +exacerbate problems in distributed systems and make debugging and resolution +more difficult. + +Similar to the NAS argument, everyone claims their pipe between data-centers is +robust and low latency. This is true...until it isn't (a network failure will +happen eventually, you can count on it). From our experience, the hassle of +managing cross-datacenter clusters is simply not worth the cost. + +==== General Considerations + +It is possible nowadays to obtain truly enormous machines. Hundreds of gigabytes +of RAM with dozens of CPU cores. Conversely, it is also possible to spin up +thousands of small virtual machines in cloud platforms such as EC2. Which +approach is best? + +In general, it is better to prefer "medium" to "large" boxes. Avoid small machines +because you don't want to manage a cluster with a thousand nodes, and the overhead +of simply running Elasticsearch is more apparent on such small boxes. + +At the same time, avoid the truly enormous machines. They often lead to imbalanced +resource usage (e.g. all the memory is being used, but none of the CPU) and can +add logistical complexity if you have to run multiple nodes per machine. + + diff --git a/510_Deployment/30_other.asciidoc b/510_Deployment/30_other.asciidoc new file mode 100644 index 000000000..1452c09bc --- /dev/null +++ b/510_Deployment/30_other.asciidoc @@ -0,0 +1,80 @@ + +=== Java Virtual Machine + +You should always run the most recent version of the Java Virtual Machine (JVM), +unless otherwise stated on the Elasticsearch website. Elasticsearch, and in +particular Lucene, is a very demanding piece of software. The unit and integration +tests from Lucene often expose bugs in the JVM itself. These bugs range from +mild annoyances to serious segfaults, so it is best to use the latest version +of the JVM where possible. + +Java 7 is strongly preferred over Java 6. Either Oracle or OpenJDK are acceptable +-- they are comparable in performance and stability. + +If your application is written in Java and you are using the TransportClient +or NodeClient, make sure the JVM running your application is identical to the +server JVM. There are a few locations in Elasticsearch where Java's native serialization +is used (IP addresses, exceptions, etc). Unfortunately, Oracle has been known +change the serialization format between minor releases, leading to strange errors. +This happens rarely, but it is best practice to keep the JVM versions identical +between client and server. + +.Please do not tweak JVM settings +**** +The JVM exposes dozens (hundreds even!) of settings, parameters and configurations. +They allow you to tweak and tune almost every aspect of the JVM. + +When a knob is encountered, it is human nature to want to turn it. We implore +you to squash this desire and _not_ use custom JVM settings. Elasticsearch is +a complex piece of software, and the current JVM settings have been tuned +over years of real-world usage. + +It is easy to start turning knobs, producing opaque effects that are hard to measure, +and eventually detune your cluster into a slow, unstable mess. When debugging +clusters, the first step is often to remove all custom configurations. About +half the time this alone restores stability and performance. +**** + +=== TransportClient vs NodeClient + +If you are using Java, you may wonder when to use the TransportClient vs the +NodeClient. As discussed at the beginning of the book, the TransportClient +acts as a communication layer between the cluster and your application. It knows +the API and can automatically round-robin between nodes, sniff the cluster for you, +etc. But it "external" to the cluster, similar to the REST clients. + +The NodeClient, on the other hand, is actually a node within the cluster (but +does not hold data, and cannot become master). Because it is a node, it knows +the entire cluster state -- where all the nodes reside, which shards live in which +nodes, etc. This means it can execute APIs with one less network-hop. + +There are uses-cases for both clients: + +- TransportClient is ideal if you want to decouple your application from the +cluster. For example, if your application quickly creates and destroys +connections to the cluster, a TransportClient is much "lighter" than a NodeClient, +since it is not part of a cluster. ++ +Similarly, if you need to create thousands of connections, you don't want to +have thousands of NodeClients join the cluster. The TC will be a better choice + +- On the flipside, if you only need a few long-lived, persistent connection +objects to the cluster, a NodeClient can be a bit more efficient since it knows +the cluster layout. But it ties your application into the cluster, so it may +pose problems from a firewall perspective, etc. + +=== Configuration Management + +If you use configuration management already (puppet, chef, ansible, etc) you can +just skip this tip. + +If you don't use configuration management tools yet...you should! Managing +a handful of servers by parallel-ssh may work now, but it will become a nightmare +as you grow your cluster. It is almost impossible to edit 30 configuration files +by hand without making a mistake. + +Configuration management tools help make your cluster consistent by automating +the process of config changes. It may take a little time to setup and learn, +but it will pay itself off handsomely over time. + + diff --git a/510_Deployment/40_config.asciidoc b/510_Deployment/40_config.asciidoc new file mode 100644 index 000000000..6bd240dd5 --- /dev/null +++ b/510_Deployment/40_config.asciidoc @@ -0,0 +1,254 @@ +=== Important Configuration Changes +Elasticsearch ships with _very good_ defaults, especially when it comes to performance- +related settings and options. When in doubt, just leave +the settings alone. We have witnessed countless dozens of clusters ruined +by errant settings because the administrator thought they could turn a knob +and gain 100x improvement. + +[IMPORTANT] +==== +Please read this entire section! All configurations presented are equally +important, and are not listed in any particular "importance" order. Please read +through all configuration options and apply them to your cluster. +==== + +Other databases may require tuning, but by-and-far, Elasticsearch does not. +If you are hitting performance problems, the solution is usually better data +layout or more nodes. There are very few "magic knobs" in Elasticsearch. +If there was...we'd have turned it already! + +With that said, there are some _logistical_ configurations that should be changed +for production. These changes are either to make your life easier, or because +there is no way to set a good default (e.g. it depends on your cluster layout). + + +==== Assign names + +Elasticseach by default starts a cluster named `elasticsearch`. It is wise +to rename your production cluster to something else, simply to prevent accidents +where someone's laptop joins the cluster. A simple change to `elasticsearch_production` +can save a lot of heartache. + +This can be changed in your `elasticsearch.yml` file: + +[source,yaml] +---- +cluster.name: elasticsearch_production +---- + +Similarly, it is wise to change the names of your nodes. You've probably +noticed by now, but Elasticsearch will assign a random Marvel Superhero name +to your nodes at startup. This is cute in development...less cute when it is +3am and you are trying to remember which physical machine was "Tagak the Leopard Lord". + +More importantly, since these names are generated on startup, each time you +restart your node it will get a new name. This can make logs very confusing, +since the names of all the nodes are constantly changing. + +Boring as it might be, we recommend you give each node a name that makes sense +to you - a plain, descriptive name. This is also configured in your `elasticsearch.yml`: + +[source,yaml] +---- +node.name: elasticsearch_005_data +---- + + +==== Paths + +By default, Elasticsearch will place the plugins, logs and -- +most importantly -- your data in the installation directory. This can lead to +unfortunate accidents, where the installation directory is accidentally overwritten +by a new installation of ES. If you aren't careful, you can erase all of your data. + +Don't laugh...we've seen it happen more than a few times. + +The best thing to do is relocate your data directory outside the installation +location. You can optionally move your plugin and log directories as well. + +This can be changed via: + +[source,yaml] +---- +path.data: /path/to/data1,/path/to/data2 <1> + +# Path to log files: +path.logs: /path/to/logs + +# Path to where plugins are installed: +path.plugins: /path/to/plugins +---- +<1> Notice that you can specify more than one directory for data using comma +separated lists. + +Data can be saved to multiple directories, and if each of these directories +are mounted on a different hard drive, this is a simple and effective way to +setup a "software RAID 0". Elasticsearch will automatically stripe +data between the different directories, boosting performance + +==== Minimum Master Nodes + +This setting, called `minimum_master_nodes` is _extremely_ important to the +stability of your cluster. This setting helps prevent "split brains", a situation +where two masters exist in a single cluster. + +When you have a split-brain, your cluster is at danger of losing data. Because +the master is considered the "supreme ruler" of the cluster, it decides +when new indices can be created, how shards are moved, etc. If you have _two_ +masters, data integrity becomes perilous, since you have two different nodes +that think they are in charge. + +This setting tells Elasticsearch to not elect a master unless there are enough +master-eligible nodes available. Only then will an election take place. + +This setting should always be configured to a quorum (majority) of your master- +eligible nodes. A quorum is `(number of master-eligible nodes / 2) + 1`. +Some examples: + +- If you have ten regular nodes (can hold data, can become master), a quorum is +`6` +- If you have three dedicated master nodes and 100 data nodes, the quorum is `2`, +since you only need to count nodes that are master-eligible +- If you have two regular nodes...you are in a conundrum. A quorum would be +`2`, but this means a loss of one node will make your cluster inoperable. A +setting of `1` will allow your cluster to function, but doesn't protect against +split brain. It is best to have a minimum of 3 nodes in situations like this. + +This setting can be configured in your `elasticsearch.yml` file: + +[source,yaml] +---- +discovery.zen.minimum_master_nodes: 2 +---- + +But because Elasticsearch clusters are dynamic, you could easily add or remove +nodes which will change the quorum. It would be extremely irritating if you had +to push new configurations to each node and restart your whole cluster just to +change the setting. + +For this reason, `minimum_master_nodes` (and other settings) can be configured +via a dynamic API call. You can change the setting while your cluster is online +using: + +[source,js] +---- +PUT /_cluster/settings +{ + "persistent" : { + "discovery.zen.minimum_master_nodes" : 2 + } +} +---- + +This will become a persistent setting that takes precedence over whatever is +in the static configuration. You should modify this setting whenever you add or +remove master-eligible nodes. + +==== Recovery settings + +There are several settings which affect the behavior of shard recovery when +your cluster restarts. First, we need to understand what happens if nothing is +configured. + +Imagine you have 10 nodes, and each node holds a single shard -- either a primary +or a replica -- in a 5 primary / 1 replica index. You take your +entire cluster offline for maintenance (installing new drives, etc). When you +restart your cluster, it just so happens that five nodes come online before +the other five. + +Maybe the switch to the other five is being flaky and they didn't +receive the restart command right away. Whatever the reason, you have five nodes +online. These five nodes will gossip with eachother, elect a master and form a +cluster. They notice that data is no longer evenly distributed since five +nodes are missing from the cluster, and immediately start replicating new +shards between each other. + +Finally, your other five nodes turn on and join the cluster. These nodes see +that _their_ data is being replicated to other nodes, so they delete their local +data (since it is now redundant, and may be out-dated). Then the cluster starts +to rebalance even more, since the cluster size just went from five to 10. + +During this whole process, your nodes are thrashing the disk and network moving +data around...for no good reason. For large clusters with terrabytes of data, +this useless shuffling of data can take a _really long time_. If all the nodes +had simply waited for the cluster to come online, all the data would have been +local and nothing would need to move. + +Now that we know the problem, we can configure a few settings to alleviate it. +First, we need give Elasticsearch a hard limit: + +[source,yaml] +---- +gateway.recover_after_nodes: 8 +---- + +This will prevent Elasticsearch from starting a recovery until at least 8 nodes +are present. The value for this setting is up to personal preference: how +many nodes do you want present before you consider your cluster functional? +In this case we are setting it to `8`, which means the cluster is inoperable +unless there are 8 nodes. + +Then we tell Elasticsearch how many nodes _should_ be in the cluster, and how +long we want to wait for all those nodes: + +[source,yaml] +---- +gateway.expected_nodes: 10 +gateway.recover_after_time: 5m +---- + +What this means is that Elasticsearch will: + +- Wait for 8 nodes to be present +- Begin recovering after five minutes, OR after 10 nodes have joined the cluster, +whichever comes first. + +These three settings allow you to avoid the excessive shard swapping that can +occur on cluster restarts. It can literally make recover take seconds instead +of hours. + + +==== Prefer Unicast over Multicast + +Elasticsearch is configured to use multicast discovery out of the box. Multicast +works by sending UDP pings across your local network to discover nodes. Other +Elasticsearch nodes will receive these pings and respond. A cluster is formed +shortly after. + +Multicast is excellent for development, since you don't need to do anything. Turn +a few nodes on and they automatically find each other and form a cluster. + +This ease of use is the exact reason you should disable it in production. The +last thing you want is for nodes to accidentally join your production network, simply +because they received an errant multicast ping. There is nothing wrong with +multicast _per-se_. Multicast simply leads to silly problems, and can be a bit +more fragile (e.g. a network engineer fiddles with the network without telling +you...and all of a sudden nodes can't find each other anymore). + +In production, it is recommended to use Unicast instead of Multicast. This works +by providing Elasticsearch a list of nodes that it should try to contact. Once +the node contacts a member of the unicast list, it will receive a full cluster +state which lists all nodes in the cluster. It will then proceed to contact +the master and join. + +This means your unicast list does not need to hold all the nodes in your cluster. +It just needs enough nodes that a new node can find someone to talk to. If you +use dedicated masters, just list your three dedicated masters and call it a day. +This setting is configured in your `elasticsearch.yml`: + +[source,yaml] +---- +discovery.zen.ping.multicast.enabled: false <1> +discovery.zen.ping.unicast.hosts: ["host1", "host2:port"] +---- +<1> Make sure you disable multicast, since it can operate in parallel with unicast + + + + + + + + + + diff --git a/510_Deployment/45_dont_touch.asciidoc b/510_Deployment/45_dont_touch.asciidoc new file mode 100644 index 000000000..50639a896 --- /dev/null +++ b/510_Deployment/45_dont_touch.asciidoc @@ -0,0 +1,87 @@ + +=== Don't touch these settings! + +There are a few hotspots in Elasticsearch that people just can't seem to avoid +tweaking. We understand: knobs just beg to be turned. + +But of all the knobs to turn, these you should _really_ leave alone. They are +often abused and will contribute to terrible stability or terrible performance. +Or both. + +==== Garbage Collector + +As briefly introduces in <>, the JVM uses a garbage +collector to free unused memory. This tip is really an extension of the last tip, +but deserves it's own section for emphasis: + +Do not change the default garbage collector! + +The default GC for Elasticsearch is Concurrent-Mark and Sweep (CMS). This GC +runs concurrently with the execution of the application so that it can minimize +pauses. It does, however, have two stop-the-world phases. It also has trouble +collecting large heaps. + +Despite these downsides, it is currently the best GC for low-latency server software +like Elasticsearch. The official recommendation is to use CMS. + +There is a newer GC called the Garbage First GC (G1GC). This newer GC is designed +to minimize pausing even more than CMS, and operate on large heaps. It works +by dividing the heap into regions and predicting which regions contain the most +reclaimable space. By collecting those regions first ("garbage first"), it can +minimize pauses and operate on very large heaps. + +Sounds great! Unfortunately, G1GC is still new and fresh bugs are found routinely. +These bugs are usually of the segfault variety, and will cause hard crashes. +The Lucene test suite is brutal on GC algorithms, and it seems that G1GC hasn't +had the kinks worked out yet. + +We would like to recommend G1GC someday, but for now, it is simply not stable +enough to meet the demands of Elasticsearch and Lucene. + +==== Threadpools + +Everyone _loves_ to tweak threadpools. For whatever reason, it seems people +cannot resist increasing thread counts. Indexing a lot? More threads! Searching +a lot? More threads! Node idling 95% of the time? More threads! + +The default threadpool settings in Elasticsearch are very sensible. For all +threadpools (except `search`) the threadcount is set to the number of CPU cores. +If you have 8 cores, you can only be running 8 threads simultaneously. It makes +sense to only assign 8 threads to any particular threadpool. + +Search gets a larger threadpool, and is configured to `# cores * 3`. + +You might argue that some threads can block (such as on a disk I/O operation), +which is why you need more threads. This is not actually not a problem in Elasticsearch: +much of the disk IO is handled by threads managed by Lucene, not Elasticsearch. + +Furthermore, threadpools cooperate by passing work between each other. You don't +need to worry about a networking thread blocking because it is waiting on a disk +write. The networking thread will have long since handed off that work unit to +another threadpool and gotten back to networking. + +Finally, the compute capacity of your process is finite. More threads just forces +the processor to switch thread contexts. A processor can only run one thread +at a time, so when it needs to switch to a different thread, it stores the current +state (registers, etc) and loads another thread. If you are lucky, the switch +will happen on the same core. If you are unlucky, the switch may migrate to a +different core and require transport on inter-core communication bus. + +This context switching eats up cycles simply doing administrative housekeeping +-- estimates can peg it as high as 30μs on modern CPUs. So unless the thread +will be blocked for longer than 30μs, it is highly likely that that time would +have been better spent just processing and finishing early. + +People routinely set threadpools to silly values. On 8 core machines, we have +run across configs with 60, 100 or even 1000 threads. These settings will simply +thrash the CPU more than getting real work done. + +So. Next time you want to tweak a threadpool...please don't. And if you +_absolutely cannot resist_, please keep your core count in mind and perhaps set +the count to double. More than that is just a waste. + + + + + + diff --git a/510_Deployment/50_heap.asciidoc b/510_Deployment/50_heap.asciidoc new file mode 100644 index 000000000..e411b5354 --- /dev/null +++ b/510_Deployment/50_heap.asciidoc @@ -0,0 +1,155 @@ +[[heap_sizing]] +=== Heap: Sizing and Swapping + +The default installation of Elasticsearch is configured with a 1gb heap. For +just about every deployment, this number is far too small. If you are using the +default heap values, your cluster is probably configured incorrectly. + +There are two ways to change the heap size in Elasticsearch. The easiest is to +set an environment variable called `ES_HEAP_SIZE`. When the server process +starts, it will read this environment variable and set the heap accordingly. +As an example, you can set it via the command line with: + +[source,bash] +---- +export ES_HEAP_SIZE=10g +---- + +Alternatively, you can pass in the heap size via a command-line argument when starting +the process, if that is easier for your setup: + +[source,bash] +---- +./bin/elasticsearch -Xmx=10g -Xms=10g <1> +---- +<1> Ensure that the min (`Xms`) and max (`Xmx`) sizes are the same to prevent +the heap from resizing at runtime, a very costly process + +Generally, setting the `ES_HEAP_SIZE` environment variable is preferred over setting +explicit `-Xmx` and `-Xms` values. + +==== Give half your memory to Lucene + +A common problem is configuring a heap that is _too_ large. You have a 64gb +machine...and by golly, you want to give Elasticsearch all 64gb of memory. More +is better! + +Heap is definitely important to Elasticsearch. It is used by many in-memory data +structures to provide fast operation. But with that said, there is another major +user of memory that is _off heap_: Lucene. + +Lucene is designed to leverage the underlying OS for caching in-memory data structures. +Lucene segments are stored in individual files. Because segments are immutable, +these files never change. This makes them very cache friendly, and the underlying +OS will happily keep "hot" segments resident in memory for faster access. + +Lucene's performance relies on this interaction with the OS. But if you give all +available memory to Elasticsearch's heap, there won't be any leftover for Lucene. +This can seriously impact the performance of full-text search. + +The standard recommendation is to give 50% of the available memory to Elasticsearch +heap, while leaving the other 50% free. It won't go unused...Lucene will happily +gobble up whatever is leftover. + +[[compressed_oops]] +==== Don't cross 32gb! +There is another reason to not allocate enormous heaps to Elasticsearch. As it turns +out, the JVM uses a trick to compress object pointers when heaps are less than +~32gb. + +In Java, all objects are allocated on the heap and referenced by a pointer. +Ordinary Object Pointers (oops) point at these objects, and are traditionally +the size of the CPU's native _word_: either 32 bits or 64 bits depending on the +processor. The pointer references the exact byte location of the value. + +For 32bit systems, this means the maximum heap size is 4gb. For 64bit systems, +the heap size can get much larger, but the overhead of 64bit pointers means there +is more "wasted" space simply because the pointer is larger. And worse than wasted +space, the larger pointers eat up more bandwidth when moving values between +main memory and various caches (LLC, L1, etc). + +Java uses a trick called "https://wikis.oracle.com/display/HotSpotInternals/CompressedOops[compressed oops]" +to get around this problem. Instead of pointing at exact byte locations in +memory, the pointers reference _object offsets_. This means a 32bit pointer can +reference 4 billion _objects_, rather than 4 billion bytes. Ultimately, this +means the heap can grow to around 32gb of physical size while still using a 32bit +pointer. + +Once you cross that magical ~30-32gb boundary, the pointers switch back to +ordinary object pointers. The size of each pointer grows, more CPU-memory +bandwidth is used and you effectively "lose" memory. Infact, it takes until around +40-50gb of allocated heap before you have the same "effective" memory of a 32gb +heap using compressed oops. + +The moral of the story is this: even when you have memory to spare, try to avoid +crossing the 32gb Heap boundary. It wastes memory, reduces CPU performance and +makes the GC struggle with large heaps. + +.I have a machine with 1TB RAM! +**** +The 32gb line is fairly important. So what do you do when your machine has a lot +of memory? It is becoming increasingly common to see super-servers with 300-500mb +of RAM. + +First, we would recommend to avoid such large machines (see <>). + +But if you already have the machines, you have to practical options: + +1. Are you doing most full-text search? Consider giving 32gb to Elasticsearch +and just let Lucene use the rest of memory via the OS file system cache. All that +memory will cache segments and lead to blisteringly fast full-text search + +2. Are you doing a lot of sorting/aggregations? You'll likely want that memory +in the heap then. Instead of one node with 32gb+ of RAM, consider running two or +more nodes on a single machine. Still adhere to the 50% rule though. So if your +machine has 124mb of RAM, run two nodes each with 32gb. This means 64gb will be +used for heaps, and 64 will be leftover for Lucene. ++ +If you choose this option, set `cluster.routing.allocation.same_shard.host: true` +in your config. This will prevent a primary and a replica shard from co-locating +to the same physical machine (since this would remove the benefits of replica HA) +**** + +==== Swapping is the death of performance + +It should be obvious, but it bears spelling out clearly: swapping main memory +to disk will _crush_ server performance. Think about it...an in-memory operation +is one that needs to execute quickly. + +If memory swaps to disk, a 100 microsecond operation becomes one that take 10 +milliseconds. Now repeat that increase in latency for all other 10us operations. +It isn't difficult to see why swapping is terrible for performance. + +The best thing to do is disable swap completely on your system. This can be done +temporarily by: + +[source,bash] +---- +sudo swapoff -a +---- + +To disable it permanently, you'll likely need to edit your `/etc/fstab`. Consult +the documentation for your OS. + +If disabling swap completely is not an option, you can try to lower swappiness. +This is a value that controls how aggressively the OS tries to swap memory. +This prevents swapping under normal circumstances, but still allows the OS to swap +under emergency memory situations. + +For most linux systems, this is configured using the sysctl value: + +[source,bash] +---- +vm.swappiness = 1 <1> +---- +<1> A swappiness of `1` is better than `0`, since on some kernel versions a swappiness +of `0` can invoke the OOM-killer. + +Finally, if neither approach is possible, you should enable `mlockall`. + file. This allows the JVM to lock it's memory and prevent +it from being swapped by the OS. In your `elasticsearch.yml`, set this: + +[source,yaml] +---- +bootstrap.mlockall: true +---- \ No newline at end of file diff --git a/510_Deployment/60_file_descriptors.asciidoc b/510_Deployment/60_file_descriptors.asciidoc new file mode 100644 index 000000000..f14a12829 --- /dev/null +++ b/510_Deployment/60_file_descriptors.asciidoc @@ -0,0 +1,61 @@ + +=== File Descriptors and MMap + +Lucene uses are _very_ large number of files. At the same time, Elasticsearch +uses a large number of sockets to communicate between nodes and HTTP clients. +All of this requires available file descriptors. + +Sadly, many modern linux distributions ship with a paltry 1024 file descriptors +allowed per process. This is _far_ too low for even a small Elasticsearch +node, let alone one that is handling hundreds of indices. + +You should increase your file descriptor count to something very large, such as +64,000. This process is irritatingly difficult and highly dependent on your +particular OS and distribution. Consult the documentation for your OS to determine +how best to change the allowed file descriptor count. + +Once you think you've changed it, check Elasticsearch to make sure it really does +have enough file descriptors. You can check with: + +[source,js] +---- +GET /_nodes/process + +{ + "cluster_name": "elasticsearch__zach", + "nodes": { + "TGn9iO2_QQKb0kavcLbnDw": { + "name": "Zach", + "transport_address": "inet[/192.168.1.131:9300]", + "host": "zacharys-air", + "ip": "192.168.1.131", + "version": "2.0.0-SNAPSHOT", + "build": "612f461", + "http_address": "inet[/192.168.1.131:9200]", + "process": { + "refresh_interval_in_millis": 1000, + "id": 19808, + "max_file_descriptors": 64000, <1> + "mlockall": true + } + } + } +} +---- +<1> The `max_file_descriptors` field will inform you how many available descriptors +the Elasticsearch process can access + +Elasticsearch also uses a mix of NioFS and MMapFS for the various files. Ensure +that the maximum map count so that there is ample virtual memory available for +mmapped files. This can be set temporarily with: + +[source,js] +---- +sysctl -w vm.max_map_count=262144 +---- + +Or permanently by modifying `vm.max_map_count` setting in your `/etc/sysctl.conf` + + + + diff --git a/510_Deployment/80_cluster_settings.asciidoc b/510_Deployment/80_cluster_settings.asciidoc new file mode 100644 index 000000000..72fcb2b64 --- /dev/null +++ b/510_Deployment/80_cluster_settings.asciidoc @@ -0,0 +1,2 @@ + +=== \ No newline at end of file diff --git a/520_Post_Deployment.asciidoc b/520_Post_Deployment.asciidoc new file mode 100644 index 000000000..c071407cd --- /dev/null +++ b/520_Post_Deployment.asciidoc @@ -0,0 +1,18 @@ +[[post_deploy]] +== Post-Deployment + +Once you have deployed your cluster in production, there are some tools and +best practices to keep your cluster running in top shape. In this short +section, we'll talk about configuring settings dynamically, how to tweak +logging levels, indexing performance tips and how to backup your cluster. + +include::520_Post_Deployment/10_dynamic_settings.asciidoc[] + +include::520_Post_Deployment/20_logging.asciidoc[] + +include::520_Post_Deployment/30_indexing_perf.asciidoc[] + +include::520_Post_Deployment/40_rolling_restart.asciidoc[] + +include::520_Post_Deployment/50_backup.asciidoc[] + diff --git a/520_Post_Deployment/10_dynamic_settings.asciidoc b/520_Post_Deployment/10_dynamic_settings.asciidoc new file mode 100644 index 000000000..74d48e87c --- /dev/null +++ b/520_Post_Deployment/10_dynamic_settings.asciidoc @@ -0,0 +1,37 @@ + +=== Changing settings dynamically + +Many settings in Elasticsearch are dynamic, and modifiable through the API. +Configuration changes that force a node (or cluster) restart are strenuously avoided. +And while it's possible to make the changes through the static configs, we +recommend that you use the API instead. + +The _Cluster Update_ API operates in two modes: + +- Transient: these changes are in effect until the cluster restarts. Once +a full cluster restart takes place, these settings are erased + +- Persistent: these changes are permanently in place unless explicitly changed. +They will survive full cluster restarts and override the static configuration files. + +Transient vs Persistent settings are supplied in the JSON body: + +[source,js] +---- +PUT /_cluster/settings +{ + "persistent" : { + "discovery.zen.minimum_master_nodes" : 2 <1> + }, + "transient" : { + "indices.store.throttle.max_bytes_per_sec" : "50mb" <2> + } +} +---- +<1> This persistent setting will survive full cluster restarts +<2> While this transient setting will be removed after the first full cluster +restart + +A complete list of settings that are dynamically updateable can be found in the +http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-update-settings.html[online reference docs]. + diff --git a/520_Post_Deployment/20_logging.asciidoc b/520_Post_Deployment/20_logging.asciidoc new file mode 100644 index 000000000..ba836e8a0 --- /dev/null +++ b/520_Post_Deployment/20_logging.asciidoc @@ -0,0 +1,80 @@ + +=== Logging + +Elasticsearch emits a number of logs, which by are placed in `ES_HOME/logs`. +The default logging level is INFO. It provides a moderate amount of information, +but is designed to be rather light so that your logs are not enormous. + +When debugging problems, particularly problems with node discovery (since this +often depends on finicky network configurations), it can be helpful to bump +up the logging level to DEBUG. + +You _could_ modify the `logging.yml` file and restart your nodes...but that is +both tedious and leads to unnecessary downtime. Instead, you can update logging +levels through the Cluster Settings API that we just learned about. + +To do so, take the logger you are interested in and prepend `logger.` to it. +Let's turn up the discovery logging: + +[source,js] +---- +PUT /_cluster/settings +{ + "transient" : { + "logger.discovery" : "DEBUG" + } +} +---- + +While this setting is in effect, Elasticsearch will begin to emit DEBUG-level +logs for the `discovery` module. + +INFORMATION: Avoid TRACE, it is extremely verbose, to the point where the logs +are no longer useful. + +==== Slowlog + +There is another log called the _Slowlog_. The purpose of this log is to catch +queries and indexing requests that take over a certain threshold of time. +It is useful for hunting down user-generated queries that are particularly slow. + +By default, the slowlog is not enabled. It can be enabled by defining the action +(query, fetch or index), the level that you want the event logged at (WARN, DEBUG, +etc) and a time threshold. + +This is an index-level setting, which means it is applied to individual indices: + +[source,js] +---- +PUT /my_index/_settings +{ + "index.search.slowlog.threshold.query.warn" : "10s", <1> + "index.search.slowlog.threshold.fetch.debug": 500ms", <2> + "index.indexing.slowlog.threshold.index.info": 5s" <3> +} +---- +<1> Emit a WARN log when queries are slower than 10s +<2> Emit a DEBUG log when fetches are slower than 500ms +<3> Emit an INFO log when indexing takes longer than 5s + +You can also define these thresholds in your `elasticsearch.yml` file. Indices +that do not have a threshold set will inherit whatever is configured in the +static config. + +Once the thresholds are set, you can toggle the logging level like any other +logger: + +[source,js] +---- +PUT /_cluster/settings +{ + "transient" : { + "logger.index.search.slowlog" : "DEBUG", <1> + "logger.index.indexing.slowlog" : WARN <2> + } +} +---- +<1> Set the search slowlog to DEBUG level +<2> Set the indexing slowlog to WARN level + + diff --git a/520_Post_Deployment/30_indexing_perf.asciidoc b/520_Post_Deployment/30_indexing_perf.asciidoc new file mode 100644 index 000000000..4880b5e78 --- /dev/null +++ b/520_Post_Deployment/30_indexing_perf.asciidoc @@ -0,0 +1,188 @@ + +=== Indexing Performance Tips + +If you are in an indexing-heavy environment, such as indexing infrastructure +logs, you may be willing to sacrifice some search performance for faster indexing +rates. In these scenarios, searches tend to be relatively rare and performed +by people internal to your organization. They are willing to wait several +seconds for a search, as opposed to a consumer facing search which must +return in milliseconds. + +Because of this unique position, there are certain tradeoffs that can be made +which will increase your indexing performance. + +.These tips only apply to Elasticsearch 1.3+ +**** +This book is written for the most recent versions of Elasticsearch, although much +of the content works on older versions. + +The tips presented in this section, however, are _explicitly_ version 1.3+. There +have been multiple performance improvements and bugs fixed which directly impact +indexing. In fact, some of these recommendations will _reduce_ performance on +older versions due to the presence of bugs or performance defects. +**** + +==== Test performance scientifically + +Performance testing is always difficult, so try to be as scientific as possible +in your approach. Randomly fiddling with knobs and turning on ingestion is not +a good way to tune performance. If there are too many "causes" then it is impossible +to determine which one had the best "effect". + +1. Test performance on a single node, with a single shard and no replicas +2. Record performance under 100% default settings so that you have a baseline to +measure against +3. Make sure performance tests run for a long time (30+ minutes) so that you can +evaluate long-term performance, not short-term spikes or latencies. Some events +(such as segment merging, GCs, etc) won't happen right away, so the performance +profile can change over time. +4. Begin making single changes to the baseline defaults. Test these rigorously, +and if performance improvement is acceptable, keep the setting and move on to the +next one. + +==== Using and Sizing Bulk requests + +Should be fairly obvious, but use bulk indexing requests for optimal performance. +Bulk sizing is dependent on your data, analysis and cluster configuration, but +a good starting point is 5-15mb per bulk. Note that this is physical size. +Document count is not a good metric for bulk size. For example, if you are +indexing 1000 documents per bulk: + +- 1000 documents at 1kb each is 1mb +- 1000 documents at 100kb each is 100mb + +Those are drastically different bulk sizes. Bulks need to be loaded into memory +at the coordinating node, so it is the physical size of the bulk that is more +important than the document count. + +Start with a bulk size around 5-15mb and slowly increase it until you do not +see performance gains anymore. Then start increasing the concurrency of your +bulk ingestion (multiple threads, etc). + +Monitor your nodes with Marvel and/or tools like `iostat`, `top, and `ps` to see +when resources start to bottleneck. If you start to receive `EsRejectedExecutionException` +then your cluster is at-capacity with _some_ resource and you need to reduce +concurrency. + +When ingesting data, make sure bulk requests are round-robin'ed across all your +data nodes. Do not send all requests to a single node, since that single node +will need to store all the bulks in memory while processing. + +==== Storage + +- Use SSDs. As mentioned elsewhere, they are superior to spinning media +- Use RAID 0. Striped RAID will increase disk IO, at the obvious expense of +potential failure if a drive dies. Don't use "mirrored" or "parity" RAIDS since +replicas provide that functionality +- Alternatively, use multiple drives and allow Elasticsearch to stripe data across +them via multiple `path.data` directories +- Do not use remote-mounted storage, such as NFS or SMB/CIFS. The latency introduced +here is antithetical with performance. +- If you are on EC2, beware EBS. Even the SSD-backed EBS options are often slower +than local drives. + +==== Segments and Merging + +Segment merging is computationally expensive, and can eat up a lot of Disk IO. +Merges are scheduled to operate in the background because they can take a long +time to finish, especially large segments. This is normally fine, because the +rate of large segment merges is relatively rare. + +But sometimes merging falls behind the ingestion rate. If this happens, Elasticsearch +will automatically throttle indexing requests to a single thread. This prevents +a "segment explosion" problem where hundreds of segments are generated before +they can be merged. Elasticsearch will log INFO level messages stating `now +throttling indexing` when it detects merging falling behind indexing. + +Elasticsearch defaults here are conservative: you don't want search performance +to be impacted by background merging. But sometimes (especially on SSD, or logging +scenarios) the throttle limit is too low. + +The default is 20mb/s, which is a good setting for spinning disks. If you have +SSDs, you might consider increasing this to 100-200mb/s. Test to see what works +for your system: + +[source,js] +---- +PUT /_cluster/settings +{ + "persistent" : { + "indices.store.throttle.max_bytes_per_sec" : "100mb" + } +} +---- + +If you are doing a bulk import and don't care about search at all, you can disable +merge throttling entirely. This will allow indexing to run as fast as your +disks will allow: + +[source,js] +---- +PUT /_cluster/settings +{ + "transient" : { + "indices.store.throttle.type" : "none" <1> + } +} +---- +<1> Setting the throttle type to `none` disables merge throttling entirely. When +you are done importing, set it back to `merge` to re-enable throttling. + +If you are using spinning media instead of SSD, you need to add this to your +`elasticsearch.yml`: + +[source,yaml] +---- +index.merge.scheduler.max_thread_count: 1 +---- + +Spinning media has a harder time with concurrent IO, so we need to decrease +the number of threads that can concurrently access the disk per index. This setting +will allow `max_thread_count + 2` threads to operate on the disk at one time, +so a setting of `1` will allow 3 threads. + +For SSDs, you can ignore this setting. The default is +`Math.min(3, Runtime.getRuntime().availableProcessors() / 2)` which works well +for SSD. + +Finally, you can increase `index.translog.flush_threshold_size` from the default +200mb to something larger, such as 1gb. This allows larger segments to accumulate +in the translog before a flush occurs. By letting larger segments build, you +flush less often, and the larger segments merge less often. All of this adds up +to less disk IO overhead and better indexing rates. + +==== Other + +Finally, there are some other considerations to keep in mind: + +- If you don't need near-realtime accuracy on your search results, consider +dropping the `index.refresh_interval` of each index to `30s`. If you are doing +a large import, you can disable refreshes by setting this value to `-1` for the +duration of the import. Don't forget to re-enable it when you are done! + +- If you are doing a large bulk import, consider disabling replicas by setting +`index.number_of_replicas: 0`. When documents are replicated, the entire document +is sent to the replica node and the indexing process is repeated verbatim. This +means each replica will perform the analysis, indexing and potentially merging +process. ++ +In contrast, if you index with zero replicas and then enable replicas when ingestion +is finished, the recovery process is essentially a byte-for-byte network transfer. +This is much more efficient than duplicating the indexing process. + +- If you don't have a natural ID for each document, use Elasticsearch's auto-ID +functionality. It is optimized to avoid version lookups, since the autogenerated +ID is unique. + +- If you are using your own ID, try to pick an ID that is http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html[friendly to Lucene]. Examples include zero-padded +sequential IDs, UUID-1 and nanotime; these IDs have consistent, "sequential" +patterns which compress well. In contrast, IDs such as UUID-4 are essentially +random, which offer poor compression and slow down Lucene. + + + + + + + + diff --git a/520_Post_Deployment/40_rolling_restart.asciidoc b/520_Post_Deployment/40_rolling_restart.asciidoc new file mode 100644 index 000000000..9427c51ab --- /dev/null +++ b/520_Post_Deployment/40_rolling_restart.asciidoc @@ -0,0 +1,66 @@ + +=== Rolling Restarts + +There will come a time when you need to perform a rolling restart of your +cluster -- keeping the cluster online and operational, but taking nodes offline +one at a time. + +The common reason is either an Elasticsearch version upgrade, or some kind of +maintenance on the server itself (OS update, hardware, etc). Whatever the case, +there is a particular method to perform a rolling restart. + +By nature, Elasticsearch wants your data to be fully replicated and evenly balanced. +If you were to shut down a single node for maintenance, the cluster will +immediately recognize a loss of a and begin re-balancing. This can be irritating +if you know the node maintenance is short term, since the rebalancing of +very large shards can take some time (think of trying to replicate 1TB...even +on fast networks this is non-trivial). + +What we want to do is tell Elasticsearch to hold off on rebalancing, because +we have more knowledge about the state of the cluster due to external factors. +The procedure is as follows: + +1. If possible, stop indexing new data. This is not always possible, but will +help speed up recovery time + +2. Disable shard allocation. This prevents Elasticsearch from re-balancing +missing shards until you tell it otherwise. If you know the maintenance will be +short, this is a good idea. You can disable allocation with: ++ +[source,js] +---- +PUT /_cluster/settings +{ + "transient" : { + "cluster.routing.allocation.enable" : "none" + } +} +---- + +3. Shutdown a single node, preferably using the _Shutdown API_ on that particular +machine: ++ +[source,js] +---- +PUT /_cluster/nodes/_local/_shutdown +---- + +4. Perform maintenance/upgrade +5. Restart node, confirm that it joins the cluster +6. Repeat 3-5 for the rest of your nodes +7. Re-enable shard allocation using: ++ +[source,js] +---- +PUT /_cluster/settings +{ + "transient" : { + "cluster.routing.allocation.enable" : "all" + } +} +---- ++ +Shard re-balancing may take some time. At this point you are safe to resume +indexing (if you had previously stopped), but waiting until the cluster is fully +balanced before resuming indexing will help speed up the process + diff --git a/520_Post_Deployment/50_backup.asciidoc b/520_Post_Deployment/50_backup.asciidoc new file mode 100644 index 000000000..56dd26568 --- /dev/null +++ b/520_Post_Deployment/50_backup.asciidoc @@ -0,0 +1,472 @@ + +=== Backing up your Cluster + +Like any software that stores data, it is important to routinely backup your +data. Elasticsearch replicas provide high-availability during runtime; they allow +you to tolerate sporadic node loss without an interruption of service. + +Replicas do not provide protection from catastrophic failure, however. For that, +you need a real backup of your cluster...a complete copy in case something goes +wrong. + +To backup your cluster, you can use the _Snapshot_ API. This will take the current +state and data in your cluster and save it to a shared repository. This +backup process is "smart". Your first snapshot will be a complete copy of data, +but all subsequent snapshots will save the _delta_ between the existing +snapshots and the new data. Data is incrementally added and deleted as you snapshot +data over time. This means subsequent backups will be substantially +faster since they are transmitting far less data. + +To use this functionality, you must first create a repository to save data. +There are several repository types that you may choose from: + +- Shared filesystem, such as a NAS +- Amazon S3 +- HDFS +- Azure Cloud + +==== Creating the repository + +Let's set up a shared filesystem repository: + +[source,js] +---- +PUT _snapshot/my_backup <1> +{ + "type": "fs", <2> + "settings": { + "location": "/mount/backups/my_backup" <3> + } +} +---- +<1> We provide a name for our repository, in this case it is called `my_backup` +<2> We specify that the type of the repository should be a shared file system +<3> And finally, we provide a mounted drive as the destination + +INFORMATION: The shared file system path must be accessible from all nodes in your +cluster! + +This will create the repository and required metadata at the mount point. There +are also some other options which you may want to configure, depending on the +performance profile of your nodes, network and repository location: + +- `max_snapshot_bytes_per_sec`: When snapshotting data into the repo, this controls +the throttling of that process. The default is `20mb` per second +- `max_restore_bytes_per_sec`: When restoring data from the repo, this controls +how much the restore is throttled so that your network is not saturated. The +default is `20mb` per second + +Let's assume we have a very fast network and are ok with extra traffic, so we +can increase the defaults: + +[source,js] +---- +POST _snapshot/my_backup/ <1> +{ + "type": "fs", + "settings": { + "location": "/mount/backups/my_backup", + "max_snapshot_bytes_per_sec" : "50mb", <2> + "max_restore_bytes_per_sec" : "50mb" + } +} +---- +<1> Note that we are using a POST instead of PUT. This will update the settings +of the existing repository +<2> Then add our new settings + +==== Snapshotting all open indices + +A repository can contain multiple snapshots. Each snapshot is associated with a +certain set of indices (all indices, some subset, a single index, etc). When +creating a snapshot, you specify which indices you are interested in and +give the snapshot a unique name. + +Let's start with the most basic snapshot command: + +[source,js] +---- +PUT _snapshot/my_backup/snapshot_1 +---- + +This will backup all open indices into a snapshot named `snapshot_1`, under the +`my_backup` repository. This call will return immediately and the snapshot will +proceed in the background. + +.Blocking for completion +**** +Usually you'll want your snapshots to proceed as a background process, but occasionally +you may want to wait for completion in your script. This can be accomplished by +adding a `wait_for_completion` flag: + +[source,js] +---- +PUT _snapshot/my_backup/snapshot_1?wait_for_completion=true +---- + +This will block the call until the snapshot has completed. Note: large snapshots +may take a long time to return! +**** + +==== Snapshotting particular indices + +The default behavior is to back up all open indices. But say you are using marvel, +and don't really want to backup all the diagnostic `.marvel` indices. You +just don't have enough space to backup everything. + +In that case, you can specify which indices to backup when snapshotting your cluster: + +[source,js] +---- +PUT _snapshot/my_backup/snapshot_2 +{ + "indices": "index_1,index_2" <1> +} +---- + +This snapshot command will now backup only `index1` and `index2`. + +==== Listing information about snapshots + +Once you start accumulating snapshots in your repository, you may forget the details +relating to each...particularly when the snapshots are named based on time +demarcations (`backup_2014_10_28`, etc). + +To obtain information about a single snapshot, simply issue a GET reguest against +the repo and snapshot name: + +[source,js] +---- +GET _snapshot/my_backup/snapshot_2 +---- + +This will return a small response with various pieces of information regarding +the snapshot: + +[source,js] +---- +{ + "snapshots": [ + { + "snapshot": "snapshot_1", + "indices": [ + ".marvel_2014_28_10", + "index1", + "index2" + ], + "state": "SUCCESS", + "start_time": "2014-09-02T13:01:43.115Z", + "start_time_in_millis": 1409662903115, + "end_time": "2014-09-02T13:01:43.439Z", + "end_time_in_millis": 1409662903439, + "duration_in_millis": 324, + "failures": [], + "shards": { + "total": 10, + "failed": 0, + "successful": 10 + } + } + ] +} +---- + +For a complete listing of all snapshots in a repository, use the `_all` placeholder +instead of a snapshot name: + +[source,js] +---- +GET _snapshot/my_backup/_all +---- + +==== Deleting Snapshots + +Finally, we need a command to delete old snapshots that are no longer useful. +This is simply a DELETE HTTP call to the repo/snapshot name: + +[source,js] +---- +DELETE _snapshot/my_backup/snapshot_2 +---- + +It is important to use the API to delete snapshots, and not some other mechanism +(deleting by hand, automated cleanup tools on S3, etc). Because snapshots are +incremental, it is possible that many snapshots are relying on "old" data. +The Delete API understands what data is still in use by more recent snapshots, +and will only delete unused segments. + +If you do a manual file delete, however, you are at risk of seriously corrupting +your backups because you are deleting data that is still in use. + +=== Restoring + +Once you've backed up some data, restoring it is very easy: simply add `_restore` +to the ID of the snapshot you wish to restore into your cluster: + +[source,js] +---- +POST _snapshot/my_backup/snapshot_1/_restore +---- + +The default behavior is to restore all indices that exist in that snapshot. +If `snapshot_1` contains five different indices, all five will be restored into +our cluster. Like the Snapshot API, it is possible to select which indices +we want to restore. + +There are also additional options for renaming indices. This allows you to +match index names with a pattern, then provide a new name during the restore process. +This is useful if you want to restore old data to verify it's contents, or perform +some other processing, without replacing existing data. Let's restore +a single index from the snapshot and provide a replacement name: + +[source,js] +---- +POST /_snapshot/my_backup/snapshot_1/_restore +{ + "indices": "index_1", <1> + "rename_pattern": "index_(.+)", <2> + "rename_replacement": "restored_index_$1" <3> +} +---- +<1> Only restore the `index_1` index, ignoring the rest that are present in the +snapshot +<2> Find any indices being restored which match the provided pattern... +<3> ... then rename them with the replacement pattern + +This will restore `index_1` into your cluster, but rename it to `restored_index_1`. + +.Blocking for completion +**** +Similar to snapshotting, the restore command will return immediately and the +restoration process will happen in the background. If you would prefer your HTTP +call to block until the restore is finished, simply add the `wait_for_completion` +flag: + +[source,js] +---- +POST _snapshot/my_backup/snapshot_1/_restore?wait_for_completion=true +---- +**** + +==== Monitoring Snapshot progress + +The `wait_for_completion` flag provides a rudimentary form of monitoring, but +really isn't sufficient when snapshotting or restoring even moderately sized clusters. + +There are two additional APIs that will give you more detailed status about the +state of the snapshotting. First you can execute a GET to the snapshot ID, +just like we did earlier get information about a particular snapshot: + +[source,js] +---- +GET _snapshot/my_backup/snapshot_3 +---- + +If the snapshot is still in-progress when you call this, you'll see information +about when it was started, how long it has been running, etc. Note, however, +that this API uses the same threadpool as the snapshot mechanism. If you are +snapshotting very large shards, the time between status updates can be quite large, +since the API is competing for the same threadpool resources. + +A better option is to poll the `_status` API: + +[source,js] +---- +GET _snapshot/my_backup/snapshot_3/_status +---- + +The Status API returns immediately and gives a much more verbose output of +statistics: + +[source,js] +---- +{ + "snapshots": [ + { + "snapshot": "snapshot_3", + "repository": "my_backup", + "state": "IN_PROGRESS", <1> + "shards_stats": { + "initializing": 0, + "started": 1, <2> + "finalizing": 0, + "done": 4, + "failed": 0, + "total": 5 + }, + "stats": { + "number_of_files": 5, + "processed_files": 5, + "total_size_in_bytes": 1792, + "processed_size_in_bytes": 1792, + "start_time_in_millis": 1409663054859, + "time_in_millis": 64 + }, + "indices": { + "index_3": { + "shards_stats": { + "initializing": 0, + "started": 0, + "finalizing": 0, + "done": 5, + "failed": 0, + "total": 5 + }, + "stats": { + "number_of_files": 5, + "processed_files": 5, + "total_size_in_bytes": 1792, + "processed_size_in_bytes": 1792, + "start_time_in_millis": 1409663054859, + "time_in_millis": 64 + }, + "shards": { + "0": { + "stage": "DONE", + "stats": { + "number_of_files": 1, + "processed_files": 1, + "total_size_in_bytes": 514, + "processed_size_in_bytes": 514, + "start_time_in_millis": 1409663054862, + "time_in_millis": 22 + } + }, + ... +---- +<1> A snapshot that is currently running will show `IN_PROGRESS` as it's status +<2> This particular snapshot has one shard still transferring (the other 4 have already completed) + +The Stats displays the total status of the snapshot, but also drills down into +per-index and per-shard statistics. This gives you an incredibly detailed view +of how the snapshot is progressing. Shards can be in various states of completion: + +- `INITIALIZING`: The shard is checking with the cluster state to see if it can +be snapshotted. This is usually very fast +- `STARTED`: Data is being transferred to the repository +- `FINALIZING`: Data transfer is complete, the shard is now sending snapshot metadata +- `DONE`: Snapshot complete! +- `FAILED`: An error was encountered during the snapshot process, and this shard/index/snapshot +could not be completed. Check your logs for more information + +==== Monitoring Restore operations + +The restoration of data from a repository piggybacks on the existing recovery +mechanisms already in place in Elasticsearch. Internally, recovering shards +from a repository is identical to recovering from another node. + +If you wish to monitor the progress of a restore, you can use the Recovery +API. This is a general purpose API which shows status of shards moving around +your cluster. + +The API can be invoked for the specific indices that you are recovering: + +[source,js] +---- +GET /_recovery/restored_index_3 +---- + +Or for all indices in your cluster, which may include other shards moving around, +unrelated to your restore process: + +[source,js] +---- +GET /_recovery/ +---- + +The output will look similar to this (and note, it can become very verbose +depending on the activity of your clsuter!): + +[source,js] +---- +{ + "restored_index_3" : { + "shards" : [ { + "id" : 0, + "type" : "snapshot", <1> + "stage" : "index", + "primary" : true, + "start_time" : "2014-02-24T12:15:59.716", + "stop_time" : 0, + "total_time_in_millis" : 175576, + "source" : { <2> + "repository" : "my_backup", + "snapshot" : "snapshot_3", + "index" : "restored_index_3" + }, + "target" : { + "id" : "ryqJ5lO5S4-lSFbGntkEkg", + "hostname" : "my.fqdn", + "ip" : "10.0.1.7", + "name" : "my_es_node" + }, + "index" : { + "files" : { + "total" : 73, + "reused" : 0, + "recovered" : 69, + "percent" : "94.5%" <3> + }, + "bytes" : { + "total" : 79063092, + "reused" : 0, + "recovered" : 68891939, + "percent" : "87.1%" + }, + "total_time_in_millis" : 0 + }, + "translog" : { + "recovered" : 0, + "total_time_in_millis" : 0 + }, + "start" : { + "check_index_time" : 0, + "total_time_in_millis" : 0 + } + } ] + } +} +---- +<1> The `type` field will tell you the nature of the recovery -- this shard is being +recovered from a snapshot +<2> The `source` hash will describe the particular snapshot and repository that is +being recovered from +<3> And the `percent` field will give you an idea about the status of the recovery. +This particular shard has recovered 94% of the files so far...it is almost complete + +The output will list all indices currently undergoing a recovery, and then a +list of all shards in each of those indices. Each of these shards will have stats +about start/stop time, duration, recover percentage, bytes transferred, etc. + +==== Canceling Snapshot and Restores + +Finally, you may want to cancel a snapshot or restore. Since these are long running +processes, a typo or mistake when executing the operation could take a long time to +resolve...and use up valuable resources at the same time. + +To cancel a snapshot, simply delete the snapshot while it is in-progress: + +[source,js] +---- +DELETE _snapshot/my_backup/snapshot_3 +---- + +This will halt the snapshot process, then proceed to delete the half-completed +snapshot from the repository. + +To cancel a restore, you need to delete the indices being restored. Because +a restore process is really just shard recovery, issuing a Delete Index API +alters the cluster state, which will in turn halt recovery. For example: + +[source,js] +---- +DELETE /restored_index_3 +---- + +If `restored_index_3` was actively being restored, this delete command would +halt the restoration as well as deleting any data that had already been restored +into the cluster. + + + + diff --git a/book.asciidoc b/book.asciidoc index a04145ae8..911427cd8 100644 --- a/book.asciidoc +++ b/book.asciidoc @@ -105,14 +105,14 @@ include::410_Scaling.asciidoc[] // Part 7 -[[administration]] - -= Administration and internals (TODO) +include::07_Admin.asciidoc[] include::500_Cluster_Admin.asciidoc[] include::510_Deployment.asciidoc[] +include::520_Post_Deployment.asciidoc[] + [[TODO]] [appendix] = TODO diff --git a/images/300_30_bar1.svg b/images/300_30_bar1.svg new file mode 100644 index 000000000..c4424ef47 --- /dev/null +++ b/images/300_30_bar1.svg @@ -0,0 +1,344 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/images/300_30_histo1.svg b/images/300_30_histo1.svg new file mode 100644 index 000000000..f199be85d --- /dev/null +++ b/images/300_30_histo1.svg @@ -0,0 +1,375 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/images/300_35_ts1.svg b/images/300_35_ts1.svg new file mode 100644 index 000000000..8a496f67f --- /dev/null +++ b/images/300_35_ts1.svg @@ -0,0 +1,269 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/images/300_35_ts2.svg b/images/300_35_ts2.svg new file mode 100644 index 000000000..5515a014a --- /dev/null +++ b/images/300_35_ts2.svg @@ -0,0 +1,470 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/images/300_65_percentile1.png b/images/300_65_percentile1.png deleted file mode 100644 index b1c28830c..000000000 Binary files a/images/300_65_percentile1.png and /dev/null differ diff --git a/images/300_65_percentile1.svg b/images/300_65_percentile1.svg new file mode 100644 index 000000000..d8e750b37 --- /dev/null +++ b/images/300_65_percentile1.svg @@ -0,0 +1,376 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/images/300_65_percentile2.png b/images/300_65_percentile2.png deleted file mode 100644 index b546de2e7..000000000 Binary files a/images/300_65_percentile2.png and /dev/null differ diff --git a/images/300_65_percentile2.svg b/images/300_65_percentile2.svg new file mode 100644 index 000000000..4d4b0849e --- /dev/null +++ b/images/300_65_percentile2.svg @@ -0,0 +1,402 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +