Skip to content

Commit

Permalink
chapter22_part22:/300_Aggregations/125_Conclusion.asciidoc (elasticse…
Browse files Browse the repository at this point in the history
…arch-cn#296)

* chapter22_part22:/300_Aggregations/125_Conclusion.asciidoc

初译

* chapter22_part22:/300_Aggregations/125_Conclusion.asciidoc

修改

* chapter22_part22:/300_Aggregations/125_Conclusion.asciidoc

修改

* chapter22_part22:/300_Aggregations/125_Conclusion.asciidoc

按照 review 建议修改。@medcl

* chapter22_part22:/300_Aggregations/125_Conclusion.asciidoc

第一行:[[_closing_thoughts]]

* chapter22_part22:/300_Aggregations/125_Conclusion.asciidoc

按 review 意见修改

* chapter22_part22:/300_Aggregations/125_Conclusion.asciidoc

移除多余前缀

* chapter22_part22:/300_Aggregations/125_Conclusion.asciidoc

按照 review 意见修改,感谢 @qindongliang

* chapter22_part22:/300_Aggregations/125_Conclusion.asciidoc

按照 review 意见修改,感谢 @smilesfc
  • Loading branch information
pengqiuyuan authored and medcl committed Nov 15, 2016
1 parent dfa78a1 commit e2325a1
Showing 1 changed file with 22 additions and 39 deletions.
61 changes: 22 additions & 39 deletions 300_Aggregations/125_Conclusion.asciidoc
Original file line number Diff line number Diff line change
@@ -1,40 +1,23 @@
[[_closing_thoughts]]
== 总结

== Closing Thoughts

This section covered a lot of ground, and a lot of deeply technical issues.
Aggregations bring a power and flexibility to Elasticsearch that is hard to
overstate. The ability to nest buckets and metrics, to quickly approximate
cardinality and percentiles, to find statistical anomalies in your data, all
while operating on near-real-time data and in parallel to full-text search--these are game-changers to many organizations.

It is a feature that, once you start using it, you'll find dozens
of other candidate uses. Real-time reporting and analytics is central to many
organizations (be it over business intelligence or server logs).

Elasticsearch has made great strides in becoming more memory friendly by defaulting
to doc values for _most_ fields, but the necessity of fielddata for string fields
means you must remain vigilant.

The management of this memory can take several forms, depending on your
particular use-case:

- During the planning stage, attempt to organize your data so that aggregations are
run on `not_analyzed` strings instead of analyzed so that doc values may be leveraged.
- While testing, verify that analysis chains are not creating high cardinality
fields which are later aggregated on
- At search time, by utilizing approximate aggregations and data filtering
- At a node level, by setting hard memory and dynamic circuit-breaker limits
- At an operations level, by monitoring memory usage and controlling slow garbage-collection cycles,
potentially by adding more nodes to the cluster

Most deployments will use one or more of the preceding methods. The exact combination
is highly dependent on your particular environment.

Whatever the path you take, it is important to assess the available options and
create both a short- and long-term plan. Decide how your memory situation exists
today and what (if anything) needs to be done. Then decide what will happen in
six months or one year as your data grows. What methods will you use to continue
scaling?

It is better to plan out these life cycles of your cluster ahead of time, rather
than panicking at 3 a.m. because your cluster is at 90% heap utilization.
本小节涵盖了许多基本理论以及很多深入的技术问题。聚合给 Elasticsearch 带来了难以言喻的强大能力和灵活性。桶与度量的嵌套能力,基数与百分位数的快速估算能力,定位信息中统计异常的能力,
所有的这些都在近乎实时的情况下操作的,而且全文搜索是并行的,它们改变了很多企业的游戏规则。

聚合是这样一种功能特性:一旦我们开始使用它,我们就能找到很多其他的可用场景。实时报表与分析对于很多组织来说都是核心功能(无论是应用于商业智能还是服务器日志)。

Elasticsearch 默认给 _所有_ 字段都会激活 doc values,所以在一些搜索场景大大的节省了内存使用量,但是需要注意的是只有不分词的 string 类型的字段才能使用这种特性。

内存的管理形式可以有多种形式,这取决于我们特定的应用场景:

- 在规划时,组织好数据,使聚合运行在 `not_analyzed` 字符串而不是 `analyzed` 字符串,这样可以有效的利用 doc values 。
- 在测试时,验证分析链不会在之后的聚合计算中创建高基数字段。
- 在搜索时,合理利用近似聚合和数据过滤。
- 在节点层,设置硬内存大小以及动态的断熔限制。
- 在应用层,通过监控集群内存的使用情况和 Full GC 的发生频率,来调整是否需要给集群资源添加更多的机器节点

大多数实施会应用到以上一种或几种方法。确切的组合方式与我们特定的系统环境高度相关。

无论采取何种方式,对于现有的选择进行评估,并同时创建短期和长期计划,都十分重要。先决定当前内存的使用情况和需要做的事情(如果有),通过评估数据增长速度,来决定未来半年或者一年的集群的规划,使用何种方式来扩展。

最好在建立集群之前就计划好这些内容,而不是在我们集群堆内存使用 90% 的时候再临时抱佛脚。

0 comments on commit e2325a1

Please sign in to comment.