Elasticsearch 8.16.x Large Increase in MMAP Counts #119652
Labels
>bug
:Performance
All issues related to Elasticsearch performance including regressions and investigations
Team:Performance
Meta label for performance team
Elasticsearch Version
8.16.1
Installed Plugins
No response
Java Version
bundled && Java 17
OS Version
Linux elasticsearch-data-hot-1 6.1.112+ #1 SMP PREEMPT_DYNAMIC Sat Oct 19 17:09:54 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Problem Description
Elasticsearch 8.16.x onwards is requiring significantly more memory regions than prior versions.
https://discuss.elastic.co/t/heap-allocation-failures-on-8-17/372211/8
https://discuss.elastic.co/t/oom-since-8-16-1-with-openjdk23
In our experience (first link), we started observing semi-frequent heap allocation failures across all our hot nodes after upgrading from 8.15.x to 8.17.x. All our hot nodes would restart due to these errors within a couple of hours of each other, and then the same would happen again between 12 - 24 hours later.
After some digging we discovered that the max mmap count we had configured, based on the recommendations was being reached, resulting in these heap allocation failures.
We doubled the value to then observe if/where Elasticsearch would eventually top out at, which in our case was in the early 400k mark, and have yet to observe any failures since. The number of memory regions is not something we were previously collecting, however at the most conservative estimate if it was previously right below the limit prior to upgrading, the new numbers we were seeing after upgrading would be a roughly 60% increase in the amount of mmap regions being used, which does not feel like intended behaviour (or should at least be documented if so)
The second link provided above is another user with the same issue, after upgrading to 8.16.x (which indicates the change likes somewhere in the 8.16 series)
In our case we went from 8.15.1 to 8.17.0, without any JVM changes (using our own provided Java 21). In the other example it was upgrading from 8.15.1 to 8.16.1 including a change to the JVM version (preumably bundled JVM)
Steps to Reproduce
I've not been able to find a specific behaviour that may cause the increase between versions, and is difficult to reproduce in small clusters due to the relatively low activity of both indexing and search
Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: