zstd is much slower than deflate #119662

hchargois · 2025-01-07T14:32:50Z

Elasticsearch Version

8.17.0

Installed Plugins

No response

Java Version

bundled

OS Version

Archlinux, kernel 6.12.1-arch1-1

Problem Description

The zstd codec introduced in ES 8.16 performs significantly worse than the old deflate codec, in terms of read (query) speed. Queries that need to read stored fields perform around 30 to 45 % worse with zstd than with deflate.

Moreover, I've found that there are no counterbalancing benefits, as the indexing speed doesn't improve, and the index size is only around 1% smaller at best, which is not significant.

So basically we're trading a <1 % improvement in index size with a ~ 40 % deterioration of query speed. For my use-case, that's not worth it.

I'm opening this as a "bug" because to me it's a huge regression, especially since there's no option to keep using deflate.

I don't know if the zstd codec can be "fixed" by optimizing it or changing its parameters, but regardless we should be able to continue using the deflate codec. Ideally we should have more precise control over the actual codec used, and we should be allowed to configure its parameters such as compression level, etc. The "default"/"best_compression" options should only be aliases to some predefined codecs and parameters that may change from release to release, but if we know we want a stable configuration, we should be able to choose "deflate" (or "zstd") and be sure that the codec doesn't change.

Steps to Reproduce

Download a sample dataset, for example the first 1M reviews of the Yelp dataset: https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset

Index them in 2 indices, yelp_deflate on ES 8.15.3 and yelp_zstd on ES 8.17.0, with the same mapping:

{
  "mappings": {
  },
  "settings": {
    "index": {
      "number_of_replicas": "1",
      "number_of_shards": "1",
      "codec": "best_compression"
    }
  }
}

Make a simple search query that returns a good amount of documents:

{
	"size": 10000,
	"query": {
		"term": {
			"cool": 3
		}
	}
}

Run the query on each index multiple times (so that the indices are in the system's page cache) but with the request_cache turned off, and record the "took":

< query.json curlie -s 'http://localhost:9202/yelp_zstd/_search?request_cache=false' | jq .took

The results are as follows:

deflate: ~ 1850 ms
zstd: ~ 2700 ms (~ 45 % slower)

As for the storage, after force-merging the indices in a single segment, we get:

deflate: 747.7 MB total index size, 316.5 MB stored_fields only
zstd: 743.2 MB total index size (- 0.6 %), 312.1 MB stored_fields only (- 1.4 %)

Logs (if relevant)

No response

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2025-01-07T14:36:20Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

hchargois · 2025-01-07T16:04:55Z

I've just found out that in 8.17 there's an undocumented, but working "legacy_best_compression" codec. Sneaky! I guess that was left exactly for this kind of testing purposes.

I've done the benchmarks again with the legacy_best_compression codec on 8.17, and the results are exactly the same as those from the best_compression (deflate) codec on 8.15 of my post above. So it shows that the performance deterioration really comes from the codec difference and not from some other difference between 8.15 and 8.17.

martijnvg · 2025-01-08T12:39:38Z

Thanks @hchargois for reporting the effects that you see in your environment because of the zstandard compression change. I have not yet replicated the experiment that you shared in the issue description, but a while back when we experimented with zstandard and in general we observed that switching from deflate to zstandard for when index.codec=best_compression, at worst resulted in similar performance / compression ratio as delate and at best gave better compression ratio and better performance. (mainly better indexing throughput)

Based on the query you shared (the size being set to 10000) reading and decompression of stored fields should be a big part of where time is being spent (regardless of what index.codec is being set to). Returning a lot of hits is typically done for reindexing or exporting purposes.

For your use case how does index.codec=default perform? This is used by default and uses lz4 under the hood and in general should perform better for when stored field read performance is more important compared to how well stored fields compress on disk.

I've just found out that in 8.17 there's an undocumented, but working "legacy_best_compression" codec. Sneaky! I guess that was left exactly for this kind of testing purposes.

Yes, the legacy_best_compression option was meant as a workaround for unforeseen issues like bugs.

hchargois · 2025-01-08T15:20:14Z

lz4 of course performs much much better in query speed, around 150 ms on that test index, more than 10x faster than deflate. But the storage size is much larger, the index is 930 MB (25 % more than deflate). We can't really afford that much extra space. Deflate has always provided a trade-off that suited us fine. Of course we wouldn't mind better query times for the same index size, or a smaller index size for the same query times... but with zstd it's significantly worse query times for no gain in index size... Not very enticing.

BTW I used a large size to better show the influence of the codec decompressing the stored fields, as you mentioned, and to have larger numbers to compare and avoid noise. Still, the incidence of the codec speed manifests itself at any size, even as low as the default size of 10 (even though took being an int of ms doesn't allow too much precision):

size	`best_compression` (zstd)	`legacy_best_compression` (deflate)
1000	avg 278 / min 267	avg 184 / min 178
100	avg 26.9 / min 25	avg 18.2 / min 18
30	avg 11.5 / min 8	avg 7.0 / min 5
10	avg 3.1 / min 3	avg 2.1 / min 2

(all times in ms, from the took field)

hchargois added >bug needs:triage Requires assignment of a team area label labels Jan 7, 2025

kingherc added the :StorageEngine/Codec label Jan 7, 2025

elasticsearchmachine added Team:StorageEngine and removed needs:triage Requires assignment of a team area label labels Jan 7, 2025

kingherc added needs:triage Requires assignment of a team area label and removed needs:triage Requires assignment of a team area label labels Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zstd is much slower than deflate #119662

zstd is much slower than deflate #119662

hchargois commented Jan 7, 2025

elasticsearchmachine commented Jan 7, 2025

hchargois commented Jan 7, 2025 •

edited

Loading

martijnvg commented Jan 8, 2025

hchargois commented Jan 8, 2025

zstd is much slower than deflate #119662

zstd is much slower than deflate #119662

Comments

hchargois commented Jan 7, 2025

Elasticsearch Version

Installed Plugins

Java Version

OS Version

Problem Description

Steps to Reproduce

Logs (if relevant)

elasticsearchmachine commented Jan 7, 2025

hchargois commented Jan 7, 2025 • edited Loading

martijnvg commented Jan 8, 2025

hchargois commented Jan 8, 2025

hchargois commented Jan 7, 2025 •

edited

Loading