Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd is much slower than deflate #119662

Open
hchargois opened this issue Jan 7, 2025 · 4 comments
Open

zstd is much slower than deflate #119662

hchargois opened this issue Jan 7, 2025 · 4 comments

Comments

@hchargois
Copy link
Contributor

Elasticsearch Version

8.17.0

Installed Plugins

No response

Java Version

bundled

OS Version

Archlinux, kernel 6.12.1-arch1-1

Problem Description

The zstd codec introduced in ES 8.16 performs significantly worse than the old deflate codec, in terms of read (query) speed. Queries that need to read stored fields perform around 30 to 45 % worse with zstd than with deflate.

Moreover, I've found that there are no counterbalancing benefits, as the indexing speed doesn't improve, and the index size is only around 1% smaller at best, which is not significant.

So basically we're trading a <1 % improvement in index size with a ~ 40 % deterioration of query speed. For my use-case, that's not worth it.

I'm opening this as a "bug" because to me it's a huge regression, especially since there's no option to keep using deflate.

I don't know if the zstd codec can be "fixed" by optimizing it or changing its parameters, but regardless we should be able to continue using the deflate codec. Ideally we should have more precise control over the actual codec used, and we should be allowed to configure its parameters such as compression level, etc. The "default"/"best_compression" options should only be aliases to some predefined codecs and parameters that may change from release to release, but if we know we want a stable configuration, we should be able to choose "deflate" (or "zstd") and be sure that the codec doesn't change.

Steps to Reproduce

Download a sample dataset, for example the first 1M reviews of the Yelp dataset: https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset

Index them in 2 indices, yelp_deflate on ES 8.15.3 and yelp_zstd on ES 8.17.0, with the same mapping:

{
  "mappings": {
  },
  "settings": {
    "index": {
      "number_of_replicas": "1",
      "number_of_shards": "1",
      "codec": "best_compression"
    }
  }
}

Make a simple search query that returns a good amount of documents:

{
	"size": 10000,
	"query": {
		"term": {
			"cool": 3
		}
	}
}

Run the query on each index multiple times (so that the indices are in the system's page cache) but with the request_cache turned off, and record the "took":

< query.json curlie -s 'http://localhost:9202/yelp_zstd/_search?request_cache=false' | jq .took

The results are as follows:

  • deflate: ~ 1850 ms
  • zstd: ~ 2700 ms (~ 45 % slower)

As for the storage, after force-merging the indices in a single segment, we get:

  • deflate: 747.7 MB total index size, 316.5 MB stored_fields only
  • zstd: 743.2 MB total index size (- 0.6 %), 312.1 MB stored_fields only (- 1.4 %)

Logs (if relevant)

No response

@hchargois hchargois added >bug needs:triage Requires assignment of a team area label labels Jan 7, 2025
@elasticsearchmachine elasticsearchmachine added Team:StorageEngine and removed needs:triage Requires assignment of a team area label labels Jan 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@kingherc kingherc added needs:triage Requires assignment of a team area label and removed needs:triage Requires assignment of a team area label labels Jan 7, 2025
@hchargois
Copy link
Contributor Author

hchargois commented Jan 7, 2025

I've just found out that in 8.17 there's an undocumented, but working "legacy_best_compression" codec. Sneaky! I guess that was left exactly for this kind of testing purposes.

I've done the benchmarks again with the legacy_best_compression codec on 8.17, and the results are exactly the same as those from the best_compression (deflate) codec on 8.15 of my post above. So it shows that the performance deterioration really comes from the codec difference and not from some other difference between 8.15 and 8.17.

@martijnvg
Copy link
Member

Thanks @hchargois for reporting the effects that you see in your environment because of the zstandard compression change. I have not yet replicated the experiment that you shared in the issue description, but a while back when we experimented with zstandard and in general we observed that switching from deflate to zstandard for when index.codec=best_compression, at worst resulted in similar performance / compression ratio as delate and at best gave better compression ratio and better performance. (mainly better indexing throughput)

Based on the query you shared (the size being set to 10000) reading and decompression of stored fields should be a big part of where time is being spent (regardless of what index.codec is being set to). Returning a lot of hits is typically done for reindexing or exporting purposes.

For your use case how does index.codec=default perform? This is used by default and uses lz4 under the hood and in general should perform better for when stored field read performance is more important compared to how well stored fields compress on disk.

I've just found out that in 8.17 there's an undocumented, but working "legacy_best_compression" codec. Sneaky! I guess that was left exactly for this kind of testing purposes.

Yes, the legacy_best_compression option was meant as a workaround for unforeseen issues like bugs.

@hchargois
Copy link
Contributor Author

lz4 of course performs much much better in query speed, around 150 ms on that test index, more than 10x faster than deflate. But the storage size is much larger, the index is 930 MB (25 % more than deflate). We can't really afford that much extra space. Deflate has always provided a trade-off that suited us fine. Of course we wouldn't mind better query times for the same index size, or a smaller index size for the same query times... but with zstd it's significantly worse query times for no gain in index size... Not very enticing.

BTW I used a large size to better show the influence of the codec decompressing the stored fields, as you mentioned, and to have larger numbers to compare and avoid noise. Still, the incidence of the codec speed manifests itself at any size, even as low as the default size of 10 (even though took being an int of ms doesn't allow too much precision):

size best_compression (zstd) legacy_best_compression (deflate)
1000 avg 278 / min 267 avg 184 / min 178
100 avg 26.9 / min 25 avg 18.2 / min 18
30 avg 11.5 / min 8 avg 7.0 / min 5
10 avg 3.1 / min 3 avg 2.1 / min 2

(all times in ms, from the took field)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants