-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support per index cache strategy and time based caching condition #5650
Comments
To add to this, if there are per index settings I would want a way to define this in the index template. And more specifically define an index pattern that defines which index's to apply those settings to. We create many indexes daily in a dynamic fashion simply by inserting data and we don't want to hand hold that process. For us it would be a situation where just 1 of our users has a sudden increase in ingested data that necessitates an increase in cache capacity for their specific set of indexes, but not the entire cluster. The problem becomes how to do that. Do we have to reapply index templates, or maybe an api to adjust a certain settings? The way it stands currently, it would seem that we would have to have multiple templates for customer specific cache settings which could be in the hundreds to thousands of templates |
This falls into "apply index template on dynamic created index" feature, which is out of the scope of this issue. TBH, I am not aware of this feature(Where is it documented?). If cache is configurable per index, then specify configuration in the template will do just fine. Oh, maybe we should consider cache per tag as well, since splits are created independently for each tag, as long as tag cardinality does not exceed threshold.
quickwit's coming release 0.9 will have index update feature included. Can't wait for that release! Updating cache eviction configuration of an index is nothing but updating its definition. Since splits cache are immutable, I believe it is pretty straightforward to implement as well. In term of cascading update(index template updates triggering index updates), I have no idea. What if you want to apply changes to part of indexes? What if some index has been manually updated, should we override? I believe manually call index update API is much simpler and flexible.
Looks like your index configuration comes with very complicated logic :). If that is the case, externalize index creation might be a better solution. |
I am not sure if it is documented, but yes, you can specify index id patterns on a template like elastic search, and when inserting documents for indexes that do not exists, they will be created with the template that first matches its id. This works today, at least though the elastic search _bulk endpoint, we leverage this functionality quite heavily. Your idea sounds useful, but consideration should be taken that it isn't a maintenance nightmare at larger scales. |
Is your feature request related to a problem? Please describe.
Thanks for providing such an amazing piece of work, quickwit provides everything(almost) we need for our platform.
Our workload pattern is very similar to what is described in #5445 , at a much smaller scale. Currently we have 11 indexes, split sized-wise, 2 indexes range from 100T to 150T, 1 index is at about 20TB, others are well below 1TB.
The LRU cache strategy is very brittle against "big scans" that runs every now and then(less than 10 times every day). Some work( #5469 ) have been done to support LFU strategy which might work, but it still lacks flexibility.
In our case ,caching is not for performance, quickwit with no disk cache is blazing fast, which is where quickwit's engineering truly shines. Long range queries with term conditions (trace_id = xxx) can not be effectively cached anyways, downloading all splits to local disk won't help.
The actual value of cache for us is that s3 requests are greatly reduced for repeated data queries, which saves money and makes some pattern economically viable(100+TPS read on last x days data).
Describe the solution you'd like
To mitigate the cache churn issue, I would like quickwit to support following features
Describe alternatives you've considered
For feature request 1, we evaluated a potential solution that use 2 searcher cluster to handle 2 groups of index(with cache/ w.o cache). It is not hard to play with some logic in the http proxy to quickwit since we also have to do authentication anyways.
For feature request 2, we believe there is no alternative.
Additional context
Out of curiosity, what's the plan for this project? Are you willing to take contributions? If it is ok, we would like to try working on this issue.
The text was updated successfully, but these errors were encountered: