You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have 150k charts, 2.3gb size bucket, and a 40MB index file. I ran reindex in a k8s pod inside aws and it took a week to complete. I can suggest a few optimizations just from looking at the code.
Reindex is downloading each chart as it iterates through the index. Instead sync the entire s3 bucket locally then process or perhaps trusting the old index and only download objects that are missing from the index.
Optimize or expose client configuration options see https://stackoverflow.com/a/48114553
It takes me 5mins to download all the objects with the above optimizations. I'd guess it would take hours with default settings
The text was updated successfully, but these errors were encountered:
We have actually, but that itself is a challenge. Each delete (like upload) takes a considerable amount of time and probably 99% of the references need to be cleaned. The criteria I can think of is remove any charts over {time period old} then also consider an allow list of currently used chart-versions that are in use. I thought about it a while and forking s3 is probably the easiest way for us to fine tune reindex to have a prune option to apply that criteria.
hypnoglow
changed the title
reindex is too slow
Reindex is too slow in repos with thousands of charts
Dec 1, 2023
At a guess I would assume that deletion takes a while because helm-s3 downloads, edits and uploads the index on every delete. With large repos the index gets pretty beefy so this adds a lot of time. Would it be possible to support deleting multiple versions in one go?
We have 150k charts, 2.3gb size bucket, and a 40MB index file. I ran reindex in a k8s pod inside aws and it took a week to complete. I can suggest a few optimizations just from looking at the code.
Reindex is downloading each chart as it iterates through the index. Instead sync the entire s3 bucket locally then process or perhaps trusting the old index and only download objects that are missing from the index.
Optimize or expose client configuration options see https://stackoverflow.com/a/48114553
It takes me 5mins to download all the objects with the above optimizations. I'd guess it would take hours with default settings
The text was updated successfully, but these errors were encountered: