Reindex is too slow in repos with thousands of charts #336

drewwells · 2023-11-28T19:38:51Z

We have 150k charts, 2.3gb size bucket, and a 40MB index file. I ran reindex in a k8s pod inside aws and it took a week to complete. I can suggest a few optimizations just from looking at the code.

Reindex is downloading each chart as it iterates through the index. Instead sync the entire s3 bucket locally then process or perhaps trusting the old index and only download objects that are missing from the index.
Optimize or expose client configuration options see https://stackoverflow.com/a/48114553

It takes me 5mins to download all the objects with the above optimizations. I'd guess it would take hours with default settings

hypnoglow · 2023-11-28T22:47:37Z

Wow that's a real number. I agree that the plugin was never designed to handle such volumes, and there is definitely a room for improvement.

Out of curiosity, why do you have so many charts? Have you considered cleaning up i.e. removing unused versions?

drewwells · 2023-11-28T23:07:38Z

We have actually, but that itself is a challenge. Each delete (like upload) takes a considerable amount of time and probably 99% of the references need to be cleaned. The criteria I can think of is remove any charts over {time period old} then also consider an allow list of currently used chart-versions that are in use. I thought about it a while and forking s3 is probably the easiest way for us to fine tune reindex to have a prune option to apply that criteria.

Makeshift · 2024-01-25T13:25:22Z

At a guess I would assume that deletion takes a while because helm-s3 downloads, edits and uploads the index on every delete. With large repos the index gets pretty beefy so this adds a lot of time. Would it be possible to support deleting multiple versions in one go?

hypnoglow added the enhancement label Nov 28, 2023

hypnoglow changed the title ~~reindex is too slow~~ Reindex is too slow in repos with thousands of charts Dec 1, 2023

hypnoglow added the help wanted label Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reindex is too slow in repos with thousands of charts #336

Reindex is too slow in repos with thousands of charts #336

drewwells commented Nov 28, 2023

hypnoglow commented Nov 28, 2023

drewwells commented Nov 28, 2023

Makeshift commented Jan 25, 2024

Reindex is too slow in repos with thousands of charts #336

Reindex is too slow in repos with thousands of charts #336

Comments

drewwells commented Nov 28, 2023

hypnoglow commented Nov 28, 2023

drewwells commented Nov 28, 2023

Makeshift commented Jan 25, 2024