`make reindex` should fail if it leaves message in fail queues #2433

hannes-ucsc · 2020-11-06T21:04:23Z

No description provided.

hannes-ucsc · 2024-12-20T01:51:30Z

I think we should lump this together with #3661. Previously this ticket was worded differently but that solution would require comparing the fail queue lengths before and after the reindex. That requires persisting some state and is complicated by retries of the reindex job on GitLab.

If we lump in #3661, the implementation will be much simpler: One should not be able to start a reindex with messages in the fail queue and the reindex should fail if there are messages in the fail queue when it exits. This eliminates the need to compare the length of the fail queues before and after.

Every reindex begins with a reset of the indexer. The lambdas are disabled, the work queues are purged, the ES indices are deleted and the lambdas are re-enabled. Note that the fail queues are not purged. A check should be added to the reindex.py script, such that it refuses to even reset the indexer with messages in the fail queue. The error message should suggest the remedy: running manage_queues.py to dump both fail queues into files and retrying the reindex.

On the other end of the reindex, there should be another check of the fail queue lengths. The script, and therefore the make target, should fail unless the lengths are zero.

Determining the length of an SQS used to be an approximate operation. Assignee to check if that's still the case. Receiving a message from a queue and letting the visibility timeout expire might be a more reliable check for emptiness but it tampers with the queue and might interfere with the suggested remedy of dumping the queues. Assignee to investigate this aspect prior to implementing the prescribed solution. Maybe there is now a "peek" operation or maybe the specific length is approximate but whether it is > 0 is not.

hannes-ucsc · 2024-12-21T04:36:28Z

Superseded by #6781.

hannes-ucsc added the orange [process] Done by the Azul team label Nov 6, 2020

hannes-ucsc added the enh [type] New feature or request label Nov 5, 2021

theathorn added infra [subject] Project infrastructure like CI/CD, build and deployment scripts and removed code [subject] Production code labels Nov 5, 2021

hannes-ucsc changed the title ~~make reindex should fail if it lengthens the fail queue~~ make reindex should fail if it leaves message in fail queues Dec 20, 2024

hannes-ucsc mentioned this issue Dec 20, 2024

Reindex can be started with non-empty queues #3661

Closed

achave11-ucsc assigned hannes-ucsc Dec 20, 2024

hannes-ucsc mentioned this issue Dec 21, 2024

Improve handling of fail queues #6781

Open

hannes-ucsc closed this as not planned Won't fix, can't repro, duplicate, stale Dec 21, 2024

hannes-ucsc added the wontfix [process] Issue is valid but resolution is not deemed necessary label Dec 21, 2024

hannes-ucsc removed their assignment Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`make reindex` should fail if it leaves message in fail queues #2433

`make reindex` should fail if it leaves message in fail queues #2433

hannes-ucsc commented Nov 6, 2020

hannes-ucsc commented Dec 20, 2024 •

edited

Loading

hannes-ucsc commented Dec 21, 2024

make reindex should fail if it leaves message in fail queues #2433

make reindex should fail if it leaves message in fail queues #2433

Comments

hannes-ucsc commented Nov 6, 2020

hannes-ucsc commented Dec 20, 2024 • edited Loading

hannes-ucsc commented Dec 21, 2024

`make reindex` should fail if it leaves message in fail queues #2433

`make reindex` should fail if it leaves message in fail queues #2433

hannes-ucsc commented Dec 20, 2024 •

edited

Loading