Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make reindex should fail if it leaves message in fail queues #2433

Closed
hannes-ucsc opened this issue Nov 6, 2020 · 2 comments
Closed

make reindex should fail if it leaves message in fail queues #2433

hannes-ucsc opened this issue Nov 6, 2020 · 2 comments
Labels
debt [type] A defect incurring continued engineering cost enh [type] New feature or request infra [subject] Project infrastructure like CI/CD, build and deployment scripts orange [process] Done by the Azul team wontfix [process] Issue is valid but resolution is not deemed necessary

Comments

@hannes-ucsc
Copy link
Member

No description provided.

@hannes-ucsc hannes-ucsc added the orange [process] Done by the Azul team label Nov 6, 2020
@melainalegaspi melainalegaspi added bug [type] A defect preventing use of the system as specified code [subject] Production code debt [type] A defect incurring continued engineering cost test [subject] Unit and integration test code and removed bug [type] A defect preventing use of the system as specified test [subject] Unit and integration test code labels Nov 6, 2020
@hannes-ucsc hannes-ucsc added the enh [type] New feature or request label Nov 5, 2021
@theathorn theathorn added infra [subject] Project infrastructure like CI/CD, build and deployment scripts and removed code [subject] Production code labels Nov 5, 2021
@hannes-ucsc hannes-ucsc changed the title make reindex should fail if it lengthens the fail queue make reindex should fail if it leaves message in fail queues Dec 20, 2024
@hannes-ucsc
Copy link
Member Author

hannes-ucsc commented Dec 20, 2024

I think we should lump this together with #3661. Previously this ticket was worded differently but that solution would require comparing the fail queue lengths before and after the reindex. That requires persisting some state and is complicated by retries of the reindex job on GitLab.

If we lump in #3661, the implementation will be much simpler: One should not be able to start a reindex with messages in the fail queue and the reindex should fail if there are messages in the fail queue when it exits. This eliminates the need to compare the length of the fail queues before and after.

Every reindex begins with a reset of the indexer. The lambdas are disabled, the work queues are purged, the ES indices are deleted and the lambdas are re-enabled. Note that the fail queues are not purged. A check should be added to the reindex.py script, such that it refuses to even reset the indexer with messages in the fail queue. The error message should suggest the remedy: running manage_queues.py to dump both fail queues into files and retrying the reindex.

On the other end of the reindex, there should be another check of the fail queue lengths. The script, and therefore the make target, should fail unless the lengths are zero.

Determining the length of an SQS used to be an approximate operation. Assignee to check if that's still the case. Receiving a message from a queue and letting the visibility timeout expire might be a more reliable check for emptiness but it tampers with the queue and might interfere with the suggested remedy of dumping the queues. Assignee to investigate this aspect prior to implementing the prescribed solution. Maybe there is now a "peek" operation or maybe the specific length is approximate but whether it is > 0 is not.

@hannes-ucsc
Copy link
Member Author

Superseded by #6781.

@hannes-ucsc hannes-ucsc closed this as not planned Won't fix, can't repro, duplicate, stale Dec 21, 2024
@hannes-ucsc hannes-ucsc added the wontfix [process] Issue is valid but resolution is not deemed necessary label Dec 21, 2024
@hannes-ucsc hannes-ucsc removed their assignment Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
debt [type] A defect incurring continued engineering cost enh [type] New feature or request infra [subject] Project infrastructure like CI/CD, build and deployment scripts orange [process] Done by the Azul team wontfix [process] Issue is valid but resolution is not deemed necessary
Projects
None yet
Development

No branches or pull requests

3 participants