You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hannes-ucsc opened this issue
Nov 6, 2020
· 2 comments
Labels
debt[type] A defect incurring continued engineering costenh[type] New feature or requestinfra[subject] Project infrastructure like CI/CD, build and deployment scriptsorange[process] Done by the Azul teamwontfix[process] Issue is valid but resolution is not deemed necessary
melainalegaspi
added
bug
[type] A defect preventing use of the system as specified
code
[subject] Production code
debt
[type] A defect incurring continued engineering cost
test
[subject] Unit and integration test code
and removed
bug
[type] A defect preventing use of the system as specified
test
[subject] Unit and integration test code
labels
Nov 6, 2020
theathorn
added
infra
[subject] Project infrastructure like CI/CD, build and deployment scripts
and removed
code
[subject] Production code
labels
Nov 5, 2021
hannes-ucsc
changed the title
make reindex should fail if it lengthens the fail queuemake reindex should fail if it leaves message in fail queues
Dec 20, 2024
I think we should lump this together with #3661. Previously this ticket was worded differently but that solution would require comparing the fail queue lengths before and after the reindex. That requires persisting some state and is complicated by retries of the reindex job on GitLab.
If we lump in #3661, the implementation will be much simpler: One should not be able to start a reindex with messages in the fail queue and the reindex should fail if there are messages in the fail queue when it exits. This eliminates the need to compare the length of the fail queues before and after.
Every reindex begins with a reset of the indexer. The lambdas are disabled, the work queues are purged, the ES indices are deleted and the lambdas are re-enabled. Note that the fail queues are not purged. A check should be added to the reindex.py script, such that it refuses to even reset the indexer with messages in the fail queue. The error message should suggest the remedy: running manage_queues.py to dump both fail queues into files and retrying the reindex.
On the other end of the reindex, there should be another check of the fail queue lengths. The script, and therefore the make target, should fail unless the lengths are zero.
Determining the length of an SQS used to be an approximate operation. Assignee to check if that's still the case. Receiving a message from a queue and letting the visibility timeout expire might be a more reliable check for emptiness but it tampers with the queue and might interfere with the suggested remedy of dumping the queues. Assignee to investigate this aspect prior to implementing the prescribed solution. Maybe there is now a "peek" operation or maybe the specific length is approximate but whether it is > 0 is not.
debt[type] A defect incurring continued engineering costenh[type] New feature or requestinfra[subject] Project infrastructure like CI/CD, build and deployment scriptsorange[process] Done by the Azul teamwontfix[process] Issue is valid but resolution is not deemed necessary
No description provided.
The text was updated successfully, but these errors were encountered: