Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ack/nack results so middleware can manage. #415

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jenstroeger
Copy link
Contributor

This PR is a draft and discussion. I’m still testing it, and wanted to gather feedback early.

Background

When using distributed RabbitMQ hosted by AmazonMQ’s service then connection losses are a regular problem due to Amazon’s maintenance window: hosts shut down whether messages are queued or in flight, which in turn triggers automatic message requeueing and failing message ack due to delivery tag changes after the connection reestablishes.

Proposed implementation

I think a Dramatiq middleware is able to manage these scenarios, even for stateful and non-transacted actor functions. However, the middleware needs to know whether a message was ack’ed successfully or not, which is what this PR addresses. I added that ack/nack result as a kwarg to maintain compatibility with existing middlewares.

@Bogdanp, I’m curious what you make of this change, and your experience with & thoughts on the mentioned AmazonMQ service (if any).

@jenstroeger jenstroeger force-pushed the add-ack-nack-status-support branch 3 times, most recently from 8cfd252 to b1a29bb Compare August 13, 2021 05:51
@jenstroeger jenstroeger force-pushed the add-ack-nack-status-support branch from b1a29bb to c861c47 Compare August 13, 2021 05:55
@Bogdanp
Copy link
Owner

Bogdanp commented Aug 16, 2021

I think there are two problems with this change, both related to backwards-compatibility:

  • After this change, external brokers (like dramatiq_sqs or dramatiq-pg) will incorrectly report that "no action was taken" after ack/nack. At least those two should also be updated.
  • Making the new ack/nack args keywords is not enough for backwards-compatibility. Passing in an unhandled kwarg to an existing middleware will end up raising a TypeError. To preserve backwards-compatibility, we'd have to detect whether or not the middleware handlers accept the new kwargs, which seems like it could get a little ugly.

I have no experience with AmazonMQ so I can't help there. With other managed AWS services that have a maintenance window, I've generally turned it off and handled upgrading manually precisely because I wanted to avoid these sorts of unexpected service interruptions.

@jenstroeger
Copy link
Contributor Author

jenstroeger commented Aug 16, 2021

After this change, external brokers (like dramatiq_sqs or dramatiq-pg) will incorrectly report that "no action was taken" after ack/nack. At least those two should also be updated.

Yup, I had to adjust the existing Redis, Stub, and RabbitMQ brokers. I’m happy to provide a PR for those two brokers as well.

Also, I think that success=None should be interpreted as “Not known whether message was n/acked” instead of “No action was taken”.

Making the new ack/nack args keywords is not enough for backwards-compatibility. […] it could get a little ugly.

I agree, and it’s not a route I’d want to go. This PR is a breaking change and as per conventional commits would warrant a major version bump. I have a feeling that you may not be open to it at this point 😉 Perhaps this change is better suited together with other features for a v2 release?

I have no experience with AmazonMQ so I can't help there. With other managed AWS services that have a maintenance window, I've generally turned it off and handled upgrading manually precisely because I wanted to avoid these sorts of unexpected service interruptions.

I’ve contacted AWS twice now and raised this issue, precisely because their decision of a forced maintenance breaks their commitment of high availability. But even if they do ack (pun intended) the problem and set out to fixing it, I’m not going to wait around because the problem is happening now.

The alternative to stabilizing async workers is to abandon AmazonMQ/RabbitMQ in favor of another queue implementation. @Bogdanp, what’s your experience with SQS (considering you implemented dramatiq_sqs)?

@jenstroeger
Copy link
Contributor Author

Looks like v1.12 and its redelivered flag for RabbitMQ messages helps address the initial issue as well 🤓

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants