-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMQP queues die when the rabbit default timeout is reached #83
Comments
I see this seems more like networking issue at first glance, since it's older version of client, first step would be to update dependency, you can do that by just bumping explicit version if there was no change in the client API. I think by default the client uses auto reconnect mechanism so it should be transparent, but it's true, handling of connections could be improved in the library as well. Which version are you using? |
I've seen some issues for Don't know if they solved everything? So the idea if I saw it correctly might be that consumer is not recreated, might need manual intervention |
I've had a look but it's not entirely clear what the best course of action is. I would suggest that we use Orleans first class citizens and replicate something like this: replacing the NoOpStreamDeliveryFailureHandler in here: What do you think? |
So if I understand correctly I would just enable setter here, and you can provide your own handler, since handlers as seen in the example above just stores the failed entry to designated storage, where you can analyze the issue of the event |
Now I've looked closer I see what you mean. I was thinking of rather than persisting the error we use this hook to recreate the connection from scratch |
I'll take a look how to do this re-connection |
Awesome thanks |
Still figuring this out, looking more closely, the exception happened on 3rd retry of doing basic get command, and auto-ack is set to false... What I see is that maybe you had some long running task or error in app that it didn't ack the message, alternatively I could also add to queue options that the events are auto ack-ed, but then messages could be lost if error occurred. Since default value is 30minutes, it can trigger this error and halt the whole stream system in Orleans. This is also because BasicGet, BasicAck have to be in sequence, missing one can break this I'm adding the StreamFailureHandler, but with only warnings to the logger. |
I've noticed through experimentation that this issue only happens when I run more than 1 server in the Orleans cluster. Have you successfully run this over a long period with more than 1? |
Happens also with one silo, after a long time of running the silo |
We've seen situations where the queue get's clogged up and occasionally hits the default 30 minute timeout of the Rabbit message. If this happens all queue handlers die and stop processing messages. This never seems to recover:
I'm happy to look into recovering from such a scenario and submit a PR, if you point me in the right direction.
I'm sure there is something more fundamental in our application causing this but it would be nice for it to recover and have the ability to alter the timeout.
The text was updated successfully, but these errors were encountered: