-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider starting the renewal of messages in DTFx.Core as soon as they are fetched #1150
Comments
I am picking this up. Will try to raise a PR in a couple of days. |
@davidmrdavid How I was looking at it was:
Questions:
|
Thanks for your interest here, @sig5!
Some of these design decisions pre-date me, to be honest, so I don't know why the design ended up like this. I do know the That being said, I strongly recommend that we don't introduce any such refactoring when working on this thread's feature proposal - that will only make the PR harder to merge. In general, we don't modify the dispatcher classes very often, so it's unclear to me that we'll gain much from a simplified design of them as these files are "rarely modified" as far as I can tell. That can be a separate discussion :-) . Other than that, your sketch of the PR seems reasonable at a high-level, but we'll have to look at the details to be certain. Excited to see your PR! |
@davidmrdavid , Thanks for clarification. Yeah, I get it. I just was curious, if there is a different school of thought being followed 😄 here for optimization/readability. I have published a PR with contained changes and an open question here #1168. |
Closing this issue because I think fundamentally the problem is with DT.AzureStorage and not DT.Core.
This is true in the case of DurableTask.AzureStorage. However, it's not generally true with DurableTask.Core. DT.Core will only attempt to fetch a message when there is sufficient concurrency available. See WorkItemDispatcher.cs for reference. This means that there's no scenario where DT.Core fetches a message but can't process it because of concurrency limitations. The problem with DT.AzureStorage is that it internally prefetches messages. DT.Core is actually not aware of this at all. It's therefore entirely the responsibility of DT.AzureStorage to renew its prefetched messages. We need to fix this, but the scope of the problem is different from what's described in this issue. We can use another issue to track the DT.AzureStorage problem. |
In DTFx.Core, the method
RenewTaskOrchestrationWorkItemLockAsync
is used to ensure a given worker maintains exclusivity over a given partition message. For example, in the Azure Storage backend, this messages renews the "message visibility timeout" so that the message does not get dequeued again, or at least until the visibility timeout expires.This renewal flow is invoked when the message is being processed, which has a very specific meaning: we have not exceeded the "maxConcurrentOrchestrations" / "maxConcurrentActivities" limit, and therefore have enough capacity to process more messages.
This means that a message may be received by a given worker, but not become processable for a long time if the active orchestrators/activities match their "max concurrent" settings and are long-running. In that time, since we're not actively extending the message's visibilityTimeout, it is possible for the message to become visible again (possibly being dequeued by the same worker that already has that message!), therefore changing it's
popReceipt
, which in turn prevents us from successfully processing the copy of the message with the old popReceipt. This can lead to a cascade of errors.I believe framework-level fix to this is to start renewing messages as soon as they're fetched/received, not just when they're being processed. This may require some refactoring in DTFx.Core's
WorkItemDispatcher
class, so it needs to be done with care.The text was updated successfully, but these errors were encountered: