-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dt.CreateInstance stored proc is causing deadlocks (proposed ROLLBACK change that fixes the issue) #217
Comments
Hi @microrama, were you able to confirm whether adding |
Yes Chris.
I would easily reproduce the issue with 2 processes trying to read from DB
and create instances, say 200. Once the rollback was added, it was
beautiful and even 5000 worked without issues.
…On Mon, May 6, 2024 at 6:47 PM Chris Gillum ***@***.***> wrote:
Hi @microrama <https://github.com/microrama>, were you able to confirm
whether adding ROLLBACK TRANSACTION before the THROW truly resolves the
deadlocks that you encountered? I ask because I didn't expect this to be
necessary since we hadn't actually changed any data at this point in the
stored procedure.
—
Reply to this email directly, view it on GitHub
<#217 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AG3U2LVQUCUC6HVVJG7CGITZBAB77AVCNFSM6AAAAABHJRCCDGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJXGA2TGOBZGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@microrama thanks for confirming! By the way, we accept pull requests from users. If you would like to submit this as a small pull request, we would happily accept it and give you appropriate credit in the Git history and release notes. |
Thanks so much Chris!
I will do a pull request tomorrow!!
…On Mon, May 6, 2024 at 7:26 PM Chris Gillum ***@***.***> wrote:
@microrama <https://github.com/microrama> thanks for confirming! By the
way, we accept pull requests from users. If you would like to submit this
as a small pull request, we would happily accept it and give you
appropriate credit in the Git history and release notes.
—
Reply to this email directly, view it on GitHub
<#217 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AG3U2LXDPFHYG7SRXNFMM7TZBAGUDAVCNFSM6AAAAABHJRCCDGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJXGA4DSNJZG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Chris - I submitted a pull request - #218 |
Thanks @microrama! The change looks good to me. Please sign the CLA and I'll go ahead and merge your PR. |
@cgillum any chance to get a release :) |
@bhugot yes, I can work on that. |
@cgillum i found other transaction with throw not rollbacked |
@bhugot feel free to submit a PR for the other case you found. I can include it in the next release. |
Problem:
Dt.CreateInstance store proc does not ROLLBACK the transaction (line 36) when it fails to create a new orchestration since another process/thread has already created it. This happens when there are multiple processes that are trying to create orchestration instances over the same source data. This was easily recreated when multiple processes are trying to create, say 500 orchestration instances based on the data in a SQL table.
The current stored proc code throws an exception (THROW 50001, @msg, 1; ) when it encounters that another process already created the orchestration instance but it does rollback the "Begin Transaction" which is perhaps causing the deadlocks and SQL timeout exceptions in the code.
The deadlock this caused was identified using the deadlock reports from the database and the report has been provided below along with the .NET exception stack trace.
Solution:
Dt.CreateInstance stored procedure with the ROLLBACK when it detects duplicate orchestration instance before throwing the exception that fixed the deadlock issues.
…….
BEGIN TRANSACTION
DECLARE @existingStatus varchar(30) = (
SELECT TOP 1 existing.[RuntimeStatus]
FROM Instances existing WITH (HOLDLOCK)
WHERE [TaskHub] = @taskhub AND [InstanceID] = @instanceid
)
-- Instance IDs can be overwritten only if the orchestration is in a terminal state
--IF @existingStatus IN ('Pending', 'Running')
IF @existingStatus IN (SELECT value FROM STRING_SPLIT(@DedupeStatuses, ','))
BEGIN
ROLLBACK TRANSACTION;
THROW 50001, @msg, 1;
END
ELSE IF @existingStatus IS NOT NULL
……………..
Exception StackTrace
"message": [
"TaskOrchestrationDispatcher-25904bb427294445a049682c104c412a-0: Failed to fetch a work-item: Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.\r\n ---> System.ComponentModel.Win32Exception (258): The wait operation timed out.\r\n at Microsoft.Data.SqlClient.SqlCommand.<>c.b__209_0(Task
1 result)\r\n at System.Threading.Tasks.ContinuationResultTaskFromResultTask
2.InnerInvoke()\r\n at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)\r\n--- End of stack trace from previous location ---\r\n at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)\r\n--- End of stack trace from previous location ---\r\n at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries)\r\n at DurableTask.SqlServer.SqlUtils.WithRetry[T](Func
1 func, SprocExecutionContext context, LogHelper traceHelper, String instanceId, Int32 maxRetries) in /_/src/DurableTask.SqlServer/SqlUtils.cs:line 504\r\n at DurableTask.SqlServer.SqlUtils.ExecuteSprocAndTraceAsync[T](DbCommand command, LogHelper traceHelper, String instanceId, Func2 executor) in /_/src/DurableTask.SqlServer/SqlUtils.cs:line 454\r\n at DurableTask.SqlServer.SqlOrchestrationService.LockNextTaskOrchestrationWorkItemAsync(TimeSpan receiveTimeout, CancellationToken cancellationToken) in /_/src/DurableTask.SqlServer/SqlOrchestrationService.cs:line 135\r\n at DurableTask.Core.WorkItemDispatcher
1.DispatchAsync(WorkItemDispatcherContext context)\r\nClientConnectionId:22a53d4c-5250-463f-96c0-bb08165b71a7\r\nError Number:-2,State:0,Class:11"],
Deadlock Report
DELETE FROM Instances WHERE [TaskHub] = @taskhub AND [InstanceID] IN (SELECT [InstanceID] FROM @InstanceIDs
EXEC dt.PurgeInstanceStateByID @instancesToPurg
Proc [Database Id = 12 Object Id = 1922978077]
DELETE FROM Instances WHERE [TaskHub] = @taskhub AND [InstanceID] IN (SELECT [InstanceID] FROM @InstanceIDs
EXEC dt.PurgeInstanceStateByID @instancesToPurg
Proc [Database Id = 12 Object Id = 1922978077]
The text was updated successfully, but these errors were encountered: