-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MDC context is lost when calling asynchronous operations with executor #2142
Comments
@getaceres I'm not familiar with MDC context, I'll refer this to the Java SDK team and ask them to take a look. |
Hi @getaceres,
Netty is being used here because it's the default async HTTP client used by the SDK when using an async service client. The Lines 203 to 211 in dc695de
In this case it does not mean at the SDK will use the executor to do any work required to execute the request. We see that a Netty thread is submitting the task onto the |
@getaceres marking this to auto close soon, let us know if you have more questions. |
@dagnir Is there any way to propagate the MDC context to logs that are logged from within the Netty event loop? I have a Spring boot application where i want all aws sdk logs to have trace ids on them, But i can't figure out anyway to do this |
…fb1689861 Pull request: release <- staging/bea6ab8b-b330-4cc1-8ca8-94bfb1689861
@debora-ito is it possible to continue the discussion here or is it better to open a new issue? There's one nearly identical to this, but it's also closed - #1613 Basically the issue is that since aws sdk v2 uses netty nio, all thread locals are lost after any async sdk method is completed, that includes spring security context, mdc, servlet request attributes or anything else the developer might have put into a thread local. Unfortunately, there seems to be no way to tell the sdk to preserve this context or extend the sdk in a way that makes it possible. This makes it really hard to use aws sdk v2 in a spring mvc environment without resorting to migrating to kotlin coroutines or some hacky workarounds. |
Is there some reason why this was closed? This is still a real issue |
I went ahead and opened a new issue #5242 as I think this one is not monitored on account of it being closed. |
When executing asynchronous, non blocking operations, MDC context is lost after receiving the result, although an executor that copies the MDC context is used to run the operation.
Describe the bug
A proof of concept can be found at https://github.com/getaceres/aws-mdc-test
There are two classes, one with a ThreadPool that copies the MDC context between calls and another one with the test. The test first executes some ficticious work in the executor to check that the MDC context is conserved, then it executes getQueueUrl, createQueue and receiveMessages operations blocking with a join in SQS to check that the MDC is conserved after obtaining the result.
The problematic part is executing the receiveMessages operation in a non-blocking way: The MDC context is lost and in the response the correlation identifier is lost for good for the rest of the processing chain.
Expected Behavior
The provided executor gets used and so the MDC context is conserved.
Current Behavior
Looking at the log:
It seems that a Netty executor is used to execute the HTTP calls, even when netty-nio-client is not in the classpath. After returning the result, the MDC context is lost.
Steps to Reproduce
Download and run the project at https://github.com/getaceres/aws-mdc-test
Possible Solution
Right now the only workaround that I can think of is to always use blocking calls, which negates the advantage of the asynchronous API.
Another possibility is to use the provided executor to execute the HTTP calls as well.
A Reactive API also could help since it can receive an executor at publishing/subscription time and use it in a non-blocking way.
Context
I'm getting a series of SQS messages from a queue. They contain an attribute with a correlation identifier that was generated by the sending service to identify the whole transaction between the different services involved in a business operation for an user. For debugging and troubleshooting purposes, this correlation identifier is used in every subsequent log operation relative to the business transaction.
This works for blocking calls to SQS even when further work is done asynchronously using a similar executor to the one provided in the example. However we've found this problem when we tried to move to the asynchronous API.
Your Environment
The text was updated successfully, but these errors were encountered: