-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correlate client and orchestrator functions for DF .NET Isolated Distributed Tracing #2998
base: dev
Are you sure you want to change the base?
Conversation
@@ -154,8 +155,22 @@ public override Task<Empty> Hello(Empty request, ServerCallContext context) | |||
{ | |||
try | |||
{ | |||
ActivitySource activityTraceSource = new ActivitySource("DurableTask.WebJobs"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, the source name is "DurableTask.WebJobs" because we load the DurableTelemetryModule
and it looks for sources that start with "DurableTask" (related code link). I'm working on updating the listeners to add a new one ("WebJobs.Extensions.DurableTask") when we initialize the module so we can remove this "DurableTask.WebJobs" that's just used to test for now.
Activity? currActivity = Activity.Current; | ||
if (options == null && options?.ParentTraceId == null && currActivity != null) | ||
{ | ||
options = new StartOrchestrationOptions(ParentTraceId: currActivity?.Id?.ToString()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the ParentTraceId
information to the StartOrchestrationOptions
, but I'm wondering if there's a better place to store this information since StartOrchestrationOptions
can be configured by the customer and we don't necessarily need customers to have access to ParentTraceId
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to avoid having a new activity source and span added. We should try and having the trace ID captured from the worker directly added to the enqueued orchestration. I think to accomplish this we will want to hydrate a shim activity which has its ID set to the trace context from the gRPC payload.
if (newActivity != null) | ||
{ | ||
typeof(Activity).GetField("_traceId", BindingFlags.Instance | BindingFlags.NonPublic)?.SetValue(newActivity, traceId); | ||
typeof(Activity).GetField("_spanId", BindingFlags.Instance | BindingFlags.NonPublic)?.SetValue(newActivity, spanId); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used reflection here to set the Activity
's trace ID and span ID, but don't set the rehydrated Activity
s parent Ids since that information isn't passed through the CreateInstanceRequest
type which ends up not correlating the client and orchestration functions in the distributed trace. I think we need to pass additional information to successfully correlate the client and orchestrator functions.
As a reference, the code below is how we rehydrated an Activity
for .NET In-Proc in DTFx:
- Get the parent context which is saved in the
ExecutionStartedEvent
and set it as theparentContext
if (!startEvent.TryGetParentTraceContext(out ActivityContext activityContext))
...
Activity? activity = ActivityTraceSource.StartActivity(
activityName,
kind: activityKind,
parentContext: activityContext,
startTime: startTime);
- Set the rehydrated
Activity
with the originalActivity
s trace ID and span ID:
if (startEvent.ParentTraceContext.Id != null && startEvent.ParentTraceContext.SpanId != null)
{
activity.SetId(startEvent.ParentTraceContext.Id!);
activity.SetSpanId(startEvent.ParentTraceContext.SpanId!);
}
In the example above, we had the following information:
- ParentTraceContext to point to the parent
- Id for the rehydrated
Activity
- spanId for the rehydrated
Activity
With the .NET Isolated approach, we only have the last two items, not the information about its parent. Would it be worth updating CreateInstanceRequest
to have the following fields?
- TraceContext parentTraceContext**
- TraceContext currentTraceContext
TraceContext
also has a spanId
field which would be another approach to help pass this information since we could create the Id using the traceId (from the parent) and spanId, but it's deprecated. Is it possible to add that field back or create a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we might be over complicating things a bit. Instead of reconstructing the trace activity we got from the client using reflection, can we start a new trace activity and have the incoming trace context be its parent? Something like this (this code won't compile, but is just illustrative):
Activity? activity = ActivityTraceSource.StartActivity(
"gRPC create orchestration",
kind: activityKind,
parentContext: request.ExecutionStarted.ParentTraceContext,
startTime: DateTime.UtcNow);
I feel like this would be cleaner and would help capture the fact that there is a gRPC network hop between the language worker and our host extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't expect you to need the parent information here. The incoming trace-context is the parent information.
The scenario is, we are about to call TaskHubClient
to schedule an orchestration. In that call, TaskHubClient
is going to create a send_orchestration
activity with kind=client. At time of this activity creation, we want to make sure it has access to whatever span was active when the customer's code asked to schedule this orchestration. This is done by ensuring Activity.Current
is set where its id (traceId, spanId) are set to what we want the parent of the mentioned activity to be. This process should not look at Activity.Current
's parent information.
Creating a new Activity.Current
with traceId and spanId set should be sufficient. We shouldn't need parent information. If this isn't working, we may need debug into TaskHubClient
and its activity creation to see what is going on.
@@ -154,8 +156,34 @@ public override Task<Empty> Hello(Empty request, ServerCallContext context) | |||
{ | |||
try | |||
{ | |||
Activity? newActivity = new Activity("gRPC start orchestration"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this span name match the convention we're using for other spans?
@@ -154,8 +156,34 @@ public override Task<Empty> Hello(Empty request, ServerCallContext context) | |||
{ | |||
try | |||
{ | |||
Activity? newActivity = new Activity("gRPC start orchestration"); | |||
|
|||
string traceParentContext = request.ParentTraceContext.TraceParent.ToString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming: Should this variable be named differently? It doesn't seem to be the trace context. Rather, it seems to be the traceparent value.
string traceParentContext = request.ParentTraceContext.TraceParent.ToString(); | |
string traceParentString = request.ParentTraceContext.TraceParent.ToString(); |
These changes help correlate client and orchestrator functions in DF .NET Isolated.
Gantt chart (shows HTTP trigger and orchestrator function in one trace)
Pull request checklist
pending_docs.md
release_notes.md
/src/Worker.Extensions.DurableTask/AssemblyInfo.cs