Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add trace support into task #55

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Add trace support into task #55

wants to merge 8 commits into from

Conversation

kaibocai
Copy link
Member

@kaibocai kaibocai commented Jan 4, 2024

This PR add support to support for injection of otel spans into tasks.
Try to resolve #31
protobuf update: microsoft/durabletask-protobuf#21

User can get current span by

span := trace.SpanFromContext(ctx.Context())

or create child span from current span

_, childSpan := tracer.Start(ctx.Context(), "activity-subwork")

The output trace looks like
image

@kaibocai kaibocai requested a review from cgillum January 4, 2024 21:47
@kaibocai
Copy link
Member Author

kaibocai commented Jan 4, 2024

tagging @ItalyPaleAle for review.

@@ -3,7 +3,7 @@

// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
// protoc-gen-go v1.31.0
// protoc-gen-go v1.28.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note protoc-gen-go was downgraded. Not sure if it matters

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. different versions of protoc-gen-go are used by developers to generate protobuf code, I think as long as tests run well, it should be fine. Will update my local protoc-gen-go version though.

@kaibocai kaibocai marked this pull request as draft January 5, 2024 23:31
task/activity.go Outdated
@@ -52,6 +54,10 @@ type activityContext struct {
type Activity func(ctx ActivityContext) (any, error)

func newTaskActivityContext(ctx context.Context, taskID int32, ts *protos.TaskScheduledEvent) *activityContext {
ctx, err := helpers.ContextFromTraceContext(ctx, ts.ParentTraceContext)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that the ctx argument already has the correct trace context, being set much earlier than this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for examples running not through the grpc we did have the trace context, but if we run it through grpc we are missing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can actually set the context in processWorkItem itself:

func (c *TaskHubGrpcClient) processActivityWorkItem(

@@ -94,7 +94,7 @@ func (c *TaskHubGrpcClient) processActivityWorkItem(
executor backend.Executor,
req *protos.ActivityRequest,
) {
var tc *protos.TraceContext = nil // TODO: How to populate trace context?
var tc *protos.TraceContext = req.ParentTraceContext
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need more help understanding why we need to pass ParentTraceContext on this request object instead of just getting it directly from the gRPC context. Is it because work items are received on a stream and gRPC doesn't automatically populate any distributed trace context?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think

  1. the grpc context cannot directly used to pass through key-value pair (span is usually stored as key-val pair in context in go), instead it needs a bit more work https://www.hward.com/golang-grpc-context-client-server/.
  2. we obtain the stream at
    stream, err := c.client.GetWorkItems(ctx, &req)
    , at this point we don't have any trace context set up in the context, the context basically is an empty context. The stream should not have any trace info therefore.

I tried a local run to assert this
This is the context we build up on grpc server side
image

This is the context at

ctx context.Context,

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying from here: #31 (comment)

Agree @kaibocai. I looked into it as well. We can easily propagate context for a normal grpc request(non-streaming) using otel interceptors(e.g. here) but for streaming requests, context is propagated at the start only i.e. when the stream is created. Once the stream creation is done, then since no context is passed during stream.send() method, and because we can't update stream context, the newer context(updated after stream creation) is never passed to client/server.

Here in DTF-Go as well at the time of stream creation, there is nothing in context(since no orchestration/acitivity are created yet). We have the updated context(with all span info) when activity/orchestration workitem needs to be sent to the app but this context is not passed while sending the workItem. So best way to do it is to encode trace context in the message itself.

Copy link
Contributor

@shivamkm07 shivamkm07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add ParentTraceContext in OrchestratorRequest as well as well, similar to activityRequest? It would enable users to create spans/use trace informations in orchestrations as well. thoughts @kaibocai @cgillum

type OrchestratorRequest struct {

@@ -68,6 +68,8 @@ func (p *activityProcessor) ProcessWorkItem(ctx context.Context, wi WorkItem) er
}()
}

// set the parent trace context to be the newly created activity span
ts.ParentTraceContext = helpers.TraceContextFromSpan(span)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to reset the ParentTraceContext here as the newly created context above already has the span information. It can be extracted in the ExecuteActivity itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I still need to set the ParentTraceContext as I need to populate it here

ParentTraceContext: task.ParentTraceContext,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can actually get span from context itself: https://pkg.go.dev/go.opentelemetry.io/otel/trace#SpanFromContext and then get traceContext from span, but i guess it's easier updating the event itself.

My issue was updating the event was that it should't cause any issue in case of errors/retries(like recursively adding a new span under activity span), but it doesn't seem anything like that happening for now, so no issues.

task/activity.go Outdated
@@ -52,6 +54,10 @@ type activityContext struct {
type Activity func(ctx ActivityContext) (any, error)

func newTaskActivityContext(ctx context.Context, taskID int32, ts *protos.TaskScheduledEvent) *activityContext {
ctx, err := helpers.ContextFromTraceContext(ctx, ts.ParentTraceContext)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can actually set the context in processWorkItem itself:

func (c *TaskHubGrpcClient) processActivityWorkItem(

@@ -207,6 +210,56 @@ func Test_SingleActivity(t *testing.T) {
)
}

func Test_SingleActivity_TaskSpan(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a test added in https://github.com/microsoft/durabletask-go/blob/main/tests/grpc/grpc_test.go as well to validate we are correctly passing the context information over grpc stream.

@kaibocai
Copy link
Member Author

Should we add ParentTraceContext in OrchestratorRequest as well as well, similar to activityRequest? It would enable users to create spans/use trace informations in orchestrations as well.

My thought is that in orchestration users are simply calling APIs we provided to do their orchestration, it seems to me that there is not much need to create child spans inside orchestration. All the operations will be simple and the span we generated here should be enough.

ctx, span := helpers.StartNewCreateOrchestrationSpan(ctx, req.Name, req.Version.GetValue(), instanceID)

main.go Outdated
@@ -23,6 +28,17 @@ func main() {
// Parse command-line arguments
flag.Parse()

// Tracing can be configured independently of the orchestration code.
tp, err := ConfigureZipkinTracing()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be under some flag like use-zipkin or some other name? In case user is using any collector other than zipkin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was the wrong commit, I will revert it. Thanks for catching it. This PR will not include changes for setting up the exporter, it only includes changes for injecting trace info to activity.

@kaibocai kaibocai marked this pull request as ready for review January 12, 2024 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for injection of otel spans into tasks for enhancement
4 participants