Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracing and log correlation without otel #43718

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

lmolkova
Copy link
Member

@lmolkova lmolkova commented Jan 7, 2025

The next PR in the series of client core tracing+logging improvements.

In this episode, we:

  • Implement fallback tracing which:
    • generates trace context - it will be used to correlate logs between each other
    • records spans as logs (if users enable corresponding logger)
  • Introduce InstrumentationContext - strongly typed container for trace context that's used to correlate logs with or without otel
  • Merge HttpInstrumentationPolicy with HttpLoggingPolicy. This is not strictly necessary, but allows to
    • merge similar configuration like URL sanitization and allowed headers (not yet done in this PR)
    • optimize code (e.g. do sanitization once)
    • keep things consistent (e.g. span attributes and logging keys)
    • it will be even more useful when we add metrics - we'd reuse the same duration and other things that we log today
  • Improves logging in HttpRetryPolicy and HttpRedirectPolicy
    • makes it similar to HTTP requests
    • consistent attributes following OTel semantic conventions (where possible)
    • enables correlation
  • Moves logging into instrumentation package

In next episodes:

  • Relationship between HttpLogOptions and InstrumetnationOptions?
  • HTTP spans: time-to-last-byte
  • Metrics

@lmolkova lmolkova changed the title Log based tracing without otel Tracing and log correlation without otel Jan 7, 2025
@lmolkova
Copy link
Member Author

lmolkova commented Jan 7, 2025

Fallback tracing benchmarks

FallbackTracingBenchmarks.fallbackTracerDisabled         avgt   15     0.499 ±  0.005  ns/op
FallbackTracingBenchmarks.fallbackTracerEnabledNoLogs    avgt   15    47.193 ±  2.478  ns/op
FallbackTracingBenchmarks.fallbackTracerEnabledWithLogs  avgt   15  1349.738 ± 46.289  ns/op

So with fallback tracing (no logs) enabled by default, we're going to create ~2 spans for client SDK code in common case (1 public API-surface and 1 HTTP), adding ~90ns to each client call along with some allocations which seems reasonable.

This time is essentially spent on generating random Ids and we'd probably get a comparable impact from x-ms-client-id thing.

@azure-sdk
Copy link
Collaborator

azure-sdk commented Jan 7, 2025

API change check

API changes are not detected in this pull request.

@lmolkova
Copy link
Member Author

lmolkova commented Jan 7, 2025

Champion scenario:

  var httpPipeline = new HttpPipelineBuilder()
          .policies(new HttpRetryPolicy(), new HttpRedirectPolicy(), new HttpInstrumentationPolicy(null, null))
          .build();

  // this would effectively be done by codegen in client libs (when they create a span)
  // but users can also provide context explicitly and we'll use it.
  // we also support creating this context from OTel spans
  RequestOptions requestOptions = new RequestOptions().setInstrumentationContext(createRandomContext());

  HttpRequest request = new HttpRequest(HttpMethod.GET, "http://www.microsoft.com")
          .setRequestOptions(requestOptions);

  httpPipeline.send(request);

logs with simulated error (formatted, verbose)

2025-01-06 19:04:44.865 [main] [DEBUG] io.clientcore.core.http.pipeline.HttpInstrumentationPolicy - 
{
  "trace.id": "a4d931fb2eb04346b9589424370887ad",
  "http.request.method": "GET",
  "http.request.body.size": 0,
  "span.id": "f913333fbc212845",
  "http.request.resend_count": 0,
  "url.full": "http://www.microsoft.com",
  "event.name": "http.request"
}

2025-01-06 19:04:44.866 [main] [DEBUG] core.tracing - 
{
  "library.name": "core",
  "library.version": "1.0.0-beta.2",
  "span.name": "GET",
  "span.duration": 5.2217,
  "span.id": "f913333fbc212845",
  "server.address": "www.microsoft.com",
  "trace.id": "a4d931fb2eb04346b9589424370887ad",
  "error.type": "java.net.UnknownHostException",
  "http.request.method": "GET",
  "span.kind": "CLIENT",
  "server.port": 80,
  "span.parent.id": "72d954337db64062",
  "url.full": "http://www.microsoft.com",
  "event.name": "span.ended"
}

2025-01-06 19:04:44.866 [main] [WARN] io.clientcore.core.http.pipeline.HttpInstrumentationPolicy - 
{
  "trace.id": "a4d931fb2eb04346b9589424370887ad",
  "http.request.method": "GET",
  "http.request.duration": 7.4057,
  "exception.type": "java.net.UnknownHostException",
  "exception.message": null,
  "http.request.body.size": 0,
  "span.id": "f913333fbc212845",
  "http.request.resend_count": 0,
  "url.full": "http://www.microsoft.com",
  "exception.stacktrace": "java.net.UnknownHostException\r\n\tat com.test/com.test.App.lambda$main$0(App.java:32)\r\n\tat [email protected]/io.clientcore.core.http.pipeline.HttpPipelineNextPolicy...",
  "event.name": "http.response"
}

2025-01-06 19:04:44.868 [main] [DEBUG] io.clientcore.core.http.pipeline.HttpRetryPolicy - 
{
  "retry.was_last_attempt": false,
  "trace.id": "a4d931fb2eb04346b9589424370887ad",
  "retry.delay": 824,
  "exception.type": "java.io.UncheckedIOException",
  "exception.message": "java.net.UnknownHostException",
  "retry.max_attempt_count": 3,
  "span.id": "72d954337db64062",
  "http.request.resend_count": 0,
  "exception.stacktrace": "java.io.UncheckedIOException: java.net.UnknownHostException\r\n\tat com.test/com.test.App.lambda$main$0(App.java:32)\r\n\tat ....",
  "event.name": "http.retry"
}

2025-01-06 19:04:45.694 [main] [DEBUG] io.clientcore.core.http.pipeline.HttpInstrumentationPolicy - 
{
  "trace.id": "a4d931fb2eb04346b9589424370887ad",
  "http.request.method": "GET",
  "http.request.body.size": 0,
  "span.id": "7f622788d28f645f",
  "http.request.resend_count": 1,
  "url.full": "http://www.microsoft.com",
  "event.name": "http.request"
}

2025-01-06 19:04:45.803 [main] [INFO] io.clientcore.core.http.pipeline.HttpInstrumentationPolicy - {
  "http.request.time_to_response": 108.7054,
  "http.response.body.size": 201253,
  "trace.id": "a4d931fb2eb04346b9589424370887ad",
  "http.request.method": "GET",
  "http.request.duration": 108.7196,
  "http.request.body.size": 0,
  "span.id": "7f622788d28f645f",
  "http.request.resend_count": 1,
  "url.full": "http://www.microsoft.com",
  "http.response.status_code": 200,
  "event.name": "http.response"
}

2025-01-06 19:04:45.803 [main] [DEBUG] core.tracing - {
  "library.name": "core",
  "library.version": "1.0.0-beta.2",
  "server.address": "www.microsoft.com",
  "span.name": "GET",
  "trace.id": "a4d931fb2eb04346b9589424370887ad",
  "span.duration": 108.9267,
  "http.request.method": "GET",
  "span.kind": "CLIENT",
  "server.port": 80,
  "span.parent.id": "f913333fbc212845",
  "span.id": "7f622788d28f645f",
  "http.request.resend_count": 1,
  "url.full": "http://www.microsoft.com",
  "http.response.status_code": 200,
  "event.name": "span.ended"
}

or with simulated error and redirect (formatted, info)

2025-01-06 19:21:08.735 [main] [WARN] io.clientcore.core.http.pipeline.HttpInstrumentationPolicy - 
{
  "trace.id": "dfc10b5460554c87ba9dc6c5a1fdda1c",
  "http.request.method": "GET",
  "http.request.duration": 6.5101,
  "exception.type": "java.net.UnknownHostException",
  "exception.message": null,
  "http.request.body.size": 0,
  "span.id": "58fea390ed94a20b",
  "http.request.resend_count": 0,
  "url.full": "http://microsoft.com",
  "event.name": "http.response"
}

2025-01-06 19:21:09.648 [main] [INFO] io.clientcore.core.http.pipeline.HttpInstrumentationPolicy - 
{
  "http.request.time_to_response": 121.9558,
  "http.response.body.size": 0,
  "trace.id": "dfc10b5460554c87ba9dc6c5a1fdda1c",
  "http.request.method": "GET",
  "http.request.duration": 121.9866,
  "http.request.body.size": 0,
  "span.id": "0ef927d10d392086",
  "http.request.resend_count": 1,
  "url.full": "http://microsoft.com",
  "http.response.status_code": 301,
  "event.name": "http.response"
}

2025-01-06 19:21:09.885 [main] [INFO] io.clientcore.core.http.pipeline.HttpInstrumentationPolicy - 
{
  "http.request.time_to_response": 236.4778,
  "http.response.body.size": 201253,
  "trace.id": "dfc10b5460554c87ba9dc6c5a1fdda1c",
  "http.request.method": "GET",
  "http.request.duration": 236.4934,
  "http.request.body.size": 0,
  "span.id": "3016d2e453fc9874",
  "http.request.resend_count": 1,
  "url.full": "https://www.microsoft.com/",
  "http.response.status_code": 200,
  "event.name": "http.response"
}

So new RequestOptions().setInstrumentationContext(createRandomContext()); brings correlation to all HTTP logs (including retries and redirects) in this pipeline.
It works with and without otel.

We can easily extend it to support implicit context propagation (which already works with otel).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to HttpInstrumentationLoggingTests

@@ -314,27 +311,6 @@ public void tracingIsDisabledOnInstance() throws IOException {
assertEquals(0, exporter.getFinishedSpanItems().size());
}

@Test
public void tracingIsDisabledOnRequest() throws IOException {
Copy link
Member Author

@lmolkova lmolkova Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed disabling individual requests - we didn't advertise this feature in azure-core and I don't recall anyone asking for it.
We also don't support it in other languages.

@lmolkova lmolkova force-pushed the log-based-tracing-without-otel branch from bcc6c9c to 4db1075 Compare January 11, 2025 02:37
@lmolkova lmolkova marked this pull request as ready for review January 11, 2025 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants