-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add async task wrapper for trace propagation. #757
Conversation
It feels a little dicey, but I've been playing around with using a ETS table to store the pid of the process doing the resolution & the span context, then using defmodule AbsintheUtils.OpenCensus.OcTaskContext do
@moduledoc """
Stores a span_ctx in a ETS table, to be looked up via $callers
Useful for Dataloader
"""
use Task
alias __MODULE__.EtsTable
def create_ets_table do
:ets.new(EtsTable, [
:named_table,
:set,
:public,
write_concurrency: true
])
end
def set(span_ctx) do
:ets.insert(EtsTable, {self(), span_ctx})
end
def unset do
:ets.delete(EtsTable, self())
end
def get do
Process.get(:"$callers", [])
|> Enum.find_value(fn pid ->
case :ets.lookup(EtsTable, pid) do
[] -> nil
[{_, ctx}] -> ctx
end
end)
end
end I have absinthe phases that run after/before When I need to find the span_ctx, That ETS table grows forever If |
I'd forgotten about Process.info(pid) |> get_in([:dictionary, :oc_span_ctx_key]) You'll get |
Zooming back out: while I could plausibly have each resolver function rummage through its |
@garthk I had went with the ETS table because I was under the impression that it was expensive to get at another process's dictionary. Specifically I had seen: https://github.com/erlang/otp/blob/master/erts/doc/src/communication.xml#L61-L74 I was specifically trying to add some tracing around Ecto, and I have not benchmarked it, though. Maybe it's not much of a performance hit. With accessing the process dictionary directly, I'm also not sure if we have to be concerned about any performance implications if one of the |
This is exactly what the Telemetry integration is for.. it's in If there are more fields that are needed (the field start event is a little bare right now), then we can talk about getting those passed to the Telemetry callbacks |
@binaryseed, I've been working with a clone of your telemetry implementation against 1.4 and haven't yet managed to solve the whole problem. I trust you'll be able to spot whatever I'm missing, set me straight, and there'll prove to be no need for this awful PR? We've got two broad challenges with tracing in 1.4:
Your
If you can straighten me out on one part of the resolution mechanism (see below), it'll also prove sufficient for:
That'd still leave:
For propagation using the process dictionary, we need to either update every resolution function or replace Absinthe's For propagation using With either, I still don't understand the Absinthe resolution mechanism enough to be confident we can set either the pdict or trace label in the blueprint process to reflect the field we're working on at the time the resolver function starts. You know it inside out. Is that possible, given only the existing phase and middleware extension points, and the 1.5 telemetry extension points? |
Also, if I'm reading the results for the failing test in 65dfa8a right, the durations for asynchronous fields are for how long it took Absinthe's process to get the result, not how long it took to produce the result. I think the duration reported for an async field can blow out by up to the duration of the longest sync field present in the same query. |
@garthk that's a fair point, but other than fields that use the
The field isn't resolved when the batch function completes, it's resolved when the field finishes resolving, which requires
I don't think targeting 1.4 makes sense. If there are things we can improve with respect to tracing in 1.5 great, let's work on that, but 1.5 is so close to being releasable that I don't want to distract the core team with significant 1.4 reworks. |
I agree with @benwilson512 here; there will be no compelling reasons to stick with v1.4 once v1.5 is out (especially in light of some significant performance optimizations and the fact v1.5 is largely backwards-compatible), and there aren't any plans to backport features—just changes relating to security. |
Oh, agreed. I'm not targeting 1.4. I have 1.4 in prod, and I'm describing the problems I'm having with it, and I'm being specific because I don't want to mislead anyone. I'm fine concentrating on 1.5. If we can solve the trace propagation in 1.5 with the current telemetry design, I'll be thrilled. Hence, all the Q&A. Well, mostly Q. If I use an |
OK to close this now? |
Yep. If time permits, I'll swing back around later with a more generic way to hook all Absinthe's execution of user-supplied code so all us instrumentation etc authors can share the load of figuring out how to hook in. Better yet, we can upstream it to Absinthe once stable. Unlikely to be this month or next month, but, unless things get hairy and I need to upgrade the introspection to lock it down. |
I tried a few times to use middleware and extra phases to propagate trace information into the processes running my async resolvers, but couldn't. That leaves me with either wrapping each resolver myself, or intervening before Absinthe runs
Task.async/1
. If there's a cleaner way, I'm all ears. This method feels clumsy, and I don't yet know if it'll break with data loaders (opencensus-beam/opencensus_absinthe#21) or other parts of the ecosystem.