Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telemetry duration of async resolvers blown out by other resolvers #758

Closed
garthk opened this issue Jul 13, 2019 · 9 comments
Closed

Telemetry duration of async resolvers blown out by other resolvers #758

garthk opened this issue Jul 13, 2019 · 9 comments

Comments

@garthk
Copy link
Contributor

garthk commented Jul 13, 2019

Split from #757:

the durations for asynchronous fields are for how long it took Absinthe's process to get the result, not how long it took to produce the result. I think the duration reported for an async field can blow out by up to the duration of the longest sync field present in the same query.

that's a fair point, but other than fields that use the async helper it's probably the right call. Consider something like batch or dataloader:

batch({SomeModule, :my_batch_fun}, id, fn results ->
  {:ok, Map.get(results, id)}
end)

The field isn't resolved when the batch function completes, it's resolved when the field finishes resolving, which requires {:ok, Map.get(results, id)} to run. That inner part can be arbitrarily complex and take up arbitrary time, which should be counted against it.

Agreed.

I'm still concerned people could get misled by the telemetry and any tracing built on it if the scheduling of the inner part can get delayed by other fields, and that delay is also counted against it. Does batch behave differently, somehow?

async doesn't really have an after the fact sync part, but at the same time it's also one of the least commonly used helpers since batch and dataloader are also effectively async while allowing for improved efficiency.

If's not fixable, and if batch doesn't suffer the same problems, should we actively encourage people to use batch?

@garthk
Copy link
Contributor Author

garthk commented Jul 14, 2019

should we actively encourage people to use batch?

😂 such literally in @benwilson512 and @bruce's book:

image

… it's just not mentioned in the @moduledoc for either Absinthe.Resolution.Helpers or Absinthe.Middleware.Async. Asking for clarity in #absinthe-graphql.

@garthk
Copy link
Contributor Author

garthk commented Jul 14, 2019

I've tried batch. Same problem with telemetry results getting pushed out by sync fields, unless I got the pattern wrong:

  defmodule TestSchema do
    use Absinthe.Schema

    object :field_resolution_timing do
      field :begin_ms, :integer
      field :delay_ms, :integer
      field :end_ms, :integer
      field :label, :integer
    end

    query do
# …
      field :delay_batch, :field_resolution_timing do
        arg :delay_ms, :integer

        resolve fn _, %{delay_ms: delay_ms}, _ ->
          ref = make_ref()

          batch({__MODULE__, :batch_delay_and_report}, {ref, delay_ms}, fn results ->
            {:ok, results[ref]}
          end)
        end
      end
    end

    def batch_delay_and_report(_, args) do
      IO.inspect(args, label: "batch_delay_and_report args")
      Map.new(args, fn {ref, delay_ms} -> {ref, delay_and_report(:delay_batch, delay_ms)} end)
    end
# …
  end

I didn't read enough emphasis on “when the thing you're executing isn't sensibly batched”, which is the usual case when I'm having to hit REST APIs to get the job done.

Back to the bug, though, I think it's fair to say that for async and batch , the duration you get on [:absinthe, :resolve, :field] can get blown out by time spent on other fields, including:

  • fields that don't use async, batch, or dataloader

  • the inner part on batch fields, which can take up arbitrary time

If so, we need to document it somewhere so people trying to rely on the measurement know what it's measuring.

@binaryseed
Copy link
Contributor

I'm not quite sure what alternative you are trying to suggest here...

Telemetry instrumentation tracks the time from start to finish of resolving a field. We shouldn't subtract time from that based on what kind of operation happened between start and finish.

If you need more detail about what's going on between start and finish, then that code should be traced as well.. Those other resolvers will also show up as spans that should overlap in time.

I think about tracing as about measuring code execution, so it's going to match what happens in absinthe code, not the shape of the GraphQL query or our mental representation of what "counts" as part of the duration.

@garthk
Copy link
Contributor Author

garthk commented Jul 24, 2019

I'm not sure we've any alternative other than to document it: the duration measures how long it took Absinthe to get to the result, not how long it took your code to get the result, and if you want to know the latter you'll need to measure it yourself.

Given a longer horizon, more telemetry on something similar to #757 could plausibly provide the latter for the user, but I reckon it'll be so full of corners any such telemetry should be left to the authors of whatever's doing the heavy lifting.

Speaking of #757: if you could straighten me out on that last question about the execution order, it'll totally let me cancel the PR and get out of your hair.

@benwilson512
Copy link
Contributor

Perhaps down the line the best model would be to invert the approach. Start a span when Absinthe's resolution loop starts doing work, and then end it right before it calls user code. Then if you overlayed that against the other spans you'd be able to tell that other work was happening.

@benwilson512
Copy link
Contributor

Anyone up for documenting this?

@garthk
Copy link
Contributor Author

garthk commented Aug 15, 2019

#770 is about as much as I can slap together this week, but it's a start.

@binaryseed
Copy link
Contributor

OK to close this now?

@garthk
Copy link
Contributor Author

garthk commented Sep 15, 2019

Yeah, between this and the docs anyone confused can find out what's going on.

@garthk garthk closed this as completed Sep 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants