-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix orphaned spans on Celery workers #822
Comments
@timmc-edx: I wanted to bring your attention to Alex's note from Slack:
This might be additional info about things that are off on the spans you will be looking into. If not, you can ignore as far as this ticket is concerned, other than to report back your findings eventually. Thanks. |
Numbers across resourcesOn prod LMS for the past 2 days, all
Key: [A] = always top-level; [S] = sometimes top-level, sometimes child Numbers drilldownFiltering on the most common top-level celery.apply over what I hope is a representative smaller time window in prod (no spikes):
So we have:
However, over a different time period that contained some spikes:
Here, the Trace-level analysisThe transmit task has a recalculate span as parent. Analysis of this relationship over a recent time period:
|
Filed https://help.datadoghq.com/hc/en-us/requests/1877349 ("Orphaned spans on celery worker") with Datadog Support. |
Answering the question "do other celery workers have this problem...
Results, filtered down to edX services:
So edxapp has multiple kinds of top-level spans, but the other workers have at most |
We're seeing orphaned spans on several of our Celery workers, identifiable as service entry spans that are not
operation_name:celery.run
:We noticed this because of missing code owner on root spans. We've restricted our monitors to just
celery.run
root spans, but we still want to fix this because about 10% of traces are broken.Filed https://help.datadoghq.com/hc/en-us/requests/1877349 ("Orphaned spans on celery worker") with Datadog Support.
The text was updated successfully, but these errors were encountered: