-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(celery): close celery.apply
spans even without after_task_publish, when using apply_async [backport 2.12]
#10891
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…sh, when using apply_async (#10676) The instrumentation for the Celery integration relies on various [Celery signals ](https://docs.celeryq.dev/en/stable/userguide/signals.html) in order to start and end the span when calling on `apply_async`. The integration can fail if the expected signals don't trigger, which can lead to broken context propagation (and unexpected traces). **Example:** - dd-trace-py expects the signal `before_task_publish` to start the span then `after_task_publish` to close the span. If the `after_task_publish` signal never gets called (which can happen if a Celery exception occurs while processing the app), then the span won't finish. - The same thing above can also happen to `task_prerun` and `task_postrun`. **Solution** This PR patches `apply_async` so that there is a check to see if there is a span lingering around and closes it when `apply_task` is called. If an internal exception happens, the error will be marked on the `celery.apply` span. To track this, I added new logs in debug mode: > The after_task_publish signal was not called, so manually closing span and > The task_postrun signal was not called, so manually closing span There's a related PR #10848 that works to improve how we extract information based on the protocols, that also affects when spans get closed or not. Special Thanks: - Thanks to @tabgok for going through this with me in great detail! - @timmc-edx for helping us track it down! [APMS-13158] ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) APMS-13158 [APMS-13158]: https://datadoghq.atlassian.net/browse/APMS-13158?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ --------- Co-authored-by: Emmett Butler <[email protected]> (cherry picked from commit 0d28e08)
wantsui
approved these changes
Oct 1, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Datadog ReportBranch report: ✅ 0 Failed, 59982 Passed, 2267 Skipped, 11h 17m 17.45s Total duration (6h 6m 20.71s time saved) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport 0d28e08 from #10676 to 2.12.
The instrumentation for the Celery integration relies on various Celery signals in order to start and end the span when calling on
apply_async
.The integration can fail if the expected signals don't trigger, which can lead to broken context propagation (and unexpected traces).
Example:
before_task_publish
to start the span thenafter_task_publish
to close the span. If theafter_task_publish
signal never gets called (which can happen if a Celery exception occurs while processing the app), then the span won't finish.task_prerun
andtask_postrun
.Solution
This PR patches
apply_async
so that there is a check to see if there is a span lingering around and closes it whenapply_task
is called.If an internal exception happens, the error will be marked on the
celery.apply
span.To track this, I added new logs in debug mode:
and
There's a related PR #10848 that works to improve how we extract information based on the protocols, that also affects when spans get closed or not.
Special Thanks:
APMS-13158
Checklist
Reviewer Checklist
APMS-13158