-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate nightly benchmarks 0 events/s issue #13738
Comments
Status updateThe first thing I looked at was what was getting reported by the benchmark failures. Here are 2 links to the benchmark run:
Both of these show I have tried reproducing the errors locally but haven't succeeded (note that the expvar metrics collection is designed for benchtimes in minutes so if testing locally make sure that you have a good enough benchtime to give expvar metrics to work correctly). I did see some special handling in the expvar metric collection but nothing explains this bug. I have also created a PR to log errors in expvar endpoint which was not done before. I am not sure how helpful it will be though. |
Is this still happening? |
@simitt I had this happen to me in a run on GH Actions last week, see Slack Thread: https://elastic.slack.com/archives/C95SB62AG/p1729263104854879 |
moving this out of iteration, to backlog. If it happens again more frequently we can reprioritize. |
@raultorrecilla it happens very infrequently. For reference, I've probably ran 300+ benchmarks since Oct, and have only observed this 3 or 4 times. |
edit: there seems to be a real issue with main |
resolved in #15338 |
Reopening as it is happening quite frequently lately. |
moving it as a candidate for next iteration (it-107) |
IMO we should treat this with urgency, ideally one of the on-support duty engineers could look into it. I believe @1pkg started investigation on it as part of support duty last week and could hand over. |
Let's wait for #15360 (comment) to be resolved and see if this still happens afterwards. |
See #15360 (comment) There might be more to this. The expvar endpoint is working fine and when running the benchmarks locally the numbers are reported correctly (events is not 0). |
started a benchmark targeting main without the last 3 commits and it seems to work: https://github.com/elastic/apm-server/actions/runs/13012315722 tip at 1e021db |
started another benchmark targeting main without the last commit and it works too: https://github.com/elastic/apm-server/actions/runs/13012675235/job/36293952533 tip at 8aac9e3 |
ok we're running the main branch with and without moxy. with moxy: everything works and events are not 0 the commit/tip is exactly the same |
I can reproduce this. Specifically, in the case of 0 events/s (without moxy), the ESS ES has all apm documents indexed. There is no error. It is the (truncated output)
|
@kruskall and I were trying to narrow down the root cause. Here's the evidence of why I believe the regression was caused by #15360 :
|
should be fixed by #15439 benchmarks using the pr branch are working fine |
PR merged. Closing this |
Nightly benchmarks occasionally report 0 events/s. Investigate the root cause of it.
The text was updated successfully, but these errors were encountered: