-
-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue: intermittent test failures #200
Comments
Regarding the weird pypy nightly freeze in test_local: I downloaded the pypy-c-jit-91601-609a3cdf9cf7-linux64 nightly, and have let it loop running the trio testsuite for the last a few hours on my laptop, and I haven't been able to reproduce the problem so far. (Though I did get two failures in |
#119 is now fixed, I think / hope. Demoting it to "on the radar". |
Got fed up and fixed #140 :-) |
Freeze in |
Here's a weird one: https://travis-ci.org/python-trio/trio/jobs/298164123 It looks like our test for CPython 3.6.2 on MacOS, one of our calls to the synchronous, stdlib function |
Here's a new one, I guess somehow introduced by #358: a timeout test failing on windows because a 1 second sleep is measured to take just a tinnnnny bit less than 1 second. |
The weird SSL failure happened again: https://travis-ci.org/python-trio/trio/jobs/311618077 Filed upstream as bpo-32219. Possibly for now we should ignore Edit: #365 is the PR for ignoring it. |
Another freeze on PyPy nightly in Same thing happened on Sept. 7, above: #200 (comment) Filed a bug: #379 |
Sigh: #447 |
There was a mysterious appveyor build failure here: #535 |
Strange PyPy nightly failure: https://travis-ci.org/python-trio/trio/jobs/391945139 Since it happened on master, I can't close/reopen a PR, but restarting the job produced the same effects. (I think someone restarted the job above and it finally worked: the job is now green.) |
Jenkins keeps eating pull requests with a segfault (but only sometimes). Looks like a bug in immutable library - but can't reproduce it locally, and I don't know how to get the core dump. |
Here's a log with the segfault on Jenkins: https://ci.cryptography.io/blue/rest/organizations/jenkins/pipelines/python-trio/pipelines/trio/branches/PR-575/runs/2/nodes/6/steps/33/log/?start=0 The crash-handler traceback shows it as happening on line 27 of self._data = immutables.Map() And Filed a bug upstream here: MagicStack/immutables#7 |
a segfault in pypy 3.6 nightly, apparently related to the faulthandler timeout firing in Reported it on the #pypy irc channel anyway, though there's not much to go on yet |
Another strange pypy 3.6 nightly faulthandler traceback: https://travis-ci.org/python-trio/trio/jobs/436962955 I don't really understand this one at all. |
What if we change it to
|
Sure, that might work too. We're just flailing in the dark here, so it's probably too optimistic to hope we can make rational fine-grained choices between similar options :-) |
Recently hit on Github Actions,
|
Here's an interesting example of a flaky test:
https://github.com/python-trio/trio/pull/1551/checks?check_run_id=720114354 |
Prompted by this random test failure: python-trio#200 (comment)
Mysterious hang in There aren't really any useful details in the log. I think the only place where we could hang is at the end of the nursery block waiting for the task to exit, and we do call |
I think I've seen this one a few times already. https://github.com/python-trio/trio/pull/1574/checks?check_run_id=739572585
|
Pytest has better warnings tools these days, we could probably fix that
test pretty easy using some combination of pytest.warns and (IIRC) the .pop
method that takes a type of warning to search for
…On Thu, Jun 4, 2020, 11:12 Quentin Pradet ***@***.***> wrote:
I think I've seen this one a few times already.
https://github.com/python-trio/trio/pull/1574/checks?check_run_id=739572585
___________________ test_warn_deprecated_no_instead_or_issue ___________________
recwarn_always = WarningsRecorder(record=True)
def test_warn_deprecated_no_instead_or_issue(recwarn_always):
# Explicitly no instead or issue
warn_deprecated("water", "1.3", issue=None, instead=None)
> assert len(recwarn_always) == 1
E assert 5 == 1
E +5
E -1
/Users/runner/hostedtoolcache/Python/3.7.7/x64/lib/python3.7/site-packages/trio/tests/test_deprecate.py:46: AssertionError
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEU42FIXXRHHIXPLNISY5DRU7P2BANCNFSM4DOZLF5A>
.
|
test_open_tcp_listeners_backlog failed again on macOS 3.8: https://github.com/python-trio/trio/pull/1575/checks?check_run_id=738751745 |
New failure in the thread cache tests: #1604 |
Transient failure in |
test_pipes failed again on macOS 3.8: #1713 (comment) |
This failure is super weird, and different from the issue in #1170. The test essentially does: with move_on_after(1.0) as scope:
async with await open_process(SLEEP(3600)) as proc:
proc.terminate()
assert not scope.cancelled_caught
assert proc.returncode == -SIGTERM ...and then the test fails on the last line because So it seems like somehow, the child process is dying from a SIGKILL. How could that be? The try:
await self.wait()
finally:
if self._proc.returncode is None:
self.kill()
with trio.CancelScope(shield=True):
await self.wait() I don't think So this seems to be one of those "that can't happen" errors... I don't know where this SIGKILL could be coming from.
I think this one is just a too-aggressive timeout: #1715 |
|
macOS 3.8 |
Another macOS 3.8 |
Yet another macOS 3.8 I think Python 3.8 changed subprocess handling for macOS (and Linux), but I don't know the details nor why those changes would affect Trio. |
macOS 3.7 interactive failure: https://github.com/python-trio/trio/pull/1747/checks?check_run_id=1208098695 So at least we know that this isn't specific to 3.8 |
timeout on Windows builds occurring often again |
"Windows (3.8, x64, with IFS LSP)" test is often running endlessly with this loop:
|
It's the nature of I/O libraries like Trio that their test suites are prone to weird intermittent failures. But they're often hard to track down, and usually the way you encounter them is that you're trying to land some other unrelated feature and the CI randomly fails, so the temptation is to click "re-run build" and worry about it later.
This temptation must be resisted. If left unchecked, you eventually end up with tests that fail all the time for unknown reasons and no-one trusts them and it's this constant drag on development. Flaky tests must be eradicated.
But to make things extra fun, there's another problem: CI services genuinely are a bit flaky, so when you see a weird failure or lock-up in the tests then it's often unclear whether this is a bug in our code, or just some cloud provider having indigestion. And you don't want to waste hours trying to reproduce indigestion. Which means we need to compare notes across multiple failures. Which is tricky when I see one failure, and you see another, and neither of us realizes that we're seeing the same thing. Hence: this issue.
What to do if you see a weird test failure that makes no sense:
Visit this bug; it's Tracking issue: intermittent test failures #200 so it's hopefully easy to remember.
Check to see if anyone else has reported the same failure
Either way, add a note recording what you saw. Make sure to link to the failed test log.
Special notes for specific CI services:
Issues we're monitoring currently
test_open_tcp_listeners_backlog
: Tracking issue: intermittent test failures #200 (comment) (last seen: all the time, should be fixed by Give up on measuring TCP socket backlog #1601)The text was updated successfully, but these errors were encountered: