-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry flaky unified tests #1565
base: main
Are you sure you want to change the base?
Conversation
6282ce7
to
8e8d7a3
Compare
8e8d7a3
to
6f3b56d
Compare
d24e003
to
6b53427
Compare
public void shouldPassAllOutcomes( | ||
final String testName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method contains the try-catch with the new retry logic.
86681ad
to
3b2c915
Compare
|
||
// TODO reasons for retry | ||
// Exception in encryption library: ChangeCipherSpec message sequence violation | ||
def.retry("TODO reason") | ||
.whenFailureContains("ChangeCipherSpec message sequence violation") | ||
.test("client-side-encryption", "namedKMS-createDataKey", "create datakey with named KMIP KMS provider"); | ||
|
||
def.retry("TODO reason") | ||
.whenFailureContains("Number of checked out connections must match expected") | ||
.test("load-balancers", "cursors are correctly pinned to connections for load-balanced clusters", "pinned connections are returned after a network error during a killCursors request"); | ||
|
||
def.retry("TODO reason") | ||
.test("client-side-encryption", "namedKMS-rewrapManyDataKey", "rewrap to kmip:name1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are flaky failures from evergreen. Reasons should be specified, or these 3 retries can be removed and added in a separate PR, after examining the failures.
} | ||
if (!testDef.matchesThrowable(e)) { | ||
// if the throwable is not matched, test definitions were not intended to apply; rethrow it | ||
throw e; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this by failing an assertion within the try of shouldPassAllOutcomes
. When the assertion message matches what is specified in a def, it will ignore, ignore, and then fail with that assertion failure. When it does not match, the first instance will fail, and the remaining will be ignored.
ce1918c
to
784dc51
Compare
784dc51
to
f62539e
Compare
@@ -156,32 +164,47 @@ public Entities getEntities() { | |||
} | |||
|
|||
@NonNull | |||
protected static Collection<Arguments> getTestData(final String directory) throws URISyntaxException, IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below should be a straightforward inline, along with the changes to the test parameters.
@@ -101,6 +106,9 @@ public abstract class UnifiedTest { | |||
private static final Set<String> PRESTART_POOL_ASYNC_WORK_MANAGER_FILE_DESCRIPTIONS = Collections.singleton( | |||
"wait queue timeout errors include details about checked out connections"); | |||
|
|||
public static final int ATTEMPTS = 3; | |||
private static Set<String> ignoreRemaining = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Should we name it succeededRetryableTests
? This would provide a bit more context about what kind of tests are being ignored.
if (!forceFlaky) { | ||
boolean ignoreThisTest = ignoreRemaining.contains(testName); | ||
assumeFalse(ignoreThisTest, "Skipping a retryable test that already succeeded"); | ||
ignoreRemaining.add(testName); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding to ignoreRemaining/succeededRetryableTests
only when a test succeeds. This could make the logic a bit easier to follow when making future changes or referring to the code, rather than adding it first and then removing it upon catching an exception.
For example, we could add this line before the catch block:
ignoreRemaining/succededRetryableTests.add(testName);
And place this at the beginning of the method:
if (!forceFlaky) { | |
boolean ignoreThisTest = ignoreRemaining.contains(testName); | |
assumeFalse(ignoreThisTest, "Skipping a retryable test that already succeeded"); | |
ignoreRemaining.add(testName); | |
} | |
boolean ignoreThisTest = succededRetryableTests.contains(testName) && !forceFlaky; | |
assumeFalse(ignoreThisTest, "Skipping a retryable test that already succeeded"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I remember considering this during implementation, but it has been a while. I might be mistaken about my original reasoning, but maybe the following makes sense.)
If we add it at the end then the implication is that all "exits" during execution are failures. For example, imagine we start a test, and during execution, it exits with an assumption failure. So there has been a successful attempt, even though there has been no successful test. Here, we should not retry the test (practically, we might waste setup time). Since it is a failure that triggers the retry, we change state when the failure happens.
I added a catch for assertion failures, and renamed the variable. Let me know what you think.
* message will be checked. Otherwise, the throwable will be checked. | ||
*/ | ||
public TestApplicator whenFailureContains(final String messageFragment) { | ||
this.matchesThrowable = (final Throwable e) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, whenFailuteContains can be chained with skips that do not have the RETRY modifier.
To reduce the potential for misconfigurations, should we add an assertion?
assertTrue(this.modifiersToApply.contains(RETRY),
format("Modifier %s was not specified before calling whenFailureContains", RETRY));
For assertTrue
method with the message we would need to add it to com.mongodb.assertions.Assertions
.
JAVA-5795
See WRITING-28651
This creates N copies (attempts) of the unified tests, and places a try-catch around shouldPassAllOutcomes. If an attempt passes, ensuing attempts are ignored. Failures are ignored, except the last attempt.
This is not intended to fully resolve the issue, but is a low-cost fix that should reduce failing tests.