Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry flaky unified tests #1565

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

katcharov
Copy link
Contributor

@katcharov katcharov commented Nov 20, 2024

JAVA-5795

See WRITING-28651

This creates N copies (attempts) of the unified tests, and places a try-catch around shouldPassAllOutcomes. If an attempt passes, ensuing attempts are ignored. Failures are ignored, except the last attempt.

This is not intended to fully resolve the issue, but is a low-cost fix that should reduce failing tests.

@katcharov katcharov requested review from a team and vbabanin and removed request for a team November 20, 2024 22:30
@katcharov katcharov force-pushed the JAVA-5393-unified-retries branch from 6282ce7 to 8e8d7a3 Compare November 21, 2024 18:17
@katcharov katcharov marked this pull request as draft November 21, 2024 18:17
@katcharov katcharov force-pushed the JAVA-5393-unified-retries branch from 8e8d7a3 to 6f3b56d Compare November 21, 2024 21:10
@katcharov katcharov force-pushed the JAVA-5393-unified-retries branch from d24e003 to 6b53427 Compare December 2, 2024 22:27
Comment on lines 330 to +338
public void shouldPassAllOutcomes(
final String testName,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method contains the try-catch with the new retry logic.

@katcharov katcharov force-pushed the JAVA-5393-unified-retries branch from 86681ad to 3b2c915 Compare December 5, 2024 23:48
Comment on lines +43 to +55

// TODO reasons for retry
// Exception in encryption library: ChangeCipherSpec message sequence violation
def.retry("TODO reason")
.whenFailureContains("ChangeCipherSpec message sequence violation")
.test("client-side-encryption", "namedKMS-createDataKey", "create datakey with named KMIP KMS provider");

def.retry("TODO reason")
.whenFailureContains("Number of checked out connections must match expected")
.test("load-balancers", "cursors are correctly pinned to connections for load-balanced clusters", "pinned connections are returned after a network error during a killCursors request");

def.retry("TODO reason")
.test("client-side-encryption", "namedKMS-rewrapManyDataKey", "rewrap to kmip:name1");
Copy link
Contributor Author

@katcharov katcharov Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are flaky failures from evergreen. Reasons should be specified, or these 3 retries can be removed and added in a separate PR, after examining the failures.

}
if (!testDef.matchesThrowable(e)) {
// if the throwable is not matched, test definitions were not intended to apply; rethrow it
throw e;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this by failing an assertion within the try of shouldPassAllOutcomes. When the assertion message matches what is specified in a def, it will ignore, ignore, and then fail with that assertion failure. When it does not match, the first instance will fail, and the remaining will be ignored.

@katcharov katcharov force-pushed the JAVA-5393-unified-retries branch from ce1918c to 784dc51 Compare December 6, 2024 00:24
@katcharov katcharov force-pushed the JAVA-5393-unified-retries branch from 784dc51 to f62539e Compare February 4, 2025 22:02
@@ -156,32 +164,47 @@ public Entities getEntities() {
}

@NonNull
protected static Collection<Arguments> getTestData(final String directory) throws URISyntaxException, IOException {
Copy link
Contributor Author

@katcharov katcharov Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below should be a straightforward inline, along with the changes to the test parameters.

@katcharov katcharov marked this pull request as ready for review February 5, 2025 14:11
@@ -101,6 +106,9 @@ public abstract class UnifiedTest {
private static final Set<String> PRESTART_POOL_ASYNC_WORK_MANAGER_FILE_DESCRIPTIONS = Collections.singleton(
"wait queue timeout errors include details about checked out connections");

public static final int ATTEMPTS = 3;
private static Set<String> ignoreRemaining = new HashSet<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Should we name it succeededRetryableTests? This would provide a bit more context about what kind of tests are being ignored.

Comment on lines 350 to 354
if (!forceFlaky) {
boolean ignoreThisTest = ignoreRemaining.contains(testName);
assumeFalse(ignoreThisTest, "Skipping a retryable test that already succeeded");
ignoreRemaining.add(testName);
}
Copy link
Member

@vbabanin vbabanin Feb 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding to ignoreRemaining/succeededRetryableTests only when a test succeeds. This could make the logic a bit easier to follow when making future changes or referring to the code, rather than adding it first and then removing it upon catching an exception.

For example, we could add this line before the catch block:
ignoreRemaining/succededRetryableTests.add(testName);

And place this at the beginning of the method:

Suggested change
if (!forceFlaky) {
boolean ignoreThisTest = ignoreRemaining.contains(testName);
assumeFalse(ignoreThisTest, "Skipping a retryable test that already succeeded");
ignoreRemaining.add(testName);
}
boolean ignoreThisTest = succededRetryableTests.contains(testName) && !forceFlaky;
assumeFalse(ignoreThisTest, "Skipping a retryable test that already succeeded");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I remember considering this during implementation, but it has been a while. I might be mistaken about my original reasoning, but maybe the following makes sense.)

If we add it at the end then the implication is that all "exits" during execution are failures. For example, imagine we start a test, and during execution, it exits with an assumption failure. So there has been a successful attempt, even though there has been no successful test. Here, we should not retry the test (practically, we might waste setup time). Since it is a failure that triggers the retry, we change state when the failure happens.

I added a catch for assertion failures, and renamed the variable. Let me know what you think.

* message will be checked. Otherwise, the throwable will be checked.
*/
public TestApplicator whenFailureContains(final String messageFragment) {
this.matchesThrowable = (final Throwable e) -> {
Copy link
Member

@vbabanin vbabanin Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, whenFailuteContains can be chained with skips that do not have the RETRY modifier.

To reduce the potential for misconfigurations, should we add an assertion?

     assertTrue(this.modifiersToApply.contains(RETRY),
                    format("Modifier %s was not specified before calling whenFailureContains", RETRY));

For assertTrue method with the message we would need to add it to com.mongodb.assertions.Assertions.

@katcharov katcharov requested a review from vbabanin February 26, 2025 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants