-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JPERF-811: Fix flaky shouldTolerateEarlyFinish test by waiting until … #23
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,7 +56,8 @@ class SshTest { | |
SshContainer().useSsh { sshHost -> | ||
installPing(sshHost) | ||
|
||
val fail = sshHost.runInBackground("nonexistent-command") | ||
val fail = sshHost.runInBackground("nonexistant-command") | ||
sshHost.waitForAllProcessesToFinish("nonexistant-command", Duration.ofMillis(100)) | ||
val failResult = fail.stop(Duration.ofMillis(20)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this fails due to We can see that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW. It seems arbitrary to have both There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I traced it down to ancient times behind the wall: https://bulldog.internal.atlassian.com/browse/JPT-292 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I already had the fixes, but I was checking the effect of each commit via rebase (to populate the CHANGELOG correctly). And it turns out that |
||
|
||
Assert.assertEquals(127, failResult.exitStatus) | ||
|
@@ -66,4 +67,11 @@ class SshTest { | |
private fun installPing(sshHost: Ssh) { | ||
sshHost.newConnection().use { it.execute("apt-get update -qq && apt-get install iputils-ping -y") } | ||
} | ||
|
||
private fun Ssh.waitForAllProcessesToFinish( | ||
processCommand: String, | ||
timeout: Duration | ||
) = this.newConnection().use { | ||
it.execute("wait `pgrep '$processCommand'`", timeout) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably a bit of an overkill, but I don't have a better idea how we can wait for the process to finish. It's also a shotgun, because we are waiting not only for the one process we started, but also all of other processes that might be named the same. I think we don't expect there to be any other processes like that anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the chance of other processes with the same name is minimal, and this solution should be sufficient. |
||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the test shows that the
runInBackground
API cannot be used easily. We should fix the problem in the API rather than in tests.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could wait for the process to start before returning from
runInBackground
similarly to how my solution proposal does it in this test, however this approach is very system specific (usage ofwait
andpgrep
). I don't know how we could do that in more portable way.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another approach would be to just return some specific
SshConnection.SshResult
as part ofBackgroundProcess.stop
when we don't really get theexitStatus
, however I don't know what would that be and we need to return something if we want to maintain the current API ofBackgroundProcess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dagguh I'd like to fix the flakiness of the test. This is my main goal.
My understanding of possible implementations of your suggestion is to either make it less portable or break the API (see my 2 previous comments). I don't like any of those options and I'm a bit stuck with this PR. Do you have any other ideas how this could be fixed? If not then maybe you have opinion about which of those 2 is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pczuj why are we afraid to break the API and properly apply semantic versioning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can just break the API, it's a tool in our belt.
It can require some extra effort. Let's say we release
ssh:3.0.0
. Theinfrastructure
is assh
consumer:It will have a choice:
ssh:[2.3.0,3.0.0)
- stay on oldssh
, this is the default state. It takes no effort, but what was the point ofssh:3.0.0
, if no consumer needs it?ssh:[2.3.0,4.0.0)
- use the new, without dropping the old. Only possible if the removed API was not used.ssh:[3.0.0,4.0.0)
- use the new, drop the old. This means that all consumers ofinfrastructure
, which are still onssh:[2.0.0,3.0.0)
, will no longer be compatible. Therefore that bump is breakage ofinfrastructure
(its POM contract), which is a another major release. This bump would cascade to multiple libs:aws-infrastructure
andjira-performance-tests
, even if they don't usessh
directly.We used scenario 3 many times in the past. It's a bit annoying, but not that scary.
We successfully used scenario 2 as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean we have to break API. I meant: the tests found a real flakiness pain, let's fix it for the consumers as well.
I would avoid it too. Not only does it lose generality, but also more moving parts: brittle and complex.