-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JPERF-716 background process results #13
JPERF-716 background process results #13
Conversation
Pros: * can get exit code, stdout, stderr * can `stop` even if the process died earlier * doesn't assume or require `screen` to be installed * conveys the need to release an external resource via `AutoCloseable` * doesn't use unreliable Kotlin `internal` to hide members Cons: * maintains a connection since start to stop * could accumulate unnecessarily big stdout - but it can be fixed later
@@ -11,6 +11,7 @@ import java.util.concurrent.TimeUnit | |||
* | |||
* @see [SshConnection.stopProcess] | |||
*/ | |||
@Deprecated(message = "Use BackgroundProcess instead") | |||
class DetachedProcess private constructor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to do it in DetachedProcess
, but none of the approaches seemed to satisfy all requirements. So I wrote the tests, which guard the requirements. Maybe someone will come in the future and will be able to make them work with DetachedProcess
.
But for now, I had to resign from the requirement of starting/stopping on different Session.Command
s.
This PR is easier to read per-commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dagguh, great work! Sorry for reviewing with such a delay.
No worries mate! |
installPing(sshHost) | ||
|
||
val fail = sshHost.runInBackground("nonexistent-command") | ||
val failResult = fail.stop(Duration.ofMillis(20)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line flakes when sometimes it doesn't get the process' exit status. I'm not sure what's the condition.
java.lang.IllegalStateException: command.exitStatus must not be null
at com.atlassian.performance.tools.ssh.WaitingCommand.waitForResult(WaitingCommand.kt:24)
at com.atlassian.performance.tools.ssh.SshjBackgroundProcess.stop(SshjBackgroundProcess.kt:20)
at com.atlassian.performance.tools.ssh.api.SshTest$shouldTolerateEarlyFinish$1.invoke(SshTest.kt:61)
at com.atlassian.performance.tools.ssh.api.SshTest$shouldTolerateEarlyFinish$1.invoke(SshTest.kt:8)
at com.atlassian.performance.tools.ssh.api.SshContainer.useSsh$ssh(SshContainer.kt:19)
at com.atlassian.performance.tools.ssh.api.SshTest.shouldTolerateEarlyFinish(SshTest.kt:56)
I managed to make the problem frequently reproducible by wrapping the "when->then" part of the test in a repeat(100) { ... }
block.
@Test
fun shouldTolerateEarlyFinish() {
SshContainer().useSsh { sshHost ->
installPing(sshHost)
repeat(100) {
val fail = sshHost.runInBackground("nonexistent-command")
val failResult = fail.stop(Duration.ofMillis(20))
Assert.assertEquals(127, failResult.exitStatus)
}
}
}
Our current API of BackgroundProcess#stop
(which is failing) says:
Throws if getting the [SshConnection.SshResult] fails.
So the problem is not really a contract bug.
==== Start of my thoughts on the topic ====
I personally think that if it's possible to not handle exception then by default it shouldn't break your application, so it shouldn't really be thrown. Because of this I think this is an API design bug and IMHO we should fix it by introducing API that is returning nullable process result.
If we want to let the consumer distinguish reason of failures (missing result) we can do that by allowing the overwrite of problem handlers - by default returning null
, but can throw if consumer wishes to do that. This way we don't leak the implementation specific exception layer into the interface, which should be polymorphic.
==== End of my thoughts on the topic ====
Nevertheless with the current API it's not a bug, yet it's a test failure. I'm trying to figure out how we can resolve it.
Possibly we could add try
-catch
block?
@dagguh do you remember if this test could work as expected if it would accept the process to not return exit status? Maybe it would be fine to repeat the runInBackground
+ stop
couple of times? It doesn't sound as the best solution for me, but I struggle to find any better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expectation is clear: assertEquals(127, failResult.exitStatus)
. That's what happens in a shell for a nonexistent command. That doesn't happen - why? We shouldn't try to hide that dysfunction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hypothesis 1: this would also flake for SshConnection.execute
Hypothesis 2: it's caused by the BackgroundProcess.stop
timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hypothesis 3: missed close
, the getExitStatus
Javadoc asks to "Always call close() first before inspecting the exit status."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hypothesis 3: missed close
Noticed the javadoc as well. I tried to add close and run the test with repeat(100)
, but it didn't help (unless I checked something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20 milliseconds is the time where we wait for the command to react to our interrupt. We are sending this interrupt immediately after starting the process.
I made an experiment by changing the order of operations, so that we read stdout and log it before failing with the exit status being null. (original)
fun waitForResult(): SshConnection.SshResult {
command.waitForCompletion(timeout)
command.close()
val output = command.inputStream.readAndLog(stdout)
val errorOutput = command.errorStream.readAndLog(stderr)
return SshConnection.SshResult(
exitStatus = command.exitStatus,
output = output,
errorOutput = errorOutput
)
}
The output suggests that it indeed doesn't get into the "command not found" part, because the first process that fails to read exit code has no "bash: nonexistent-command: command not found" output and only "^C".
2022-06-22T15:12:14,728Z DEBUG Test worker [] [com.atlassian.performance.tools.ssh.WaitingCommand.Companion] ^Cbash: nonexistent-command: command not found
2022-06-22T15:12:14,773Z DEBUG Test worker [] [com.atlassian.performance.tools.ssh.WaitingCommand.Companion] ^C
command.exitStatus must not be null
java.lang.IllegalStateException: command.exitStatus must not be null
at com.atlassian.performance.tools.ssh.WaitingCommand.waitForResult(WaitingCommand.kt:26)
at com.atlassian.performance.tools.ssh.SshjBackgroundProcess.stop(SshjBackgroundProcess.kt:20)
at com.atlassian.performance.tools.ssh.api.SshTest$shouldTolerateEarlyFinish$1.invoke(SshTest.kt:61)
at com.atlassian.performance.tools.ssh.api.SshTest$shouldTolerateEarlyFinish$1.invoke(SshTest.kt:8)
at com.atlassian.performance.tools.ssh.api.SshContainer.useSsh$ssh(SshContainer.kt:19)
at com.atlassian.performance.tools.ssh.api.SshTest.shouldTolerateEarlyFinish(SshTest.kt:56)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet this proves nothing really, it's only a clue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run it with Thread.sleep(20)
before interrupt, then it succeeds
@Test
fun shouldTolerateEarlyFinish() {
SshContainer().useSsh { sshHost ->
installPing(sshHost)
repeat(100) {
val fail = sshHost.runInBackground("nonexistant-command")
Thread.sleep(20)
val failResult = fail.stop(Duration.ofMillis(20))
Assert.assertEquals(127, failResult.exitStatus)
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with repeat(1000)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This blocks investigating flaky https://github.com/atlassian/infrastructure integration tests.
And those block my ShipIt project 😉
I plan to continue this stream in Innovation Time and next ShipIt.