-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop agents before exiting test cases, and reliably close all_*.log
when agents exit
#624
Conversation
src/test/java/com/cloudbees/jenkins/support/CheckFilterTest.java
Outdated
Show resolved
Hide resolved
Apparently did not suffice. Maybe this file is not actually closed when the agent terminates? Or the current attempt to terminate the agent is not effective because it just closes the connection and does not exit the JVM? |
support-core-plugin/src/main/java/com/cloudbees/jenkins/support/SupportPlugin.java Line 879 in 7fcc00b
support-core-plugin/src/main/java/com/cloudbees/jenkins/support/SupportLogHandler.java Line 242 in 7fcc00b
|
Incorrect.
|
Almost, but not quite. Weird because the log does show the agent process having been killed before the test cleanup begins. |
This reverts commit 06190e8. It seemed to mostly work, but not in one case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for the investigation!
So this failure is puzzling me.
would seem to indicate that something recreated this file after its containing directory had been deleted. |
Two tracks now: jenkinsci/jenkins-test-harness#922; and trying to see if I can get |
No luck so far on the latter approach: diff --git src/main/java/com/cloudbees/jenkins/support/SupportPlugin.java src/main/java/com/cloudbees/jenkins/support/SupportPlugin.java
index 3e57d01..c09f9d6 100644
--- src/main/java/com/cloudbees/jenkins/support/SupportPlugin.java
+++ src/main/java/com/cloudbees/jenkins/support/SupportPlugin.java
@@ -57,6 +57,7 @@ import hudson.model.Descriptor;
import hudson.model.Node;
import hudson.model.PeriodicWork;
import hudson.model.TaskListener;
+import hudson.remoting.Channel;
import hudson.remoting.ChannelClosedException;
import hudson.remoting.Future;
import hudson.remoting.VirtualChannel;
@@ -877,6 +878,12 @@ public class SupportPlugin extends Plugin {
LogHolder.AGENT_LOG_HANDLER.setLevel(level);
LogHolder.AGENT_LOG_HANDLER.setDirectory(new File(rootPath.getRemote(), SUPPORT_DIRECTORY_NAME), "all");
ROOT_LOGGER.addHandler(LogHolder.AGENT_LOG_HANDLER);
+ Channel.currentOrFail().addListener(new Channel.Listener() {
+ @Override
+ public void onClosed(Channel channel, IOException cause) {
+ LogHolder.AGENT_LOG_HANDLER.close();
+ }
+ });
return null;
}
} So far as I can tell, |
Still failing so I am running out of hypotheses. |
With better logs we can see that the agent process is terminated (exit code zero), and yet 59ms later the attempt to delete its working directory fails. This remark
suggests that jenkinsci/jenkins-test-harness#922 (while probably a step in the right direction) cannot suffice. Either I figure out how to make the agent release its locks before it exits, or the deletion code needs to just retry for a while. |
…lying on channel events
all_*.log
when agents exit
@@ -70,6 +70,8 @@ | |||
<!-- https://www.jenkins.io/doc/developer/plugin-development/choosing-jenkins-baseline/ --> | |||
<jenkins.baseline>2.479</jenkins.baseline> | |||
<jenkins.version>${jenkins.baseline}.1</jenkins.version> | |||
<!-- TODO until in parent --> | |||
<jenkins-test-harness.version>2403.v256947ecb_c8a_</jenkins-test-harness.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jenkinsci/jenkins-test-harness#922
Previously, the agent JVM launched by SimpleCommandLauncher
(JenkinsRule.createSlave
/ .createOnlineSlave
) would be terminated when the Computer
was disconnected as part of Jenkins.cleanUp
, but asynchronously and possible slightly later, when TemporaryDirectoryAllocator
was already trying to clean up.
@@ -234,6 +234,7 @@ private void setFile(File file) throws FileNotFoundException { | |||
parentFile.mkdirs(); | |||
} | |||
|
|||
StreamUtils.closeQuietly(null); // ensure class is loaded so close() can succeed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does nothing but loads this class. Otherwise there can occasionally be a NoClassDefFoundError
from close
during agent shutdown, at which time it is too late to load new classes from RemoteClassLoader
because the connection has already been closed.
@@ -862,6 +862,15 @@ public Void call() { | |||
// avoid double installation of the handler. JNLP agents can reconnect to the controller multiple times | |||
// and each connection gets a different RemoteClassLoader, so we need to evict them by class name, | |||
// not by their identity. | |||
closeAll(); | |||
Runtime.getRuntime().addShutdownHook(new Thread(LogInitializer::closeAll, "close log handlers")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensures that the file handle is closed before the JVM exits. Likely irrelevant on Linux, but seems to matter on Windows, where the OS will release mandatory file locks “sometime” after the process exits but not necessarily quickly enough for a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very, very much!
…diate disconnection
As noted in #617 (comment) there are mostly consistent test failures on Windows. Not sure why these are just appearing now (and I cannot reproduce locally on Windows 10), but at any rate
SupportLogHandler
opens a file handle on e.g.agent-work-dirs\agent0\support\all_2025-02-14_13.33.29.log
which will not be closed unless the agent process exits.(CloudBees internal issue)