Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop job blocks shutdown #298

Open
zbjornson opened this issue Mar 25, 2021 · 6 comments
Open

Stop job blocks shutdown #298

zbjornson opened this issue Mar 25, 2021 · 6 comments
Assignees

Comments

@zbjornson
Copy link

Hi,

I'm running agent version 20210217.00-g1 on Ubuntu 20.04, and when a patch job is deployed, shutdown appears to be blocked by the OSConfig Agent for 90s. Below is the serial terminal output:

Mar 25 17:19:46 prod-arbiter GCEMetadataScripts[9187]: 2021/03/25 17:19:46 GCEMetadataScripts: Starting shutdown scripts (version 20201217.02-0ubuntu1~20.04.0).
[�[0;32m  OK  �[0m] Stopped �[0;1;39mGoogle Compute Engine Shutdown Scripts�[0m.
         Stopping �[0;1;39mSystem Logging Service�[0m...
         Stopping �[0;1;39mSnap Daemon�[0m...
Mar 25 17:19:46 prod-arbiter finalrd[9366]: run-parts: executing /usr/share/finalrd/open-iscsi.finalrd setup
Mar 25 17:19:46 prod-arbiter systemd[1]: fstrim.timer: Succeeded.
Mar 25 17:19:46 prod-arbiter GCEMetadataScripts[9187]: 2021/03/25 17:19:46 GCEMetadataScripts: No shutdown scripts to run.
Mar 25 17:19:46 prod-arbiter systemd[1]: Stopped Discard unused blocks once a week.
Mar 25 17:19:46 prod-arbiter systemd[1]: fwupd-refresh.timer: Succeeded.
Mar 25 17:19:46 prod-arbiter systemd[1]: Stopped Refresh fwupd metadata regularly.
[�[0;32m  OK  �[0m] Stopped �[0;1;39mSnap Daemon�[0m.
[�[0;32m  OK  �[0m] Stopped �[0;1;39mSystem Logging Service�[0m.
[�[0;32m  OK  �[0m] Stopped �[0;1;39mCreate final runtime dir for shutdown pivot root�[0m.
[�[0m�[0;31m*     �[0m] A stop job is running for Google OSConfig Agent (5s / 1min 30s)
�M
�[K[�[0;1;31m*�[0m�[0;31m*    �[0m] A stop job is running for Google OSConfig Agent (6s / 1min 30s)
�M
�[K[�[0;31m*�[0;1;31m*�[0m�[0;31m*   �[0m] A stop job is running for Google OSConfig Agent (6s / 1min 30s)
�M
�[K[ �[0;31m*�[0;1;31m*�[0m�[0;31m*  �[0m] A stop job is running for Google OSConfig Agent (7s / 1min 30s)
�M
�[K[  �[0;31m*�[0;1;31m*�[0m�[0;31m* �[0m] A stop job is running for Google OSConfig Agent (7s / 1min 30s)
�M
�[K[   �[0;31m*�[0;1;31m*�[0m�[0;31m*�[0m] A stop job is running for Google OSConfig Agent (8s / 1min 30s)
�M
�[K[    �[0;31m*�[0;1;31m*�[0m] A stop job is running for Google OSConfig Agent (8s / 1min 30s)
�M
�[K[     �[0;31m*�[0m] A stop job is running for Google OSConfig Agent (9s / 1min 30s)
�M
�[K[    �[0;31m*�[0;1;31m*�[0m] A stop job is running for Google OSConfig Agent (9s / 1min 30s)
�M
�[K[   �[0;31m*�[0;1;31m*�[0m�[0;31m*�[0m] A stop job is running for Google OSConfig Agent (10s / 1min 30s)
�M
�[K[  �[0;31m*�[0;1;31m*�[0m�[0;31m* �[0m] A stop job is running for Google OSConfig Agent (10s / 1min 30s)
�M
<snip>
�M
�[K[    �[0;31m*�[0;1;31m*�[0m] A stop job is running for Google OSConfig Agent (1min 25s / 1min 30s)
�M
�[K[     �[0;31m*�[0m] A stop job is running for Google OSConfig Agent (1min 26s / 1min 30s)
�M
�[K[    �[0;31m*�[0;1;31m*�[0m] A stop job is running for Google OSConfig Agent (1min 26s / 1min 30s)
�M
�[K[   �[0;31m*�[0;1;31m*�[0m�[0;31m*�[0m] A stop job is running for Google OSConfig Agent (1min 27s / 1min 30s)
�M
�[K[  �[0;31m*�[0;1;31m*�[0m�[0;31m* �[0m] A stop job is running for Google OSConfig Agent (1min 27s / 1min 30s)
�M
�[K[ �[0;31m*�[0;1;31m*�[0m�[0;31m*  �[0m] A stop job is running for Google OSConfig Agent (1min 28s / 1min 30s)
�M
�[K[�[0;31m*�[0;1;31m*�[0m�[0;31m*   �[0m] A stop job is running for Google OSConfig Agent (1min 28s / 1min 30s)
�M
�[K[�[0;1;31m*�[0m�[0;31m*    �[0m] A stop job is running for Google OSConfig Agent (1min 29s / 1min 30s)
�M
�[K[�[0m�[0;31m*     �[0m] A stop job is running for Google OSConfig Agent (1min 29s / 1min 30s)
�M
�[K[�[0;1;31m*�[0m�[0;31m*    �[0m] A stop job is running for Google OSConfig Agent (1min 30s / 1min 30s)
�M
�[K[�[0;32m  OK  �[0m] Stopped �[0;1;39mGoogle OSConfig Agent�[0m.
�[K[�[0;32m  OK  �[0m] Stopped target �[0;1;39mNetwork is Online�[0m.
[�[0;32m  OK  �[0m] Stopped target �[0;1;39mNetwork�[0m.
[�[0;32m  OK  �[0m] Stopped �[0;1;39mNetwork Manager Wait Online�[0m.
         Stopping �[0;1;39mNetwork Manager�[0m...
         Stopping �[0;1;39mNetwork Name Resolution�[0m...

Thank you-

@adjackura
Copy link
Contributor

Are you running a patch job and then initiating a shutdown?

@adjackura adjackura self-assigned this Apr 2, 2021
@zbjornson
Copy link
Author

Yes, sort of. That's the/a shutdown that the patch job triggered.

It might specifically be that a restart was already required before the patch job could run though. I'd need to hunt down a server that currently needs a restart to test that theory. Let me know if that would be useful.

@adjackura
Copy link
Contributor

Hmm, if the agent itself initiated the reboot that should not hang, the only time it will hang right now that I know of is if the agent is executing another task like say apt-get, right now it wont cancel those processes if issued a shutdown command from systemd. Ill see if I can reproduce this.

Just to double check, you arent trying to reboot from a pre or post script are you?

@adjackura
Copy link
Contributor

Also if you have the agent logs from the patch job that would be helpful.

@zbjornson
Copy link
Author

you arent trying to reboot from a pre or post script are you?

I'm not.

Here's an example log:

insertId jsonPayload.localTimestamp jsonPayload.message resource.type resource.labels.instance_id resource.labels.zone timestamp severity labels.patch_job labels.agent_version labels.instance_name sourceLocation.file sourceLocation.line sourceLocation.function receiveTimestamp
s12zfpg3xew7j4 2021-03-25T17:15:26.6357Z Beginning patch task gce_instance 7859635178538648756 us-central1-c 2021-03-25T17:15:26.636762362Z INFO projects/623160494257/patchJobs/a4a64118-8ff4-45f9-b6e2-2ebce70e00da 20210217.00-g1 prod-mongodb-arbiter patch_task.go 220 github.com/GoogleCloudPlatform/osconfig/agentendpoint.(*patchTask).run 2021-03-25T17:15:27.353447814Z
s12zfpg3xew7j5 2021-03-25T17:15:27.0020Z System indicates a reboot is required. gce_instance 7859635178538648756 us-central1-c 2021-03-25T17:15:27.002594755Z INFO projects/623160494257/patchJobs/a4a64118-8ff4-45f9-b6e2-2ebce70e00da 20210217.00-g1 prod-mongodb-arbiter patch_task.go 179 github.com/GoogleCloudPlatform/osconfig/agentendpoint.(*patchTask).rebootIfNeeded 2021-03-25T17:15:27.353447814Z
15cx4c8g1emq8ak 2021-03-25T17:16:54.5660Z Beginning patch task gce_instance 7859635178538648756 us-central1-c 2021-03-25T17:16:54.566303516Z INFO projects/623160494257/patchJobs/a4a64118-8ff4-45f9-b6e2-2ebce70e00da 20210217.00-g1 prod-mongodb-arbiter patch_task.go 220 github.com/GoogleCloudPlatform/osconfig/agentendpoint.(*patchTask).run 2021-03-25T17:16:55.458226481Z
uj9bqag1dv1fxk 2021-03-25T17:17:20.2150Z Updating 24 packages: [{libnss-systemd x86_64 245.4-4ubuntu3.5} {udev x86_64 245.4-4ubuntu3.5} {libudev1 x86_64 245.4-4ubuntu3.5} {systemd-sysv x86_64 245.4-4ubuntu3.5} {libpam-systemd x86_64 245.4-4ubuntu3.5} {systemd x86_64 245.4-4ubuntu3.5} {libsystemd0 x86_64 245.4-4ubuntu3.5} {update-notifier-common all 3.192.30.6} {isc-dhcp-client x86_64 4.4.1-2.1ubuntu5.20.04.1} {isc-dhcp-common x86_64 4.4.1-2.1ubuntu5.20.04.1} {libssl1.1 x86_64 1.1.1f-1ubuntu2.3} {openssl x86_64 1.1.1f-1ubuntu2.3} {gnome-shell x86_64 3.36.7-0ubuntu0.20.04.1} {gnome-shell-common all 3.36.7-0ubuntu0.20.04.1} {initramfs-tools all 0.136ubuntu6.4} {initramfs-tools-core all 0.136ubuntu6.4} {initramfs-tools-bin x86_64 0.136ubuntu6.4} {zfs-initramfs x86_64 0.8.3-1ubuntu12.7} {zfsutils-linux x86_64 0.8.3-1ubuntu12.7} {libuutil1linux x86_64 0.8.3-1ubuntu12.7} {libzfs2linux x86_64 0.8.3-1ubuntu12.7} {libzpool2linux x86_64 0.8.3-1ubuntu12.7} {libnvpair1linux x86_64 0.8.3-1ubuntu12.7} {zfs-zed x86_64 0.8.3-1ubuntu12.7}] gce_instance 7859635178538648756 us-central1-c 2021-03-25T17:17:20.352523687Z INFO projects/623160494257/patchJobs/a4a64118-8ff4-45f9-b6e2-2ebce70e00da 20210217.00-g1 prod-mongodb-arbiter apt_upgrade.go 100 github.com/GoogleCloudPlatform/osconfig/ospatch.RunAptGetUpgrade 2021-03-25T17:17:22.007320109Z
uhrb0g2zsv67l 2021-03-25T17:19:12.9278Z System indicates a reboot is required. gce_instance 7859635178538648756 us-central1-c 2021-03-25T17:19:13.023385600Z INFO projects/623160494257/patchJobs/a4a64118-8ff4-45f9-b6e2-2ebce70e00da 20210217.00-g1 prod-mongodb-arbiter patch_task.go 179 github.com/GoogleCloudPlatform/osconfig/agentendpoint.(*patchTask).rebootIfNeeded 2021-03-25T17:19:14.293973867Z
k9ywxg3cj3t8r 2021-03-25T17:22:07.5024Z Beginning patch task gce_instance 7859635178538648756 us-central1-c 2021-03-25T17:22:07.502838003Z INFO projects/623160494257/patchJobs/a4a64118-8ff4-45f9-b6e2-2ebce70e00da 20210217.00-g1 prod-mongodb-arbiter patch_task.go 220 github.com/GoogleCloudPlatform/osconfig/agentendpoint.(*patchTask).run 2021-03-25T17:22:08.508435139Z
3w0bltg3551a7z 2021-03-25T17:22:28.6273Z No packages to update. gce_instance 7859635178538648756 us-central1-c 2021-03-25T17:22:28.657819554Z INFO projects/623160494257/patchJobs/a4a64118-8ff4-45f9-b6e2-2ebce70e00da 20210217.00-g1 prod-mongodb-arbiter apt_upgrade.go 86 github.com/GoogleCloudPlatform/osconfig/ospatch.RunAptGetUpgrade 2021-03-25T17:22:29.774953522Z
3w0bltg3551a80 2021-03-25T17:22:28.7144Z System indicates a reboot is not required. gce_instance 7859635178538648756 us-central1-c 2021-03-25T17:22:28.714615929Z INFO projects/623160494257/patchJobs/a4a64118-8ff4-45f9-b6e2-2ebce70e00da 20210217.00-g1 prod-mongodb-arbiter patch_task.go 181 github.com/GoogleCloudPlatform/osconfig/agentendpoint.(*patchTask).rebootIfNeeded 2021-03-25T17:22:29.774953522Z
3w0bltg3551a81 2021-03-25T17:22:28.8446Z Successfully completed patch task gce_instance 7859635178538648756 us-central1-c 2021-03-25T17:22:28.845177039Z INFO projects/623160494257/patchJobs/a4a64118-8ff4-45f9-b6e2-2ebce70e00da 20210217.00-g1 prod-mongodb-arbiter patch_task.go 282 github.com/GoogleCloudPlatform/osconfig/agentendpoint.(*patchTask).run 2021-03-25T17:22:29.774953522Z

@adjackura
Copy link
Contributor

Ok, so it looks like the job works fine, but during the reboot it takes an extra 90s because the agent wont stop.
I'll double check to see why this might be hanging.

ilc-fg pushed a commit to ilc-fg/osconfig that referenced this issue Oct 3, 2023
With this flag customers can choose to disable the configuration
management for OS Login cert based authentication.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants