From 5f43470da1f80675ac6144136ec8e60f23f9356b Mon Sep 17 00:00:00 2001 From: Michele Baldessari Date: Sat, 15 Sep 2018 15:19:26 +0200 Subject: [PATCH] Make sure rhel-plugin-push.service is stopped after pacemaker stops When issuing a normal reboot command on an overcloud node the following stop sequence can take place: ------------- ----------------------------- | Pacemaker | | paunch-container-shutdown | ------------- ----------------------------- | | \ / \ / ---------- | docker | ---------- If there are docker plugins that are allowed to stop before docker and also before pacemaker, it might happen that stopping them down during the pacemaker stop will cause a bunch of timeouts and a failure to stop containers: Sep 13 17:53:00.821030 controller-0.localdomain pacemakerd[6147]: notice: Shutting down Pacemaker Sep 13 17:54:15.798026 controller-0.localdomain lrmd[6284]: warning: galera-bundle-docker-0_monitor_60000 process (PID 226329) timed out Sep 13 17:54:15.799004 controller-0.localdomain lrmd[6284]: warning: galera-bundle-docker-0_monitor_60000:226329 - timed out after 20000ms One of these plugins is 'rhel-push-plugin.service'. It seems that when this plugin is free to stop before docker on shutdown, it is very possible that docker commands can start timing out. Before: Before adding the symlink we would need 15mins to reboot a node and we would get a bunch of timeouts on shutdown and some failed actions on boot. After: A reboot will take a reasonable couple of minutes to complete with no failed actions at boot and timeouts during shutdown. NB: We add the symlink unconditionally as systemd will ignore it if the service is not installed. Closes-Bug: #1792701 Change-Id: I6f6d27f2457efcc49d9edd8a2f98484c5f7c0933 (cherry picked from commit e288dbd8252765020816639b9b53f8212292cfaf) --- manifests/profile/base/pacemaker.pp | 5 +++++ manifests/profile/base/pacemaker_remote.pp | 5 +++++ 2 files changed, 10 insertions(+) diff --git a/manifests/profile/base/pacemaker.pp b/manifests/profile/base/pacemaker.pp index 30213b726..738fbfeeb 100644 --- a/manifests/profile/base/pacemaker.pp +++ b/manifests/profile/base/pacemaker.pp @@ -144,6 +144,11 @@ class tripleo::profile::base::pacemaker ( target => '/usr/lib/systemd/system/docker.service', before => Class['pacemaker'], } + -> systemd::unit_file { 'rhel-push-plugin.service': + path => '/etc/systemd/system/resource-agents-deps.target.wants', + target => '/usr/lib/systemd/system/rhel-push-plugin.service', + before => Class['pacemaker'], + } ~> Class['systemd::systemctl::daemon_reload'] } diff --git a/manifests/profile/base/pacemaker_remote.pp b/manifests/profile/base/pacemaker_remote.pp index a143be049..4aa28fd51 100644 --- a/manifests/profile/base/pacemaker_remote.pp +++ b/manifests/profile/base/pacemaker_remote.pp @@ -56,6 +56,11 @@ class tripleo::profile::base::pacemaker_remote ( target => '/usr/lib/systemd/system/docker.service', before => Class['pacemaker::remote'], } + -> systemd::unit_file { 'rhel-push-plugin.service': + path => '/etc/systemd/system/resource-agents-deps.target.wants', + target => '/usr/lib/systemd/system/rhel-push-plugin.service', + before => Class['pacemaker::remote'], + } ~> Class['systemd::systemctl::daemon_reload'] } $enable_fencing_real = str2bool($enable_fencing) and $step >= 5