minor update: move VIP before stopping pacemaker on a node

When pacemaker stops, it first stops all pacemaker resources
managed on the node, and stops the pacemaker daemon.

If the node being stopped is hosting VIP resources, those ones
must be restarted elsewhere as soon as possible to avoid long
service disruption, but there is currently no constraint defined
to force that behaviour.

So what can happen is the VIP resources are stopped, then other
resources on the hosts are stopped (e.g. rabbit, galera), and only
when there's no more resources pacemaker restarts VIPs elsewhere,
which can lead to a long OpenStack service disruption.

To avoid unexpected long outage period due to VIP unavailability,
force-move the VIPs away from the node before stopping pacemaker.

Closes-Bug: #1815204
Change-Id: I9cbbf9e66b804f00fd19b2b3641f10bb472a94c7
(cherry picked from commit 38fb412ac0)
This commit is contained in:
Damien Ciabrini 2019-02-07 11:25:31 +00:00 committed by Michele Baldessari
parent 8fd137f0a6
commit ded38b7440
1 changed files with 20 additions and 0 deletions

View File

@ -166,6 +166,26 @@ outputs:
pacemaker_cluster: state=online check_and_fail=true
async: 30
poll: 4
- name: Move virtual IPs to another node before stopping pacemaker
when: step|int == 1
shell: |
CLUSTER_NODE=$(crm_node -n)
echo "Retrieving all the VIPs which are hosted on this node"
VIPS_TO_MOVE=$(crm_mon --as-xml | xmllint --xpath '//resource[@resource_agent = "ocf::heartbeat:IPaddr2" and @role = "Started" and @managed = "true" and ./node[@name = "'${CLUSTER_NODE}'"]]/@id' - | sed -e 's/id=//g' -e 's/"//g')
for v in ${VIPS_TO_MOVE}; do
echo "Moving VIP $v on another node"
pcs resource move $v --wait=300
done
echo "Removing the location constraints that were created to move the VIPs"
for v in ${VIPS_TO_MOVE}; do
echo "Removing location ban for VIP $v"
ban_id=$(cibadmin --query | xmllint --xpath 'string(//rsc_location[@rsc="'${v}'" and @node="'${CLUSTER_NODE}'" and @score="-INFINITY"]/@id)' -)
if [ -n "$ban_id" ]; then
pcs constraint remove ${ban_id}
else
echo "Could not retrieve and clear location constraint for VIP $v" 2>&1
fi
done
- name: Stop pacemaker cluster
when: step|int == 1
pacemaker_cluster: state=offline