Improve deep_compare detection

In the initial implementation of resource updates in puppet-pacemaker,
in order to detect if a resource changed we did the following:
1) dump the cib and make a copy
2) remove the resource from the dump
3) readd the resource with current parameters
4) via crm_diff --cib we would detect if there were any changes related
   to the resource by checking a specific xpath on the the diff
5) If 4) matched the xpath we would push the newly created CIB back to
   the cluster

This approach has a number of drawbacks:
A) In case of bundles it would remove the resource running inside the
bundle for a very brief amount of time
B) pcs would remove all constraints related to the resource
automatically, so we had to recreate them forcefully during the update
operation and then update the CIB
C) We sometimes missed to trigger a resource update because the XPATH on
4) was not always matching and so we missed updating a resource from
time to time.

We now change approach slightly to make it more robust. Namely,
instead of removing a resource from the offline CIB we actually just
update it via pcs update commands on the offline CIB. In the case of
bundles we need to actually remove all currently existing storage maps
and then add the newly defined set, because pcs does not allow to update
a resource in a declarative way [1]. We also extend the xpath at C) to match
most things and also be careful about not matching empty meta attributes
which get added by pcs due to a bug [2].
This new approach solves A) and B) and is also more precise and solve C)
fully as well.

Given the complexity of the changes here is how I tested this change:
1) 1 mount point added
Oct 25 14:46:31 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle
and rabbitmq-bundle was correctly restarted with the correct bind mounts

2) 1 mount point removed
Oct 25 15:56:38 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle
and rabbitmq-bundle was correctly restarted with the correct bind mounts

3) 3 mount points added
Oct 26 01:46:42 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle
and rabbitmq-bundle was correctly restarted with the correct bind mounts

4) 2 mount points removed
Oct 26 02:42:07 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle
and rabbitmq-bundle was correctly restarted with the correct bind mounts

5) 1 mount point added plus replicas number changed
Correctly get a reduced replicas number and correct mount point
 Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest]
   rabbitmq-bundle-0    (ocf:💓rabbitmq-cluster):      Started controller-2
   rabbitmq-bundle-1    (ocf:💓rabbitmq-cluster):      Started controller-0

6) VIP netmask changed (from /32 to /24)
Oct 26 03:53:04 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-192.168.24.10
Oct 26 03:53:20 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-10.0.0.101
Oct 26 03:53:36 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-172.17.1.12
Oct 26 03:53:52 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-172.17.1.10
Oct 26 03:54:07 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-172.17.3.17
Oct 26 03:54:24 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-172.17.4.13
[root@controller-0 10]# ip -o a |grep '/32'
[root@controller-0 10]#

7) VIP netmask changed (from /24 to /32) and bundle replicas number changed
rabbitmq-bundle went from two replicas (see 5.) to three:
 Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest]
   rabbitmq-bundle-0    (ocf:💓rabbitmq-cluster):      Started controller-2
   rabbitmq-bundle-1    (ocf:💓rabbitmq-cluster):      Started controller-0
   rabbitmq-bundle-2    (ocf:💓rabbitmq-cluster):      Started controller-1

VIP netmasks correctly moved from /24 to /32

8) Redeploy with no changes
No resources were restarted

9) Change mount point order in puppet does not restart the service
Oct 26 04:59:36 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned false for resource rabbitmq-bundle

10) Change a single option of a mountpoint
Oct 26 05:32:13 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle
and rabbitmq-bundle was correctly restarted with the correct bind mounts

11) Change a parameter of a pacemaker remote resource
Added the following hiera keys:
  pacemaker::resource::remote::deep_compare: true
  pacemaker_remote_reconnect_interval: 105

and correctly got:
[root@controller-0 ~]# pcs resource show compute-0 |grep reconnect
  Attributes: reconnect_interval=105 server=172.17.1.14

12) Change 1 mount point on three bundles
Oct 26 10:40:30 controller-0 dockerd-current[19339]: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle

13) Try and change a meta attribute
Added an 'ordered=true' to the rabbitmq bundle
[root@controller-0 ~]# pcs resource show rabbitmq-bundle |grep ordered
   Meta Attrs: container-attribute-target=host notify=true ordered=true

14) Delete a single location constraint and redeploy
[root@controller-0 ~]# pcs constraint remove location-rabbitmq-bundle
..redeployed..
[root@controller-0 ~]# pcs constraint |grep rabbit
  Resource: rabbitmq-bundle
    Constraint: location-rabbitmq-bundle (resource-discovery=exclusive)
        Expression: rabbitmq-role eq true

Other things verified during the tests above:
- Constraints stayed the same
- Properties stayed the same

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1598197
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1568353
Closes-Bug: #1797542

Change-Id: I048fbb6b0319bac33c82ad2d7639d0f594fb73bb
This commit is contained in:
Michele Baldessari 2018-10-26 15:37:17 +02:00
parent ce81a99d08
commit 90033be653
4 changed files with 150 additions and 56 deletions

View File

@ -3,7 +3,27 @@ require_relative '../pcmk_common'
Puppet::Type.type(:pcmk_bundle).provide(:default) do
desc 'A bundle resource definition for pacemaker'
def build_pcs_bundle_cmd
def _storage_maps_cmd(storage_maps, update=false)
return '' if storage_maps == nil or storage_maps.empty?
cmd = ''
if update
add = ' add '
else
add = ' '
end
storage_maps.each do | key, value |
cmd += ' storage-map' + add + 'id=' + key + \
' source-dir=' + value['source-dir'] + \
' target-dir=' + value['target-dir']
options = value['options']
if not_empty_string(options)
cmd += ' options=' + options
end
end
cmd
end
def build_pcs_bundle_cmd(update=false)
image = @resource[:image]
replicas = @resource[:replicas]
masters = @resource[:masters]
@ -15,8 +35,16 @@ Puppet::Type.type(:pcmk_bundle).provide(:default) do
location_rule = @resource[:location_rule]
container_backend = @resource[:container_backend]
if update
create_cmd = 'update'
docker_cmd = ''
else
create_cmd = 'create'
docker_cmd = 'docker'
end
# Build the 'pcs resource create' command. Check out the pcs man page :-)
cmd = 'resource bundle create ' + @resource[:name]+' container ' + container_backend + ' image=' + @resource[:image]
cmd = 'resource bundle ' + create_cmd + ' ' + @resource[:name] + ' container ' + docker_cmd + ' image=' + @resource[:image]
if replicas
cmd += " replicas=#{replicas}"
end
@ -33,26 +61,30 @@ Puppet::Type.type(:pcmk_bundle).provide(:default) do
cmd += ' ' + container_options
end
# storage_maps[id] = {'source' => value, 'target' => value, 'options' => value}
# FIXME: need to do proper error checking here
if storage_maps and !storage_maps.empty?
storage_maps.each do | key, value |
cmd += ' storage-map id=' + key + \
' source-dir=' + value['source-dir'] + \
' target-dir=' + value['target-dir']
options = value['options']
if not_empty_string(options)
cmd += ' options=' + options
end
end
end
# When we're updating a bundle we first dump the CIB, then
# we remove all the *current* storage-maps for the resource
# and then we add back the storage-maps passed to us
cmd += _storage_maps_cmd(storage_maps, update)
if network
cmd += ' network ' + network
end
cmd
end
def build_pcs_bundle_pruning
cmd = 'resource bundle update ' + @resource[:name]
# In case of updates due to how pcs manages storage, we need to first remove all
# the *current* existing storage maps and then readd the puppet defined ones
live_storage_maps = pcmk_get_bundle_storage_map(@resource[:name])
if live_storage_maps and !live_storage_maps.empty?
live_storage_maps.each do | key, value |
cmd += ' storage-map remove ' + value['id']
end
return cmd
end
return ''
end
### overloaded methods
def initialize(*args)
super(*args)
@ -63,10 +95,11 @@ Puppet::Type.type(:pcmk_bundle).provide(:default) do
end
def create_bundle_and_location(location_rule, needs_update=false)
cmd = build_pcs_bundle_cmd()
if needs_update then
pcmk_update_resource(@resource, cmd, @resource[:update_settle_secs])
cmd = build_pcs_bundle_cmd(update=true)
pcmk_update_resource(@resource, cmd, build_pcs_bundle_pruning(), @resource[:update_settle_secs])
else
cmd = build_pcs_bundle_cmd()
if location_rule then
pcs('create', @resource[:name], "#{cmd} --disabled", @resource[:tries],
@resource[:try_sleep], @resource[:verify_on_create], @resource[:post_success_sleep])
@ -139,7 +172,7 @@ Puppet::Type.type(:pcmk_bundle).provide(:default) do
if ret == false then
return PCMK_NOTEXISTS
end
if @resource[:deep_compare] and pcmk_resource_has_changed?(@resource, build_pcs_bundle_cmd(), true) then
if @resource[:deep_compare] and pcmk_resource_has_changed?(@resource, build_pcs_bundle_cmd(update=true), build_pcs_bundle_pruning(), true) then
return PCMK_CHANGENEEDED
end
return PCMK_NOCHANGENEEDED

View File

@ -230,6 +230,25 @@ def push_cib_offline(cib, tries=1, try_sleep=0, post_success_sleep=0)
end
end
# returns the storage map for the resource as a dictionary
def pcmk_get_bundle_storage_map(resource)
storage_xpath = "/cib/configuration/resources/bundle[@id='#{resource}']/storage/storage-mapping"
cib = backup_cib()
cibxml = File.read(cib)
storage_doc = REXML::Document.new cibxml
ret = {}
REXML::XPath.each(storage_doc, storage_xpath) do |element|
attrs = {}
element.attributes.each do |key, value|
attrs[key] = value
end
ret[attrs['id']] = attrs
end
delete_cib(cib)
Puppet.debug("pcmk_get_bundle_storage_map #{resource} returned #{ret}")
ret
end
# The following function will take a resource_name an xml graph file as generated by crm_simulate and
# will return true if the resource_name is contained in the transition graph (i.e. the cluster would
# restart the resource) and false if not (i.e. the cluster would not restart the resource)
@ -298,6 +317,48 @@ def is_crm_diff_buggy?
raise Puppet::Error, "#{cmd} failed with (#{ret}): #{cmd_out}"
end
# This function will return true when a CIB diff xml has an empty meta_attribute change (either
# addition or removal). It does so by veryfiying that the diff has an empty meta_attribute node
# and when that is the case it verifies that the corresponding meta_attributes
# for the resource in the CIB is indeed either non-existing or has no children
def has_empty_meta_attributes?(cibfile, element)
# First we verify that the cib diff does contain an empty meta_attributes node, like this:
# <change operation='create' path='/cib/configuration/resources/primitive[@id=&apos;ip-172.16.11.97&apos;]' position='2'>
# <meta_attributes id='ip-172.16.11.97-meta_attributes'/>
# </change>
if element.attributes.has_key?('operation') and \
['delete', 'create'].include? element.attributes['operation'] and \
element.attributes.has_key?('path')
path = element.attributes['path']
element.each_element('//meta_attributes') do |meta|
# If the meta_attributes was an empty set we verify that it is so in the CIB as well
# and if that is the case we return true
if not meta.has_elements?
begin
meta_id = meta.attributes['id']
orig_cib = File.read(cibfile)
meta_doc = REXML::Document.new orig_cib
meta_xpath = "//meta_attributes[@id='#{meta_id}']"
meta_search = meta_doc.get_elements(meta_xpath)
# If there are not meta_attributes at all or if we have a tag but it has no elements
# then we return true
if meta_search.empty? or (meta_search.length() > 0 and not meta_search[0].has_elements?)
Puppet.debug("has_empty_meta_attributes? detected and empty meta_attribute change and empty meta_attribute in the CIB, skipping: #{meta_id}")
return true
end
rescue
# Should there be any kind of exception in the code above we take
# the slightly safer path and we simply return false which implies
# updating the CIB and pushing it to the live cluster
return false
end
end
end
end
return false
end
# same as pcmk_restart_resource? but using crm_diff
def pcmk_restart_resource_ng?(resource_name, cib)
cmd = "#{CRMDIFF_BIN} --cib -o #{cib}.orig -n #{cib}"
@ -317,8 +378,11 @@ def pcmk_restart_resource_ng?(resource_name, cib)
graph_doc = REXML::Document.new cmd_out
# crm_diff --cib -o cib-orig.xml -n cib-vip-update.xml | \
# xmllint --xpath '/diff/change[@operation and contains(@path, "ip-192.168.24.6")]/change-result' -
xpath_query = "/diff/change[@operation and contains(@path, \"#{resource_name}\")]/change-result"
xpath_query = "/diff/change[@operation and @operation != 'move' and contains(@path, \"@id='#{resource_name}'\")]"
REXML::XPath.each(graph_doc, xpath_query) do |element|
# We need to check for removals of empty meta_attribute tags and ignore those
# See https://bugzilla.redhat.com/show_bug.cgi?id=1568353 for pcs creating those spurious empty tags
next if has_empty_meta_attributes?(cib, element)
return true
end
return false
@ -348,18 +412,19 @@ end
# 1. Deletes the resource from the offline CIB
# 2. Recreates the resource on the offline CIB
# 3. Verifies if the pacemaker will restart the resource and returns true if the answer is a yes
def pcmk_resource_has_changed?(resource, cmd_create, is_bundle=false)
def pcmk_resource_has_changed?(resource, cmd_update, cmd_pruning='', is_bundle=false)
cib = backup_cib()
cmd_delete = "resource delete #{resource[:name]}"
ret = pcs_offline(cmd_delete, cib)
if ret == false
delete_cib(cib)
raise Puppet::Error, "pcmk_resource_has_changed? #{cmd_delete} returned error on #{resource[:name]}. This should never happen."
if not_empty_string(cmd_pruning)
ret = pcs_offline(cmd_pruning, cib)
if ret == false
delete_cib(cib)
raise Puppet::Error, "pcmk_update_resource #{cmd_pruning} returned error on #{resource[:name]}. This should never happen."
end
end
ret = pcs_offline(cmd_create, cib)
ret = pcs_offline(cmd_update, cib)
if ret == false
delete_cib(cib)
raise Puppet::Error, "pcmk_resource_has_changed? #{cmd_create} returned error #{resource[:name]}. This should never happen."
raise Puppet::Error, "pcmk_resource_has_changed? #{cmd_update} returned error #{resource[:name]}. This should never happen."
end
if is_crm_diff_buggy?
ret = pcmk_restart_resource?(resource[:name], cib, is_bundle)
@ -372,33 +437,23 @@ def pcmk_resource_has_changed?(resource, cmd_create, is_bundle=false)
return ret
end
# This function will update a resource by making a cib backup
# removing the resource and readding it and then push the CIB
# to the cluster
def pcmk_update_resource(resource, cmd_create, settle_timeout_secs=600)
# This function will update a resource by making a cib backup,
# running a pruning command first and then running the update command.
# Finally it pushes the CIB back to the cluster.
def pcmk_update_resource(resource, cmd_update, cmd_pruning='', settle_timeout_secs=600)
cib = backup_cib()
cmd_delete = "resource delete #{resource[:name]}"
ret = pcs_offline(cmd_delete, cib)
if ret == false
delete_cib(cib)
raise Puppet::Error, "pcmk_update_resource #{cmd_delete} returned error on #{resource[:name]}. This should never happen."
end
ret = pcs_offline(cmd_create, cib)
if ret == false
delete_cib(cib)
raise Puppet::Error, "pcmk_update_resource #{cmd_create} returned error on #{resource[:name]}. This should never happen."
end
if resource[:location_rule] then
# Some versions of pcs do not automatically remove the location rule associated to a
# bundle. So we might end up here with the location_rule still existing
# Let's just force its creation and ignore the fact that it might already be there
cmd_location = build_pcs_location_rule_cmd(resource, force=true)
ret = pcs_offline(cmd_location, cib)
if not_empty_string(cmd_pruning)
ret = pcs_offline(cmd_pruning, cib)
if ret == false
delete_cib(cib)
raise Puppet::Error, "pcmk_update_resource #{cmd_location} returned error on #{resource[:location_rule]}. This should never happen."
raise Puppet::Error, "pcmk_update_resource #{cmd_pruning} returned error on #{resource[:name]}. This should never happen."
end
end
ret = pcs_offline(cmd_update, cib)
if ret == false
delete_cib(cib)
raise Puppet::Error, "pcmk_update_resource #{cmd_update} returned error on #{resource[:name]}. This should never happen."
end
push_cib_offline(cib, resource[:tries], resource[:try_sleep], resource[:post_success_sleep])
cmd = "#{TIMEOUT_BIN} #{settle_timeout_secs} #{CRMRESOURCE_BIN} --wait"
cmd_out = `#{cmd}`

View File

@ -3,7 +3,7 @@ require_relative '../pcmk_common'
Puppet::Type.type(:pcmk_resource).provide(:default) do
desc 'A base resource definition for a pacemaker resource'
def build_pcs_resource_cmd
def build_pcs_resource_cmd(update=false)
resource_params = @resource[:resource_params]
meta_params = @resource[:meta_params]
op_params = @resource[:op_params]
@ -21,9 +21,14 @@ Puppet::Type.type(:pcmk_resource).provide(:default) do
raise(Puppet::Error, "May only define one of clone_params, "+
"master_params and group_params")
end
if update
create_cmd = ' update '
else
create_cmd = ' create '
end
# Build the 'pcs resource create' command. Check out the pcs man page :-)
cmd = 'resource create ' + @resource[:name]+' ' +@resource[:resource_type]
cmd = 'resource' + create_cmd + @resource[:name] + ' ' + @resource[:resource_type]
if @resource[:resource_type] == 'remote'
if not_empty_string(@resource[:remote_address])
cmd += ' server=' + @resource[:remote_address]
@ -73,10 +78,11 @@ Puppet::Type.type(:pcmk_resource).provide(:default) do
end
def create_resource_and_location(location_rule, needs_update=false)
cmd = build_pcs_resource_cmd()
if needs_update then
pcmk_update_resource(@resource, cmd, @resource[:update_settle_secs])
cmd = build_pcs_resource_cmd(update=true)
pcmk_update_resource(@resource, cmd, '', @resource[:update_settle_secs])
else
cmd = build_pcs_resource_cmd()
if location_rule then
pcs('create', @resource[:name], "#{cmd} --disabled", @resource[:tries],
@resource[:try_sleep], @resource[:verify_on_create], @resource[:post_success_sleep])
@ -149,7 +155,7 @@ Puppet::Type.type(:pcmk_resource).provide(:default) do
if ret == false then
return PCMK_NOTEXISTS
end
if @resource[:deep_compare] and pcmk_resource_has_changed?(@resource, build_pcs_resource_cmd()) then
if @resource[:deep_compare] and pcmk_resource_has_changed?(@resource, build_pcs_resource_cmd(update=true), '') then
return PCMK_CHANGENEEDED
end
return PCMK_NOCHANGENEEDED

View File

@ -107,7 +107,7 @@ define pacemaker::resource::remote(
$try_sleep = 0,
$verify_on_create = false,
$location_rule = undef,
$deep_compare = false,
$deep_compare = hiera('pacemaker::resource::remote::deep_compare', false),
$update_settle_secs = hiera('pacemaker::resource::remote::update_settle_secs', 600),
) {
pcmk_resource { $name: