Improve deep_compare detection
In the initial implementation of resource updates in puppet-pacemaker, in order to detect if a resource changed we did the following: 1) dump the cib and make a copy 2) remove the resource from the dump 3) readd the resource with current parameters 4) via crm_diff --cib we would detect if there were any changes related to the resource by checking a specific xpath on the the diff 5) If 4) matched the xpath we would push the newly created CIB back to the cluster This approach has a number of drawbacks: A) In case of bundles it would remove the resource running inside the bundle for a very brief amount of time B) pcs would remove all constraints related to the resource automatically, so we had to recreate them forcefully during the update operation and then update the CIB C) We sometimes missed to trigger a resource update because the XPATH on 4) was not always matching and so we missed updating a resource from time to time. We now change approach slightly to make it more robust. Namely, instead of removing a resource from the offline CIB we actually just update it via pcs update commands on the offline CIB. In the case of bundles we need to actually remove all currently existing storage maps and then add the newly defined set, because pcs does not allow to update a resource in a declarative way [1]. We also extend the xpath at C) to match most things and also be careful about not matching empty meta attributes which get added by pcs due to a bug [2]. This new approach solves A) and B) and is also more precise and solve C) fully as well. Given the complexity of the changes here is how I tested this change: 1) 1 mount point added Oct 25 14:46:31 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle and rabbitmq-bundle was correctly restarted with the correct bind mounts 2) 1 mount point removed Oct 25 15:56:38 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle and rabbitmq-bundle was correctly restarted with the correct bind mounts 3) 3 mount points added Oct 26 01:46:42 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle and rabbitmq-bundle was correctly restarted with the correct bind mounts 4) 2 mount points removed Oct 26 02:42:07 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle and rabbitmq-bundle was correctly restarted with the correct bind mounts 5) 1 mount point added plus replicas number changed Correctly get a reduced replicas number and correct mount point Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest] rabbitmq-bundle-0 (ocf:💓rabbitmq-cluster): Started controller-2 rabbitmq-bundle-1 (ocf:💓rabbitmq-cluster): Started controller-0 6) VIP netmask changed (from /32 to /24) Oct 26 03:53:04 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-192.168.24.10 Oct 26 03:53:20 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-10.0.0.101 Oct 26 03:53:36 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-172.17.1.12 Oct 26 03:53:52 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-172.17.1.10 Oct 26 03:54:07 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-172.17.3.17 Oct 26 03:54:24 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-172.17.4.13 [root@controller-0 10]# ip -o a |grep '/32' [root@controller-0 10]# 7) VIP netmask changed (from /24 to /32) and bundle replicas number changed rabbitmq-bundle went from two replicas (see 5.) to three: Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest] rabbitmq-bundle-0 (ocf:💓rabbitmq-cluster): Started controller-2 rabbitmq-bundle-1 (ocf:💓rabbitmq-cluster): Started controller-0 rabbitmq-bundle-2 (ocf:💓rabbitmq-cluster): Started controller-1 VIP netmasks correctly moved from /24 to /32 8) Redeploy with no changes No resources were restarted 9) Change mount point order in puppet does not restart the service Oct 26 04:59:36 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned false for resource rabbitmq-bundle 10) Change a single option of a mountpoint Oct 26 05:32:13 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle and rabbitmq-bundle was correctly restarted with the correct bind mounts 11) Change a parameter of a pacemaker remote resource Added the following hiera keys: pacemaker::resource::remote::deep_compare: true pacemaker_remote_reconnect_interval: 105 and correctly got: [root@controller-0 ~]# pcs resource show compute-0 |grep reconnect Attributes: reconnect_interval=105 server=172.17.1.14 12) Change 1 mount point on three bundles Oct 26 10:40:30 controller-0 dockerd-current[19339]: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle 13) Try and change a meta attribute Added an 'ordered=true' to the rabbitmq bundle [root@controller-0 ~]# pcs resource show rabbitmq-bundle |grep ordered Meta Attrs: container-attribute-target=host notify=true ordered=true 14) Delete a single location constraint and redeploy [root@controller-0 ~]# pcs constraint remove location-rabbitmq-bundle ..redeployed.. [root@controller-0 ~]# pcs constraint |grep rabbit Resource: rabbitmq-bundle Constraint: location-rabbitmq-bundle (resource-discovery=exclusive) Expression: rabbitmq-role eq true Other things verified during the tests above: - Constraints stayed the same - Properties stayed the same [1] https://bugzilla.redhat.com/show_bug.cgi?id=1598197 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1568353 Closes-Bug: #1797542 Change-Id: I048fbb6b0319bac33c82ad2d7639d0f594fb73bb
This commit is contained in:
parent
ce81a99d08
commit
90033be653
|
@ -3,7 +3,27 @@ require_relative '../pcmk_common'
|
|||
Puppet::Type.type(:pcmk_bundle).provide(:default) do
|
||||
desc 'A bundle resource definition for pacemaker'
|
||||
|
||||
def build_pcs_bundle_cmd
|
||||
def _storage_maps_cmd(storage_maps, update=false)
|
||||
return '' if storage_maps == nil or storage_maps.empty?
|
||||
cmd = ''
|
||||
if update
|
||||
add = ' add '
|
||||
else
|
||||
add = ' '
|
||||
end
|
||||
storage_maps.each do | key, value |
|
||||
cmd += ' storage-map' + add + 'id=' + key + \
|
||||
' source-dir=' + value['source-dir'] + \
|
||||
' target-dir=' + value['target-dir']
|
||||
options = value['options']
|
||||
if not_empty_string(options)
|
||||
cmd += ' options=' + options
|
||||
end
|
||||
end
|
||||
cmd
|
||||
end
|
||||
|
||||
def build_pcs_bundle_cmd(update=false)
|
||||
image = @resource[:image]
|
||||
replicas = @resource[:replicas]
|
||||
masters = @resource[:masters]
|
||||
|
@ -15,8 +35,16 @@ Puppet::Type.type(:pcmk_bundle).provide(:default) do
|
|||
location_rule = @resource[:location_rule]
|
||||
container_backend = @resource[:container_backend]
|
||||
|
||||
if update
|
||||
create_cmd = 'update'
|
||||
docker_cmd = ''
|
||||
else
|
||||
create_cmd = 'create'
|
||||
docker_cmd = 'docker'
|
||||
end
|
||||
|
||||
# Build the 'pcs resource create' command. Check out the pcs man page :-)
|
||||
cmd = 'resource bundle create ' + @resource[:name]+' container ' + container_backend + ' image=' + @resource[:image]
|
||||
cmd = 'resource bundle ' + create_cmd + ' ' + @resource[:name] + ' container ' + docker_cmd + ' image=' + @resource[:image]
|
||||
if replicas
|
||||
cmd += " replicas=#{replicas}"
|
||||
end
|
||||
|
@ -33,26 +61,30 @@ Puppet::Type.type(:pcmk_bundle).provide(:default) do
|
|||
cmd += ' ' + container_options
|
||||
end
|
||||
|
||||
# storage_maps[id] = {'source' => value, 'target' => value, 'options' => value}
|
||||
# FIXME: need to do proper error checking here
|
||||
if storage_maps and !storage_maps.empty?
|
||||
storage_maps.each do | key, value |
|
||||
cmd += ' storage-map id=' + key + \
|
||||
' source-dir=' + value['source-dir'] + \
|
||||
' target-dir=' + value['target-dir']
|
||||
options = value['options']
|
||||
if not_empty_string(options)
|
||||
cmd += ' options=' + options
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
# When we're updating a bundle we first dump the CIB, then
|
||||
# we remove all the *current* storage-maps for the resource
|
||||
# and then we add back the storage-maps passed to us
|
||||
cmd += _storage_maps_cmd(storage_maps, update)
|
||||
if network
|
||||
cmd += ' network ' + network
|
||||
end
|
||||
cmd
|
||||
end
|
||||
|
||||
def build_pcs_bundle_pruning
|
||||
cmd = 'resource bundle update ' + @resource[:name]
|
||||
# In case of updates due to how pcs manages storage, we need to first remove all
|
||||
# the *current* existing storage maps and then readd the puppet defined ones
|
||||
live_storage_maps = pcmk_get_bundle_storage_map(@resource[:name])
|
||||
if live_storage_maps and !live_storage_maps.empty?
|
||||
live_storage_maps.each do | key, value |
|
||||
cmd += ' storage-map remove ' + value['id']
|
||||
end
|
||||
return cmd
|
||||
end
|
||||
return ''
|
||||
end
|
||||
|
||||
### overloaded methods
|
||||
def initialize(*args)
|
||||
super(*args)
|
||||
|
@ -63,10 +95,11 @@ Puppet::Type.type(:pcmk_bundle).provide(:default) do
|
|||
end
|
||||
|
||||
def create_bundle_and_location(location_rule, needs_update=false)
|
||||
cmd = build_pcs_bundle_cmd()
|
||||
if needs_update then
|
||||
pcmk_update_resource(@resource, cmd, @resource[:update_settle_secs])
|
||||
cmd = build_pcs_bundle_cmd(update=true)
|
||||
pcmk_update_resource(@resource, cmd, build_pcs_bundle_pruning(), @resource[:update_settle_secs])
|
||||
else
|
||||
cmd = build_pcs_bundle_cmd()
|
||||
if location_rule then
|
||||
pcs('create', @resource[:name], "#{cmd} --disabled", @resource[:tries],
|
||||
@resource[:try_sleep], @resource[:verify_on_create], @resource[:post_success_sleep])
|
||||
|
@ -139,7 +172,7 @@ Puppet::Type.type(:pcmk_bundle).provide(:default) do
|
|||
if ret == false then
|
||||
return PCMK_NOTEXISTS
|
||||
end
|
||||
if @resource[:deep_compare] and pcmk_resource_has_changed?(@resource, build_pcs_bundle_cmd(), true) then
|
||||
if @resource[:deep_compare] and pcmk_resource_has_changed?(@resource, build_pcs_bundle_cmd(update=true), build_pcs_bundle_pruning(), true) then
|
||||
return PCMK_CHANGENEEDED
|
||||
end
|
||||
return PCMK_NOCHANGENEEDED
|
||||
|
|
|
@ -230,6 +230,25 @@ def push_cib_offline(cib, tries=1, try_sleep=0, post_success_sleep=0)
|
|||
end
|
||||
end
|
||||
|
||||
# returns the storage map for the resource as a dictionary
|
||||
def pcmk_get_bundle_storage_map(resource)
|
||||
storage_xpath = "/cib/configuration/resources/bundle[@id='#{resource}']/storage/storage-mapping"
|
||||
cib = backup_cib()
|
||||
cibxml = File.read(cib)
|
||||
storage_doc = REXML::Document.new cibxml
|
||||
ret = {}
|
||||
REXML::XPath.each(storage_doc, storage_xpath) do |element|
|
||||
attrs = {}
|
||||
element.attributes.each do |key, value|
|
||||
attrs[key] = value
|
||||
end
|
||||
ret[attrs['id']] = attrs
|
||||
end
|
||||
delete_cib(cib)
|
||||
Puppet.debug("pcmk_get_bundle_storage_map #{resource} returned #{ret}")
|
||||
ret
|
||||
end
|
||||
|
||||
# The following function will take a resource_name an xml graph file as generated by crm_simulate and
|
||||
# will return true if the resource_name is contained in the transition graph (i.e. the cluster would
|
||||
# restart the resource) and false if not (i.e. the cluster would not restart the resource)
|
||||
|
@ -298,6 +317,48 @@ def is_crm_diff_buggy?
|
|||
raise Puppet::Error, "#{cmd} failed with (#{ret}): #{cmd_out}"
|
||||
end
|
||||
|
||||
# This function will return true when a CIB diff xml has an empty meta_attribute change (either
|
||||
# addition or removal). It does so by veryfiying that the diff has an empty meta_attribute node
|
||||
# and when that is the case it verifies that the corresponding meta_attributes
|
||||
# for the resource in the CIB is indeed either non-existing or has no children
|
||||
def has_empty_meta_attributes?(cibfile, element)
|
||||
# First we verify that the cib diff does contain an empty meta_attributes node, like this:
|
||||
# <change operation='create' path='/cib/configuration/resources/primitive[@id='ip-172.16.11.97']' position='2'>
|
||||
# <meta_attributes id='ip-172.16.11.97-meta_attributes'/>
|
||||
# </change>
|
||||
if element.attributes.has_key?('operation') and \
|
||||
['delete', 'create'].include? element.attributes['operation'] and \
|
||||
element.attributes.has_key?('path')
|
||||
path = element.attributes['path']
|
||||
element.each_element('//meta_attributes') do |meta|
|
||||
# If the meta_attributes was an empty set we verify that it is so in the CIB as well
|
||||
# and if that is the case we return true
|
||||
if not meta.has_elements?
|
||||
begin
|
||||
meta_id = meta.attributes['id']
|
||||
orig_cib = File.read(cibfile)
|
||||
meta_doc = REXML::Document.new orig_cib
|
||||
meta_xpath = "//meta_attributes[@id='#{meta_id}']"
|
||||
meta_search = meta_doc.get_elements(meta_xpath)
|
||||
# If there are not meta_attributes at all or if we have a tag but it has no elements
|
||||
# then we return true
|
||||
if meta_search.empty? or (meta_search.length() > 0 and not meta_search[0].has_elements?)
|
||||
Puppet.debug("has_empty_meta_attributes? detected and empty meta_attribute change and empty meta_attribute in the CIB, skipping: #{meta_id}")
|
||||
return true
|
||||
end
|
||||
rescue
|
||||
# Should there be any kind of exception in the code above we take
|
||||
# the slightly safer path and we simply return false which implies
|
||||
# updating the CIB and pushing it to the live cluster
|
||||
return false
|
||||
end
|
||||
end
|
||||
end
|
||||
end
|
||||
return false
|
||||
end
|
||||
|
||||
|
||||
# same as pcmk_restart_resource? but using crm_diff
|
||||
def pcmk_restart_resource_ng?(resource_name, cib)
|
||||
cmd = "#{CRMDIFF_BIN} --cib -o #{cib}.orig -n #{cib}"
|
||||
|
@ -317,8 +378,11 @@ def pcmk_restart_resource_ng?(resource_name, cib)
|
|||
graph_doc = REXML::Document.new cmd_out
|
||||
# crm_diff --cib -o cib-orig.xml -n cib-vip-update.xml | \
|
||||
# xmllint --xpath '/diff/change[@operation and contains(@path, "ip-192.168.24.6")]/change-result' -
|
||||
xpath_query = "/diff/change[@operation and contains(@path, \"#{resource_name}\")]/change-result"
|
||||
xpath_query = "/diff/change[@operation and @operation != 'move' and contains(@path, \"@id='#{resource_name}'\")]"
|
||||
REXML::XPath.each(graph_doc, xpath_query) do |element|
|
||||
# We need to check for removals of empty meta_attribute tags and ignore those
|
||||
# See https://bugzilla.redhat.com/show_bug.cgi?id=1568353 for pcs creating those spurious empty tags
|
||||
next if has_empty_meta_attributes?(cib, element)
|
||||
return true
|
||||
end
|
||||
return false
|
||||
|
@ -348,18 +412,19 @@ end
|
|||
# 1. Deletes the resource from the offline CIB
|
||||
# 2. Recreates the resource on the offline CIB
|
||||
# 3. Verifies if the pacemaker will restart the resource and returns true if the answer is a yes
|
||||
def pcmk_resource_has_changed?(resource, cmd_create, is_bundle=false)
|
||||
def pcmk_resource_has_changed?(resource, cmd_update, cmd_pruning='', is_bundle=false)
|
||||
cib = backup_cib()
|
||||
cmd_delete = "resource delete #{resource[:name]}"
|
||||
ret = pcs_offline(cmd_delete, cib)
|
||||
if ret == false
|
||||
delete_cib(cib)
|
||||
raise Puppet::Error, "pcmk_resource_has_changed? #{cmd_delete} returned error on #{resource[:name]}. This should never happen."
|
||||
if not_empty_string(cmd_pruning)
|
||||
ret = pcs_offline(cmd_pruning, cib)
|
||||
if ret == false
|
||||
delete_cib(cib)
|
||||
raise Puppet::Error, "pcmk_update_resource #{cmd_pruning} returned error on #{resource[:name]}. This should never happen."
|
||||
end
|
||||
end
|
||||
ret = pcs_offline(cmd_create, cib)
|
||||
ret = pcs_offline(cmd_update, cib)
|
||||
if ret == false
|
||||
delete_cib(cib)
|
||||
raise Puppet::Error, "pcmk_resource_has_changed? #{cmd_create} returned error #{resource[:name]}. This should never happen."
|
||||
raise Puppet::Error, "pcmk_resource_has_changed? #{cmd_update} returned error #{resource[:name]}. This should never happen."
|
||||
end
|
||||
if is_crm_diff_buggy?
|
||||
ret = pcmk_restart_resource?(resource[:name], cib, is_bundle)
|
||||
|
@ -372,33 +437,23 @@ def pcmk_resource_has_changed?(resource, cmd_create, is_bundle=false)
|
|||
return ret
|
||||
end
|
||||
|
||||
# This function will update a resource by making a cib backup
|
||||
# removing the resource and readding it and then push the CIB
|
||||
# to the cluster
|
||||
def pcmk_update_resource(resource, cmd_create, settle_timeout_secs=600)
|
||||
# This function will update a resource by making a cib backup,
|
||||
# running a pruning command first and then running the update command.
|
||||
# Finally it pushes the CIB back to the cluster.
|
||||
def pcmk_update_resource(resource, cmd_update, cmd_pruning='', settle_timeout_secs=600)
|
||||
cib = backup_cib()
|
||||
cmd_delete = "resource delete #{resource[:name]}"
|
||||
ret = pcs_offline(cmd_delete, cib)
|
||||
if ret == false
|
||||
delete_cib(cib)
|
||||
raise Puppet::Error, "pcmk_update_resource #{cmd_delete} returned error on #{resource[:name]}. This should never happen."
|
||||
end
|
||||
ret = pcs_offline(cmd_create, cib)
|
||||
if ret == false
|
||||
delete_cib(cib)
|
||||
raise Puppet::Error, "pcmk_update_resource #{cmd_create} returned error on #{resource[:name]}. This should never happen."
|
||||
end
|
||||
if resource[:location_rule] then
|
||||
# Some versions of pcs do not automatically remove the location rule associated to a
|
||||
# bundle. So we might end up here with the location_rule still existing
|
||||
# Let's just force its creation and ignore the fact that it might already be there
|
||||
cmd_location = build_pcs_location_rule_cmd(resource, force=true)
|
||||
ret = pcs_offline(cmd_location, cib)
|
||||
if not_empty_string(cmd_pruning)
|
||||
ret = pcs_offline(cmd_pruning, cib)
|
||||
if ret == false
|
||||
delete_cib(cib)
|
||||
raise Puppet::Error, "pcmk_update_resource #{cmd_location} returned error on #{resource[:location_rule]}. This should never happen."
|
||||
raise Puppet::Error, "pcmk_update_resource #{cmd_pruning} returned error on #{resource[:name]}. This should never happen."
|
||||
end
|
||||
end
|
||||
ret = pcs_offline(cmd_update, cib)
|
||||
if ret == false
|
||||
delete_cib(cib)
|
||||
raise Puppet::Error, "pcmk_update_resource #{cmd_update} returned error on #{resource[:name]}. This should never happen."
|
||||
end
|
||||
push_cib_offline(cib, resource[:tries], resource[:try_sleep], resource[:post_success_sleep])
|
||||
cmd = "#{TIMEOUT_BIN} #{settle_timeout_secs} #{CRMRESOURCE_BIN} --wait"
|
||||
cmd_out = `#{cmd}`
|
||||
|
|
|
@ -3,7 +3,7 @@ require_relative '../pcmk_common'
|
|||
Puppet::Type.type(:pcmk_resource).provide(:default) do
|
||||
desc 'A base resource definition for a pacemaker resource'
|
||||
|
||||
def build_pcs_resource_cmd
|
||||
def build_pcs_resource_cmd(update=false)
|
||||
resource_params = @resource[:resource_params]
|
||||
meta_params = @resource[:meta_params]
|
||||
op_params = @resource[:op_params]
|
||||
|
@ -21,9 +21,14 @@ Puppet::Type.type(:pcmk_resource).provide(:default) do
|
|||
raise(Puppet::Error, "May only define one of clone_params, "+
|
||||
"master_params and group_params")
|
||||
end
|
||||
if update
|
||||
create_cmd = ' update '
|
||||
else
|
||||
create_cmd = ' create '
|
||||
end
|
||||
|
||||
# Build the 'pcs resource create' command. Check out the pcs man page :-)
|
||||
cmd = 'resource create ' + @resource[:name]+' ' +@resource[:resource_type]
|
||||
cmd = 'resource' + create_cmd + @resource[:name] + ' ' + @resource[:resource_type]
|
||||
if @resource[:resource_type] == 'remote'
|
||||
if not_empty_string(@resource[:remote_address])
|
||||
cmd += ' server=' + @resource[:remote_address]
|
||||
|
@ -73,10 +78,11 @@ Puppet::Type.type(:pcmk_resource).provide(:default) do
|
|||
end
|
||||
|
||||
def create_resource_and_location(location_rule, needs_update=false)
|
||||
cmd = build_pcs_resource_cmd()
|
||||
if needs_update then
|
||||
pcmk_update_resource(@resource, cmd, @resource[:update_settle_secs])
|
||||
cmd = build_pcs_resource_cmd(update=true)
|
||||
pcmk_update_resource(@resource, cmd, '', @resource[:update_settle_secs])
|
||||
else
|
||||
cmd = build_pcs_resource_cmd()
|
||||
if location_rule then
|
||||
pcs('create', @resource[:name], "#{cmd} --disabled", @resource[:tries],
|
||||
@resource[:try_sleep], @resource[:verify_on_create], @resource[:post_success_sleep])
|
||||
|
@ -149,7 +155,7 @@ Puppet::Type.type(:pcmk_resource).provide(:default) do
|
|||
if ret == false then
|
||||
return PCMK_NOTEXISTS
|
||||
end
|
||||
if @resource[:deep_compare] and pcmk_resource_has_changed?(@resource, build_pcs_resource_cmd()) then
|
||||
if @resource[:deep_compare] and pcmk_resource_has_changed?(@resource, build_pcs_resource_cmd(update=true), '') then
|
||||
return PCMK_CHANGENEEDED
|
||||
end
|
||||
return PCMK_NOCHANGENEEDED
|
||||
|
|
|
@ -107,7 +107,7 @@ define pacemaker::resource::remote(
|
|||
$try_sleep = 0,
|
||||
$verify_on_create = false,
|
||||
$location_rule = undef,
|
||||
$deep_compare = false,
|
||||
$deep_compare = hiera('pacemaker::resource::remote::deep_compare', false),
|
||||
$update_settle_secs = hiera('pacemaker::resource::remote::update_settle_secs', 600),
|
||||
) {
|
||||
pcmk_resource { $name:
|
||||
|
|
Loading…
Reference in New Issue