We deprecated setting target-role values via the meta attribute, in
favor of :start/:stop actions on the resource. So this should not be
relied upon anymore, and it's safe to drop this if we want to.
There is one racy case where this matters:
- node1 and node2 try to create a primitive with the chef resource; on
initial creation, we set target-role='Stopped' because we do not
want to autostart primitives.
- because they can't create it at the same time, node2 will fail on
creation. If the chef resource is configured to retry, then node2
will then try to update the primitive (since it now exists); but the
chef resource is not reloaded so still has target-role='Stopped'.
- if node1 had also started the primitive before node2 retries the
:create, then the target-role will be changed from 'Started' to
'Stopped' with the update.
This can result in a primitive not being started with [:create, :start].
Therefore, we just delete this deprecated bit from meta to avoid any issue.
[Thu, 31 Jul 2014 15:11:55 +0200] INFO: Processing pacemaker_primitive[vip-public-cluster-proposal-1] action create (crowbar-pacemaker::haproxy line 15)
[Thu, 31 Jul 2014 15:11:56 +0200] INFO: Creating new primitive resource 'vip-public-cluster-proposal-1'
[Thu, 31 Jul 2014 15:11:56 +0200] INFO: Processing execute[crm configure primitive vip-public-cluster-proposal-1 ocf💓IPaddr2 params ip="192.168.126.5" meta target-role="Stopped" ] action run (/var/chef/cache/cookbooks/pacemaker/libraries/chef/mixin/pacemaker/sta
ndard_cib_object.rb line 91)
================================================================================
Error executing action `run` on resource 'execute[crm configure primitive vip-public-cluster-proposal-1 ocf💓IPaddr2 params ip="192.168.126.5" meta target-role="Stopped" ]'
================================================================================
Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Expected process to exit with [0], but received '1'
---- Begin output of crm configure primitive vip-public-cluster-proposal-1 ocf💓IPaddr2 params ip="192.168.126.5" meta target-role="Stopped" ----
STDOUT:
STDERR: ERROR: vip-public-cluster-proposal-1: id is already in use
---- End output of crm configure primitive vip-public-cluster-proposal-1 ocf💓IPaddr2 params ip="192.168.126.5" meta target-role="Stopped" ----
Ran crm configure primitive vip-public-cluster-proposal-1 ocf💓IPaddr2 params ip="192.168.126.5" meta target-role="Stopped" returned 1
[...]
[Thu, 31 Jul 2014 15:11:56 +0200] INFO: Retrying execution of pacemaker_primitive[vip-public-cluster-proposal-1], 0 attempt(s) left
[Thu, 31 Jul 2014 15:12:02 +0200] INFO: Checking existing primitive resource 'vip-public-cluster-proposal-1' for modifications
[Thu, 31 Jul 2014 15:12:02 +0200] INFO: vip-public-cluster-proposal-1's ip params didn't change
[Thu, 31 Jul 2014 15:12:02 +0200] INFO: vip-public-cluster-proposal-1's target-role meta changed from to Stopped
[Thu, 31 Jul 2014 15:12:02 +0200] INFO: Processing execute[crm_resource --resource vip-public-cluster-proposal-1 --set-parameter "target-role" --parameter-value "Stopped" --meta] action run (/var/chef/cache/cookbooks/pacemaker/providers/primitive.rb line 88)
[Thu, 31 Jul 2014 15:12:02 +0200] INFO: execute[crm_resource --resource vip-public-cluster-proposal-1 --set-parameter "target-role" --parameter-value "Stopped" --meta] ran successfully
[Thu, 31 Jul 2014 15:12:02 +0200] INFO: Processing execute[crm configure primitive vip-public-cluster-proposal-1 ocf💓IPaddr2 params ip="192.168.126.5" meta target-role="Stopped" ] action nothing (/var/chef/cache/cookbooks/pacemaker/libraries/chef/mixin/pacemaker
/standard_cib_object.rb line 91)
[Thu, 31 Jul 2014 15:12:02 +0200] INFO: Processing execute[crm_resource --resource vip-public-cluster-proposal-1 --set-parameter "target-role" --parameter-value "Stopped" --meta] action nothing (/var/chef/cache/cookbooks/pacemaker/providers/primitive.rb line 88)
For some reason, sometimes, I see them not running after the cluster
creation. Since we're supposed to explicitly start all primitives, this
should work better.
Pacemaker::Resource.extract_hash() is written to extract things from a complete
CIB object definition. Therefore it rightfully expects whitespaces before the
data_type ("params" in this case). So we just add those here to make the string
parsable by extract_hash.
https://bugzilla.novell.com/show_bug.cgi?id=887244
Stonith + SBD deployment can fail due to SBD not being created on the
shared disk. The message does not really tell what went wrong and what
should be done to fix it. Use a more descriptive one.
The SBD creation is a destructive op and a typo could wipe the user
data, so this is the reason that sbd create is not run by default.
An alternative would be to run the sdb create tonly after user types
the volume name into the input field in the UI twice, for confirmation.
Travis started failing with:
uninitialized constant RSpec::Matchers::BuiltIn::RaiseError::MatchAliases (NameError)
due to bundler installing rspec-expectations-2.99 which is a
pre-release:
https://github.com/rubygems/rubygems/issues/853
Every time we start corosync, we need to wait for the cluster to be up.
And when we use sbd, we restart corosync to start sbd, so we need to
wait there too.
10 minutes is way too long and people have time to think that something
is just totally broken. 1 minute is more reasonable. And in the worst
case, an attribute can be set to change this now.
While we cannot fully deploy SBD fencing (the user needs to provide the
devices and to make sure there's a watchdog for this hardware), we can
help a bit by doing the last few bits.
In Pacemaker, target-role defaults to 'Started', but we want to allow
consumers of the LWRPs the choice whether their newly created resource
gets started or not, and we also want to adhere to the Principle of
Least Surprise. Therefore we stick to the intuitive semantics that
action :create
creates the resource with target-role="Stopped" in order to prevent it
from starting immediately, whereas
action [:create, :start]
creates the resource and then starts it.
Since we are honouring :start / :stop actions to determine the
target-role value, if target-role is specified via meta, it will just be
overridden anyway. So we also deprecate direct use of target-role meta
parameter in recipes.
We want to allow a resource X to be created without starting it, and
then for a group/clone Y to be created referencing X, which can
subsequently be started. This ensures that child resources are only
started via their parent, respecting the constraints their parent needs
to impose (e.g. ordering if the parent is a group).
However in this case, changing target-role on Y causes a "Do you want to
override target-role for child resource ..." interactive prompt. When
STDIN is not a tty, --force is required to ensure that the target-role
on the child resource. This only works with newer versions of crm which
are patched according to:
https://bugzilla.novell.com/show_bug.cgi?id=868697
The pacemaker_vip_primitive definition only works in a Crowbar
environment (due to retrieving the VIP name/address from Crowbar's
data bag for the relevant network), so it belongs in the
crowbar-pacemaker cookbook not the pacemaker cookbook.
Currently we are seeing pacemaker_vip_primitive occasionally
generate NoMethodError exceptions like:
undefined method `[]' for nil:NilClass
Clearly we need more helpful error messages when things go wrong.
Some STONITH plugin take hostname as param, while some take hostlist.
And we can't really guess that. So for now, we check if there's a
hostname or hostlist key in the params, and if one of them is there, we
don't do anything. If none is them, we try with hostname.
We have some races when creating STONITH resources because nodes are
creating resources at the same time. It turns out that, in the case of
crowbar, we can have each node configure its own resources only because
we know all nodes will run the code.
On the initial run, it takes 20-30 seconds before corosync is ready to
accept changes with "crm configure". So wait for that before we continue
(and apply crm-initial.conf, for instance).
By default, STONITH is marked as disabled. It can be enabled and
configured:
- manually (by the user)
- with a stonith plugin that will be used with a clone resource
- with a stonith plugin that will have per-node parameters
There is no point in loading it if it's the same as before, since we
already imported these settings.
We load it immediately because we know that corosync is running already,
and it's a critical file that we do not want to fail loading because of
a later crash in chef-client.
This is an alternative provider for Chef's "service" platform resource.
It allows us to make an existing service resource Pacemaker-aware by
adding a single line to the service block.