summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBogdan Dobrelya <bdobrelia@mirantis.com>2015-04-09 10:17:24 +0200
committerBogdan Dobrelya <bdobrelia@mirantis.com>2015-04-09 12:23:14 +0200
commit1a3849bd98e94c750883efe4ec6290b4359efb89 (patch)
tree924e4e05c539d1da33431219a41f2005d124008a
parente49522c66a0dc2ea2748d0b0bbd08612d01a232f (diff)
Update docs
Update example and README docs: * Quotes are important for 'off' as YAML treats off w/o quotes as a false * Updated info about recommended cluster configuration for 'suicide' no quorum policy. * Updated details about 'reboot' and 'poweroff' policy values * Provided example provision/deploy commands * Update known issues Change-Id: I4ce2c6641d221c8b37fe275029973b5968d27cb1 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Notes
Notes (review): Verified+2: Jenkins Code-Review+2: Bogdan Dobrelya <bdobrelia@mirantis.com> Workflow+1: Bogdan Dobrelya <bdobrelia@mirantis.com> Code-Review+1: Timur Nurlygayanov <tnurlygayanov@mirantis.com> Submitted-by: Jenkins Submitted-at: Thu, 09 Apr 2015 11:19:58 +0000 Reviewed-on: https://review.openstack.org/171964 Project: stackforge/fuel-plugin-ha-fencing Branch: refs/heads/master
-rw-r--r--README.md63
-rw-r--r--deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing.yaml6
-rw-r--r--deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing_virsh.yaml6
3 files changed, 63 insertions, 12 deletions
diff --git a/README.md b/README.md
index df1ec08..178909e 100644
--- a/README.md
+++ b/README.md
@@ -56,14 +56,27 @@ Note that in order to build this plugin the following tools must present:
56 56
57* Create an HA environment and select the fencing policy (reboot, poweroff or 57* Create an HA environment and select the fencing policy (reboot, poweroff or
58 disabled) at the settings tab. 58 disabled) at the settings tab.
59 Note, that there is no difference between the 'reboot' and 'poweroff' policy for
60 this version of the plugin. The 'reboot' or 'poweroff' value just enables the
61 fencing feature, while the 'disabled' value - disables it. The difference may
62 present for future versions, when creation of the YAML configuration files for
63 nodes will be automated.
59 64
60* Assign roles to the nodes as always, but use Fuel CLI instead of Deploy button 65* Assign roles to the nodes as always, but use Fuel CLI instead of Deploy button
61 to provision all nodes in the environment. Please note, that the power management 66 to provision all nodes in the environment. Please note, that the power management
62 devices should be reachable from the management network via TCP protocol. 67 devices should be reachable from the management network via TCP protocol:
68
69 ```
70 fuel --env <environment_id> node --provision --node <nodes_list>
71 ```
72
73 (node list should be comma-separated like 1,2,3,4)
63 74
64* Define YAML configuration files for controller nodes and existing power management 75* Define YAML configuration files for controller nodes and existing power management
65 (PM aka STONITH) devices. See an example in 76 (PM aka STONITH) devices. See an example in
66 ``deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing.yaml``. 77 [deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing.yaml](https://github.com/stackforge/fuel-plugin-ha-fencing/blob/master/deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing.yaml).
78 Note, that quotes for the 'off' and 'reboot' values are important as just an ``off``
79 would be equal to ``false``, which is wrong.
67 80
68 In the given example we assume 'reboot' policy, which is a hard resetting of 81 In the given example we assume 'reboot' policy, which is a hard resetting of
69 the failed nodes in Pacemaker cluster. We define IPMI reset action and PSU OFF/ON 82 the failed nodes in Pacemaker cluster. We define IPMI reset action and PSU OFF/ON
@@ -116,12 +129,19 @@ Note that in order to build this plugin the following tools must present:
116* Put created fencing configuration YAML files as ``/etc/pcs_fencing.yaml`` 129* Put created fencing configuration YAML files as ``/etc/pcs_fencing.yaml``
117 for corresponding controller nodes. 130 for corresponding controller nodes.
118 131
119* Deploy HA environment either by CLI command or Deploy button 132* Deploy HA environment either by Deploy button in UI or by CLI command:
133
134 ```
135 fuel --env <environment_id> node --deploy --node <nodes_list>
136 ```
137
138 (node list should be comma-separated like 1,2,3,4)
120 139
121TODO(bogdando) finish the guide, add agents and devices verification commands 140TODO(bogdando) finish the guide, add agents and devices verification commands
122 141
123Please also note that the recommended value for the ``no-quorum-policy`` cluster property 142Please also note that for clusters containing 3,5,7 or more controllers the recommended
124should be changed manually (after deployment is done) from ignore/stopped to suicide. 143value for the ``no-quorum-policy`` cluster property should be changed manually
144(after deployment is done) from ignore/stopped to suicide.
125For more information on no-quorum policy, see the [Cluster Options](http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-cluster-options.html) 145For more information on no-quorum policy, see the [Cluster Options](http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-cluster-options.html)
126section in the official Pacemaker documentation. You can set this property by the command 146section in the official Pacemaker documentation. You can set this property by the command
127``` 147```
@@ -184,7 +204,7 @@ Plugin :: Fuel version
184Known Issues 204Known Issues
185------------ 205------------
186 206
187[LP1411603](https://bugs.launchpad.net/fuel/+bug/1411603) 207### Concurrent nodes deployment issue [LP1411603](https://bugs.launchpad.net/fuel/+bug/1411603)
188 208
189After the deployment is finished, please make sure all of the controller nodes have 209After the deployment is finished, please make sure all of the controller nodes have
190corresponding ``stonith__*`` primitives and the stonith verification command gives 210corresponding ``stonith__*`` primitives and the stonith verification command gives
@@ -208,6 +228,37 @@ one "allow" location shown by the ref command.
208If some of the controller nodes does not have corresponding stonith primitives 228If some of the controller nodes does not have corresponding stonith primitives
209or locations for them, please follow the workaround provided at the LP bug. 229or locations for them, please follow the workaround provided at the LP bug.
210 230
231### Timer expired responses
232
233There is also possible that fencing actions are timed out with the errors like:
234
235```
236error: remote_op_done: Operation reboot of node-8 by node-7 for
237crmd.7932@node-7.d3cb0ebd: Timer expired
238```
239
240or some nodes configured with 'reboot' policy may enter the reboot loop caused by
241the fencing action.
242
243All of this means that the given values for timeouts should be verified and adjusted
244as appropriate.
245
246### Node stucks in pending state after was powered on
247
248There is a known bug in pacemaker 1.1.10 when the fenced node returns back too fast
249(see this [mail thread](http://oss.clusterlabs.org/pipermail/pacemaker/2014-April/021564.html) for details):
250
251Essentially the node is returning "too fast" (specifically, before the fencing
252notification arrives) causing pacemaker to forget the node is up and healthy.
253The fix for this is https://github.com/beekhof/pacemaker/commit/e777b17 and is
254present in 1.1.11
255
256As a workaround you should not bring the failed node back within few minutes after
257it had been STONITHed. And if it still stucks in pending state, you can restart its
258corosync service. And if corosync service hangs on stop and have to be killed and
259restarted - make it fast, otherwise another STONITH action triggered by dead corosync
260process would arrive.
261
211Release Notes 262Release Notes
212------------- 263-------------
213 264
diff --git a/deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing.yaml b/deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing.yaml
index ea34149..2dfc584 100644
--- a/deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing.yaml
+++ b/deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing.yaml
@@ -41,9 +41,9 @@ fence_primitives:
41 auth: password 41 auth: password
42 power_wait: '15' 42 power_wait: '15'
43 delay: '300' 43 delay: '300'
44 action: reboot 44 action: 'reboot'
45 pcmk_reboot_action: reboot 45 pcmk_reboot_action: 'reboot'
46 pcmk_off_action: reboot 46 pcmk_off_action: 'reboot'
47 pcmk_host_list: node-10.test.local 47 pcmk_host_list: node-10.test.local
48 psu_off: 48 psu_off:
49 agent_type: fence_apc_snmp 49 agent_type: fence_apc_snmp
diff --git a/deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing_virsh.yaml b/deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing_virsh.yaml
index a2a3f19..47406d7 100644
--- a/deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing_virsh.yaml
+++ b/deployment_scripts/puppet/modules/pcs_fencing/examples/pcs_fencing_virsh.yaml
@@ -37,7 +37,7 @@ fence_primitives:
37 login_timeout: '5' 37 login_timeout: '5'
38 secure: true 38 secure: true
39 delay: '300' 39 delay: '300'
40 action: reboot 40 action: 'reboot'
41 pcmk_reboot_action: reboot 41 pcmk_reboot_action: 'reboot'
42 pcmk_off_action: reboot 42 pcmk_off_action: 'reboot'
43 pcmk_host_map: 'node-7:env60_slave-07' 43 pcmk_host_map: 'node-7:env60_slave-07'