Documentation update

This commit uniforms and updates the documentation of the various roles inside the repo. Change-Id: Ib08e0206992bcc454bd632fa85e2c16ce1111fdf
2017-03-27 15:01:50 +02:00 · 2017-03-27 15:01:50 +02:00 · 5f4aa533f6
parent 79560a3287
commit 5f4aa533f6
5 changed files with 159 additions and 118 deletions
--- a/README.md
+++ b/README.md
@ -1,15 +1,9 @@
-Team and repository tags
-========================
+Utility roles and docs for tripleo-quickstart
+=============================================

-[![Team and repository tags](http://governance.openstack.org/badges/tripleo-quickstart-extras.svg)](http://governance.openstack.org/reference/tags/index.html)
-
-<!-- Change things from this point on -->
-
-Extra roles for tripleo-quickstart
-==================================
-
-These Ansible role are extending the functionality of tripleo-quickstart to do
-end-to-end deployment and testing of TripleO.
+These Ansible roles are a set of useful tools to be used on top of TripleO
+deployments. They can be used together with tripleo-quickstart (and
+[tripleo-quickstart-extras](https://github.com/openstack/tripleo-quickstart-extras).

 The documentation of each role is located in the individual role folders, and
 general usage information is in the [tripleo-quickstart
--- a/docs/multi-virtual-undercloud/README.md
+++ b/docs/multi-virtual-undercloud/README.md
@ -12,20 +12,20 @@ Requirements
 **Physical switches**

 The switch(es) must support VLAN tagging and all the ports must be configured in
-trunk, so that the dedicated network interface on the physical host (in the 
+trunk, so that the dedicated network interface on the physical host (in the
 examples the secondary interface, eth1) is able to offer PXE and dhcp to all the
 overcloud machines via undercloud virtual machine's bridged interface.

 **Host hardware**

-The main requirement to make this kind of setup working is to have a host 
+The main requirement to make this kind of setup working is to have a host
 powerful enough to run virtual machines with at least 16GB of RAM and 8 cpus.
 The more power you have, the more undercloud machines you can spawn without
 having impact on performances.

 **Host Network topology**

-Host is reachable via ssh from the machine launching quickstart and configured 
+Host is reachable via ssh from the machine launching quickstart and configured
 with two main network interfaces:

 - **eth0**: bridged on **br0**, pointing to LAN (underclouds will own an IP to
@ -33,14 +33,14 @@ with two main network interfaces:
 - **eth1**: connected to the dedicated switch that supports all the VLANs that
  will be used in the deployment;

-Over eth1, for each undercloud virtual machine two VLAN interfaces are created, 
+Over eth1, for each undercloud virtual machine two VLAN interfaces are created,
 with associated bridges:

- **Control plane network bridge** (i.e. br2100) built over VLAN interface (i.e. 
-  eth1.2100) that will be eth1 on the undercloud virtual machine, used by 
+- **Control plane network bridge** (i.e. br2100) built over VLAN interface (i.e.
+  eth1.2100) that will be eth1 on the undercloud virtual machine, used by
  TripleO as br-ctlplane;
- **External network bridge** (i.e. br2105) built over VLAN interface (i.e. 
-  eth1.2105) that will be eth2 on the undercloud virtual machine, used by 
+- **External network bridge** (i.e. br2105) built over VLAN interface (i.e.
+  eth1.2105) that will be eth2 on the undercloud virtual machine, used by
  TripleO as external network device;

 ![network-topology](./multi-virtual-undercloud_network-topology.png "Multi Virtual Undercloud - Network Topology")
@ -48,14 +48,14 @@ with associated bridges:
 Quickstart configuration
 ------------------------

-Virtual undercloud machine is treated as a baremetal one and the Quickstart 
+Virtual undercloud machine is treated as a baremetal one and the Quickstart
 command relies on the baremetal undercloud role, and its playbook.
 This means that any playbook similar to [baremetal-undercloud.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud.yml "Baremetal undercloud playbook") should be okay.

 The configuration file has two specific sections that needs attention:

 - Additional interface for external network to route overcloud traffic:
-  
+
  ```yaml
  undercloud_networks:
     external:
@ -64,30 +64,30 @@ The configuration file has two specific sections that needs attention:
       device_type: ethernet
       device_name: eth2
  ```
-  
-  **NOTE:** in this configuration eth2 is acting also as a default router for 
+
+  **NOTE:** in this configuration eth2 is acting also as a default router for
  the external network.

 - Baremetal provision script, which will be an helper for the
  [multi-virtual-undercloud.sh](./multi-virtual-undercloud.sh) script on the <VIRTHOST>:
-  
+
  ```yaml
   baremetal_provisioning_script: "/path/to/multi-virtual-undercloud-provisioner.sh <VIRTHOST> <DISTRO> <UNDERCLOUD-NAME> <UNDERCLOUD IP> <UNDERCLOUD NETMASK> <UNDERCLOUD GATEWAY> <CTLPLANEV LAN> <EXTERNAL NETWORK VLAN>"
  ```
-  
-  The supported parameters, with the exception of VIRTHOST, are the same ones 
+
+  The supported parameters, with the exception of VIRTHOST, are the same ones
  that are passed to the script that lives (and runs) on the VIRTHOST,
  *multi-virtual-undercloud.sh*.
-  This helper script launches the remote command on VIRTHOST host and ensures 
+  This helper script launches the remote command on VIRTHOST host and ensures
  that the machine gets reachable via ssh before proceeding.

 The multi virtual undercloud script
 -----------------------------------

-The [multi-virtual-undercloud.sh](./multi-virtual-undercloud.sh) script is 
+The [multi-virtual-undercloud.sh](./multi-virtual-undercloud.sh) script is
 placed on the VIRTHOST and needs these parameters:

-1. **DISTRO**: this must be the name (without extension) of one of the images 
+1. **DISTRO**: this must be the name (without extension) of one of the images
   present inside the */images* dir on the VIRTHOST;
 2. **VMNAME**: the name of the undercloud virtual machine (the name that will be
   used by libvirt);
@ -107,7 +107,7 @@ The script's actions are basically:
   installation, it fix size, network interfaces, packages and ssh keys;
 3. Create and launch the virtual undercloud machine;

-**Note**: on the VIRTHOST there must exist an */images* directory containing 
+**Note**: on the VIRTHOST there must exist an */images* directory containing
 images suitable for the deploy.
 Having this directory structure:

@ -142,8 +142,8 @@ this:
  $VIRTHOST
 ```

-So nothing different from a normal quickstart deploy command line, the 
-difference here is made by the config.yml as described above, with its provision 
+So nothing different from a normal quickstart deploy command line, the
+difference here is made by the config.yml as described above, with its provision
 script.

 Conclusions
@ -152,12 +152,12 @@ Conclusions
 This approach can be considered useful in testing multi environments with
 TripleO for three reasons:

-* It is *fast*: it takes the same time to install the undercloud but less to 
+* It is *fast*: it takes the same time to install the undercloud but less to
  provide it, since you don’t have to wait the physical undercloud provision;
-* It is *isolated*: using VLANs to separate the traffic keeps each environment 
+* It is *isolated*: using VLANs to separate the traffic keeps each environment
  completely isolated from the others;
-* It is *reliable*: you can have the undercloud on a shared storage and think 
-  about putting the undercloud vm in HA, live migrating it with libvirt, 
+* It is *reliable*: you can have the undercloud on a shared storage and think
+  about putting the undercloud vm in HA, live migrating it with libvirt,
  pacemaker, whatever...

 There are no macroscopic cons, except for the initial configuration on the
--- a/roles/instance-ha/README.md
+++ b/roles/instance-ha/README.md
@ -9,7 +9,7 @@ Requirements
 ------------

 This role must be used with a deployed TripleO environment, so you'll need a
-working directory of tripleo-quickstart with the following files:
+working directory of tripleo-quickstart or in any case these files available:

 - **hosts**: which will contain all the hosts used in the deployment;
 - **ssh.config.ansible**: which will have all the ssh data to connect to the
@ -24,9 +24,9 @@ Instance HA
 -----------

 Instance HA is a feature that gives a certain degree of high-availability to the
-instances spawned by an OpenStack deployment. Namely, if a compute node on which an
-instance is running breaks for whatever reason, this configuration will spawn the
-instances that were running on the broken node onto a functioning one.
+instances spawned by an OpenStack deployment. Namely, if a compute node on which
+an instance is running breaks for whatever reason, this configuration will spawn
+the instances that were running on the broken node onto a functioning one.
 This role automates are all the necessary steps needed to configure Pacemaker
 cluster to support this functionality. A typical cluster configuration on a
 clean stock **newton** (or **osp10**) deployment is something like this:
@ -156,7 +156,8 @@ Where:
    [defaults]
    roles_path = /path/to/tripleo-quickstart-utils/roles

-**hosts** file must be configured with two *controller* and *compute* sections like these:
+**hosts** file must be configured with two *controller* and *compute* sections
+like these:

    undercloud ansible_host=undercloud ansible_user=stack ansible_private_key_file=/path/to/id_rsa_undercloud
    overcloud-novacompute-1 ansible_host=overcloud-novacompute-1 ansible_user=heat-admin ansible_private_key_file=/path/to/id_rsa_overcloud
@ -184,7 +185,8 @@ Where:
    overcloud-controller-1
    overcloud-controller-0

-**ssh.config.ansible** can *optionally* contain specific per-host connection options, like these:
+**ssh.config.ansible** can *optionally* contain specific per-host connection
+options, like these:

    ...
    ...
--- a/roles/stonith-config/README.md
+++ b/roles/stonith-config/README.md
@ -1,16 +1,62 @@
 stonith-config
 ==============

-This role acts on an already deployed tripleo environment, setting up STONITH (Shoot The Other Node In The Head) inside the Pacemaker configuration for all the hosts that are part of the overcloud.
+This role acts on an already deployed tripleo environment, setting up STONITH
+(Shoot The Other Node In The Head) inside the Pacemaker configuration for all
+the hosts that are part of the overcloud.

 Requirements
 ------------

-This role must be used with a deployed TripleO environment, so you'll need a working directory of tripleo-quickstart with these files:
+This role must be used with a deployed TripleO environment, so you'll need a
+working directory of tripleo-quickstart or in any case these files available:

 - **hosts**: which will contain all the hosts used in the deployment;
- **ssh.config.ansible**: which will have all the ssh data to connect to the undercloud and all the overcloud nodes;
- **instackenv.json**: which must be present on the undercloud workdir. This should be created by the installer;
+- **ssh.config.ansible**: which will have all the ssh data to connect to the
+undercloud and all the overcloud nodes;
+- **instackenv.json**: which must be present on the undercloud workdir. This
+should be created by the installer;
+
+STONITH
+-------
+
+STONITH is the way a Pacemaker clusters use to be certain that a node is powered
+off. STONITH is the only way to use a shared storage environment without
+worrying about concurrent writes on disks. Inside TripleO environments STONITH
+is a requisite also for activating features like Instance HA because, before
+moving any machine, the system need to be sure that the "move from" machine is
+off.
+STONITH configuration relies on the **instackenv.json** file, used by TripleO
+also to configure Ironic and all the provision stuff.
+Basically this role enables STONITH on the Pacemaker cluster and takes all the
+information from the mentioned file, creating a STONITH resource for each host
+on the overcloud.
+After running this playbook the cluster configuration will have this properties:
+
+    $ sudo pcs property
+    Cluster Properties:
+     cluster-infrastructure: corosync
+     cluster-name: tripleo_cluster
+     ...
+     ...
+     **stonith-enabled: true**
+
+And something like this, depending on how many nodes are there in the overcloud:
+
+    sudo pcs stonith
+     ipmilan-overcloud-compute-0    (stonith:fence_ipmilan):        Started overcloud-controller-1
+     ipmilan-overcloud-controller-2 (stonith:fence_ipmilan):        Started overcloud-controller-0
+     ipmilan-overcloud-controller-0 (stonith:fence_ipmilan):        Started overcloud-controller-0
+     ipmilan-overcloud-controller-1 (stonith:fence_ipmilan):        Started overcloud-controller-1
+     ipmilan-overcloud-compute-1    (stonith:fence_ipmilan):        Started overcloud-controller-1
+
+Having all this in place is a requirement for a reliable HA solution and for
+configuring special OpenStack features like [Instance HA](https://github.com/redhat-openstack/tripleo-quickstart-utils/tree/master/roles/instance-ha).
+
+**Note**: by default this role configures STONITH for all the overcloud nodes,
+but it is possible to limitate it just for controllers, or just for computes, by
+setting the **stonith_devices** variable, which by default is set to "all", but
+can also be "*controllers*" or "*computes*".

 Quickstart invocation
 ---------------------
@ -37,38 +83,11 @@ Basically this command:

 **Important note**

-You might need to export *ANSIBLE_SSH_ARGS* with the path of the *ssh.config.ansible* file to make the command work, like this:
+You might need to export *ANSIBLE_SSH_ARGS* with the path of the
+*ssh.config.ansible* file to make the command work, like this:

    export ANSIBLE_SSH_ARGS="-F /path/to/quickstart/workdir/ssh.config.ansible"

-STONITH configuration
---------------------
-
-STONITH configuration relies on the same **instackenv.json** file used by TripleO to configure Ironic and all the provision stuff.
-Basically this role enable STONITH on the Pacemaker cluster and takes all the information from the mentioned file, creating a STONITH resource for each host on the overcloud.
-After running this playbook th cluster configuration will have this property:
-
-    $ sudo pcs property
-    Cluster Properties:
-     cluster-infrastructure: corosync
-     cluster-name: tripleo_cluster
-     ...
-     ...
-     **stonith-enabled: true**
-
-And something like this, depending on how many nodes are there in the overcloud:
-
-    sudo pcs stonith
-     ipmilan-overcloud-compute-0    (stonith:fence_ipmilan):        Started overcloud-controller-1
-     ipmilan-overcloud-controller-2 (stonith:fence_ipmilan):        Started overcloud-controller-0
-     ipmilan-overcloud-controller-0 (stonith:fence_ipmilan):        Started overcloud-controller-0
-     ipmilan-overcloud-controller-1 (stonith:fence_ipmilan):        Started overcloud-controller-1
-     ipmilan-overcloud-compute-1    (stonith:fence_ipmilan):        Started overcloud-controller-1
-
-Having all this in place is a requirement for a reliable HA solution and for configuring special OpenStack features like [Instance HA](https://github.com/redhat-openstack/tripleo-quickstart-utils/tree/master/roles/instance-ha).
-
-**Note**: by default this role configures STONITH for all the overcloud nodes, but it is possible to limitate it just for controllers, or just for computes, by setting the **stonith_devices** variable, which by default is set to "all", but can also be "*controllers*" or "*computes*".
-
 Limitations
 -----------

@ -86,7 +105,8 @@ The main playbook couldn't be simpler:
      roles:
        - stonith-config

-But it could also be used at the end of a deployment, like the validate-ha role is used in [baremetal-undercloud-validate-ha.yml](https://github.com/redhat-openstack/tripleo-quickstart-utils/blob/master/playbooks/baremetal-undercloud-validate-ha.yml).
+But it could also be used at the end of a deployment, like the validate-ha role
+is used in [baremetal-undercloud-validate-ha.yml](https://github.com/redhat-openstack/tripleo-quickstart-utils/blob/master/playbooks/baremetal-undercloud-validate-ha.yml).

 License
 -------
--- a/roles/validate-ha/README.md
+++ b/roles/validate-ha/README.md
@ -1,22 +1,76 @@
 overcloud-validate-ha
 =====================

-This role acts on an already deployed tripleo environment, testing all HA related functionalities of the installation.
+This role acts on an already deployed tripleo environment, testing all HA
+related functionalities of the installation.

 Requirements
 ------------

-This role must be used with a deployed TripleO environment, so you'll need a working directory of tripleo-quickstart with these files:
+This role must be used with a deployed TripleO environment, so you'll need a
+working directory of tripleo-quickstart or in any case these files available:

 - **hosts**: which will contain all the hosts used in the deployment;
- **ssh.config.ansible**: which will have all the ssh data to connect to the undercloud and all the overcloud nodes;
- A **config file** with a definition for the floating network (which will be used to test HA instances), like this one:
+- **ssh.config.ansible**: which will have all the ssh data to connect to the
+undercloud and all the overcloud nodes;
+- A **config file** with a definition for the floating network (which will be
+used to test HA instances), like this one:

      public_physical_network: "floating"
      floating_ip_cidr: "10.0.0.0/24"
      public_net_pool_start: "10.0.0.191"
      public_net_pool_end: "10.0.0.198"
-      public_net_gateway: "10.0.0.254"    
+      public_net_gateway: "10.0.0.254"
+
+HA tests
+--------
+
+HA tests are meant to check the behavior of the environment in front of
+circumstances that involve service interruption, lost of a node and in general
+actions that stress the OpenStack installation with unexpected failures.
+Each test is associated to a global variable that, if true, makes the test
+happen.
+Tests are grouped and performed by default depending on the OpenStack release.
+This is the list of the supported variables, with test description and name of
+the release on which the test is performed:
+
+- **test_ha_failed_actions**: Look for failed actions (**all**)
+- **test_ha_master_slave**: Stop master slave resources (galera and redis), all
+the resources should come down (**all**)
+- **test_ha_keystone_constraint_removal**: Stop keystone resource (by stopping
+httpd), check no other resource is stopped (**mitaka**)
+- Next generation cluster checks (**newton**, **ocata**, **master**):
+  - **test_ha_ng_a**: Stop every systemd resource, stop Galera and Rabbitmq,
+Start every systemd resource
+  - **test_ha_ng_b**: Stop Galera and Rabbitmq, stop every systemd resource,
+Start every systemd resource
+  - **test_ha_ng_c**: Stop Galera and Rabbitmq, wait 20 minutes to see if
+something fails
+- **test_ha_instance**: Instance deployment (**all**)
+
+It is also possible to omit (or add) tests not made for the specific release,
+using the above vars, like in this example:
+
+    ./quickstart.sh \
+      --retain-inventory \
+      --ansible-debug \
+      --no-clone \
+      --playbook overcloud-validate-ha.yml \
+      --working-dir /path/to/workdir/ \
+      --config /path/to/config.yml \
+      --extra-vars test_ha_failed_actions=false \
+      --extra-vars test_ha_ng_a=true \
+      --release mitaka \
+      --tags all \
+      <VIRTHOST>
+
+In this case we will not check for failed actions (which is test that otherwise
+will be done in mitaka) and we will force the execution of the "ng_a" test
+described earlier, which is originally executed just in newton versions or
+above.
+
+All tests are performed using an external application named
+[tripleo-director-ha-test-suite](https://github.com/rscarazz/tripleo-director-ha-test-suite).

 Quickstart invocation
 ---------------------
@ -43,44 +97,14 @@ Basically this command:

 **Important note**

-If the role is called by itself, so not in the same playbook that already deploys the environment (see [baremetal-undercloud-validate-ha.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud-validate-ha.yml), you need to export *ANSIBLE_SSH_ARGS* with the path of the *ssh.config.ansible* file, like this:
+If the role is called by itself, so not in the same playbook that already
+deploys the environment (see
+[baremetal-undercloud-validate-ha.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud-validate-ha.yml),
+you need to export *ANSIBLE_SSH_ARGS* with the path of the *ssh.config.ansible*
+file, like this:

    export ANSIBLE_SSH_ARGS="-F /path/to/quickstart/workdir/ssh.config.ansible"

-HA tests
--------
-
-Each test is associated to a global variable that, if true, makes the test happen. Tests are grouped and performed by default depending on the OpenStack release.
-This is the list of the supported variables, with test description and name of the release on which test is performed:
-
- **test_ha_failed_actions**: Look for failed actions (**all**)
- **test_ha_master_slave**: Stop master slave resources (galera and redis), all the resources should come down (**all**)
- **test_ha_keystone_constraint_removal**: Stop keystone resource (by stopping httpd), check no other resource is stopped (**mitaka**)
- **Test: next generation cluster checks (**newton**):
-  - **test_ha_ng_a**: Stop every systemd resource, stop Galera and Rabbitmq, Start every systemd resource
-  - **test_ha_ng_b**: Stop Galera and Rabbitmq, stop every systemd resource, Start every systemd resource
-  - **test_ha_ng_c**: Stop Galera and Rabbitmq, wait 20 minutes to see if something fails
- **test_ha_instance**: Instance deployment (**all**)
-
-It is also possible to omit (or add) tests not made for the specific release, using the above vars, like in this example:
-
-    ./quickstart.sh \
-      --retain-inventory \
-      --ansible-debug \
-      --no-clone \
-      --playbook overcloud-validate-ha.yml \
-      --working-dir /path/to/workdir/ \
-      --config /path/to/config.yml \
-      --extra-vars test_ha_failed_actions=false \
-      --extra-vars test_ha_ng_a=true \
-      --release mitaka \
-      --tags all \
-      <VIRTHOST>
-
-In this case we will not check for failed actions (which is test that otherwise will be done in mitaka) and we will force the execution of the "ng_a" test described earlier, which is originally executed just in newton versions or above.
-
-All tests are performed using an external application named [tripleo-director-ha-test-suite](https://github.com/rscarazz/tripleo-director-ha-test-suite).
-
 Example Playbook
 ----------------

@ -93,12 +117,13 @@ The main playbook couldn't be simpler:
      roles:
        - tripleo-overcloud-validate-ha

-But it could also be used at the end of a deployment, like in this file [baremetal-undercloud-validate-ha.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud-validate-ha.yml).
+But it could also be used at the end of a deployment, like in this file
+[baremetal-undercloud-validate-ha.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud-validate-ha.yml).

 License
 -------

-Apache
+GPL

 Author Information
 ------------------