Documentation update

This commit uniforms and updates the documentation of the various roles
inside the repo.

Change-Id: Ib08e0206992bcc454bd632fa85e2c16ce1111fdf
This commit is contained in:
Raoul Scarazzini 2017-03-27 15:01:50 +02:00
parent 79560a3287
commit 5f4aa533f6
5 changed files with 159 additions and 118 deletions

View File

@ -1,15 +1,9 @@
Team and repository tags
========================
Utility roles and docs for tripleo-quickstart
=============================================
[![Team and repository tags](http://governance.openstack.org/badges/tripleo-quickstart-extras.svg)](http://governance.openstack.org/reference/tags/index.html)
<!-- Change things from this point on -->
Extra roles for tripleo-quickstart
==================================
These Ansible role are extending the functionality of tripleo-quickstart to do
end-to-end deployment and testing of TripleO.
These Ansible roles are a set of useful tools to be used on top of TripleO
deployments. They can be used together with tripleo-quickstart (and
[tripleo-quickstart-extras](https://github.com/openstack/tripleo-quickstart-extras).
The documentation of each role is located in the individual role folders, and
general usage information is in the [tripleo-quickstart

View File

@ -12,20 +12,20 @@ Requirements
**Physical switches**
The switch(es) must support VLAN tagging and all the ports must be configured in
trunk, so that the dedicated network interface on the physical host (in the
trunk, so that the dedicated network interface on the physical host (in the
examples the secondary interface, eth1) is able to offer PXE and dhcp to all the
overcloud machines via undercloud virtual machine's bridged interface.
**Host hardware**
The main requirement to make this kind of setup working is to have a host
The main requirement to make this kind of setup working is to have a host
powerful enough to run virtual machines with at least 16GB of RAM and 8 cpus.
The more power you have, the more undercloud machines you can spawn without
having impact on performances.
**Host Network topology**
Host is reachable via ssh from the machine launching quickstart and configured
Host is reachable via ssh from the machine launching quickstart and configured
with two main network interfaces:
- **eth0**: bridged on **br0**, pointing to LAN (underclouds will own an IP to
@ -33,14 +33,14 @@ with two main network interfaces:
- **eth1**: connected to the dedicated switch that supports all the VLANs that
will be used in the deployment;
Over eth1, for each undercloud virtual machine two VLAN interfaces are created,
Over eth1, for each undercloud virtual machine two VLAN interfaces are created,
with associated bridges:
- **Control plane network bridge** (i.e. br2100) built over VLAN interface (i.e.
eth1.2100) that will be eth1 on the undercloud virtual machine, used by
- **Control plane network bridge** (i.e. br2100) built over VLAN interface (i.e.
eth1.2100) that will be eth1 on the undercloud virtual machine, used by
TripleO as br-ctlplane;
- **External network bridge** (i.e. br2105) built over VLAN interface (i.e.
eth1.2105) that will be eth2 on the undercloud virtual machine, used by
- **External network bridge** (i.e. br2105) built over VLAN interface (i.e.
eth1.2105) that will be eth2 on the undercloud virtual machine, used by
TripleO as external network device;
![network-topology](./multi-virtual-undercloud_network-topology.png "Multi Virtual Undercloud - Network Topology")
@ -48,14 +48,14 @@ with associated bridges:
Quickstart configuration
------------------------
Virtual undercloud machine is treated as a baremetal one and the Quickstart
Virtual undercloud machine is treated as a baremetal one and the Quickstart
command relies on the baremetal undercloud role, and its playbook.
This means that any playbook similar to [baremetal-undercloud.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud.yml "Baremetal undercloud playbook") should be okay.
The configuration file has two specific sections that needs attention:
- Additional interface for external network to route overcloud traffic:
```yaml
undercloud_networks:
external:
@ -64,30 +64,30 @@ The configuration file has two specific sections that needs attention:
device_type: ethernet
device_name: eth2
```
**NOTE:** in this configuration eth2 is acting also as a default router for
**NOTE:** in this configuration eth2 is acting also as a default router for
the external network.
- Baremetal provision script, which will be an helper for the
[multi-virtual-undercloud.sh](./multi-virtual-undercloud.sh) script on the <VIRTHOST>:
```yaml
baremetal_provisioning_script: "/path/to/multi-virtual-undercloud-provisioner.sh <VIRTHOST> <DISTRO> <UNDERCLOUD-NAME> <UNDERCLOUD IP> <UNDERCLOUD NETMASK> <UNDERCLOUD GATEWAY> <CTLPLANEV LAN> <EXTERNAL NETWORK VLAN>"
```
The supported parameters, with the exception of VIRTHOST, are the same ones
The supported parameters, with the exception of VIRTHOST, are the same ones
that are passed to the script that lives (and runs) on the VIRTHOST,
*multi-virtual-undercloud.sh*.
This helper script launches the remote command on VIRTHOST host and ensures
This helper script launches the remote command on VIRTHOST host and ensures
that the machine gets reachable via ssh before proceeding.
The multi virtual undercloud script
-----------------------------------
The [multi-virtual-undercloud.sh](./multi-virtual-undercloud.sh) script is
The [multi-virtual-undercloud.sh](./multi-virtual-undercloud.sh) script is
placed on the VIRTHOST and needs these parameters:
1. **DISTRO**: this must be the name (without extension) of one of the images
1. **DISTRO**: this must be the name (without extension) of one of the images
present inside the */images* dir on the VIRTHOST;
2. **VMNAME**: the name of the undercloud virtual machine (the name that will be
used by libvirt);
@ -107,7 +107,7 @@ The script's actions are basically:
installation, it fix size, network interfaces, packages and ssh keys;
3. Create and launch the virtual undercloud machine;
**Note**: on the VIRTHOST there must exist an */images* directory containing
**Note**: on the VIRTHOST there must exist an */images* directory containing
images suitable for the deploy.
Having this directory structure:
@ -142,8 +142,8 @@ this:
$VIRTHOST
```
So nothing different from a normal quickstart deploy command line, the
difference here is made by the config.yml as described above, with its provision
So nothing different from a normal quickstart deploy command line, the
difference here is made by the config.yml as described above, with its provision
script.
Conclusions
@ -152,12 +152,12 @@ Conclusions
This approach can be considered useful in testing multi environments with
TripleO for three reasons:
* It is *fast*: it takes the same time to install the undercloud but less to
* It is *fast*: it takes the same time to install the undercloud but less to
provide it, since you dont have to wait the physical undercloud provision;
* It is *isolated*: using VLANs to separate the traffic keeps each environment
* It is *isolated*: using VLANs to separate the traffic keeps each environment
completely isolated from the others;
* It is *reliable*: you can have the undercloud on a shared storage and think
about putting the undercloud vm in HA, live migrating it with libvirt,
* It is *reliable*: you can have the undercloud on a shared storage and think
about putting the undercloud vm in HA, live migrating it with libvirt,
pacemaker, whatever...
There are no macroscopic cons, except for the initial configuration on the

View File

@ -9,7 +9,7 @@ Requirements
------------
This role must be used with a deployed TripleO environment, so you'll need a
working directory of tripleo-quickstart with the following files:
working directory of tripleo-quickstart or in any case these files available:
- **hosts**: which will contain all the hosts used in the deployment;
- **ssh.config.ansible**: which will have all the ssh data to connect to the
@ -24,9 +24,9 @@ Instance HA
-----------
Instance HA is a feature that gives a certain degree of high-availability to the
instances spawned by an OpenStack deployment. Namely, if a compute node on which an
instance is running breaks for whatever reason, this configuration will spawn the
instances that were running on the broken node onto a functioning one.
instances spawned by an OpenStack deployment. Namely, if a compute node on which
an instance is running breaks for whatever reason, this configuration will spawn
the instances that were running on the broken node onto a functioning one.
This role automates are all the necessary steps needed to configure Pacemaker
cluster to support this functionality. A typical cluster configuration on a
clean stock **newton** (or **osp10**) deployment is something like this:
@ -156,7 +156,8 @@ Where:
[defaults]
roles_path = /path/to/tripleo-quickstart-utils/roles
**hosts** file must be configured with two *controller* and *compute* sections like these:
**hosts** file must be configured with two *controller* and *compute* sections
like these:
undercloud ansible_host=undercloud ansible_user=stack ansible_private_key_file=/path/to/id_rsa_undercloud
overcloud-novacompute-1 ansible_host=overcloud-novacompute-1 ansible_user=heat-admin ansible_private_key_file=/path/to/id_rsa_overcloud
@ -184,7 +185,8 @@ Where:
overcloud-controller-1
overcloud-controller-0
**ssh.config.ansible** can *optionally* contain specific per-host connection options, like these:
**ssh.config.ansible** can *optionally* contain specific per-host connection
options, like these:
...
...

View File

@ -1,16 +1,62 @@
stonith-config
==============
This role acts on an already deployed tripleo environment, setting up STONITH (Shoot The Other Node In The Head) inside the Pacemaker configuration for all the hosts that are part of the overcloud.
This role acts on an already deployed tripleo environment, setting up STONITH
(Shoot The Other Node In The Head) inside the Pacemaker configuration for all
the hosts that are part of the overcloud.
Requirements
------------
This role must be used with a deployed TripleO environment, so you'll need a working directory of tripleo-quickstart with these files:
This role must be used with a deployed TripleO environment, so you'll need a
working directory of tripleo-quickstart or in any case these files available:
- **hosts**: which will contain all the hosts used in the deployment;
- **ssh.config.ansible**: which will have all the ssh data to connect to the undercloud and all the overcloud nodes;
- **instackenv.json**: which must be present on the undercloud workdir. This should be created by the installer;
- **ssh.config.ansible**: which will have all the ssh data to connect to the
undercloud and all the overcloud nodes;
- **instackenv.json**: which must be present on the undercloud workdir. This
should be created by the installer;
STONITH
-------
STONITH is the way a Pacemaker clusters use to be certain that a node is powered
off. STONITH is the only way to use a shared storage environment without
worrying about concurrent writes on disks. Inside TripleO environments STONITH
is a requisite also for activating features like Instance HA because, before
moving any machine, the system need to be sure that the "move from" machine is
off.
STONITH configuration relies on the **instackenv.json** file, used by TripleO
also to configure Ironic and all the provision stuff.
Basically this role enables STONITH on the Pacemaker cluster and takes all the
information from the mentioned file, creating a STONITH resource for each host
on the overcloud.
After running this playbook the cluster configuration will have this properties:
$ sudo pcs property
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: tripleo_cluster
...
...
**stonith-enabled: true**
And something like this, depending on how many nodes are there in the overcloud:
sudo pcs stonith
ipmilan-overcloud-compute-0 (stonith:fence_ipmilan): Started overcloud-controller-1
ipmilan-overcloud-controller-2 (stonith:fence_ipmilan): Started overcloud-controller-0
ipmilan-overcloud-controller-0 (stonith:fence_ipmilan): Started overcloud-controller-0
ipmilan-overcloud-controller-1 (stonith:fence_ipmilan): Started overcloud-controller-1
ipmilan-overcloud-compute-1 (stonith:fence_ipmilan): Started overcloud-controller-1
Having all this in place is a requirement for a reliable HA solution and for
configuring special OpenStack features like [Instance HA](https://github.com/redhat-openstack/tripleo-quickstart-utils/tree/master/roles/instance-ha).
**Note**: by default this role configures STONITH for all the overcloud nodes,
but it is possible to limitate it just for controllers, or just for computes, by
setting the **stonith_devices** variable, which by default is set to "all", but
can also be "*controllers*" or "*computes*".
Quickstart invocation
---------------------
@ -37,38 +83,11 @@ Basically this command:
**Important note**
You might need to export *ANSIBLE_SSH_ARGS* with the path of the *ssh.config.ansible* file to make the command work, like this:
You might need to export *ANSIBLE_SSH_ARGS* with the path of the
*ssh.config.ansible* file to make the command work, like this:
export ANSIBLE_SSH_ARGS="-F /path/to/quickstart/workdir/ssh.config.ansible"
STONITH configuration
---------------------
STONITH configuration relies on the same **instackenv.json** file used by TripleO to configure Ironic and all the provision stuff.
Basically this role enable STONITH on the Pacemaker cluster and takes all the information from the mentioned file, creating a STONITH resource for each host on the overcloud.
After running this playbook th cluster configuration will have this property:
$ sudo pcs property
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: tripleo_cluster
...
...
**stonith-enabled: true**
And something like this, depending on how many nodes are there in the overcloud:
sudo pcs stonith
ipmilan-overcloud-compute-0 (stonith:fence_ipmilan): Started overcloud-controller-1
ipmilan-overcloud-controller-2 (stonith:fence_ipmilan): Started overcloud-controller-0
ipmilan-overcloud-controller-0 (stonith:fence_ipmilan): Started overcloud-controller-0
ipmilan-overcloud-controller-1 (stonith:fence_ipmilan): Started overcloud-controller-1
ipmilan-overcloud-compute-1 (stonith:fence_ipmilan): Started overcloud-controller-1
Having all this in place is a requirement for a reliable HA solution and for configuring special OpenStack features like [Instance HA](https://github.com/redhat-openstack/tripleo-quickstart-utils/tree/master/roles/instance-ha).
**Note**: by default this role configures STONITH for all the overcloud nodes, but it is possible to limitate it just for controllers, or just for computes, by setting the **stonith_devices** variable, which by default is set to "all", but can also be "*controllers*" or "*computes*".
Limitations
-----------
@ -86,7 +105,8 @@ The main playbook couldn't be simpler:
roles:
- stonith-config
But it could also be used at the end of a deployment, like the validate-ha role is used in [baremetal-undercloud-validate-ha.yml](https://github.com/redhat-openstack/tripleo-quickstart-utils/blob/master/playbooks/baremetal-undercloud-validate-ha.yml).
But it could also be used at the end of a deployment, like the validate-ha role
is used in [baremetal-undercloud-validate-ha.yml](https://github.com/redhat-openstack/tripleo-quickstart-utils/blob/master/playbooks/baremetal-undercloud-validate-ha.yml).
License
-------

View File

@ -1,22 +1,76 @@
overcloud-validate-ha
=====================
This role acts on an already deployed tripleo environment, testing all HA related functionalities of the installation.
This role acts on an already deployed tripleo environment, testing all HA
related functionalities of the installation.
Requirements
------------
This role must be used with a deployed TripleO environment, so you'll need a working directory of tripleo-quickstart with these files:
This role must be used with a deployed TripleO environment, so you'll need a
working directory of tripleo-quickstart or in any case these files available:
- **hosts**: which will contain all the hosts used in the deployment;
- **ssh.config.ansible**: which will have all the ssh data to connect to the undercloud and all the overcloud nodes;
- A **config file** with a definition for the floating network (which will be used to test HA instances), like this one:
- **ssh.config.ansible**: which will have all the ssh data to connect to the
undercloud and all the overcloud nodes;
- A **config file** with a definition for the floating network (which will be
used to test HA instances), like this one:
public_physical_network: "floating"
floating_ip_cidr: "10.0.0.0/24"
public_net_pool_start: "10.0.0.191"
public_net_pool_end: "10.0.0.198"
public_net_gateway: "10.0.0.254"
public_net_gateway: "10.0.0.254"
HA tests
--------
HA tests are meant to check the behavior of the environment in front of
circumstances that involve service interruption, lost of a node and in general
actions that stress the OpenStack installation with unexpected failures.
Each test is associated to a global variable that, if true, makes the test
happen.
Tests are grouped and performed by default depending on the OpenStack release.
This is the list of the supported variables, with test description and name of
the release on which the test is performed:
- **test_ha_failed_actions**: Look for failed actions (**all**)
- **test_ha_master_slave**: Stop master slave resources (galera and redis), all
the resources should come down (**all**)
- **test_ha_keystone_constraint_removal**: Stop keystone resource (by stopping
httpd), check no other resource is stopped (**mitaka**)
- Next generation cluster checks (**newton**, **ocata**, **master**):
- **test_ha_ng_a**: Stop every systemd resource, stop Galera and Rabbitmq,
Start every systemd resource
- **test_ha_ng_b**: Stop Galera and Rabbitmq, stop every systemd resource,
Start every systemd resource
- **test_ha_ng_c**: Stop Galera and Rabbitmq, wait 20 minutes to see if
something fails
- **test_ha_instance**: Instance deployment (**all**)
It is also possible to omit (or add) tests not made for the specific release,
using the above vars, like in this example:
./quickstart.sh \
--retain-inventory \
--ansible-debug \
--no-clone \
--playbook overcloud-validate-ha.yml \
--working-dir /path/to/workdir/ \
--config /path/to/config.yml \
--extra-vars test_ha_failed_actions=false \
--extra-vars test_ha_ng_a=true \
--release mitaka \
--tags all \
<VIRTHOST>
In this case we will not check for failed actions (which is test that otherwise
will be done in mitaka) and we will force the execution of the "ng_a" test
described earlier, which is originally executed just in newton versions or
above.
All tests are performed using an external application named
[tripleo-director-ha-test-suite](https://github.com/rscarazz/tripleo-director-ha-test-suite).
Quickstart invocation
---------------------
@ -43,44 +97,14 @@ Basically this command:
**Important note**
If the role is called by itself, so not in the same playbook that already deploys the environment (see [baremetal-undercloud-validate-ha.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud-validate-ha.yml), you need to export *ANSIBLE_SSH_ARGS* with the path of the *ssh.config.ansible* file, like this:
If the role is called by itself, so not in the same playbook that already
deploys the environment (see
[baremetal-undercloud-validate-ha.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud-validate-ha.yml),
you need to export *ANSIBLE_SSH_ARGS* with the path of the *ssh.config.ansible*
file, like this:
export ANSIBLE_SSH_ARGS="-F /path/to/quickstart/workdir/ssh.config.ansible"
HA tests
--------
Each test is associated to a global variable that, if true, makes the test happen. Tests are grouped and performed by default depending on the OpenStack release.
This is the list of the supported variables, with test description and name of the release on which test is performed:
- **test_ha_failed_actions**: Look for failed actions (**all**)
- **test_ha_master_slave**: Stop master slave resources (galera and redis), all the resources should come down (**all**)
- **test_ha_keystone_constraint_removal**: Stop keystone resource (by stopping httpd), check no other resource is stopped (**mitaka**)
- **Test: next generation cluster checks (**newton**):
- **test_ha_ng_a**: Stop every systemd resource, stop Galera and Rabbitmq, Start every systemd resource
- **test_ha_ng_b**: Stop Galera and Rabbitmq, stop every systemd resource, Start every systemd resource
- **test_ha_ng_c**: Stop Galera and Rabbitmq, wait 20 minutes to see if something fails
- **test_ha_instance**: Instance deployment (**all**)
It is also possible to omit (or add) tests not made for the specific release, using the above vars, like in this example:
./quickstart.sh \
--retain-inventory \
--ansible-debug \
--no-clone \
--playbook overcloud-validate-ha.yml \
--working-dir /path/to/workdir/ \
--config /path/to/config.yml \
--extra-vars test_ha_failed_actions=false \
--extra-vars test_ha_ng_a=true \
--release mitaka \
--tags all \
<VIRTHOST>
In this case we will not check for failed actions (which is test that otherwise will be done in mitaka) and we will force the execution of the "ng_a" test described earlier, which is originally executed just in newton versions or above.
All tests are performed using an external application named [tripleo-director-ha-test-suite](https://github.com/rscarazz/tripleo-director-ha-test-suite).
Example Playbook
----------------
@ -93,12 +117,13 @@ The main playbook couldn't be simpler:
roles:
- tripleo-overcloud-validate-ha
But it could also be used at the end of a deployment, like in this file [baremetal-undercloud-validate-ha.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud-validate-ha.yml).
But it could also be used at the end of a deployment, like in this file
[baremetal-undercloud-validate-ha.yml](https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/baremetal-undercloud-validate-ha.yml).
License
-------
Apache
GPL
Author Information
------------------