Commit Graph

260 Commits

Author SHA1 Message Date
Monty Taylor 5730c2993a Retire stackforge/tripleo-ansible 2015-10-17 16:05:04 -04:00
Jenkins bd0346b55b Merge "Add basic test script" 2015-02-04 20:21:32 +00:00
Julia Kreger 50fbbca362 Add basic test script
Initial version of a test script that can be utilized to help
execute and validate the update_cloud.yml playbook repeatedly.

Change-Id: I23948bb28a26c545ff2d5236a3efcd44bbf7f79e
2015-01-29 14:14:11 -08:00
Julia Kreger 0175c378eb Update online update for newer Ansible and base playbooks
Converted the previously working string checks to boolean checks
so Ansible will properly act upon the plays instead of skip them.

Added code to set the instance_rebuilt fact which is checked in
the steps to execute os-collect-config to prevent harm to a
running system.

Change-Id: I91e1fa822655056ceb88a860367ca40183d1db58
2015-01-29 12:17:49 -05:00
Julia Kreger 2f1e7ee4e1 Correct syntax issues in glance_download module
Correct syntax issues in the glance download module that caused
the module to fail initial validation.

Change-Id: Id69a07fafd7dee6125e193a3bb0ccb2c585bbfe9
2015-01-29 12:17:49 -05:00
Julia Kreger 4aeb966792 Fix bootstrap node service stop
The boostrap node service stop configuration was previously
hardcoded to work only on Helion.

Changes the configuration and process list for upstream use.

Change-Id: I9d57d391f91967ba8c1daf696a354af49c6a2371
2015-01-27 21:22:57 +00:00
Julia Kreger 1e5810fa14 Prevent unintended MySQL exit
If the playbook is run without nova metadata tags having been set,
the playbook would automatically disable os-collect-config on all
known, but no unknown nodes.  The downside of this is that can cause
MySQL to erroneously exit for no apparent reason.

Changed to list the explicit node types instead of all, and changed
the command statement to utilize a shell as that appears to be more
reliable.

Change-Id: Iccb949ac3490980da17e80fc0b0704558fa51f8a
2015-01-27 21:20:25 +00:00
Julia Kreger 16a93580d2 Remove references to VSA nodes
Removing references to VSA node types as they are helion specific
node types that do not exist downstream.

Change-Id: I93a2b9b7c259c60dfa7c82ded7fe5895cac19a75
2015-01-27 21:18:15 +00:00
Julia Kreger 0304253fe2 Enable non-local IP binds for HAProxy
The configuraiton utilized for HAProxy involves it binding to a
virtual IP address that is shared amongst the controller nodes.

As such, the ability to bind sockets to non-local IP addresses
is required so HAProxy can be ready to take traffic should the
virtual IP address move.

Changed playbook to explicity set sysctl net.ipv4.ip_nonlocal_bind
to a value of 1 on the bootstrap node.

Change-Id: Idf92ae6593141343b070803e40cbce6960accc60
2015-01-27 21:16:25 +00:00
Stephanie Miller 01872dda66 Check that disk variables are set
If scripts/populate_image_vars is not run or if appropriate
variables are not passed on the command line to ansible-playbook,
the upgrade can fail in an unrecoverable state. To avoid this,
check that the variables are set as part of the pre-flight check
and also at the start of update_cloud.yml.

(Also fix a minor typo in scripts/populate_image_vars.)

Change-Id: I14e588fd83cc36d58ffdea64a43b7d19b7c709e2
2015-01-27 13:11:41 -08:00
Julia Kreger c062995b00 Revise details on use of populate_image_vars
Revised description on the use of populate_image_vars, added
corresponding glance image-list and populate_image_vars output,
and added additional detail while cleaning up whitespace.

Change-Id: I3ea31e2e3e5562ef14161e9de5bab59bbb60fde2
2015-01-27 12:56:18 -08:00
Julia Kreger 892efc1461 Remount ephemeral disk if not rebuilt
Setting the ephemeral disk to be remounted in the event that a
rebuild would not have occured, in order to allow the upgrade
sequence to bring the host back into a running state.

Additionally record if the instance was rebuilt, and based upon
that skip re-applying configuration files.

Change-Id: I9c97f8c4d0a093c7129796de32eac3da65522319
2015-01-27 12:56:18 -08:00
Julia Kreger 13d2472fa4 Updating documentation for environment setup
Updated to remove outdated list of specific reviews and replaced
the block of text with a more generic list detailing required disk
image elements.

Change-Id: I0a7083a48441c8a140327e5f944664e4e0fc26bc
2015-01-27 14:12:02 -05:00
Julia Kreger 12644c84bb Correct incorrect syntax on alternate logic path
Corrected incorrect syntax in the automatic rabbitmq restart portion
of the pre-flight check as status was incorrectly listed instead of
state which is what is required.

Change-Id: Icea7dfa605a58498d85724da172b70f488e07c6d
2015-01-27 18:07:59 +00:00
Julia Kreger 59ed3a48a5 Correct RabbitMQ check/start logic
Corrected RabbitMQ start logic so an attempt at starting RabbitMQ
will be made by default.

Change-Id: I4825566b26a24d4a2f35f391c390b3d42d250864
2015-01-27 18:07:32 +00:00
Julia Kreger 4f0b5ea593 Retry rebuild status check once
In a VM testing environment, there is a possibility for transient
network connectivity errors to cause an exception for the rebuild
status check, where in reality it will continue without issues.

We will now attempt to retry once in the event of an exception
occuring.

Change-Id: I3e4d841a17aa53362710b32e86ee24f0c32a27da
2015-01-27 09:43:32 -08:00
Jenkins 5a3ed47d42 Merge "Reinstate multiple execution attempts of os-collect-config" 2015-01-27 01:45:07 +00:00
Julia Kreger 90064d5471 Reinstate multiple execution attempts of os-collect-config
Downstream opted to only ever attempt to execute os-collect-config
once, however upstream naturally has slight differences that may
necessitate more than one os-collect-config run to complete node
configuration.

This change reinstates the logic to attempt os-collect-config
a second time if it fails the first time which was first introduced
in I74ff31b52367fc0279ffa43a3724e44a2578ced3.

Change-Id: I7470ca3a5b0dffa844008e5550281d98f1b137a6
2015-01-26 10:01:09 -05:00
Julia Kreger 706138e36d Remove invalid os-collect-config call
Removed invalid direct call to os-collect-config that appears
to have made it past a previous merge conflict as os-collect-config
cannot be invoked again until the sentinel file is removed which is
handled later on in the sequence.

Change-Id: I41a98a4ec7a9bc3c8821121162fa4b971ac8ae01
2015-01-26 14:43:34 +00:00
Julia Kreger 45326b5fd7 Make os-collect-config log location a variable
Updated playbook configuration to make location of the
os-collect-config log file a variable so it can be configured
easily and is in sync current upstream TripleO logging configuration.

Change-Id: I6399aa13d5a7dfc1a6da28e67fc39560fdd51374
2015-01-26 09:41:57 -05:00
Julia Kreger 7667340292 Add configuration variable for os-config-refresh folder
As a result of a change in upstream TripleO where the location of
os-config-refresh files changed from /opt/stack/os-config-refresh
to /usr/libexec/os-refresh-config, we have changed this to be
an option that can be set by the user.

Change-Id: I1e2d756753ae1f4362db4da9463c6346caeaa725
2015-01-26 09:41:42 -05:00
Julia Kreger b245c1c8b0 Cleanup error reporting for heat.py
Cleaned up error reporting to return the single exception message
instead of a stack trace to help users more clearly identify what
the issue may be.

Change-Id: I79b6b248d706d74936abf83ffa073ac4770e3ef3
2015-01-24 12:32:51 -08:00
Julia Kreger 88efc04db6 Extend timeout to cope with systems under load
Extending the Ansible and rebuild timeouts in order to better cope
with systems under load, primarily in virtual testing environments.

Change-Id: I258ab59c8fd7b2acbede43c71c1d5c7184b93c91
2015-01-24 12:28:51 -08:00
Julia Kreger 679797cb1d Inject firewall rules so RabbitMQ cluster starts
Adding firewall rules to make 4356 , 5672, and 61000 reachable so a
new RabbitMQ cluster node can join the cluster when the on-boot
firewall rules lack and thus block RabbitMQ from starting.

Change-Id: I83846fe9fd4488abc3aa28b73c1666b1d2aa2781
2015-01-24 12:27:47 -08:00
Julia Kreger 64e687fd0e Ensure /var/run/rabbitmq is present
Ensure the /var/run/rabbitmq folder is present prior to proceeding.

Change-Id: I623e91aac27a2c841e19ffd7891ab0775e268d03
2015-01-24 12:26:20 -08:00
Julia Kreger df6ae40cbe Assert firewall rules mysql to rejoin cluster
Adding iptables rules to permit access to TCP ports 4444 and 4567,
4568 which are presumably added by the o-r-c scripts, but are
required for controllers to receive state transfers and restart.

Change-Id: If3c74d6d52975d8f352a85c728317804407e914c
2015-01-24 12:24:34 -08:00
Julia Kreger b0ee862931 Preserve/Restore iscsi initiatorname file
Added logic to preserve and restore the iscsi initiator name file
as without it open-iscsi will automatically generate a new file
upon restarting which it will utilize to identify itself.

Change-Id: I2bba352ca990c4b96e390103afc684261e745eac
2015-01-24 12:19:19 -08:00
Cian O'Driscoll 35459afc5d Shutdown ephemeral-ca before rebuild
Ephemeral-CA service has been added to the undercloud and
overcloud nodes. Disable this service during shutdown and
start post rebuild.

Change-Id: I9e320776749f56520819398cf9aab31be9f88613
2015-01-24 12:17:46 -08:00
Julia Kreger 2069cc5899 Utilize ansible module for rabbitmq start
Converts a rabbitmq-server start command to utilize the ansible
service module.  This is so if it is already running, there is not
an erroneous failure.

Change-Id: Ie68a2b7bcd2ae18d4d1058340ed580fa57a13499
2015-01-24 12:16:02 -08:00
Julia Kreger dbce4d3afc Use virt module to abort if vms are running
Added check to call shutdown a second time, but at the same time
utilize the collected fact to help ensure that virtual machines
have been stopped

If the ansible module indicates that it had to send a shutdown
command after sixty seconds, then the upgrade is aborted and a
message is reported directing the user to shut down their virtual
machines and retry.

Change-Id: I27f32765cc08197e91a2fd740d18289ee0fd342e
2015-01-24 12:14:08 -08:00
Julia Kreger 35fd0a5fa7 Fix to os-collect-config run logic
Rebased previous state to merge in behavior in change
I2dd21f115c3425e68fe37e2235ab6a8ff56fa43a coupled with logic fix
to prevent un-necessary exectuion of os-collect-config.

Corrected the grep command with a simpler search as the more complex
search was not returning results.  Corrected logic utilizing fact set
as part of the grep command as the string was being converted to a
boolean value and thus could only be compared as such.

Change-Id: I8c88d5b5a20bca5be1dd66495fd708871c6a684e
2015-01-24 12:12:32 -08:00
Stephanie Miller 29067f3a8a Only check number of controllers where appropriate
The pre-flight check was broken because we attempted to run a
check for the number of controller nodes in environments where we
don't have controller nodes. Adjust the hosts we run the check on
to avoid failures. Also fix a few incidental typos.

Change-Id: I1786243df0b481f9922b7c59c6ec4711f2e41cb2
2015-01-24 12:08:37 -08:00
Stephanie Miller 4c04e6dbd7 Fix reporting of image classes for undercloud
The current populate_image_vars script fails when run on the undercloud
because the undercloud instance does not have a subclass (e.g. compute)
in its name. Modify the script to accept this, while still preserving
its ability to deal with build numbers in the name.

Change-Id: I4a19c7c00594e8d6224649ad6b1374e4f91a4910
2015-01-24 12:01:20 -08:00
Cian O'Driscoll 74ea04fad8 Disable o-c-c on start-up after reimage
O-C-C was runnning multiple times on bootup after a
re-image. This causes services to die during startup breaking
services like neutron. This change disables o-c-c after it
has done the key restoration. We then start it when we want it.

Change-Id: I2dd21f115c3425e68fe37e2235ab6a8ff56fa43a
2015-01-23 23:15:49 +00:00
Cian O'Driscoll 1398f892f5 Support multiple libvirt daemon names
With the latest roll of the hlinunx repo the name of the libvirt
daemon has changed to libvirtd. Support this new name as well as
the old libvirt-bin

Change-Id: I8b8a5bb987d93e3f62d11d6fa2e82e6e351d80b2
2015-01-13 16:51:10 -05:00
Julia Kreger 2178dc867f Stop neutron-ns-metadata-proxy processes
Neutron metadata proxy processes can also appear on controller nodes,
and as such need to be stopped as well before file systems can be unmounted.

Change-Id: Ib262924c902ced4846b373d75bb25e8aa6397d1f
2015-01-13 16:50:13 -05:00
Julia Kreger af21dc63b2 Delete _ssh_host_key folder should it already exist
Delete the ssh host key folder should it already exist for some
unknown reason.

Change-Id: I36b2a1a7f201eee1a5296ea4f1e697c1caa82a32
2015-01-13 16:45:55 -05:00
Julia Kreger 487795e64b Make os-collect-config executions kinder
Added logic to only execute os-collect-config IF necessary, and
allow extra timing for compute nodes since they are sensitive to
service restarts.

Change-Id: I74ff31b52367fc0279ffa43a3724e44a2578ced3
2015-01-13 16:45:55 -05:00
Julia Kreger fe330650ff Minor bugfix to remove un-neceessary fact collection
Removed fact collection from hooks as it is an un-necessary action.

Change-Id: Ic40d1b85bf19717a92d1db24109f049f3fca0a7f
2015-01-13 16:45:54 -05:00
Cian O'Driscoll 82eb9d86ec Re-activate vg and lvs on rebuild
Re-creates the loopback devices on rebuild/update.
Re-enables volume groups and also logical volumes

Change-Id: Ib319eebecfd29c71bc346c48ed47525552d54e94
2015-01-13 16:45:54 -05:00
Jenkins a60e01c0da Merge "Fix populate_image_vars for bootstrap node" 2015-01-13 21:33:18 +00:00
Julia Kreger 2c7ec6fd02 Fix populate_image_vars for bootstrap node
The bootstrap node requires special logic but the same disk image
variable as normal controller nodes.  Changed logic to create symlink if
the controller variable file is present and a symlink is not
already present.

Change-Id: I3388cf5070e7f0a747feadc860d444eda42c78fe
2015-01-13 16:25:00 -05:00
Jenkins d80e259cb6 Merge "Tag Helion specific MySQL action as such" 2015-01-13 20:41:27 +00:00
Jenkins e959950f9a Merge "Exclude unknown host" 2015-01-13 20:40:12 +00:00
Jenkins 43645ea579 Merge "Fixing Galera Status checks for Upstream" 2015-01-13 20:39:49 +00:00
Jeremy Stanley 3c1648af74 Workflow documentation is now in infra-manual
Replace URLs for workflow documentation to appropriate parts of the
OpenStack Project Infrastructure Manual.

Change-Id: Id1f84178e4fee087c50c220643181d4365d0d013
2014-12-05 03:30:47 +00:00
Julia Kreger ee6e6af78e Tag Helion specific MySQL action as such
One of the MySQL steps is specific to Helion and cannot be
run on an Ubuntu based upstream deployment.

Change-Id: Ia793b6b5734df05dd6d86febcbff948031deed44
2014-11-20 17:37:59 -05:00
Julia Kreger 3d4999b761 Exclude unknown host
As part of the metadata tagging, we tag unknown hosts as unknown.
We should not attempt to act upon them.

Change-Id: I795f7a65ae4b8a8c5444fc0bba1408cf9b19640e
2014-11-20 17:26:33 -05:00
Julia Kreger 83d26aa930 Fixing Galera Status checks for Upstream
We had a defect based on logic from the original development of the
playbook that needed to be corrected, as register replaces variables
even if the step does not run, so a set_fact check must be used to
store the information if required to be continued onwards for
later conditional checks.

Change-Id: Id93667ffd8a2546ed767cf89a03549b1c491835c
2014-11-20 17:11:11 -05:00
Julia Kreger 9bda4589a0 Minor conditional fix for disk space check
Localhost does not have an instance_status variable, so it does
not make sense to have a conditional for it when checking disk
space.

Change-Id: I9fe8305b0b206fb5a78ab2350f6183a6fb74d5d0
2014-11-19 16:51:10 -05:00