Series Upgrade Documentation
Change-Id: I0cd9c2e96d24e9d357da43e71955505b9069423d
This commit is contained in:
parent
d7cb32c135
commit
11ab6dfc0c
|
@ -0,0 +1,476 @@
|
|||
Appendix F: Series Upgrade
|
||||
==============================
|
||||
|
||||
Introduction
|
||||
++++++++++++
|
||||
|
||||
Juju and OpenStack charms provide the primitives to prepare for and
|
||||
respond to an upgrade from one Ubuntu LTS series to another.
|
||||
|
||||
|
||||
Warnings
|
||||
++++++++
|
||||
|
||||
Upgrading a single machine from one LTS to another is a complex task.
|
||||
Doing so on a running OpenStack cloud is an order of magnitude more
|
||||
complex.
|
||||
|
||||
Please read through this document thoroughly before attempting a series
|
||||
upgrade. Please pay particular attention to the Assumptions section and
|
||||
the order of operations.
|
||||
|
||||
The series upgrade should be executed by an administrator or team of
|
||||
administrators who are intimately familiar with the cloud undergoing
|
||||
upgrade, OpenStack in general, working with Juju and OpenStack charms.
|
||||
|
||||
The tasks of preparing stateful OpenStack services for series upgrade is
|
||||
not automated and is the responsibility of the administrator. For
|
||||
example: evacuating a compute node, switching HA routers to a network
|
||||
node, any storage rebalancing that may be required.
|
||||
|
||||
The actual task of executing the do-release-upgrade on an individual
|
||||
machine is not automated. It will be performed by the administrator. Any
|
||||
bespoke preparation for or cleanup after the do-release-upgrade is the
|
||||
responsibility of the administrator.
|
||||
|
||||
The series upgrade process requires API downtime. Although the goal is
|
||||
minimal downtime, it is necessary to pause services to avoid race
|
||||
condition errors. Therefore, the API undergoing upgrade will require
|
||||
downtime.
|
||||
|
||||
Stateful services which OpenStack depends on such as percona-cluster and
|
||||
rabbitmq will affect all APIs during series upgrade and therefore
|
||||
require downtime.
|
||||
|
||||
Third party charms may not have implemented series upgrade yet. Please
|
||||
pay particular attention to SDN and storage charms which may affect
|
||||
cloud operation.
|
||||
|
||||
If the architecture and layout of charms does not match the assumptions
|
||||
section of this document, great care needs to be taken to avoid problems
|
||||
with application leadership across machines. In other words, if most
|
||||
services are not in LXD containers, it is possible to have the leader of
|
||||
percona-cluster on one host and the leader of rabbit on another causing
|
||||
complication's in the procure for series upgrade.
|
||||
|
||||
Test, test, test! The series upgrade process should be tested on a
|
||||
non-production cloud that closely resembles the eventual production
|
||||
environment. Not only does this validate the software involved but it
|
||||
prepares the administrator for the complex task ahead.
|
||||
|
||||
|
||||
Juju
|
||||
++++
|
||||
|
||||
Please read all Juju documentation on the series upgrade feature.
|
||||
|
||||
https://docs.jujucharms.com/devel/en/getting-started
|
||||
|
||||
.. note::
|
||||
The Juju upgrade-series command operates on the machine level. This
|
||||
document will be focused on applications as many require pausing their
|
||||
peers and some subordinates. But it is important to remember the whole
|
||||
machine is upgraded.
|
||||
|
||||
Applications deployed in a LXD container are considered a machine apart
|
||||
from the physical host machine the container is hosted on.
|
||||
|
||||
Upgrading the host machine will not upgrade the LXD contained machines.
|
||||
However, when the required post-upgrade reboot of the host machine
|
||||
occurs all the services contained in LXD containers will be unavailable
|
||||
during the reboot.
|
||||
|
||||
For example a physical host with nova-compute, neutron-openvswitch and
|
||||
ceph-osd colocated as well as hosting a keystone unit in a LXD. When
|
||||
the juju upgrade-series prepare command is executed on the machine,
|
||||
nova-compute, neutron-openvswitch and ceph-osd will execute their
|
||||
pre-series-upgrade hooks but keystone will not. Nor will the LXD
|
||||
operating system be affected by the do-release-upgrade on the host. At
|
||||
reboot however, the keystone unit will be unavailable during the
|
||||
duration of the reboot. Please plan accordingly.
|
||||
|
||||
|
||||
Assumptions
|
||||
+++++++++++
|
||||
|
||||
This document makes a number of assumptions about the architecture and
|
||||
preparation of the cloud undergoing series upgrade. Please review these
|
||||
and compare to the running cloud before performing the series upgrade.
|
||||
|
||||
|
||||
Preparations
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Charms are upgraded to the latest release.
|
||||
|
||||
OpenStack is upgraded to the highest version the current LTS supports.
|
||||
Mitaka for Trusty and Queens for Xenial.
|
||||
|
||||
The current Ubuntu operating system is up to date prior to do-release-upgrade.
|
||||
|
||||
Stateful services have been backed up. Percona-cluster and mongodb
|
||||
should be backed up prior to upgrading.
|
||||
|
||||
General cloud health. Confirm the cloud is fully operational before
|
||||
beginning a series upgrade.
|
||||
|
||||
OpenStack charms health. No charms are in hook error. Confirm the health
|
||||
of the juju environment before beginning series upgrade.
|
||||
|
||||
Per machine preparations. Individual compute nodes are evacuated prior
|
||||
to series upgrade. HA routers are moved to network nodes not undergoing
|
||||
series upgrade.
|
||||
|
||||
|
||||
Hyper-Converged Architecture
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Compute, storage and their subordinates may be colocated.
|
||||
|
||||
API Services are deployed in LXD containers.
|
||||
|
||||
Percona-cluster is deployed in a LXD container.
|
||||
|
||||
Rabbitmq is deployed in a LXD container.
|
||||
|
||||
Third party charms either do not exist or have been thoroughly tested
|
||||
for series upgrade.
|
||||
|
||||
No other non-subordinate charms are colocated on the same machine.
|
||||
|
||||
|
||||
Overview
|
||||
++++++++
|
||||
|
||||
.. note::
|
||||
This overview is not a substitute for understanding the
|
||||
entirety of this document. It is the general case but the individual
|
||||
details matter. Read "where appropriate" at the end of each step.
|
||||
|
||||
Evacuate or otherwise prepare the machine
|
||||
|
||||
Pause hacluster for non-leader units not undergoing upgrade
|
||||
|
||||
Pause non-leader peer units not undergoing upgrade
|
||||
|
||||
Juju upgrade-series prepare the leader's machine
|
||||
|
||||
Execute do-release-upgrade and any post-upgrade operating system tasks
|
||||
|
||||
Reboot
|
||||
|
||||
Set openstack-origin or source for new operating system ("distro")
|
||||
|
||||
Juju upgrade-series complete the machine
|
||||
|
||||
Repeat the steps from prepare to complete for the non-leader machines
|
||||
|
||||
Perform any cluster completed upgrade tasks after all units of
|
||||
application have been upgraded.
|
||||
|
||||
Juju set-series to the new series for all future units of an application.
|
||||
|
||||
Exceptions
|
||||
~~~~~~~~~~
|
||||
|
||||
This overview describes the general case that includes the API charms,
|
||||
percona culster and rabbitmq.
|
||||
|
||||
The notable exceptions are nova-compute, ceph-mon and ceph-osd which
|
||||
do not require pausing of any units and unit leadership is irrelevant.
|
||||
|
||||
|
||||
Example as code
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
Attempting an automated series upgrade on a running production cloud is
|
||||
not recommended. The following example-as-code encapsulates the
|
||||
processes described in this document, and are provided solely to
|
||||
illustrate the methods used to develop and test the series upgrade
|
||||
primitives. The example code should not be consumed in an automation
|
||||
outside of its intended use case (charm dev/test gate automation).
|
||||
|
||||
https://github.com/openstack-charmers/zaza/blob/master/zaza/charm_tests/series_upgrade/tests.py
|
||||
|
||||
https://github.com/openstack-charmers/zaza/blob/master/zaza/utilities/generic.py#L173
|
||||
|
||||
|
||||
Procedures
|
||||
++++++++++
|
||||
|
||||
The following procures are broken up into categories of charms that
|
||||
follow the same procedure.
|
||||
|
||||
.. note::
|
||||
Example commands used in this documentation assume a Trusty to Xenial
|
||||
series upgrade, the same approach is used for Xenial to Bionic
|
||||
series upgrades. Unit and machine numbers are examples only they will
|
||||
differ from site to site. For example the machine number 0 is reused
|
||||
purely for example purposes.
|
||||
|
||||
|
||||
Physical Host Nodes
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Procedure for the physical host nodes which may include nova-compute,
|
||||
neutron-openvswitch and ceph-osd as well as neutron-gateway. Though
|
||||
ceph-mon is most often deployed in LXD containers it follows this
|
||||
procedure.
|
||||
|
||||
.. note::
|
||||
Nova-compute and ceph-osd are used in the commands below for
|
||||
example purposes. In this example, physical host where
|
||||
nova-compute/0 and ceph-osd/0 are deployed is machine 0.
|
||||
|
||||
Evacuate or otherwise prepare the machine
|
||||
For compute nodes move all running VMs off the physical host.
|
||||
For network nodes force HA routers off of the current node.
|
||||
Any storage related tasks that may be required.
|
||||
Any site specific tasks that may be required.
|
||||
|
||||
|
||||
Juju upgrade-series prepare the machine
|
||||
.. code:: bash
|
||||
|
||||
juju upgrade-series prepare 0 xenial
|
||||
|
||||
.. note::
|
||||
The upgrade-series prepare command causes all the charms on the given
|
||||
machine to run their pre-series-upgrade hook. For most cases with the
|
||||
OpenStack charms this pauses the unit. At the completion of the
|
||||
pre-series-upgrade hook the workload status should be "blocked" with
|
||||
the message "Ready for do-release-upgrade and reboot."
|
||||
|
||||
Execute do-release-upgrade and any post-upgrade operating system tasks
|
||||
The do-release-upgrade process is performed by the administrator. Any
|
||||
post do-release-upgrade tasks are also the responsibility of the
|
||||
administrator.
|
||||
|
||||
Reboot
|
||||
Post do-release-upgrade reboot executed by the administrator.
|
||||
|
||||
Set openstack-origin or source for new operating system ("distro")
|
||||
This step is required and should occur before the first node is
|
||||
completed.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
juju config nova-compute openstack-origin=distro
|
||||
juju config ceph-osd source=distro
|
||||
|
||||
|
||||
Juju upgrade-series complete the machine
|
||||
.. code:: bash
|
||||
|
||||
juju upgrade-series complete 0
|
||||
|
||||
.. note::
|
||||
|
||||
The upgrade-series complete command causes all the charms on the given
|
||||
machine to run their post-series-upgrade hook. For most cases with the
|
||||
OpenStack charms this re-writes configuration files and resumes the unit.
|
||||
At the completion of the post-series-upgrade hook the workload status
|
||||
should be "active" with the message "Unit is ready."
|
||||
|
||||
Juju set-series to the new series for all future units of an application.
|
||||
To guarantee that any future unit-add commands create new
|
||||
instantiations of the application on the correct series it is necessary
|
||||
to set the series on the application.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
juju set-series nova-compute xenial
|
||||
juju set-series neutron-openvswitch xenial
|
||||
juju set-series ceph-osd xenial
|
||||
|
||||
|
||||
Repeat the procedure for all physical host nodes.
|
||||
It is not necessary to repeat the set openstack-origin step.
|
||||
|
||||
|
||||
|
||||
Stateful Services
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
Procedure for the stateful services deployed on LXD containers.
|
||||
These include percona-cluster and rabbitmq.
|
||||
|
||||
|
||||
.. note::
|
||||
While percona-cluster is often deployed with hacluster for HA,
|
||||
rabbitmq is not. Ignore the hacluster steps for rabbitmq.
|
||||
Likewise no backup is required of rabbitmq. Percona-cluster is used
|
||||
below for example purposes. In this example, the LXD container the
|
||||
leader node of percona-cluster/0 is deployed on is machine 0.
|
||||
|
||||
|
||||
Prepare the machine
|
||||
Perform backups of percona-cluster and scp the backup to a secure
|
||||
location.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
juju run-action percona-cluster/0 backup
|
||||
juju scp -- -r percona-cluster/0:/opt/backups/mysql /path/to/local/backup/dir
|
||||
|
||||
|
||||
Pause hacluster for non-leader units not undergoing upgrade
|
||||
.. code:: bash
|
||||
|
||||
juju run-action percona-cluster-hacluster/1 pause
|
||||
juju run-action percona-cluster-hacluster/2 pause
|
||||
|
||||
|
||||
Pause non-leader peer units not undergoing upgrade
|
||||
.. code:: bash
|
||||
|
||||
juju run-action percona-cluster/1 pause
|
||||
juju run-action percona-cluster/2 pause
|
||||
|
||||
|
||||
Juju upgrade-series prepare the leader's machine
|
||||
.. code:: bash
|
||||
|
||||
juju upgrade-series prepare 0 xenial
|
||||
|
||||
.. note::
|
||||
The upgrade-series prepare command causes all the charms on the given
|
||||
machine to run their pre-series-upgrade hook. For most cases with the
|
||||
OpenStack charms this pauses the unit. At the completion of the
|
||||
pre-series-upgrade hook the workload status should be "blocked" with
|
||||
the message "Ready for do-release-upgrade and reboot."
|
||||
|
||||
Execute do-release-upgrade and any post-upgrade operating system tasks
|
||||
The do-release-upgrade process is performed by the administrator. Any
|
||||
post do-release-upgrade tasks are also the responsibility of the
|
||||
administrator.
|
||||
|
||||
Reboot
|
||||
Post do-release-upgrade reboot executed by the administrator.
|
||||
|
||||
Set openstack-origin or source for new operating system ("distro")
|
||||
This step is required and should occur before the first node is
|
||||
completed but after the other units are paused.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
juju config percona-cluster source=distro
|
||||
|
||||
|
||||
Juju upgrade-series complete the machine
|
||||
.. code:: bash
|
||||
|
||||
juju upgrade-series complete 0
|
||||
|
||||
.. note::
|
||||
|
||||
The upgrade-series complete command causes all the charms on the given
|
||||
machine to run their post-series-upgrade hook. For most cases with the
|
||||
OpenStack charms this re-writes configuration files and resumes the unit.
|
||||
At the completion of the post-series-upgrade hook the workload status
|
||||
should be "active" with the message "Unit is ready."
|
||||
|
||||
Repeat the procedure for non-leader nodes
|
||||
It is not necessary to repeat the set openstack-origin step.
|
||||
|
||||
Perform any cluster completed upgrade tasks after all units of application have been upgraded.
|
||||
Run the complete-cluster-series-upgrade action on the leader node. This
|
||||
action informs each node of the cluster the upgrade process is complete
|
||||
cluster wide. This also updates mysql configuration with all peers in
|
||||
the cluster.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
juju run-action percona-cluster/0 complete-cluster-series-upgrade
|
||||
|
||||
Juju set-series to the new series for all future units of an application.
|
||||
To guarantee that any future unit-add commands create new
|
||||
instantiations of the application on the correct series it is necessary
|
||||
to set the series on the application.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
juju set-series percona-cluster xenial
|
||||
|
||||
|
||||
API Services
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Procedure for the API services in LXD containers. These include but are
|
||||
not limited to keystone, glance, cinder, neutron-api and
|
||||
nova-cloud-controller. Any subordinates deployed with these applications
|
||||
will be upgraded at the same time.
|
||||
|
||||
.. note::
|
||||
Keystone is used in the commands below for example purposes. In this
|
||||
example, the LXD container the leader node of keystone/0 is deployed
|
||||
on is machine 0.
|
||||
|
||||
|
||||
Pause hacluster for non-leader units not undergoing upgrade
|
||||
.. code:: bash
|
||||
|
||||
juju run-action keystone-hacluster/1 pause
|
||||
juju run-action keystone-hacluster/2 pause
|
||||
|
||||
|
||||
Pause non-leader peer units not undergoing upgrade
|
||||
.. code:: bash
|
||||
|
||||
juju run-action keystone/1 pause
|
||||
juju run-action keystone/2 pause
|
||||
|
||||
|
||||
Juju upgrade-series prepare the leader's machine
|
||||
.. code:: bash
|
||||
|
||||
juju upgrade-series prepare 0 xenial
|
||||
|
||||
.. note::
|
||||
The upgrade-series prepare command causes all the charms on the given
|
||||
machine to run their pre-series-upgrade hook. For most cases with the
|
||||
OpenStack charms this pauses the unit. At the completion of the
|
||||
pre-series-upgrade hook the workload status should be "blocked" with
|
||||
the message "Ready for do-release-upgrade and reboot."
|
||||
|
||||
Execute do-release-upgrade and any post-upgrade operating system tasks
|
||||
The do-release-upgrade process is performed by the administrator. Any
|
||||
post do-release-upgrade tasks are also the responsibility of the
|
||||
administrator.
|
||||
|
||||
Reboot
|
||||
Post do-release-upgrade reboot executed by the administrator.
|
||||
|
||||
Set openstack-origin or source for new operating system ("distro")
|
||||
This step is required and should occur before the first node is
|
||||
completed but after the other units are paused.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
juju config keystone source=distro
|
||||
|
||||
|
||||
Juju upgrade-series complete the machine
|
||||
.. code:: bash
|
||||
|
||||
juju upgrade-series complete 0
|
||||
|
||||
.. note::
|
||||
|
||||
The upgrade-series complete command causes all the charms on the given
|
||||
machine to run their post-series-upgrade hook. For most cases with the
|
||||
OpenStack charms this re-writes configuration files and resumes the unit.
|
||||
At the completion of the post-series-upgrade hook the workload status
|
||||
should be "active" with the message "Unit is ready."
|
||||
|
||||
Repeat the procedure for non-leader nodes
|
||||
It is not necessary to repeat the set openstack-origin step.
|
||||
|
||||
Juju set-series to the new series for all future units of an application.
|
||||
To guarantee that any future unit-add commands create new
|
||||
instantiations of the application on the correct series it
|
||||
is necessary to set the series on the application.
|
||||
|
||||
.. code:: bash
|
||||
|
||||
juju set-series keystone xenial
|
|
@ -10,3 +10,4 @@ Appendices
|
|||
app-vault.rst
|
||||
app-encryption-at-rest.rst
|
||||
app-certificate-management.rst
|
||||
app-series-upgrade.rst
|
||||
|
|
Loading…
Reference in New Issue