diff --git a/deploy-guide/source/app-series-upgrade.rst b/deploy-guide/source/app-series-upgrade.rst new file mode 100644 index 0000000..0cbf6b4 --- /dev/null +++ b/deploy-guide/source/app-series-upgrade.rst @@ -0,0 +1,476 @@ +Appendix F: Series Upgrade +============================== + +Introduction +++++++++++++ + +Juju and OpenStack charms provide the primitives to prepare for and +respond to an upgrade from one Ubuntu LTS series to another. + + +Warnings +++++++++ + +Upgrading a single machine from one LTS to another is a complex task. +Doing so on a running OpenStack cloud is an order of magnitude more +complex. + +Please read through this document thoroughly before attempting a series +upgrade. Please pay particular attention to the Assumptions section and +the order of operations. + +The series upgrade should be executed by an administrator or team of +administrators who are intimately familiar with the cloud undergoing +upgrade, OpenStack in general, working with Juju and OpenStack charms. + +The tasks of preparing stateful OpenStack services for series upgrade is +not automated and is the responsibility of the administrator. For +example: evacuating a compute node, switching HA routers to a network +node, any storage rebalancing that may be required. + +The actual task of executing the do-release-upgrade on an individual +machine is not automated. It will be performed by the administrator. Any +bespoke preparation for or cleanup after the do-release-upgrade is the +responsibility of the administrator. + +The series upgrade process requires API downtime. Although the goal is +minimal downtime, it is necessary to pause services to avoid race +condition errors. Therefore, the API undergoing upgrade will require +downtime. + +Stateful services which OpenStack depends on such as percona-cluster and +rabbitmq will affect all APIs during series upgrade and therefore +require downtime. + +Third party charms may not have implemented series upgrade yet. Please +pay particular attention to SDN and storage charms which may affect +cloud operation. + +If the architecture and layout of charms does not match the assumptions +section of this document, great care needs to be taken to avoid problems +with application leadership across machines. In other words, if most +services are not in LXD containers, it is possible to have the leader of +percona-cluster on one host and the leader of rabbit on another causing +complication's in the procure for series upgrade. + +Test, test, test! The series upgrade process should be tested on a +non-production cloud that closely resembles the eventual production +environment. Not only does this validate the software involved but it +prepares the administrator for the complex task ahead. + + +Juju +++++ + +Please read all Juju documentation on the series upgrade feature. + +https://docs.jujucharms.com/devel/en/getting-started + +.. note:: + The Juju upgrade-series command operates on the machine level. This + document will be focused on applications as many require pausing their + peers and some subordinates. But it is important to remember the whole + machine is upgraded. + + Applications deployed in a LXD container are considered a machine apart + from the physical host machine the container is hosted on. + + Upgrading the host machine will not upgrade the LXD contained machines. + However, when the required post-upgrade reboot of the host machine + occurs all the services contained in LXD containers will be unavailable + during the reboot. + + For example a physical host with nova-compute, neutron-openvswitch and + ceph-osd colocated as well as hosting a keystone unit in a LXD. When + the juju upgrade-series prepare command is executed on the machine, + nova-compute, neutron-openvswitch and ceph-osd will execute their + pre-series-upgrade hooks but keystone will not. Nor will the LXD + operating system be affected by the do-release-upgrade on the host. At + reboot however, the keystone unit will be unavailable during the + duration of the reboot. Please plan accordingly. + + +Assumptions ++++++++++++ + +This document makes a number of assumptions about the architecture and +preparation of the cloud undergoing series upgrade. Please review these +and compare to the running cloud before performing the series upgrade. + + +Preparations +~~~~~~~~~~~~ + +Charms are upgraded to the latest release. + +OpenStack is upgraded to the highest version the current LTS supports. +Mitaka for Trusty and Queens for Xenial. + +The current Ubuntu operating system is up to date prior to do-release-upgrade. + +Stateful services have been backed up. Percona-cluster and mongodb +should be backed up prior to upgrading. + +General cloud health. Confirm the cloud is fully operational before +beginning a series upgrade. + +OpenStack charms health. No charms are in hook error. Confirm the health +of the juju environment before beginning series upgrade. + +Per machine preparations. Individual compute nodes are evacuated prior +to series upgrade. HA routers are moved to network nodes not undergoing +series upgrade. + + +Hyper-Converged Architecture +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Compute, storage and their subordinates may be colocated. + +API Services are deployed in LXD containers. + +Percona-cluster is deployed in a LXD container. + +Rabbitmq is deployed in a LXD container. + +Third party charms either do not exist or have been thoroughly tested +for series upgrade. + +No other non-subordinate charms are colocated on the same machine. + + +Overview +++++++++ + +.. note:: + This overview is not a substitute for understanding the + entirety of this document. It is the general case but the individual + details matter. Read "where appropriate" at the end of each step. + +Evacuate or otherwise prepare the machine + +Pause hacluster for non-leader units not undergoing upgrade + +Pause non-leader peer units not undergoing upgrade + +Juju upgrade-series prepare the leader's machine + +Execute do-release-upgrade and any post-upgrade operating system tasks + +Reboot + +Set openstack-origin or source for new operating system ("distro") + +Juju upgrade-series complete the machine + +Repeat the steps from prepare to complete for the non-leader machines + +Perform any cluster completed upgrade tasks after all units of +application have been upgraded. + +Juju set-series to the new series for all future units of an application. + +Exceptions +~~~~~~~~~~ + +This overview describes the general case that includes the API charms, +percona culster and rabbitmq. + +The notable exceptions are nova-compute, ceph-mon and ceph-osd which +do not require pausing of any units and unit leadership is irrelevant. + + +Example as code +~~~~~~~~~~~~~~~ + +Attempting an automated series upgrade on a running production cloud is +not recommended. The following example-as-code encapsulates the +processes described in this document, and are provided solely to +illustrate the methods used to develop and test the series upgrade +primitives. The example code should not be consumed in an automation +outside of its intended use case (charm dev/test gate automation). + +https://github.com/openstack-charmers/zaza/blob/master/zaza/charm_tests/series_upgrade/tests.py + +https://github.com/openstack-charmers/zaza/blob/master/zaza/utilities/generic.py#L173 + + +Procedures +++++++++++ + +The following procures are broken up into categories of charms that +follow the same procedure. + +.. note:: + Example commands used in this documentation assume a Trusty to Xenial + series upgrade, the same approach is used for Xenial to Bionic + series upgrades. Unit and machine numbers are examples only they will + differ from site to site. For example the machine number 0 is reused + purely for example purposes. + + +Physical Host Nodes +~~~~~~~~~~~~~~~~~~~ + +Procedure for the physical host nodes which may include nova-compute, +neutron-openvswitch and ceph-osd as well as neutron-gateway. Though +ceph-mon is most often deployed in LXD containers it follows this +procedure. + + .. note:: + Nova-compute and ceph-osd are used in the commands below for + example purposes. In this example, physical host where + nova-compute/0 and ceph-osd/0 are deployed is machine 0. + +Evacuate or otherwise prepare the machine + For compute nodes move all running VMs off the physical host. + For network nodes force HA routers off of the current node. + Any storage related tasks that may be required. + Any site specific tasks that may be required. + + +Juju upgrade-series prepare the machine + .. code:: bash + + juju upgrade-series prepare 0 xenial + + .. note:: + The upgrade-series prepare command causes all the charms on the given + machine to run their pre-series-upgrade hook. For most cases with the + OpenStack charms this pauses the unit. At the completion of the + pre-series-upgrade hook the workload status should be "blocked" with + the message "Ready for do-release-upgrade and reboot." + +Execute do-release-upgrade and any post-upgrade operating system tasks + The do-release-upgrade process is performed by the administrator. Any + post do-release-upgrade tasks are also the responsibility of the + administrator. + +Reboot + Post do-release-upgrade reboot executed by the administrator. + +Set openstack-origin or source for new operating system ("distro") + This step is required and should occur before the first node is + completed. + + .. code:: bash + + juju config nova-compute openstack-origin=distro + juju config ceph-osd source=distro + + +Juju upgrade-series complete the machine + .. code:: bash + + juju upgrade-series complete 0 + + .. note:: + + The upgrade-series complete command causes all the charms on the given + machine to run their post-series-upgrade hook. For most cases with the + OpenStack charms this re-writes configuration files and resumes the unit. + At the completion of the post-series-upgrade hook the workload status + should be "active" with the message "Unit is ready." + +Juju set-series to the new series for all future units of an application. + To guarantee that any future unit-add commands create new + instantiations of the application on the correct series it is necessary + to set the series on the application. + + .. code:: bash + + juju set-series nova-compute xenial + juju set-series neutron-openvswitch xenial + juju set-series ceph-osd xenial + + +Repeat the procedure for all physical host nodes. + It is not necessary to repeat the set openstack-origin step. + + + +Stateful Services +~~~~~~~~~~~~~~~~~ + +Procedure for the stateful services deployed on LXD containers. +These include percona-cluster and rabbitmq. + + +.. note:: + While percona-cluster is often deployed with hacluster for HA, + rabbitmq is not. Ignore the hacluster steps for rabbitmq. + Likewise no backup is required of rabbitmq. Percona-cluster is used + below for example purposes. In this example, the LXD container the + leader node of percona-cluster/0 is deployed on is machine 0. + + +Prepare the machine + Perform backups of percona-cluster and scp the backup to a secure + location. + + .. code:: bash + + juju run-action percona-cluster/0 backup + juju scp -- -r percona-cluster/0:/opt/backups/mysql /path/to/local/backup/dir + + +Pause hacluster for non-leader units not undergoing upgrade + .. code:: bash + + juju run-action percona-cluster-hacluster/1 pause + juju run-action percona-cluster-hacluster/2 pause + + +Pause non-leader peer units not undergoing upgrade + .. code:: bash + + juju run-action percona-cluster/1 pause + juju run-action percona-cluster/2 pause + + +Juju upgrade-series prepare the leader's machine + .. code:: bash + + juju upgrade-series prepare 0 xenial + + .. note:: + The upgrade-series prepare command causes all the charms on the given + machine to run their pre-series-upgrade hook. For most cases with the + OpenStack charms this pauses the unit. At the completion of the + pre-series-upgrade hook the workload status should be "blocked" with + the message "Ready for do-release-upgrade and reboot." + +Execute do-release-upgrade and any post-upgrade operating system tasks + The do-release-upgrade process is performed by the administrator. Any + post do-release-upgrade tasks are also the responsibility of the + administrator. + +Reboot + Post do-release-upgrade reboot executed by the administrator. + +Set openstack-origin or source for new operating system ("distro") + This step is required and should occur before the first node is + completed but after the other units are paused. + + .. code:: bash + + juju config percona-cluster source=distro + + +Juju upgrade-series complete the machine + .. code:: bash + + juju upgrade-series complete 0 + + .. note:: + + The upgrade-series complete command causes all the charms on the given + machine to run their post-series-upgrade hook. For most cases with the + OpenStack charms this re-writes configuration files and resumes the unit. + At the completion of the post-series-upgrade hook the workload status + should be "active" with the message "Unit is ready." + +Repeat the procedure for non-leader nodes + It is not necessary to repeat the set openstack-origin step. + +Perform any cluster completed upgrade tasks after all units of application have been upgraded. + Run the complete-cluster-series-upgrade action on the leader node. This + action informs each node of the cluster the upgrade process is complete + cluster wide. This also updates mysql configuration with all peers in + the cluster. + + .. code:: bash + + juju run-action percona-cluster/0 complete-cluster-series-upgrade + +Juju set-series to the new series for all future units of an application. + To guarantee that any future unit-add commands create new + instantiations of the application on the correct series it is necessary + to set the series on the application. + + .. code:: bash + + juju set-series percona-cluster xenial + + +API Services +~~~~~~~~~~~~ + +Procedure for the API services in LXD containers. These include but are +not limited to keystone, glance, cinder, neutron-api and +nova-cloud-controller. Any subordinates deployed with these applications +will be upgraded at the same time. + +.. note:: + Keystone is used in the commands below for example purposes. In this + example, the LXD container the leader node of keystone/0 is deployed + on is machine 0. + + +Pause hacluster for non-leader units not undergoing upgrade + .. code:: bash + + juju run-action keystone-hacluster/1 pause + juju run-action keystone-hacluster/2 pause + + +Pause non-leader peer units not undergoing upgrade + .. code:: bash + + juju run-action keystone/1 pause + juju run-action keystone/2 pause + + +Juju upgrade-series prepare the leader's machine + .. code:: bash + + juju upgrade-series prepare 0 xenial + + .. note:: + The upgrade-series prepare command causes all the charms on the given + machine to run their pre-series-upgrade hook. For most cases with the + OpenStack charms this pauses the unit. At the completion of the + pre-series-upgrade hook the workload status should be "blocked" with + the message "Ready for do-release-upgrade and reboot." + +Execute do-release-upgrade and any post-upgrade operating system tasks + The do-release-upgrade process is performed by the administrator. Any + post do-release-upgrade tasks are also the responsibility of the + administrator. + +Reboot + Post do-release-upgrade reboot executed by the administrator. + +Set openstack-origin or source for new operating system ("distro") + This step is required and should occur before the first node is + completed but after the other units are paused. + + .. code:: bash + + juju config keystone source=distro + + +Juju upgrade-series complete the machine + .. code:: bash + + juju upgrade-series complete 0 + + .. note:: + + The upgrade-series complete command causes all the charms on the given + machine to run their post-series-upgrade hook. For most cases with the + OpenStack charms this re-writes configuration files and resumes the unit. + At the completion of the post-series-upgrade hook the workload status + should be "active" with the message "Unit is ready." + +Repeat the procedure for non-leader nodes + It is not necessary to repeat the set openstack-origin step. + +Juju set-series to the new series for all future units of an application. + To guarantee that any future unit-add commands create new + instantiations of the application on the correct series it + is necessary to set the series on the application. + + .. code:: bash + + juju set-series keystone xenial diff --git a/deploy-guide/source/app.rst b/deploy-guide/source/app.rst index fa82371..53a4208 100644 --- a/deploy-guide/source/app.rst +++ b/deploy-guide/source/app.rst @@ -10,3 +10,4 @@ Appendices app-vault.rst app-encryption-at-rest.rst app-certificate-management.rst + app-series-upgrade.rst