Support volume-backed server rebuild
The spec is describing the operation of rebuild the volume-backed server with a new image. Spec for blueprint volume-backed-server-rebuild Change-Id: I646041c872c4172219df2527820a672a0a2cb736
This commit is contained in:
parent
6da301304c
commit
967737966f
|
@ -0,0 +1,240 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================================
|
||||
volume-backed server rebuild
|
||||
==========================================
|
||||
https://blueprints.launchpad.net/nova/+spec/volume-backed-server-rebuild
|
||||
|
||||
Currently, the compute API will `fail`_ if a user tries to rebuild
|
||||
a volume-backed server with a new image. This spec proposes to add
|
||||
support for rebuilding a volume-backed server with a new image.
|
||||
|
||||
.. _fail: https://github.com/openstack/nova/blob/62245235b/nova/compute/api.py#L3318
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Currently Nova rebuild (with a new image) only supports instances which are
|
||||
booted from images. The volume-backed instance cannot be rebuilt when a new
|
||||
image is supplied. Trying to rebuild a volume-backed instance will raise a
|
||||
HTTPBadRequest exception.
|
||||
|
||||
Use Cases
|
||||
---------
|
||||
|
||||
* As a user, I would like to rebuild my volume-backed server with a new image.
|
||||
|
||||
* As a nova developer, I would like to have feature parity in the compute API
|
||||
for volume-backed and image-backed servers.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
First, change the existing API for rebuilding a volume-backed server.
|
||||
Then the API flow would be:
|
||||
|
||||
#. Has the new API microversion been requested?
|
||||
#. Is the instance.host service version new enough to support
|
||||
volume-backed rebuild with a new image?
|
||||
|
||||
If these are true, proceed. If not, fail in the API with a 409 error.
|
||||
|
||||
Note that when rebuilding with a new image, the request will be run through
|
||||
the scheduler against the current host to be consistent with image-backed
|
||||
rebuild with a new image. See `bug 1664931`_ for details.
|
||||
|
||||
.. _bug 1664931: https://bugs.launchpad.net/nova/+bug/1664931
|
||||
|
||||
Then the nova-compute will perform the following steps:
|
||||
|
||||
#. Create an empty (no connector) volume attachment for the volume and
|
||||
server. This ensures the volume remains ``reserved`` through the next
|
||||
step.
|
||||
#. Delete the existing volume attachment (the old one).
|
||||
#. Call the new ``os-reimage`` cinder API.
|
||||
#. Poll the volume status for completion (either success or failure).
|
||||
#. Upon successful completion of the re-image operation, update the empty
|
||||
volume attchment in Cinder, and then do the attachment on the Nova host
|
||||
when spawning the (rebuilt) guest VM and "complete" the attachment
|
||||
which will make the volume ``in-use`` again.
|
||||
|
||||
In this process, there are some conditions that we could hit:
|
||||
|
||||
* If we failed to re-image the volume and the volume is in 'error' status
|
||||
then we should set the instance status as "error". Since users can rebuild
|
||||
instances in error status, the user has a way to retry the rebuild once
|
||||
the cause of the cinder side failure is resolved. Note that nova-compute
|
||||
will *not* attempt to update the volume attachment records with the host
|
||||
connector again on the volume in error status.
|
||||
* If the cinder API itself returns a >=400 error, nothing changed about the
|
||||
root volume and in that case the migration status can be 'failed' but the
|
||||
instance status should go back to what it was (we can see how
|
||||
_error_out_instance_on_exception is used).
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
The main alternative is that nova would perform the rebuild like an initial
|
||||
boot from volume where nova-compute would create a new volume from the new
|
||||
image and then "swap" the root volume on the instance during rebuild.
|
||||
|
||||
There are issues with this, however, like what to do about the old volume:
|
||||
|
||||
* Regarding 'delete_on_termination' flag in the BDM,
|
||||
delete_on_termination=True means: don't delete the volume when we kill
|
||||
the instance. Rebuild means: re-initialize this instance in place. The
|
||||
rebuild flow would have to determine what to do if the old root volume
|
||||
BDM was marked with delete_on_termination=True - ignore that and preserve
|
||||
the old root volume or delete it.
|
||||
|
||||
* We could pass a new flag to the rebuild API telling nova what to do about the
|
||||
old volume (delete it or not).
|
||||
If the flag is true to delete the old volume but the old volume has
|
||||
snapshots, Nova won't be deleting the volume snapshots just to delete
|
||||
the volume during a rebuild.
|
||||
|
||||
But there are several issues with that as mentioned above like quota and
|
||||
the questions about what nova should do about the old volume, you can
|
||||
see more detailed information in `References`_.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
Change the rebuild request response code from 400 to 202 if the conditions
|
||||
described in the `Proposed change`_ section are met.
|
||||
The API microversion and compute RPC version will also be incremented to
|
||||
indicate the new support.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
The python-novaclient and python-openstackclient will be updated
|
||||
to support the new microversion.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
The operation will take longer because of the orchestration
|
||||
involved and the work that needs to happen in Cinder.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
If the cinder volume ``reimage`` API operation fails and the volume goes to
|
||||
``error`` status, an admin will likely need to investigate and resolve the
|
||||
issue in cinder and then reset the volume status to ``reserved``.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None
|
||||
|
||||
Upgrade impact
|
||||
--------------
|
||||
|
||||
The API microversion and compute RPC version will also be incremented
|
||||
to indicate the new support, therefore users will not be able to leverage
|
||||
the feature until the nova-compute service hosting a volume-backed instance
|
||||
is upgraded.
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Jie Li <lijie@unitedstack.com> (ramboman)
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Change the existing rebuild API.
|
||||
* Create an empty attachment for the root volume so the volume
|
||||
remains in-use during rebuild (we do this today already).
|
||||
* Delete the old volume attachment.
|
||||
* Call the cinder API to re-image the volume.
|
||||
* Update and complete the volume attachment once re-imaged.
|
||||
* Adopt the new compute version.
|
||||
* Adopt the new microversion in python-novaclient.
|
||||
* Adopt the new microversion in python-openstackclient.
|
||||
* Change the nova API documents.
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
Depends on the cinder blueprint for re-imaging a volume, see
|
||||
more detail information in References.
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
The following tests are added.
|
||||
|
||||
* Nova unit tests for negative scenarios
|
||||
* Nova functional tests for "happy path" testing
|
||||
* Tempest integration tests to make sure the nova/cinder integration
|
||||
works properly
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
We will replace the `note in the API reference`_ with
|
||||
a note about the required minimum microversion for rebuilding a
|
||||
volume-backed server with a new image.
|
||||
|
||||
The following document will be updated:
|
||||
|
||||
* API Reference
|
||||
|
||||
.. _note in the API reference: https://developer.openstack.org/api-ref/compute/?expanded=#rebuild-server-rebuild-action
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* Stein PTG etherpad: https://etherpad.openstack.org/p/nova-ptg-stein
|
||||
|
||||
* This is the discussion about rebuild the volume-backed server:
|
||||
|
||||
http://lists.openstack.org/pipermail/openstack-dev/2017-October/123255.html
|
||||
|
||||
* This is the discussion about what we should do about the root volume
|
||||
during a rebuild:
|
||||
|
||||
http://lists.openstack.org/pipermail/openstack-operators/2018-March/014952.html
|
||||
|
||||
* The cinder blueprint for re-imaging a volume:
|
||||
|
||||
https://blueprints.launchpad.net/cinder/+spec/add-volume-re-image-api
|
||||
|
||||
History
|
||||
=======
|
||||
|
||||
.. list-table:: Revisions
|
||||
:header-rows: 1
|
||||
|
||||
* - Release Name
|
||||
- Description
|
||||
* - Stein
|
||||
- Proposed
|
Loading…
Reference in New Issue