From e1c22713faca5f4a992f897ee0605feade49e2ed Mon Sep 17 00:00:00 2001 From: Theodoros Tsioutsias Date: Mon, 19 Mar 2018 09:54:21 +0000 Subject: [PATCH] Add PENDING vm state This feature adds support for the PENDING server state. Instead of setting a server to the ERROR state when the Placement returns no valid hosts for the requested server, the PENDING state is used if the operator wishes so. This will allow the execution of subsequent actions transparently to the end user. Change-Id: Ie2989dee31a8d7242bf5b78019f016b3bc7e42a3 Implements: blueprint introduce-pending-vm-state --- .../approved/introduce-pending-vm-state.rst | 263 ++++++++++++++++++ 1 file changed, 263 insertions(+) create mode 100644 specs/stein/approved/introduce-pending-vm-state.rst diff --git a/specs/stein/approved/introduce-pending-vm-state.rst b/specs/stein/approved/introduce-pending-vm-state.rst new file mode 100644 index 000000000..f4fe71424 --- /dev/null +++ b/specs/stein/approved/introduce-pending-vm-state.rst @@ -0,0 +1,263 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +========================== +Introduce Pending VM state +========================== +https://blueprints.launchpad.net/nova/+spec/introduce-pending-vm-state + +This feature adds support for the PENDING server state. When the scheduler +determines there is no capacity available for the given request, and so the +instance is about to be routed into cell0, the server should be set into the +PENDING state instead of ERROR if the operator wishes so. This will allow the +execution of subsequent actions transparently to the end user. + +Problem description +=================== + +Use Cases +--------- + +As an operator, I want to enable an external -to Nova- service, triggered as +soon as a server's build request fails due to NoValidHosts, and try to free up +the requested resources. + +If the outcome of the follow up actions is: + +#. success, the external service will try to rebuild the instance:: + + POST /servers/{server_id}/action + { + "rebuild": { + "description": null, + "imageRef": {image_id} + } + } + + NOTE:: + The rebuild api needs to be adapted to take care of instances that fail + while building and are mapped to cell0. This change is considered out + of scope for this spec and is being addressed by another spec [1] + +#. failure, the external service will set the state of the instance to ERROR + (using reset-state):: + + POST /servers/{server_id}/action + { + "os-resetState": { + "state": "error" + } + } + +In order to achieve that, transparently to the user, the instance should not +be set to the ERROR state but to the new PENDING state. + +We need to clarify here that, as for all the other VM states, the end user +will be able to delete instances set to the PENDING state. Failures to the +follow up actions, caused by the deletion of instances in the new state, have +to be handled by the external service. + +Proposed change +=============== + +#. Add the PENDING state in the InstanceState object. + +#. Add the PENDING state in compute vm_states. + +#. Add the PENDING state in the server ViewBuilder as a new progress status. + +#. Add a configuration option that defaults to ``False`` in the DEFAULT group + to enable the use of PENDING vm_state on NoValidHost events:: + + CONF.use_pending_state + +#. Add the following code in the conductor manager ``_bury_in_cell0`` method + to make sure that the a vm is set to PENDING only when the operator has + chosen so and the failure reported by the scheduler is a NoValidHost:: + + verify = isinstance(exc, exception.NoValidHost) + + if CONF.use_pending_state and verify: + vm_state = vm_states.PENDING + else: + vm_state = vm_states.ERROR + + updates = {'vm_state': vm_state, 'task_state': None} + +#. Add a new API microversion and Map the PENDING state to ERROR for requests + to previous microversions. See `REST API impact`_. + +Alternatives +------------ + +Follow the vendor data example and perform an asynchronous REST API call from +the Nova Conductor to the external service when enabled by the operator. But +having an asynchronous REST API call from the conductor would potentially have +performance impact. + +Data model impact +----------------- + +None. + +REST API impact +--------------- + +A new API microversion is needed for this change. For the older microversions +the PENDING state will be mapped to the ERROR state. + +Example responses for a server set to PENDING would be:: + + GET /servers/detail (new microversion) + { + "servers":[ + { + ...: ..., + "name": "test", + "id":"2dd26c1e-bc6f-45f6-83b3-2cb72ea026eb", + "OS-EXT-STS:vm_state":"pending", + "status":"PENDING", + ...: ... + } + ] + } + + GET /servers/detail (previous microversions) + { + "servers":[ + { + ...: ..., + "name": "test", + "id":"2dd26c1e-bc6f-45f6-83b3-2cb72ea026eb", + "OS-EXT-STS:vm_state":"error", + "status":"ERROR", + ...: ... + } + ] + } + + +Security impact +--------------- + +None. + +Notifications impact +-------------------- + +Firstly, the external third party service has to be notified when a server is +set to PENDING state. For this, the already existing versioned notification +instance.update. + +For the second part, a notification is needed in order to inform the external +service about a server's build procedure outcome. The plan is to use this +notification in order to enable the external Reaper service, to know where the +requested resources have to be freed up. + +The transformation of the existing ``select_destinations`` notification from +unversioned to a versioned one, is out of scope of this spec. There is an +existing change that tries to address the transformation of the notification +and can be picked up [2]. + +Other end user impact +--------------------- + +The end user will have servers in PENDING state if the operator chooses to +enable the configuration option ``CONF.use_pending_state``. + +Performance Impact +------------------ + +None. + +Other deployer impact +--------------------- + +There will be a new config option specifying if the PENDING state will be used +or not. It seems that the most appropriate place for this option is the DEFAULT +section. + +Developer impact +---------------- + +None. + +Upgrade impact +-------------- + +None. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + + +Other contributors: + + + + +Work Items +---------- + +See `Proposed change`_. + +Dependencies +============ + +None. + +Testing +======= + +Updating existing unit and functional tests should be enough to verify the +use of the new state. +New unit and functional tests have to be added to verify the new notification. + +Documentation Impact +==================== + +#. The new configuration option as well as the meaning of the PENDING state + should be documented. + +#. Update the allowed state transitions documentation to include:: + + BUILD to PENDING + PENDING to BUILD + PENDING to ERROR + +#. Document that the responsibility of managing the instance's lifecycle is + transferred to the external service as soon as the instance is set to the + PENDING state. + +References +========== + +[1] Enable rebuild for instances in cell0 + https://review.openstack.org/#/c/554218/ + +[2] WIP: Transform scheduler.select_destinations notification + https://review.openstack.org/#/c/508506/ + +As discussed in the Dublin PTG: +https://etherpad.openstack.org/p/nova-ptg-rocky L472 + +History +======= + +.. list-table:: Revisions + :header-rows: 1 + + * - Release Name + - Description + * - Rocky + - Introduced + * - Stein + - Re-proposed