vm moves for host notification

The original design is to save instance evacuations information
for host failure notification. For later new features, there will
be instance migrations information for host restored notification.
It is more compatible to use vm move (vmove) object, wihch
include 'type' filed to show it is one evacuation or migration.

Blueprint: vm-evacuations-for-host-recovery
Change-Id: Ic0d1283bebbc562cfd20e004cb9b6ed309d0fd28
This commit is contained in:
suzhengwei 2022-11-30 15:58:34 +08:00
parent 1975f2f177
commit 53c0119d08
1 changed files with 76 additions and 67 deletions

View File

@ -14,40 +14,33 @@ https://blueprints.launchpad.net/masakari/+spec/vm-evacuations-for-host-recovery
Problem description Problem description
=================== ===================
If one compute node failed, Masakari will evacuate the instances If one compute node failed, Masakari will evacuate the instances from
from the failed host. the failed host.
Generally, the resources of computing nodes are gradually reduced. If a large number of hosts fail at the same time, the resources of
If a large number of hosts fail at the same time and there are not computing nodes are dramatically reduced. There would not be enough
enough computing node resources, the operator needs to set the resources for all instances to recovery. So it is reasonable that
priority of instance evacuation in advance to ensure that the the very important instances to be firstly evacuated, and evacuations
evacuation can be carried out in the order of priority. can be aborted once the cloud environment encounters an irreversible
condition.
When the failed compute node recovers, in order to make full When the failed hosts come back, the restored resources may be lying
use of the computing node resources and due to some instances idle. In order to make full use of the restored resources, It needs
needing to run on a specific computing node, the operator wants to move instances to the restored hosts. Sometimes there may be a
to migrate the instance back to the original node. distribution on purpose. The vm moves automatically such as DRS
or manually could mess up the distribution. So it is a good idea to
save the evaucations when the host is failed, and move instances back
when the host is restored according to the previous evaucations.
Proposed change Proposed change
=============== ===============
This spec is mainly to record instance evacuation information in This spec is mainly to record vm moves information in
the database, provide two interfaces to support obtaining all the database, mainly including instance_uuid, notification_uuid,
evacuation information lists and specific evacuation information source_host, dest_host, type, status, start_time and end_time.
details, and prepare relevant information for supporting the
migration of the instances back to previously failed hosts.
* Record instance evacuation information in the database, mainly
including instance_id, notification_id, source_host, dest_host,
status. The status is pending.
* User can get evacuation information about a specific masakari
notification by ``GET /notifications/<notification_id>/evacuations``
API.
* User can get detailed information about a specific evacuation record
of a particular masakari notification by
``GET /notifications/<notification_id>/evacuations/<evacuation_id>``
API.
User can get vm moves information of a 'COMPUTE_HOST' type
notification by vmove API.
Alternatives Alternatives
------------ ------------
@ -57,64 +50,85 @@ None
Data model impact Data model impact
----------------- -----------------
The table ``evacuation`` will be added into the Masakari database. The table ``vmoves`` will be added into the Masakari database.
* created_at: Datetime. * created_at: Datetime.
* updated_at: Datetime. * updated_at: Datetime.
* deleted_at: Datetime. * deleted_at: Datetime.
* deleted: Boolean. * deleted: Boolean.
* uuid: UUID. uuid of evacuation record. * uuid: UUID. UUID of the vmove.
* notification_uuid: UUID. uuid of notification. * notification_uuid: UUID. UUID of notification the vmove belong to.
* instance_uuid: UUID. uuid of instance. * instance_uuid: UUID. UUID of instance.
* source_host_name: String. The source compute node before the instance * instance_uuid: String. Name of instance.
evacuated. * source_host: String. Source host name of the vmove.
* dest_host_name: String. The destination compute node after the instance is * dest_host: String. Destination host name of the vmove.
evacuated. * start_time: Datetime. Start time of the vmove.
* status: String. Represents possible statuses for notifications, such as * end_time: Datetime. End time of the vmove.
pending, ongoing, ignored, failed and succeeded. * type: String. Represents possible types for the vmove, such as
* status_details: String. Store the details reason of evacuate failed/ignored. migration, live_migration or evacuation.
* priority: Numeric. Set the evacuation priority and support the * status: String. Represents possible statuses for the vmove, such as
evacuation of instances in order. The default value is 1. pending, ongoing, ignored, failed or succeeded.
* message: String. Display some meaningful information if the vmove is
failed or ignored.
REST API impact REST API impact
--------------- ---------------
Following changes will be introduced in a new API micro-version. Following vmove API will be introduced in a new API micro-version.
* GET /notifications/<notification_id>/evacuations * GET /notifications/<notification_id>/vmoves
response example:: response example::
{ {
"evacuations": [ "vmoves": [
{ {
"uuid": "239f95ca-fd46-44d2-8ff8-35e8a9c94f69", "uuid": "239f95ca-fd46-44d2-8ff8-35e8a9c94f69",
"instance_uuid": "33826ebd-af0f-445d-833f-e06340f7ae1c", "instance_uuid": "33826ebd-af0f-445d-833f-e06340f7ae1c",
"instance_name": "vm-1",
"notification_uuid": "c0fa1a39-c150-4b86-ae97-8fae31700c67", "notification_uuid": "c0fa1a39-c150-4b86-ae97-8fae31700c67",
"source_host_name": "node01", "source_host": "node01",
"dest_host_name": "node02", "dest_host": "node02",
"status": "pending", "start_time": "2022-11-22 14:50:22",
"status_details": "", "end_time": "2022-11-22 14:50:35",
"priority": "1" "type": "evacuation",
"status": "succeeded",
"message": null
},
{
"uuid": "65a5da84-5819-4aea-8278-a28d2b489028",
"instance_uuid": "e1a5a45b-f251-47cf-9c5f-fa1e66e1286a",
"instance_name": "vm-2",
"notification_uuid": "c0fa1a39-c150-4b86-ae97-8fae31700c67",
"source_host": "node01",
"dest_host": "node02",
"start_time": "2022-11-22 14:50:23",
"end_time": "2022-11-22 14:50:38",
"type": "evacuation",
"status": "succeeded",
"message": null
} }
] ]
} }
* GET /notifications/<notification_id>/evacuations/<evacuation_id> * GET /notifications/<notification_id>/vmoves/<vmove_id>
response example:: response example::
{ {
"evacuation": "vmove":
{ {
"uuid": "239f95ca-fd46-44d2-8ff8-35e8a9c94f69", "uuid": "239f95ca-fd46-44d2-8ff8-35e8a9c94f69",
"instance_uuid": "33826ebd-af0f-445d-833f-e06340f7ae1c", "instance_uuid": "33826ebd-af0f-445d-833f-e06340f7ae1c",
"instance_name": "vm-1",
"notification_uuid": "c0fa1a39-c150-4b86-ae97-8fae31700c67", "notification_uuid": "c0fa1a39-c150-4b86-ae97-8fae31700c67",
"source_host_name": "node01", "source_host": "node01",
"dest_host_name": "node02", "dest_host": "node02",
"status": "pending", "start_time": "2022-11-22 14:50:22",
"status_details": "", "end_time": "2022-11-22 14:50:38",
"priority": "1" "type": "evacuation",
"status": "succeeded",
"message": null
} }
} }
@ -131,8 +145,8 @@ None
Other end user impact Other end user impact
--------------------- ---------------------
The python-masakariclient, masakari-dashboard and openstacksdk will be updated The masakari-dashboard and openstacksdk will be updated to support
to support instance evacuations for host recovery in a new micro-version. vm moves for host type notification in a new micro-version.
Performance Impact Performance Impact
------------------ ------------------
@ -157,11 +171,7 @@ Assignee(s)
Primary assignee: Primary assignee:
* suzhengwei <sugar-2008@163.com> * suzhengwei <suzhengwei@inspur.com>
Historical assignee (pre-Yoga):
* shenxinxin <shenxinxin@inspur.com>
Work Items Work Items
---------- ----------
@ -169,13 +179,12 @@ Work Items
* Create the object definition, database schema, updating * Create the object definition, database schema, updating
engine to handle this. engine to handle this.
* Create a new API microversion to get information for all evacuations * Create a new API microversion to get information for all vmoves
and get detailed information about a particular evacuation. and get detailed information about a particular vmove.
* Update docs for instance evacuations for host recovery * Update docs about vm moves for host recovery
* Update python-masakariclient, masakari-dashboard and openstacksdk to * Update masakari-dashboard and openstacksdk to manage vm moves.
manage instance evacuations for host recovery.
* Add unit and functional tests. * Add unit and functional tests.