vm moves for host notification

The original design is to save instance evacuations information
for host failure notification. For later new features, there will
be instance migrations information for host restored notification.
It is more compatible to use vm move (vmove) object, wihch
include 'type' filed to show it is one evacuation or migration.

Blueprint: vm-evacuations-for-host-recovery
Change-Id: Ic0d1283bebbc562cfd20e004cb9b6ed309d0fd28
This commit is contained in:
suzhengwei 2022-11-30 15:58:34 +08:00
parent 1975f2f177
commit 53c0119d08
1 changed files with 76 additions and 67 deletions

View File

@ -14,40 +14,33 @@ https://blueprints.launchpad.net/masakari/+spec/vm-evacuations-for-host-recovery
Problem description
===================
If one compute node failed, Masakari will evacuate the instances
from the failed host.
If one compute node failed, Masakari will evacuate the instances from
the failed host.
Generally, the resources of computing nodes are gradually reduced.
If a large number of hosts fail at the same time and there are not
enough computing node resources, the operator needs to set the
priority of instance evacuation in advance to ensure that the
evacuation can be carried out in the order of priority.
If a large number of hosts fail at the same time, the resources of
computing nodes are dramatically reduced. There would not be enough
resources for all instances to recovery. So it is reasonable that
the very important instances to be firstly evacuated, and evacuations
can be aborted once the cloud environment encounters an irreversible
condition.
When the failed compute node recovers, in order to make full
use of the computing node resources and due to some instances
needing to run on a specific computing node, the operator wants
to migrate the instance back to the original node.
When the failed hosts come back, the restored resources may be lying
idle. In order to make full use of the restored resources, It needs
to move instances to the restored hosts. Sometimes there may be a
distribution on purpose. The vm moves automatically such as DRS
or manually could mess up the distribution. So it is a good idea to
save the evaucations when the host is failed, and move instances back
when the host is restored according to the previous evaucations.
Proposed change
===============
This spec is mainly to record instance evacuation information in
the database, provide two interfaces to support obtaining all
evacuation information lists and specific evacuation information
details, and prepare relevant information for supporting the
migration of the instances back to previously failed hosts.
* Record instance evacuation information in the database, mainly
including instance_id, notification_id, source_host, dest_host,
status. The status is pending.
* User can get evacuation information about a specific masakari
notification by ``GET /notifications/<notification_id>/evacuations``
API.
* User can get detailed information about a specific evacuation record
of a particular masakari notification by
``GET /notifications/<notification_id>/evacuations/<evacuation_id>``
API.
This spec is mainly to record vm moves information in
the database, mainly including instance_uuid, notification_uuid,
source_host, dest_host, type, status, start_time and end_time.
User can get vm moves information of a 'COMPUTE_HOST' type
notification by vmove API.
Alternatives
------------
@ -57,64 +50,85 @@ None
Data model impact
-----------------
The table ``evacuation`` will be added into the Masakari database.
The table ``vmoves`` will be added into the Masakari database.
* created_at: Datetime.
* updated_at: Datetime.
* deleted_at: Datetime.
* deleted: Boolean.
* uuid: UUID. uuid of evacuation record.
* notification_uuid: UUID. uuid of notification.
* instance_uuid: UUID. uuid of instance.
* source_host_name: String. The source compute node before the instance
evacuated.
* dest_host_name: String. The destination compute node after the instance is
evacuated.
* status: String. Represents possible statuses for notifications, such as
pending, ongoing, ignored, failed and succeeded.
* status_details: String. Store the details reason of evacuate failed/ignored.
* priority: Numeric. Set the evacuation priority and support the
evacuation of instances in order. The default value is 1.
* uuid: UUID. UUID of the vmove.
* notification_uuid: UUID. UUID of notification the vmove belong to.
* instance_uuid: UUID. UUID of instance.
* instance_uuid: String. Name of instance.
* source_host: String. Source host name of the vmove.
* dest_host: String. Destination host name of the vmove.
* start_time: Datetime. Start time of the vmove.
* end_time: Datetime. End time of the vmove.
* type: String. Represents possible types for the vmove, such as
migration, live_migration or evacuation.
* status: String. Represents possible statuses for the vmove, such as
pending, ongoing, ignored, failed or succeeded.
* message: String. Display some meaningful information if the vmove is
failed or ignored.
REST API impact
---------------
Following changes will be introduced in a new API micro-version.
Following vmove API will be introduced in a new API micro-version.
* GET /notifications/<notification_id>/evacuations
* GET /notifications/<notification_id>/vmoves
response example::
{
"evacuations": [
"vmoves": [
{
"uuid": "239f95ca-fd46-44d2-8ff8-35e8a9c94f69",
"instance_uuid": "33826ebd-af0f-445d-833f-e06340f7ae1c",
"instance_name": "vm-1",
"notification_uuid": "c0fa1a39-c150-4b86-ae97-8fae31700c67",
"source_host_name": "node01",
"dest_host_name": "node02",
"status": "pending",
"status_details": "",
"priority": "1"
"source_host": "node01",
"dest_host": "node02",
"start_time": "2022-11-22 14:50:22",
"end_time": "2022-11-22 14:50:35",
"type": "evacuation",
"status": "succeeded",
"message": null
},
{
"uuid": "65a5da84-5819-4aea-8278-a28d2b489028",
"instance_uuid": "e1a5a45b-f251-47cf-9c5f-fa1e66e1286a",
"instance_name": "vm-2",
"notification_uuid": "c0fa1a39-c150-4b86-ae97-8fae31700c67",
"source_host": "node01",
"dest_host": "node02",
"start_time": "2022-11-22 14:50:23",
"end_time": "2022-11-22 14:50:38",
"type": "evacuation",
"status": "succeeded",
"message": null
}
]
}
* GET /notifications/<notification_id>/evacuations/<evacuation_id>
* GET /notifications/<notification_id>/vmoves/<vmove_id>
response example::
{
"evacuation":
"vmove":
{
"uuid": "239f95ca-fd46-44d2-8ff8-35e8a9c94f69",
"instance_uuid": "33826ebd-af0f-445d-833f-e06340f7ae1c",
"instance_name": "vm-1",
"notification_uuid": "c0fa1a39-c150-4b86-ae97-8fae31700c67",
"source_host_name": "node01",
"dest_host_name": "node02",
"status": "pending",
"status_details": "",
"priority": "1"
"source_host": "node01",
"dest_host": "node02",
"start_time": "2022-11-22 14:50:22",
"end_time": "2022-11-22 14:50:38",
"type": "evacuation",
"status": "succeeded",
"message": null
}
}
@ -131,8 +145,8 @@ None
Other end user impact
---------------------
The python-masakariclient, masakari-dashboard and openstacksdk will be updated
to support instance evacuations for host recovery in a new micro-version.
The masakari-dashboard and openstacksdk will be updated to support
vm moves for host type notification in a new micro-version.
Performance Impact
------------------
@ -157,11 +171,7 @@ Assignee(s)
Primary assignee:
* suzhengwei <sugar-2008@163.com>
Historical assignee (pre-Yoga):
* shenxinxin <shenxinxin@inspur.com>
* suzhengwei <suzhengwei@inspur.com>
Work Items
----------
@ -169,13 +179,12 @@ Work Items
* Create the object definition, database schema, updating
engine to handle this.
* Create a new API microversion to get information for all evacuations
and get detailed information about a particular evacuation.
* Create a new API microversion to get information for all vmoves
and get detailed information about a particular vmove.
* Update docs for instance evacuations for host recovery
* Update docs about vm moves for host recovery
* Update python-masakariclient, masakari-dashboard and openstacksdk to
manage instance evacuations for host recovery.
* Update masakari-dashboard and openstacksdk to manage vm moves.
* Add unit and functional tests.