From 53c0119d083a49e50ca5eca769ef2d57ad7dc8a9 Mon Sep 17 00:00:00 2001 From: suzhengwei Date: Wed, 30 Nov 2022 15:58:34 +0800 Subject: [PATCH] vm moves for host notification The original design is to save instance evacuations information for host failure notification. For later new features, there will be instance migrations information for host restored notification. It is more compatible to use vm move (vmove) object, wihch include 'type' filed to show it is one evacuation or migration. Blueprint: vm-evacuations-for-host-recovery Change-Id: Ic0d1283bebbc562cfd20e004cb9b6ed309d0fd28 --- .../vm-evacuations-for-host-recovery.rst | 143 ++++++++++-------- 1 file changed, 76 insertions(+), 67 deletions(-) diff --git a/specs/xena/approved/vm-evacuations-for-host-recovery.rst b/specs/xena/approved/vm-evacuations-for-host-recovery.rst index 6bb1282..b72ca9c 100644 --- a/specs/xena/approved/vm-evacuations-for-host-recovery.rst +++ b/specs/xena/approved/vm-evacuations-for-host-recovery.rst @@ -14,40 +14,33 @@ https://blueprints.launchpad.net/masakari/+spec/vm-evacuations-for-host-recovery Problem description =================== -If one compute node failed, Masakari will evacuate the instances -from the failed host. +If one compute node failed, Masakari will evacuate the instances from +the failed host. -Generally, the resources of computing nodes are gradually reduced. -If a large number of hosts fail at the same time and there are not -enough computing node resources, the operator needs to set the -priority of instance evacuation in advance to ensure that the -evacuation can be carried out in the order of priority. +If a large number of hosts fail at the same time, the resources of +computing nodes are dramatically reduced. There would not be enough +resources for all instances to recovery. So it is reasonable that +the very important instances to be firstly evacuated, and evacuations +can be aborted once the cloud environment encounters an irreversible +condition. -When the failed compute node recovers, in order to make full -use of the computing node resources and due to some instances -needing to run on a specific computing node, the operator wants -to migrate the instance back to the original node. +When the failed hosts come back, the restored resources may be lying +idle. In order to make full use of the restored resources, It needs +to move instances to the restored hosts. Sometimes there may be a +distribution on purpose. The vm moves automatically such as DRS +or manually could mess up the distribution. So it is a good idea to +save the evaucations when the host is failed, and move instances back +when the host is restored according to the previous evaucations. Proposed change =============== -This spec is mainly to record instance evacuation information in -the database, provide two interfaces to support obtaining all -evacuation information lists and specific evacuation information -details, and prepare relevant information for supporting the -migration of the instances back to previously failed hosts. - -* Record instance evacuation information in the database, mainly - including instance_id, notification_id, source_host, dest_host, - status. The status is pending. -* User can get evacuation information about a specific masakari - notification by ``GET /notifications//evacuations`` - API. -* User can get detailed information about a specific evacuation record - of a particular masakari notification by - ``GET /notifications//evacuations/`` - API. +This spec is mainly to record vm moves information in +the database, mainly including instance_uuid, notification_uuid, +source_host, dest_host, type, status, start_time and end_time. +User can get vm moves information of a 'COMPUTE_HOST' type +notification by vmove API. Alternatives ------------ @@ -57,64 +50,85 @@ None Data model impact ----------------- -The table ``evacuation`` will be added into the Masakari database. +The table ``vmoves`` will be added into the Masakari database. * created_at: Datetime. * updated_at: Datetime. * deleted_at: Datetime. * deleted: Boolean. -* uuid: UUID. uuid of evacuation record. -* notification_uuid: UUID. uuid of notification. -* instance_uuid: UUID. uuid of instance. -* source_host_name: String. The source compute node before the instance - evacuated. -* dest_host_name: String. The destination compute node after the instance is - evacuated. -* status: String. Represents possible statuses for notifications, such as - pending, ongoing, ignored, failed and succeeded. -* status_details: String. Store the details reason of evacuate failed/ignored. -* priority: Numeric. Set the evacuation priority and support the - evacuation of instances in order. The default value is 1. +* uuid: UUID. UUID of the vmove. +* notification_uuid: UUID. UUID of notification the vmove belong to. +* instance_uuid: UUID. UUID of instance. +* instance_uuid: String. Name of instance. +* source_host: String. Source host name of the vmove. +* dest_host: String. Destination host name of the vmove. +* start_time: Datetime. Start time of the vmove. +* end_time: Datetime. End time of the vmove. +* type: String. Represents possible types for the vmove, such as + migration, live_migration or evacuation. +* status: String. Represents possible statuses for the vmove, such as + pending, ongoing, ignored, failed or succeeded. +* message: String. Display some meaningful information if the vmove is + failed or ignored. REST API impact --------------- -Following changes will be introduced in a new API micro-version. +Following vmove API will be introduced in a new API micro-version. -* GET /notifications//evacuations +* GET /notifications//vmoves response example:: { - "evacuations": [ + "vmoves": [ { "uuid": "239f95ca-fd46-44d2-8ff8-35e8a9c94f69", "instance_uuid": "33826ebd-af0f-445d-833f-e06340f7ae1c", + "instance_name": "vm-1", "notification_uuid": "c0fa1a39-c150-4b86-ae97-8fae31700c67", - "source_host_name": "node01", - "dest_host_name": "node02", - "status": "pending", - "status_details": "", - "priority": "1" + "source_host": "node01", + "dest_host": "node02", + "start_time": "2022-11-22 14:50:22", + "end_time": "2022-11-22 14:50:35", + "type": "evacuation", + "status": "succeeded", + "message": null + }, + { + "uuid": "65a5da84-5819-4aea-8278-a28d2b489028", + "instance_uuid": "e1a5a45b-f251-47cf-9c5f-fa1e66e1286a", + "instance_name": "vm-2", + "notification_uuid": "c0fa1a39-c150-4b86-ae97-8fae31700c67", + "source_host": "node01", + "dest_host": "node02", + "start_time": "2022-11-22 14:50:23", + "end_time": "2022-11-22 14:50:38", + "type": "evacuation", + "status": "succeeded", + "message": null } ] } -* GET /notifications//evacuations/ +* GET /notifications//vmoves/ response example:: { - "evacuation": + "vmove": { "uuid": "239f95ca-fd46-44d2-8ff8-35e8a9c94f69", "instance_uuid": "33826ebd-af0f-445d-833f-e06340f7ae1c", + "instance_name": "vm-1", "notification_uuid": "c0fa1a39-c150-4b86-ae97-8fae31700c67", - "source_host_name": "node01", - "dest_host_name": "node02", - "status": "pending", - "status_details": "", - "priority": "1" + "source_host": "node01", + "dest_host": "node02", + "start_time": "2022-11-22 14:50:22", + "end_time": "2022-11-22 14:50:38", + "type": "evacuation", + "status": "succeeded", + "message": null } } @@ -131,8 +145,8 @@ None Other end user impact --------------------- -The python-masakariclient, masakari-dashboard and openstacksdk will be updated -to support instance evacuations for host recovery in a new micro-version. +The masakari-dashboard and openstacksdk will be updated to support +vm moves for host type notification in a new micro-version. Performance Impact ------------------ @@ -157,11 +171,7 @@ Assignee(s) Primary assignee: -* suzhengwei - -Historical assignee (pre-Yoga): - -* shenxinxin +* suzhengwei Work Items ---------- @@ -169,13 +179,12 @@ Work Items * Create the object definition, database schema, updating engine to handle this. -* Create a new API microversion to get information for all evacuations - and get detailed information about a particular evacuation. +* Create a new API microversion to get information for all vmoves + and get detailed information about a particular vmove. -* Update docs for instance evacuations for host recovery +* Update docs about vm moves for host recovery -* Update python-masakariclient, masakari-dashboard and openstacksdk to - manage instance evacuations for host recovery. +* Update masakari-dashboard and openstacksdk to manage vm moves. * Add unit and functional tests.