Commit Graph

18 Commits

Author SHA1 Message Date
suzhengwei 23c6c31409 vm moves for host notification
This feature is mainly to persist vm moves information of one host
notification into the database. It also provides a new 'VMove' api,
which could help users to insight the process or result of the host
recovery workflow, such as which vms evacuated succeed or failed, and
which ones are still evacuating.

Implements: BP vm-evacuations-for-host-recovery
Change-Id: I334af06ef526bf11dfe5030e8cba210b98f1ceea
2023-01-28 11:01:24 +00:00
Mitya_Eremeev c861437b52 Set "disabled reason" for compute service.
Masakari never sets reason why compute service was disabled.
"disabled reason" was added in config.

Closes-Bug: 1936181
Change-Id: I998f7884195b93927773c7186d61c13670a53662
2021-08-17 13:11:53 +03:00
suzhengwei fe88eae9cb add enabled to segment
Sometimes, operators want to temporarily close instance-ha function.
This patch add 'enabled' to segment. If the segment 'enabled' value
is set False, all notifications of this segment will be ignored
and no recovery methods will execuate.

Change-Id: I561a2519626fa1beae1e3033a6de510cea8f3fac
Implements: BP enable-to-segment
2021-03-03 10:42:35 +08:00
Radosław Piliszek 4397088da7 Add ha_enabled_instance_metadata_key config option
Previously Masakari enforced a common instance metadata key that
controlled how the HA protection applied to the instance:
"HA_Enabled".
Moreover, it was the same whether considering the instance or host
failure.
There are cases where users (operators) would like to independently
configure it per failure class and in general customize its name.

This patch implements the requested behaviour.
New tests are included.

Change-Id: I9dddf784da422f239ec304a2c0b8f24625914fc5
Implements: blueprint customisable-ha-enabled-instance-metadata-key
2020-08-16 10:15:00 +02:00
Shilpa 895899f11b Check if host belongs to a segment
If you query a host passing an existing failover segment but not
the one that is assigned to the host, it still returns the host
successfully. In this case, it should fail with 404 error.

This patch checks if the host belongs to the segment that is
present in the URI. If not, it will return 404 error.

Change-Id: I16256cc2a01696a1d54cb9326aed17b723b87727
Closes-Bug: #1854323
2020-05-20 11:25:35 +05:30
shilpa f2343ef09c Handle KeyError: 'progress_details'
If user tries to get notification details before masakari-engine taskflow
driver add progress details of the first task of the recovery workflow,
it's giving Keyerror: 'progress_details'.

Fixed the issue by checking progress_details is available in the meta.
If it's not present, then get notification details will return empty list
of recovery_workflow_details.

Change-Id: I79af240d7715718b424695253ee452dc9552607d
Closes-Bug: #1819422
2019-03-13 14:50:00 +05:30
Kengo Takahara c3d12e37b0 Recover resized instance(power_state=SHUTDOWN) with vm_state=stopped
The instance which is vm_state=resized and power_state=SHUTDOWN
should not be recovered with vm_state=error. It should be recovered
with vm_state=stopped. The instances such as paused or suspended
(which are other than vm_state=active,stopped,error) are also the same.

This patch modified that masakari-engine will recover these instances
with vm_state=stopped.

Change-Id: Ibdea705f69d139834c9b2294ae11bc9344c259bd
Closes-Bug: #1782518
2019-02-25 05:30:38 +00:00
binhong.hua fc3c689912 change nova.services.disable use service_uuid
nova_api_version has been bump to 2.53 in bug/1800073,
with this version, nova-client only take service_uuid to disable service,
args host_name and binary are no longer supported.

Change-Id: I6ab942f657f8983a22c9e16747090399c01fc3f8
Closes-bug: 1811742
2019-01-16 17:42:50 +08:00
Kengo Takahara d621267402 Evacuates instances which task_state is not none
This patch added implementation so that masakari can evacuate
instances which task_state is not none.
After the instance evacuated, it is recovered with original
vm_state. So if the instance's vm_state was 'stopped', it is
recovered with 'stopped', and if 'error', it is recovered
with 'error'.

Change-Id: I7af8552de0ee77b948a071b7f787514a81ccebc3
Closes-Bug: #1721742
2017-12-11 18:44:21 +09:00
dineshbhor 4173aaf039 Make provision to evacuate all instances
As of now host failure workflow was evacuating instances which were
having vm_state as active, stopped, error and resize. It was ignoring
other vm_states such as shelved, rescued, paused and suspended. Made
provision to evacuate instances which are having vm_states such as
shelved, rescued, paused and suspended by changing its vm_state to
error and after evacuating those instances will be stopped.

NOTE:
On master if the instance is in error or resized state then after
recovery it was becoming active. With this patch error instances
will be stopped and then set to error after recovery. For resized
instance if it's previous power_state is 4(SHUTDOWN) then we can
say that before failure the instance was is stopped state and then
it was resized so masakari will stop that instance to maintain
consistency of instance states as the instance was not fully
resized(resize operation was not confirmed). Resized instance which
was in active state before failure will become active again after
recovery.

Closes-Bug: #1693731
Closes-Bug: #1692435
Closes-Bug: #1690995
Closes-Bug: #1690768
Change-Id: I134e8b6ee7315935bd8ce418ef6241be0b9450b3
2017-08-08 16:51:25 +05:30
Dinesh Bhor 212d254da1 Remove 'on_shared_storage' parameter from nova evacuate
Starting since version 2.14, Nova automatically detects whether the
server data is on shared storage or not.

Removed 'on_shared_storage' parameter from nova evacuate call and
bumped nova api version from 2.9 to 2.14 so that shared storage
deployment can be detected by nova. Also added a related note in
README.rst to point out. Operators should configure shared storage
to use maskari otherwise instance data will be lost after evacuation.

Change-Id: I0b0581a5c84143fc91c9fc6e2c440096013c7438
2017-07-21 05:44:49 +00:00
dineshbhor 25d33d2cb1 Fix race condition between evacuation and its confirmation
Masakari can face a race condition where after evacuation of an
instance to other host user might perform some actions on that
instance which gives wrong instance vm_state to ConfirmEvacuationTask
that results into notification failure.

To fix this issue this patch proposes to lock the instance before
evacuation till its confirmation so that any normal user will not
be able to perform any actions on it. To achieve this the
ConfirmEvacuationTask is completly removed and the confirmation is
done in the EvacuateInstancesTask itself by per instance.
Evacuating an instance and confirming it's evacuation immediately
can reduce the performance so this patch uses the
eventlet.greenpool.GreenPool which executes the complete evacuation
and confirmation of an instance in a separate thread.
To check if the server is already locked or not upgraded the
novaclient's NOVA_API_VERSION from 2.1 to 2.9  as the 'locked'
property is available in nova api_version 2.9 and above.

This patch introduces a new config option
'host_failure_recovery_threads' which will be the number of threads
to be used for evacuating and confirming the instances evacuation.
The default value for this config option is 3.

Closes-Bug: #1693728
Change-Id: Ib5145878633fd424bca5bcbd5cfed13d20362f94
2017-06-23 13:08:03 +05:30
Kengo Takahara 7415951c46 Prevent 404 error when adding reserved_host to aggregate
When host-failure occurs, masakari-engine adds reserve_host
to aggregate.
However, when masakari-engine adds reserved_host,
masakari-engine passes an aggregate_name to novaclient.
This patch is modified so that masakari-engine passes
aggregate_id instead of aggregate_name to novaclient.

Change-Id: I669b19dea04c8ebb3a27a8ae746ae4c3f88d66f0
Closes-Bug: #1667246
2017-02-27 17:00:32 +09:00
Dinesh Bhor d45f754cbb Add reserved_host to failed_host's aggregate
Reserved hosts can be shared between multiple host_aggregates. So
before evacuating the instances from failed_host to reserved_host,
the target resered_host should be added to the same aggregate in
which the failed_host is.

This patch adds the reserved_host to failed_host's aggregate.
Adding reserved_host to aggregate is optional and can be configured
by operators with the help of new configuration parameter
'add_reserved_host_to_aggregate' which is added under the 'host_failure'
section. This config option defaults to 'False'.

Change-Id: I7478e0f24ecd6fd6385dd67e7f0cad5ca3460526
2017-02-16 11:57:40 +05:30
dineshbhor 0968920087 Add unit tests for notification business rules
Change-Id: I5f50e56c4500224fd82b51ac4ec6999636b502fe
2017-01-13 16:05:25 +05:30
Abhishek Kekane 51adf721a8 Add unit tests for process failure flow
Change-Id: Iba6e33b4db7a76e9342f30d664486a5accf75866
2017-01-10 12:13:01 +05:30
Abhishek Kekane f2a0ad343a Add unit tests for instance failure flow
Change-Id: I45b4a4ffa2ecb9f7cf56a3c5d16d3708fe584fc0
2016-11-30 13:42:40 +05:30
Abhishek Kekane 02a925465b Add unit tests for host failure workflow
Change-Id: If39e88223caa5a512ee1fc0e7da9173b48dac981
2016-11-30 13:30:34 +05:30