Support Vitrage resources in Heat

The purpose is to automate the auto-healing process that
involves external monitoring, Vitrage alarm deduction and
Mistral workflow execution.

Story: 2002684
Task: 22527

Depends-On: Ie28ba2087c6d87aec57198afe9c328542a4c25ca
Change-Id: If66248e07a662a225799a2bd3fc88a31d1539021
This commit is contained in:
Ifat Afek 2018-06-28 12:37:46 +00:00
parent 8e865c99e9
commit 12921071ec
1 changed files with 213 additions and 0 deletions

View File

@ -0,0 +1,213 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=========================
Support Vitrage Resources
=========================
https://storyboard.openstack.org/#!/story/2002684
This Blueprint proposes to add support for Vitrage resources in Heat.
The purpose is to automate the auto-healing process that involves external
monitoring, Vitrage alarm deduction and Mistral workflow execution.
Problem description
===================
Auto-healing a Heat stack when an instance is down is extremely important.
This use case is already handled when Nova sends a notification about the
instance state, Aodh raises an event alarm and as a result a Mistral healing
workflow is executed.
However, there are cases where Nova is not aware about the real state of the
instance. One example is a network failure: a NIC that is down can result in no
network connectivity to certain instances, while their state in Nova remains
'Active'. We would like to support auto-healing in such cases as well.
Proposed change
===============
An OS::Vitrage::Template resource will be added in Heat, under
heat/engine/resources/openstack/vitrage.
Its role will be to create, based on the properties given in HOT template,
a Vitrage template with a condition->action scenario that will handle the
healing.
The VitrageTemplate resource will support the following use case:
#. An external monitor detects a network failure
#. Vitrage is notified, and based on its topology-graph it identifies all
affected resources
#. If an instance that belongs to a Heat stack is affected, Vitrage executes
a Mistral healing workflow
The implementation will be done in three phases.
**Phase 1:** A simple Vitrage template will be created and will support (only)
the following scenario:
If a specific alarm is raised on a specific instance -> execute the Mistral
healing workflow
**Phase 2:** Enable creation of more complex Vitrage templates, like the one in
the network failure use case. This will require additional development in
Vitrage, so it will provide "template skeletons" for different scenarios.
**Phase 3:** Enable referencing a complete Vitrage template that is written in
a separate yaml file. This will allow all the capabilities that are provided
by Vitrage.s
**VitrageTemplate definition**
.. code-block:: yaml
resources:
name:
type: OS::Vitrage::Template
properties:
type: String # Phase 1 - only 'instance_auto_healing' is supported
description: String
input:
alarm_name: String
resource_type: String # Phase 1 - only 'nova.instance' is supported
resource_id: String
actions:
- action:
type: String # Phase 1 - only 'execute_mistral' is supported
properties:
workflow: String
workflow_input:
{...}
Properties:
- type - Type of the Vitrage template. On phase 1 only a single template will
be supported: if there is an alarm on an instance, execute the workflow.
- description - Description of the Vitrage template
- alarm_name - The name of the alarm that should trigger the workflow
execution. This can be an alarm from an external monitor (like Zabbix,
Nagios, Collectd, or Prometheus), or a deduced alarm that was raised by
Vitrage.
- resource_type - Type of the resource that the alarm is raised on. On phase 1
only 'nova.instance' will be supported.
- resource_id - Id of the resource that the alarm is raised on
- actions - a list of actions to execute as a result. On phase 1 the only
supported action will be execute_mistral
- workflow - Id of the Mistral workflow to be executed
- workflow_input - values to be passed as inputs to the workflow
**Phase 1 example**
If there is an 'Instance down' alarm on an instance, execute a Mistral healing
workflow on that instance.
.. code-block:: yaml
resources:
execute_healing:
type: OS::Vitrage::Template
properties:
type: instance_auto_healing
description: Execute Mistral healing workflow if the instance is down
input:
alarm:
name: Instance down
resource_type: nova.instance
resource_id: {get_resource: server}
actions:
- action:
type: execute_mistral
properties:
workflow: {get_resource: autoheal}
input:
instance_id: {get_resource: server}
heat_stack_id: {get_param: "OS::stack_id"}
**Phase 2 example**
If there is an 'Host down' alarm on a host, and the host contains the instance
that is defined in this template, execute a Mistral healing workflow on that
instance.
The differences between the first example and this one are:
- template type. The template of this type will include internally the
host->instance relation
- resource_type
- additional 'instance' parameter
.. code-block:: yaml
resources:
execute_healing:
type: OS::Vitrage::Template
properties:
type: host_down_auto_healing
description: Execute Mistral healing workflow if the instance is down
input:
alarm:
name: Host down
resource_type: nova.host
resource_id: 'compute-1'
instance: {get_resource: server}
actions:
- action:
type: execute_mistral
properties:
workflow: {get_resource: autoheal}
input:
instance_id: {get_resource: server}
heat_stack_id: {get_param: "OS::stack_id"}
Alternatives
------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
ifat_afek
Milestones
----------
Target Milestone for completion:
rocky-3
Work Items
----------
Phase 1:
- Implement a Vitrage client plugin
- Implement the VitrageTemplate resource
- Add unit tests and tempest tests
- Add a HOT template example to heat-templates
Phase 2 (future):
- Create different types of Vitrage templates
Dependencies
============
No dependencies for phase 1.
Phase 2 depends on future Vitrage development.