Merge "Add alarm-based monitoring driver to Tacker"

2016-07-19 06:27:48 +00:00 · 2016-07-19 06:27:48 +00:00 · d81173f0b3
parent 37ca50bdd5 7ac9ce4e68
commit d81173f0b3
1 changed files with 399 additions and 0 deletions
--- a/specs/newton/alarm-based-monitoring-driver.rst
+++ b/specs/newton/alarm-based-monitoring-driver.rst
@ -0,0 +1,399 @@
+
+===========================================
+Add alarm-based monitoring driver to Tacker
+===========================================
+https://blueprints.launchpad.net/tacker/+spec/alarm-based-monitoring-driver
+
+This spec describes an alarm-based monitoring driver in Tacker
+
+Problem description
+===================
+
+ETSI MANO architecture describes to monitor the VNF to take appropriate action
+such as fault management, performance management. Monitoring became an
+important aspect in MANO architecture.
+Currently, Tacker provides a very minimal support for checking the liveliness
+of VNF elements by means of ping or curl which helps to recover the element
+in case it is unreachable. But Tacker does not support monitoring of
+the CPU/memory usage of VNF elements. Further, it is necessary for Tacker to monitor all
+VNF resources as well. The reason is that the failure of VNFs happen too diversely.
+
+Proposed change
+===============
+
+The scope of this spec focused on:
+
+* designing a generic monitoring framework. Whereby, an alarm-based monitoring driver
+  in Tacker is designed to collect alarms/events triggered by the low-level designs
+  (Ceilometer, Monasca, custom driver). In this spec, the alarm-based monitoring
+  driver can completely monitor any resources in OpenStack that Ceilometer can support.
+  In real implementation, this spec aims to leverage Ceilometer to monitor CPU/memory
+  usage inside VNF.
+
+* defining Monitoring Policy using the TOSCA Policy format. The monitoring policy
+  can apply to a single VDU or multiple VDUs.
+
+* adding support for inserting Ceilometer Alarms into the HOT template to allow
+  Ceilometer to trigger scaling in Heat resource groups.
+
+
+::
+
+    The alarm-based monitoring framework:
+            +-----------------------------------+
+            |                                   |
+            |                                   |
+            |      +-----------------+          |
+            |      | VNFM / TOSCA    |          |
+            |      |                 |          |
+            |      +--------+--------+          |
+            |               |                   |
+            |      +--------v--------+          |
+            |      |                 |          |
+            |      | alarm-framework <-----+    |
+            |      |                 +---+ |    |
+            |      +-+-^-------+-^---+   | |    |
+            |        | |       | |       | |    |
+            | +------v-++  +---v-+-+  +--v-+-+  |
+            | |         |  |       |  |      |  |
+            | |         |  |       |  |      |  |
+            | |Ceilometer  |Monasca|  |Custom|  |
+            | |         |  |       |  |      |  |
+            | |         |  |       |  |      |  |
+            | |         |  |       |  |      |  |
+            | +---------+  +-------+  +------+  |
+            +-----------------------------------+
+
+The TOSCA scheme could be defined as the following:
+
+**tosca.policies.tacker.Monitoring**
+
+.. code-block:: yaml
+
+  tosca.policies.tacker.Monitoring:
+    derived_from: tosca.policies.Monitoring
+    targets:
+      type: list
+      entry_schema:
+        type: string
+        required: true
+      description: List of monitored VDUs
+    triggers:
+      resize_compute:
+        event_type:
+          type: map
+          entry_schema:
+            type: string
+          required: true
+        metrics:
+          type: string
+          required: true
+        condition:
+          type: map
+          entry_schema:
+            type: string
+          required: false
+        action:
+          type: map
+          entry_schema:
+            type: string
+          required: true
+
+
+TOSCA template referred to [1]_ could be modeled with the below details in term of auto-scaling:
+
+.. code-block:: ini
+
+ tosca_definitions_version: tosca_simple_profile_for_nfv_1_0_0
+ description: Demo example
+
+ metadata:
+ template_name: sample-tosca-vnfd
+
+ topology_template:
+ node_templates:
+    vdu1:
+      type: tosca.nodes.nfv.VDU.Tacker
+      capabilities:
+        nfv_compute:
+          properties:
+            disk_size: 1 GB
+            mem_size: 512 MB
+            num_cpus: 2
+      properties:
+        image: cirros-0.3.4-x86_64-uec
+        mgmt_driver: noop
+        availability_zone: nova
+
+    vdu1_cpu_usage_monitoring_policy:
+        type: tosca.policies.tacker.Monitoring
+        targets: [vdu1]
+        triggers:
+            resize_compute:
+                event_type:
+                    type: tosca.events.resource.utilization
+                    implementation: Ceilometer
+                metrics: cpu_util
+                condition: utilization greater_than 70%
+                    threshold: 70
+                    period: 60
+                    evaluations: 1
+                    method: average
+                    comparison: gt
+                action:
+                    resize: vdu1_scaling_policy
+
+In the above template, event type is described in [3]_ and used in [4]_.
+
+alarm_url will be created by webhook in Tacker as the following:
+
+.. code-block:: ini
+
+    v1.0/vnfs/<vnf-uuid>/<monitoring-policy-name>/<action-name>/<params>
+
+Where:
+monitoring-policy is the name of monitoring policy which is described in VNFD.
+
+action-name is the name of action which is described in VNFD as well. Multiple actions
+could be supported in monitoring policy. By changing action-name, the appropriate action
+will be invoked and then the alarm-based monitoring driver will process this action.
+In above example, action-name is 'vdu1_scaling_policy'. Whereby, when the monitoring driver
+receives triggers from Ceilometer, it will invoke scaling action and trigger scaling
+automatically. The detailed scaling mechanism using the monitoring driver is defined by
+the scaling spec [2]_.
+
+params contains the information related to alarm-actions. For example,
+it can be used for user authentication. Whereby, Webhook handler will generate
+randomly a key. This helps to make sure that we have a unique url for each alarm.
+Alarm url will be stored in Tacker db and only these unique callbacks will be
+used. The expression showm below is an example of alarm url which contains user authentication
+
+.. code-block:: ini
+
+    v1.0/vnfs/<vnf-uuid>/<monitoring-policy-name>/<action-name>/2w3r40-34c2d2
+
+Here, monitoring-policy-name is the name of  monitoring policy and threshold is a value
+which user wants to update.
+
+Based on the different types of callbacks, we have the appropriate actions as following:
+
+#1. if action is "Log", the monitoring driver will restore alarms into database.
+We have two options to display these information:
+
+ * Use CLI. The status of alarm could be defined in the existing CLI as the following:
+
+   tacker vnf-show [vnf-id]
+
+ * Modify Tacker-Horizon. Add "Alarms" tab to tacker-horizon where user can know what
+   is happening with VNF. This tab need to have some information like:
+   [VDU-ID]-----[Alarms (CPU, MEMORY, PORT,...)]--- [Status (HIGH, LOW, DELETED,..)].
+
+#2. If action is "Scaling", we can call API to trigger scaling. The detailed scaling
+    mechanism could be found in scaling spec [2]_.
+
+#3. If action is "respawn", this action is the same in case of ping driver.
+
+
+
+In order to translate the monitoring policy into HOT template, we can use heat ceilometer
+resource type. In this approach, Tacker will create OS::Ceilometer::Alarm resource by
+making use of either the same template used for scale-group or separate template.
+
+create a ceilometer resource as below with required alarm criteria:
+
+.. code-block:: ini
+
+    vdu_scale_up_alarm:
+
+        type: OS::Ceilometer::Alarm
+        properties:
+
+          meter_name: cpu_util
+          statistic: avg
+          period: 60
+          evaluation_periods: 1
+          threshold: 50
+          comparison_operator: gt
+          action:
+            - {get_attr: tacker_alarm_url}
+    vdu_scale_down_alarm:
+
+        type: OS::Ceilometer::Alarm
+        properties:
+
+          meter_name: cpu_util
+          statistic: avg
+          period: 600
+          evaluation_periods: 1
+          threshold: 15
+          comparison_operator: lt
+          action:
+             - {get_attr: tacker_alarm_url}
+
+
+Future considerations:
+----------------------
+
+1. Indeed, it is necessary so that the monitoring driver could monitor beyond VDU resources.
+CP resources should be monitored as well. Especially, it is necessary when we have SFC
+in the future. The reason is that each CP will need to assign to a Neutron port.
+SFC is created based on the connection of Neutron ports, therefore port monitoring is
+necessary for high availability in SFC. The below example show port monitoring which
+could be done by the alarm-based monitoring driver:
+
+
+.. code-block:: ini
+
+    tosca_definitions_version: tosca_simple_profile_for_nfv_1_0_0
+
+    description: Demo example
+
+    metadata:
+    template_name: sample-tosca-vnfd
+
+    topology_template:
+    node_templates:
+       VDU1:
+         type: tosca.nodes.nfv.VDU.Tacker
+         properties:
+           image: cirros-0.3.4-x86_64-uec
+           flavor: m1.tiny
+           availability_zone: nova
+           mgmt_driver: noop
+           config: |
+             param0: key1
+             param1: key2
+
+       CP1:
+        type: tosca.nodes.nfv.CP.Tacker
+        properties:
+           management: true
+           anti_spoofing_protection: false
+        requirements:
+           - virtualLink:
+              node: VL1
+           - virtualBinding:
+              node: VDU1
+       CP_monitoring_policy:
+         type: tosca.policies.tacker.Monitoring
+         targets: [CP1]
+          triggers:
+            port_monitoring:
+              event:
+                type: tosca.events.resource.utilization
+                implementation: Ceilometer
+                metrics: port_bandwidth
+                condition: load greater_than 80%
+                period: 60
+                evaluations: 1
+                statistics: average
+                action:
+                 trigger: vnffg1-ha-policy
+
+2. In the future, Tacker users could want to update monitoring parameters like threshold.
+The problem is when VNF instances sustain heavy load and CPU usage reaches to the
+pre-defined threshold value. Alarms will be triggered to Tacker, but actually it not really
+necessary because the VNF instances still have the ability to work well. Tacker users now want
+to increase the threshold value. This could be done as the following:
+
+.. code-block:: ini
+
+    tacker vnf-update --vnf-id <vnf-id> --monitoring-policy-name <monitoring policy>
+                                        --threshold [threshold-value]
+
+NOTE: The threshold need to be be parameterized in the template.
+
+Alternatives
+------------
+
+None
+
+Data model impact
+------------------
+
+None
+
+REST API impact
+------------------
+
+**POST on  /v1.0/vnfs/<vnf-uuid>/<monitoring-policy>/<action-name>/<params>**
+
+
+Security
+------------------
+
+Need security between OpenStack Ceilometer and Tacker [5]_.
+
+Notifications impact
+--------------------
+Ceilometer triggers alarms to the alarm-based monitoring driver in Tacker.
+
+Other end user impact
+---------------------
+
+None
+
+Performance impact
+------------------
+
+None
+
+Other deployer impact
+---------------------
+
+None
+
+Developer impact
+------------------
+
+None
+
+Implementation
+===============
+
+Assignee(s)
+------------------
+
+Primary assignee:
+  Tung Doan <tungdoan@dcn.ssu.ac.kr>
+
+  Kanagaraj Manickam <mkr1481@gmail.com>
+
+Work Items
+------------------
+
+#. Tosca monitoring elment model to Heat ceilometer monitoring element
+   translation
+#. Enable the new convention in vnfd for mentioning to the alarm based
+   monitoring parameters
+#. create a sample TOSCA template
+#. Create a new monitoring driver for alarm based monitoring with configurable
+   parameter to use either of the approach mentioned above.
+#. Enable to log Ceilometer alarms and report to users.
+#. Enhance the horizon to show the live monitoring parameters.
+
+
+Dependencies
+============
+In case we use heat ceilometer to describe the monitoring policy, make sure
+that monitoring strategy is supported by Ceilometer.
+Testing
+========
+
+1. Monitoring in case of high CPU usage
+
+- Create vnfd from the alarm-based VNFD template
+- Create vnf from the vnfd
+- Stress VM which VNF is running on. The purpose is to make CPU usage reach
+  threshold.
+- Use CLI/Horizon to show alarms/events related to VNF VM.
+
+
+Reference
+=========
+
+.. [1] http://docs.oasis-open.org/tosca/tosca-nfv/v1.0/tosca-nfv-v1.0.pdf
+.. [2] https://review.openstack.org/#/c/318577/
+.. [3] https://www.oasis-open.org/committees/download.php/56812/2015-10-27%20OpenStack%20Tokyo%20-%20Senlin-TOSCA%20vBrownBag-final.pdf
+.. [4] https://github.com/openstack/tosca-parser/blob/master/toscaparser/tests/data/policies/tosca_policy_template.yaml#L60
+.. [5] https://github.com/openstack/ceilometer/blob/stable/liberty/ceilometer/alarm/notifier/rest.py#L84