From 9c49c01a418f6347903b4d7ab096008a79d9fecb Mon Sep 17 00:00:00 2001
From: Toure Dunnon <toure@redhat.com>
Date: Wed, 8 Mar 2017 11:15:05 -0500
Subject: [PATCH] Workflow Error Analysis

Workflow error analysis specification to add new functionality
to mistral.

Change-Id: I64a64cc421e87eb6f92787314b7f05e3c7ab94b1
---
 .../pike/approved/workflow-error-analysis.rst | 197 ++++++++++++++++++
 1 file changed, 197 insertions(+)
 create mode 100644 specs/pike/approved/workflow-error-analysis.rst
diff --git a/specs/pike/approved/workflow-error-analysis.rst b/specs/pike/approved/workflow-error-analysis.rst
new file mode 100644
index 0000000..4c04250
--- /dev/null
+++ b/specs/pike/approved/workflow-error-analysis.rst
@@ -0,0 +1,197 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+=======================
+Workflow Error Analysis
+=======================
+
+Include the URL of your launchpad blueprint:
+
+https://blueprints.launchpad.net/mistral/+spec/mistral-error-analysis
+
+This specification will outline the need for error analysis command or method
+within Mistral.
+
+
+Problem description
+===================
+
+Currently there is not a central way or single command which can be issued to
+determine the root cause of an error that has occurred upon failure of a
+mistral workflow. The proposed functionality would give the developer and
+operator a method which can help debug errors which may stem from syntax errors
+within the workbook or reveal actual bugs, by reporting the necessary
+information from the execution to the client.
+
+
+Use Cases
+---------
+
+The main uses for this feature would involve post workflow runs which involve
+but not limited to OpenStack post deployment and workflow run investigation.
+
+
+Proposed change
+===============
+
+Provide a command line interface and public API which the operator can use to
+trigger the analysis of errors.
+
+The table below is a draft example and subject to change once reviews are
+complete.
+
+* 'mistral report-generate <workflow id>'
+
++-------------------------------------------------------------------+
+|Field                 |   Value                                    |
++======================+============================================+
+|Workflow_name         | my_workflow                                |
++----------------------+--------------------------------------------+
+|Workflow_ID           | xxxxx-xxxx-xxx-xxxxxxx                     |
++----------------------+--------------------------------------------+
+|Workflow_State        | [Error | Success ]                         |
++----------------------+--------------------------------------------+
+|\**Workflow_State_info| \***<task_name: cause>                     |
++----------------------+--------------------------------------------+
+|Task_name             | my_task                                    |
++----------------------+--------------------------------------------+
+|Task_ID               | xxxxx-xxxx-xxx-xxxxxxx                     |
++----------------------+--------------------------------------------+
+|Task_State            | [Error | Success]                          |
++----------------------+--------------------------------------------+
+|Task_State_info       | <cause>                                    |
++-------------------------------------------------------------------+
+
+* 'mistral report-generate --include-trace <workflow id>'
+
++---------------------------------------------------------------------+
+|Field                   |   Value                                    |
++========================+============================================+
+|Workflow_name           | my_workflow                                |
++------------------------+--------------------------------------------+
+|Workflow_ID             | xxxxx-xxxx-xxx-xxxxxxx                     |
++------------------------+--------------------------------------------+
+|Workflow_State          | [Error | Success ]                         |
++------------------------+--------------------------------------------+
+|\**Workflow_State_info  | \***<task_name: cause>                     |
++------------------------+--------------------------------------------+
+|Task_name               | my_task                                    |
++------------------------+--------------------------------------------+
+|Task_ID                 | xxxxx-xxxx-xxx-xxxxxxx                     |
++------------------------+--------------------------------------------+
+|Task_State              | [Error | Success]                          |
++------------------------+--------------------------------------------+
+|Task_State_info         | <cause>                                    |
++------------------------+--------------------------------------------+
+|\****Workflow_traceback |    my_workflow ERROR                       |
+|                        |      task_2 ERROR                          |
+|                        |        workflow: my_other_workflow         |
+|                        |          task_b: Error                     |
+|                        |            action: somethingbroken         |
++---------------------------------------------------------------------+
+
+\** State info would report <None> in the case where no error is generated.
+
+\*** Task name and cause, the cause would be evaluated from an enum value.
+
+\**** Workflow traceback would report a more verbose output of errors this
+output could be controlled with a cli switch --include-trace. Without the
+flag, the operator would just receive the enum value with a brief description.
+
+example:
+ * E101 -- task <task name> contains syntax error
+ * E120 -- task <task name> missing input
+ * E201 -- action failed to complete
+
+
+
+
+Alternatives
+------------
+
+The current method of determining a error would involve looking through the
+workflow execution id list to determine what is in an error state.
+
+* 'mistral task-list <workflow execution id>' and see what are in ERROR
+* for each failed task execution run:
+   - 'mistral action-execution-list' and see what are in ERROR
+* for each failed action run:
+   - 'mistral action-execution-get-output <id>' to see the description of the
+     error
+* for each failed task execution of type Workflow, find the sub-workflow
+  execution ID, and go back to the first bullet.
+
+Data model impact
+-----------------
+
+None.
+
+REST API impact
+---------------
+
+This is still in discussion.
+
+* A separate REST API endpoint to build reports on the current status of
+  execution and/or error analysis
+
+End user impact
+---------------
+
+The end user would have a newly documented method/function to call to start the
+error analysis.
+
+
+Performance Impact
+------------------
+
+If this is implemented on the server side the performance impact should be
+greatly reduced as the need for ReST calls would be drastically reduced.
+
+Deployer impact
+---------------
+
+This would provide additional information to help the operator correct errors
+in the deployment, or it will provide enough information which can be attached
+to a bug report to help development correct the offending source.
+
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  toure
+
+Other contributors:
+  rakhmerov
+
+Work Items
+----------
+
+* Create new Mistral engine error analysis functionality.
+* Update python-mistralclient to include new API changes.
+* Update documentation to explain usage.
+* Create CI scripts/jobs to mimic error in workflows.
+
+
+Dependencies
+============
+
+None.
+
+Testing
+=======
+
+Functional tests that imitate workflow failures and make sure that we
+get the right report.
+
+
+References
+==========
+
+None.