Workflow Error Analysis

Workflow error analysis specification to add new functionality to mistral. Change-Id: I64a64cc421e87eb6f92787314b7f05e3c7ab94b1
2017-03-08 11:15:05 -05:00 · 2017-03-08 11:15:05 -05:00 · 9c49c01a41
parent 352669e570
commit 9c49c01a41
1 changed files with 197 additions and 0 deletions
--- a/specs/pike/approved/workflow-error-analysis.rst
+++ b/specs/pike/approved/workflow-error-analysis.rst
@ -0,0 +1,197 @@
+..
+ This work is licensed under a Creative Commons Attribution 3.0 Unported
+ License.
+
+ http://creativecommons.org/licenses/by/3.0/legalcode
+
+=======================
+Workflow Error Analysis
+=======================
+
+Include the URL of your launchpad blueprint:
+
+https://blueprints.launchpad.net/mistral/+spec/mistral-error-analysis
+
+This specification will outline the need for error analysis command or method
+within Mistral.
+
+
+Problem description
+===================
+
+Currently there is not a central way or single command which can be issued to
+determine the root cause of an error that has occurred upon failure of a
+mistral workflow. The proposed functionality would give the developer and
+operator a method which can help debug errors which may stem from syntax errors
+within the workbook or reveal actual bugs, by reporting the necessary
+information from the execution to the client.
+
+
+Use Cases
+---------
+
+The main uses for this feature would involve post workflow runs which involve
+but not limited to OpenStack post deployment and workflow run investigation.
+
+
+Proposed change
+===============
+
+Provide a command line interface and public API which the operator can use to
+trigger the analysis of errors.
+
+The table below is a draft example and subject to change once reviews are
+complete.
+
+* 'mistral report-generate <workflow id>'
+
+-------------------------------------------------------------------+
+|Field                 |   Value                                    |
+======================+============================================+
+|Workflow_name         | my_workflow                                |
+----------------------+--------------------------------------------+
+|Workflow_ID           | xxxxx-xxxx-xxx-xxxxxxx                     |
+----------------------+--------------------------------------------+
+|Workflow_State        | [Error | Success ]                         |
+----------------------+--------------------------------------------+
+|\**Workflow_State_info| \***<task_name: cause>                     |
+----------------------+--------------------------------------------+
+|Task_name             | my_task                                    |
+----------------------+--------------------------------------------+
+|Task_ID               | xxxxx-xxxx-xxx-xxxxxxx                     |
+----------------------+--------------------------------------------+
+|Task_State            | [Error | Success]                          |
+----------------------+--------------------------------------------+
+|Task_State_info       | <cause>                                    |
+-------------------------------------------------------------------+
+
+* 'mistral report-generate --include-trace <workflow id>'
+
+---------------------------------------------------------------------+
+|Field                   |   Value                                    |
+========================+============================================+
+|Workflow_name           | my_workflow                                |
+------------------------+--------------------------------------------+
+|Workflow_ID             | xxxxx-xxxx-xxx-xxxxxxx                     |
+------------------------+--------------------------------------------+
+|Workflow_State          | [Error | Success ]                         |
+------------------------+--------------------------------------------+
+|\**Workflow_State_info  | \***<task_name: cause>                     |
+------------------------+--------------------------------------------+
+|Task_name               | my_task                                    |
+------------------------+--------------------------------------------+
+|Task_ID                 | xxxxx-xxxx-xxx-xxxxxxx                     |
+------------------------+--------------------------------------------+
+|Task_State              | [Error | Success]                          |
+------------------------+--------------------------------------------+
+|Task_State_info         | <cause>                                    |
+------------------------+--------------------------------------------+
+|\****Workflow_traceback |    my_workflow ERROR                       |
+|                        |      task_2 ERROR                          |
+|                        |        workflow: my_other_workflow         |
+|                        |          task_b: Error                     |
+|                        |            action: somethingbroken         |
+---------------------------------------------------------------------+
+
+\** State info would report <None> in the case where no error is generated.
+
+\*** Task name and cause, the cause would be evaluated from an enum value.
+
+\**** Workflow traceback would report a more verbose output of errors this
+output could be controlled with a cli switch --include-trace. Without the
+flag, the operator would just receive the enum value with a brief description.
+
+example:
+ * E101 -- task <task name> contains syntax error
+ * E120 -- task <task name> missing input
+ * E201 -- action failed to complete
+
+
+
+
+Alternatives
+------------
+
+The current method of determining a error would involve looking through the
+workflow execution id list to determine what is in an error state.
+
+* 'mistral task-list <workflow execution id>' and see what are in ERROR
+* for each failed task execution run:
+   - 'mistral action-execution-list' and see what are in ERROR
+* for each failed action run:
+   - 'mistral action-execution-get-output <id>' to see the description of the
+     error
+* for each failed task execution of type Workflow, find the sub-workflow
+  execution ID, and go back to the first bullet.
+
+Data model impact
+-----------------
+
+None.
+
+REST API impact
+---------------
+
+This is still in discussion.
+
+* A separate REST API endpoint to build reports on the current status of
+  execution and/or error analysis
+
+End user impact
+---------------
+
+The end user would have a newly documented method/function to call to start the
+error analysis.
+
+
+Performance Impact
+------------------
+
+If this is implemented on the server side the performance impact should be
+greatly reduced as the need for ReST calls would be drastically reduced.
+
+Deployer impact
+---------------
+
+This would provide additional information to help the operator correct errors
+in the deployment, or it will provide enough information which can be attached
+to a bug report to help development correct the offending source.
+
+
+Implementation
+==============
+
+Assignee(s)
+-----------
+
+Primary assignee:
+  toure
+
+Other contributors:
+  rakhmerov
+
+Work Items
+----------
+
+* Create new Mistral engine error analysis functionality.
+* Update python-mistralclient to include new API changes.
+* Update documentation to explain usage.
+* Create CI scripts/jobs to mimic error in workflows.
+
+
+Dependencies
+============
+
+None.
+
+Testing
+=======
+
+Functional tests that imitate workflow failures and make sure that we
+get the right report.
+
+
+References
+==========
+
+None.