mistral-specs/specs/pike/approved/workflow-error-analysis.rst

..
 This work is licensed under a Creative Commons Attribution 3.0 Unported
 License.

 http://creativecommons.org/licenses/by/3.0/legalcode

=======================
Workflow Error Analysis
=======================

Include the URL of your launchpad blueprint:

https://blueprints.launchpad.net/mistral/+spec/mistral-error-analysis

This specification will outline the need for error analysis command or method
within Mistral.


Problem description
===================

Currently there is not a central way or single command which can be issued to
determine the root cause of an error that has occurred upon failure of a
mistral workflow. The proposed functionality would give the developer and
operator a method which can help debug errors which may stem from syntax errors
within the workbook or reveal actual bugs, by reporting the necessary
information from the execution to the client.


Use Cases
---------

The main uses for this feature would involve post workflow runs which involve
but not limited to OpenStack post deployment and workflow run investigation.


Proposed change
===============

Provide a command line interface and public API which the operator can use to
trigger the analysis of errors.

The table below is a draft example and subject to change once reviews are
complete.

* 'mistral report-generate <workflow id>'

+-------------------------------------------------------------------+
|Field                 |   Value                                    |
+======================+============================================+
|Workflow_name         | my_workflow                                |
+----------------------+--------------------------------------------+
|Workflow_ID           | xxxxx-xxxx-xxx-xxxxxxx                     |
+----------------------+--------------------------------------------+
|Workflow_State        | [Error | Success ]                         |
+----------------------+--------------------------------------------+
|\**Workflow_State_info| \***<task_name: cause>                     |
+----------------------+--------------------------------------------+
|Task_name             | my_task                                    |
+----------------------+--------------------------------------------+
|Task_ID               | xxxxx-xxxx-xxx-xxxxxxx                     |
+----------------------+--------------------------------------------+
|Task_State            | [Error | Success]                          |
+----------------------+--------------------------------------------+
|Task_State_info       | <cause>                                    |
+-------------------------------------------------------------------+

* 'mistral report-generate --include-trace <workflow id>'

+---------------------------------------------------------------------+
|Field                   |   Value                                    |
+========================+============================================+
|Workflow_name           | my_workflow                                |
+------------------------+--------------------------------------------+
|Workflow_ID             | xxxxx-xxxx-xxx-xxxxxxx                     |
+------------------------+--------------------------------------------+
|Workflow_State          | [Error | Success ]                         |
+------------------------+--------------------------------------------+
|\**Workflow_State_info  | \***<task_name: cause>                     |
+------------------------+--------------------------------------------+
|Task_name               | my_task                                    |
+------------------------+--------------------------------------------+
|Task_ID                 | xxxxx-xxxx-xxx-xxxxxxx                     |
+------------------------+--------------------------------------------+
|Task_State              | [Error | Success]                          |
+------------------------+--------------------------------------------+
|Task_State_info         | <cause>                                    |
+------------------------+--------------------------------------------+
|\****Workflow_traceback |    my_workflow ERROR                       |
|                        |      task_2 ERROR                          |
|                        |        workflow: my_other_workflow         |
|                        |          task_b: Error                     |
|                        |            action: somethingbroken         |
+---------------------------------------------------------------------+

\** State info would report <None> in the case where no error is generated.

\*** Task name and cause, the cause would be evaluated from an enum value.

\**** Workflow traceback would report a more verbose output of errors this
output could be controlled with a cli switch --include-trace. Without the
flag, the operator would just receive the enum value with a brief description.

example:
 * E101 -- task <task name> contains syntax error
 * E120 -- task <task name> missing input
 * E201 -- action failed to complete


Alternatives
------------

The current method of determining a error would involve looking through the
workflow execution id list to determine what is in an error state.

* 'mistral task-list <workflow execution id>' and see what are in ERROR
* for each failed task execution run:
   - 'mistral action-execution-list' and see what are in ERROR
* for each failed action run:
   - 'mistral action-execution-get-output <id>' to see the description of the
     error
* for each failed task execution of type Workflow, find the sub-workflow
  execution ID, and go back to the first bullet.

Data model impact
-----------------

None.

REST API impact
---------------

This is still in discussion.

* A separate REST API endpoint to build reports on the current status of
  execution and/or error analysis

End user impact
---------------

The end user would have a newly documented method/function to call to start the
error analysis.


Performance Impact
------------------

If this is implemented on the server side the performance impact should be
greatly reduced as the need for ReST calls would be drastically reduced.

Deployer impact
---------------

This would provide additional information to help the operator correct errors
in the deployment, or it will provide enough information which can be attached
to a bug report to help development correct the offending source.


Implementation
==============

Assignee(s)
-----------

Primary assignee:
  toure

Other contributors:
  rakhmerov

Work Items
----------

* Create new Mistral engine error analysis functionality.
* Update python-mistralclient to include new API changes.
* Update documentation to explain usage.
* Create CI scripts/jobs to mimic error in workflows.


Dependencies
============

None.

Testing
=======

Functional tests that imitate workflow failures and make sure that we
get the right report.


References
==========

None.