Commit Graph

41 Commits

Author SHA1 Message Date
Q.hongtao 4bc6162515 Remove six library
Remove six-library Replace the following items with Python 3 style code.
- six.interger_types
- six.itervalues
- six.text_type
- six.string_types
- six.StringIO
- six.next
- six.b
- six.PY3

Change-Id: I299c90d5cbeb41be0132691265b8dcbeae65520e
2020-09-23 10:27:12 +08:00
Renat Akhmerov 7dec19ae19 Fix calculating task execution result for "with-items"
* The logic of calculating a task result in case of "with-items" was
  overcomplicated and broke encapsulation of a "with-items" task.
  This patch makes it simpler, so that the method doesn't need to
  peek into the internals of a "with-items" task (e.g. runtime_context).

Change-Id: I036193cbae15d7f3c3414b123525ceafa91fdeb1
2020-06-02 16:28:42 +07:00
Renat Akhmerov ddf9577785 Refactor task policies
* The purpose of this patch is to improve encapsulation of task
  execution state management. We already have the class Task
  (engine.tasks.Task) that represents an engine task and it is
  supposed to be responsible for everything related to managing
  persistent state of the corresponding task execution object.
  However, we break this encapsulation in many places and various
  modules manipulate with task execution state directly. This fact
  leads to what is called "spagetty code" because important
  things are often spread out across the system and it's hard to
  maintain. It also leads to lots of duplications. So this patch
  refactors policies so that they manipulate with a task execution
  through an instance of Task which hides low level aspects.

Change-Id: Ie728bf950c4244db3fec0f3dadd5e195ad42081d
2020-06-01 14:05:49 +07:00
Zuul 1c7e242975 Merge "Reformat rerun logic for tasks with join" 2019-11-11 14:42:37 +00:00
Oleg Ovcharuk 4e926a1f13 Fail-on policy
Fail-on policy allows to fail success tasks by condition. It is useful
in cases we have to fail task if its result is unacceptable and it makes
workflow definition more readable.

Change-Id: I57b4f3d1533982d3b9b7063925f8d70f044aefea
Implements: blueprint fail-on-policy
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
2019-08-11 07:21:57 +00:00
Oleg Ovcharuk bdbfb82301 Reformat rerun logic for tasks with join
Change-Id: I055bc2d5a4bdf839f1e262e49563616d8deff92f
Closes-Bug: #1833262
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
2019-07-11 17:11:13 +03:00
Renat Akhmerov 43d23c0e25 Create needed infrastructure to switch scheduler implementations
* After this patch we can switch scheduler implementations in the
  configuration. All functionality related to scheduling jobs is
  now expressed vi the internal API classes Scheduler and
  SchedulerJob. Patch also adds another entry point into setup.cfg
  where we can register a new scheduler implementation.
* The new scheduler (which is now called DefaultScheduler) still
  should be considered experimental and requires a lot of testing
  and optimisations.
* Fixed and refactored "with-items" tests. Before the patch they
  were breaking the "black box" testing principle and relied on
  on some either purely implementation or volatile data (e.g.
  checks of the internal 'capacity' property)
* Fixed all other relevant tests.

Change-Id: I340f886615d416a1db08e4516f825d200f76860d
2019-06-24 11:25:57 +03:00
Oleg Ovcharuk 475b82c532 Delete delayed calls for deleted entities
Delayed calls for nonexistent entities should not fail; they should do
nothing and be deleted in normal way.

Change-Id: I1b818d671468b95ce8ae06416b57fd4a22cc6eb2
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
2019-06-10 11:14:31 +03:00
Oleg Ovcharuk 88e5af4148 Reformat retry logic for tasks with join
Change-Id: Ie31f08a20265a59bcaa63dd6480834eb6918f349
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
2019-04-26 16:21:49 +03:00
Oleg Ovcharuk 99ebc1b5f7 Retries shouldn't execute if join task failed because of child task
Change-Id: Ideaa9938497f74335af633044cb6e98fbb1522d8
Closes-Bug: #1819418
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
2019-03-11 22:29:08 +03:00
Renat Akhmerov 80a1bed67b Simplify workflow and join completion logic
* action_queue module is replaced with the more generic
  post_tx_queue module that allows to register operations that must
  run after the main DB transaction associated with processing a
  workflow event such as completing action.
* Instead of calling workflow completion check from all places
  where task may possibly complete, Mistral now registers a post
  transactional operation that runs after the main DB transaction
  (to make sure at least one needed consistent DB read) right
  inside the task completion logic. It reduces clutter significantly.
* Workflow completion check is now registered only if the just
  completed task may lead to workflow completion, i.e. if it's the
  last one in a workflow branch.
* Join now checks delayed calls to reduce a number of join
  completion checks created with scheduler and also uses post
  transactional queue for that.

Closes-Bug: #1801872
Change-Id: I90741d4121c48c42606dfa850cfe824557b095d0
2018-11-09 14:17:20 +07:00
Renat Akhmerov 3d7acd3957 Improve workflow completion logic by removing periodic jobs
* Workflow completion algorithm use periodic scheduled jobs to
  poll DB and determine when a workflow is finished. The problem
  with this approach is that if Mistral runs another iteration
  of such job too soon then running such jobs will create a big
  load on the system. If too late, then a workflow may be in
  RUNNING state for too long after all its tasks are completed.
  The current implementation tries to predict a delay with which
  the next job should run, based on a number of incompleted tasks.
  This approach was initially taken because we switched to a
  non-blocking transactional model (previously we locked the entire
  workflow execution graph in order to change a state of anything)
  and in this architecture, when we have parallel branches, i.e.
  parallel DB transactions, we can't make a consistent read from
  DB from neither of these transactions to make a reliable decision
  about whether the workflow is completed or not. Using periodic
  jobs was a solution. However, this approach has been proven to
  work unreliably because such a prediction about delay before the
  next job iteration doesn't work well on all variety of use cases
  that we have.
  This patch removes using periodic jobs in favor of using the
  "two transactions" approach when in the first transaction we
  handle action completion event (and task completion if it causes
  it) and in the second transaction, if a task is completed, we
  check if the workflow is completed. This approach guarantees
  that at least one of the "second" transactions in parallel
  branches will make needed consistent read from DB (i.e. will
  see the actuall state of all needed objects) to make the right
  decision.

Closes-Bug: #1799382
Change-Id: I2333507503b3b8226c184beb0bd783e1dcfa397f
2018-11-07 04:00:04 +00:00
Vitalii Solodilov 78b542c4c5 Refresh a number of retry a task when task was rerun
Change-Id: If0a8219bb54ee0d01084dbaf5c9ed5b2041c2bc4
Closes-Bug: #1772265
Signed-off-by: Vitalii Solodilov <mcdkr@yandex.ru>
2018-06-24 05:34:33 +00:00
Renat Akhmerov 6b7b58ed6c Add '__task_execution' structure to task execution context on the fly
* Previously we stored the data structure describing the current
  task execution (id and name) in the inbound task execution context
  directly so that it'd be saved to DB. This was needed to evaluate
  YAQL/Jinja function task() without parameters properly. However,
  it's not needed, we can just build a context view on the fly
  just before evaluating an expression.

Change-Id: If523039446ab3e2ccc9542617de2a170168f6e20
Closes-Bug: #1764704
2018-04-17 18:13:35 +07:00
Renat Akhmerov 9726189c43 Fix 'pause' engine command
* Commands going after 'pause' in 'on-XXX' clauses
  were never processed after workflow resume. The
  solution is to introduce a notion of a workflow
  execution backlog where we can save these commands
  in a serialized form so that the engine dispatcher
  could see and process them after resume.
* Other minor changes

Change-Id: I963b5660daf528d1caf6a785311de4fb272cafd0
Closes-Bug: #1714054
2018-03-24 11:10:08 +00:00
Vitalii Solodilov e8d6c382c5 Correction of comments for the #539039 review
Change-Id: Id76c20d2da20c362ca94727e5c5dea2e19ed6b6d
Signed-off-by: Vitalii Solodilov <mcdkr@yandex.ru>
2018-02-12 04:09:12 +04:00
Vitalii Solodilov b79f91e9ec Propagated a task timeout to a action execution
It shall be possible to specify timeout for Mistral actions in order
to cancel some long-performed action so that to provide predictable
execution time for client service.
Currently Mistral allows configure timeout on task and automatically
changes task status to error. However mistral don't interrupt action
execution.
We need Mistral to terminate timed out action execution, because there
might be the following issues:
* several the same action executions can run at the same time breaking
data consistency
* stale action executions may lead to the massive resources
consumption (memory, cpu..)

Change-Id: I2a960110663627a54b8150917fd01eec68e8933d
Signed-off-by: Vitalii Solodilov <mcdkr@yandex.ru>
2018-01-31 17:40:52 +04:00
Vitalii Solodilov 2ffbc412c4 Fix break_on calculation in before_task_start
RetryPolicy: prevent break_on from evaluation before task execution.
Sometimes expressions in break_on require existence of task execution
(see example in updated test). But if break_on is evaluated before
first execution of task, it may end up with exception.

Change-Id: Ia836c0330dbed62954d79059df1bef3758f7c5e5
Signed-off-by: Anton Kazakov <ton.kazakov@gmail.com>
Signed-off-by: Vitalii Solodilov <mcdkr@yandex.ru>
2018-01-23 13:21:17 +00:00
Andras Kovi 7184596443 Gracefully handle DB disconnected connect errors
When the DB is disconnected, the Mistral API should retry the
operation for a predefined amount of time at least for GET
type requests as this error is highly probable to be caused
by temporary failures. The handlind of Operational errors
was already implemented.

Change-Id: I3adb94dd695aeaa40d37956beae088d5618422c3
2017-12-28 16:50:19 +07:00
Mike Fedosin 4283998694 Fix inconsistencies when setting policy values
This patch fixes inconsistencies between two ways of setting policy
values: as a variable and directly in workflow definition as a constant.

Inconsistency #1:
For policies 'wait-before', 'wait-after', 'retry', 'timeout', 'concurrency'
there is a difference on how they are executed if value is 0.
If the value is hardcoded in workflow, the policies are omitted [1],
but if a user defines them as a variable, then the policies are applied.

Inconsistency #2:
Policy values in workflow definitions cannot be negative numbers (validated
by schema) [2], but if a user sets them as variables it's okay [3]. It
happens because the schemas are different for both cases.

[1] https://github.com/openstack/mistral/blob/master/mistral/engine/policies.py#L83
[2] https://github.com/openstack/mistral/blob/master/mistral/lang/v2/policies.py#L27
[3] https://github.com/openstack/mistral/blob/master/mistral/engine/policies.py#L161

Change-Id: I660ec2fe00e9f524292957560548447e517332fc
Closes-bug: #1731100
2017-12-13 11:13:12 +00:00
Renat Akhmerov 397a562788 Fix deletion of delayed calls
* Deletion of delayed calls is incorrect. A list of delayed calls
  gets deleted within one DB transaction and if at least one object
  is not deleted because of a DBDeadlock exception (on MySQL) then
  the entire transaction fails and, what's more important, the
  exception is swallowed by the try-finally block without reraising
  it so that it could be handled by the "retry_on_deadlock" decorator.
  This patch fixes this problem by reraising the initial exception.
* Added "retry_on_deadlock" decorator to all methods methods that
  open DB transactions and where we have a risk of hitting a deadlock.

Change-Id: I816c8c2a940e38cf1698d76e1019671249238598
2017-10-23 13:56:57 +07:00
Winson Chan 16b54d8766 Cascade pause from pause-before in subworkflows
When a workflow is paused by pause-before, the state will cascade down
to other subworkflows and up to parent workflow.

Change-Id: Ied178fe08f8308455bf05b3168635a3b69799cec
Closes-Bug: #1700196
2017-08-07 21:02:15 +00:00
Jenkins cbe881ef7e Merge "Updated the retries_remain statement" 2016-11-29 09:07:34 +00:00
Sharat Sharma 2842415f09 Updated the retries_remain statement
If the task is specified with number of retries as 1, then it is
not retried on error. So, this patch changes the statement of
retries_remain to consider 1 as a value for retry.

Change-Id: Ib0ede7a119bb57108141e50722928d53dd904d5f
Closes-Bug: #1631140
2016-11-23 11:21:26 +00:00
Winson Chan ab2c23acc3 Add cancelled state to action executions
Allow action executions to be cancelled, specifically for async actions, and
handle the cancellation for task and with-items task appropriately. For
with-items tasks, if one of the action executions is cancelled, then the
task is cancelled. Previously, if there is a mix of error and cancels, the
task is marked with error. But this leads to on-complete being processed
which shouldn't since the with-items task is incomplete due to partially
cancelled.

Change-Id: Iafc2263735f75fe06ae5f03a885cda8f965a7cc4
Implements: blueprint mistral-cancel-state
2016-11-16 19:34:04 +07:00
Renat Akhmerov 6b229d360a Run actions without Scheduer
Change-Id: If6bc0d62851f0ef73ce0c56f770883ddfd4a40bf
Implements: blueprint mistral-run-actions-without-scheduler
2016-11-03 17:29:30 +07:00
Renat Akhmerov a4287a5e63 Avoid storing workflow input in task inbound context
* 'in_context' field of task executions changed its semantics to
  not store workflow input and other data stored in the initial
  workflow context such as openstack security context and workflow
  variables, therefore task executions occupy less space in DB
* Introduced ContextView class to avoid having to merge
  dictionaries every time we need to evaluate YAQL functions
  against some context. This class is a composite structure
  built on top of regular dictionaries that provides priority
  based lookup algorithm over these dictionaries. For example,
  if we need to evaluate an expression against a task inbound
  context we just need to build a context view including
  task 'in_context', workflow initial context (wf_ex.context)
  and workflow input dictionary (wf_ex.input). Using this
  class is a significant performance boost
* Fixed unit tests
* Other minor changes

Change-Id: I7fe90533e260e7d78818b69a087fb5175b9d5199
2016-09-21 13:32:44 +03:00
Renat Akhmerov c7aa89e03d Splitting executions into different tables
* Having different types of execution objects in different
  tables will give less contention on DB tables and hence better
  performance so DB schema was changed accordingly
* Fixed all unit tests and places in the code where we assumed
  polymorphic access to execution objects
* Other minor fixes

TODO(in upcoming patches):
* DB migration script

Change-Id: Ibc8408e12dd85e143302d7fdddace32954551ac5
2016-08-02 11:47:25 +07:00
Renat Akhmerov 633eb0fe6d Add proper error handling for task continuation
* In case if task needs to be continued, e.g. in case of 'wait-before'
  policy which inserts a delay into normal task execution flow (between
  creation of task policy and scheduling actions), possible exceptions
  also need to be handled properly (move task and worklfow into ERROR).
  This patch adds error handling and the test to check this.
* Other minor changes related to addressing a few TODO's across engine
  code.

Change-Id: I525f193a149e3b0341aa8d0ffa0858ded96ba94f
2016-07-08 15:08:51 +07:00
Renat Akhmerov 3641b46d15 Remove unnecessary database transaction from Scheduler
Change-Id: I08f0fcd67e0cd0e40e76ed6cfc7bb214096a2c16
Closes-Bug: #1484521
2016-06-01 08:21:39 +00:00
Renat Akhmerov 816bfd9dcc Refactor Mistral Engine
* Introduced class hierarchies Task and Action used by Mistral engine.
  Note: Action here is a different than executor Action and represents
  rather actions of different types: regular python action, ad-hoc
  action and workflow action (since for task action and workflow are
  polymorphic)
* Refactored task_handler.py and action_handler.py with Task and Action
  hierarchies
* Rebuilt a chain call so that the entire action processing would look
  like a chain of calls Action -> Task -> Workflow where each level
  knows only about the next level and can influence it (e.g. if adhoc
  action has failed due to YAQL error in 'output' transformer action
  itself fails its task)
* Refactored policies according to new object model
* Fixed some of the tests to match the idea of having two types of
  exceptions, MistralException and MistralError, where the latter
  is considered either a harsh environmental problem or a logical
  issue in the system itself so that it must not be handled anywhere
  in the code

TODO(in subsequent patches):
 * Refactor WithItemsTask w/o using with_items.py
 * Remove DB transaction in Scheduler when making a delayed call,
   helper policy methods like 'continue_workflow'
 * Refactor policies test so that workflow definitions live right
   in test methods
 * Refactor workflow_handler with Workflow abstraction
 * Get rid of RunExistingTask workflow command, it should be just
   one command with various properties
 * Refactor resume and rerun with Task abstraction (same way as
   other methods, e.g. on_action_complete())
 * Add error handling to all required places such as
   task_handler.continue_task()
 * More tests for error handling

P.S. This patch is very big but it was nearly impossible to split
it into multiple smaller patches just because how entangled everything
was in Mistral Engine.

Partially implements: blueprint mistral-engine-error-handling
Implements: blueprint mistral-action-result-processing-pipeline
Implements: blueprint mistral-refactor-task-handler
Closes-Bug: #1568909

Change-Id: I0668e695c60dde31efc690563fc891387d44d6ba
2016-05-31 14:08:36 +00:00
Limor Stotland c4a614273d If task fails on timeout - there is no clear message of failure
* Adding state_info to fail_task_if_incomplete solve it
 * Unskip test TaskDefaultsReverseWorkflowEngineTest#test_task_defaults_timeout_policy

Closes-Bug: #1527976
Change-Id: I1f44f648ea71d2dcf8bdca77e6bcca0023963be0
2015-12-23 13:29:22 +00:00
hparekh cc13c4ecd7 Added Unit test when policy input is variable.
Also code has been changed for py34 compatibility.

Change-Id: I7e7675265807dbcf4637bbb56c927a2da86e7c5d
2015-11-30 08:50:07 +00:00
hparekh 4109f1cc9a Comparision opeartor has been changed.
While creating policy '>' operator is used,
due to which in py34 exception is occurred when
variable is provided from input parameter.

Exception was
TypeError: unorderable types: str() > int()

TODO: Add more unit test to catch such scenarios.

Partially-Implements: blueprint mistral-py3
Change-Id: I2c652812ae4a04cd7610f2a6684da76c582a4e32
2015-11-03 10:22:46 +05:30
Winson Chan 61ec312160 Remove the transaction scope from task executions API
Since the task execution API get_all method is in a transaction block,
if there is a lot of read against the task execution API GET method , it
will lead to unnecessary DB locks that can result in deadlocks and
consequently WF execution failures.

Change-Id: I5a6b7829176178bb6e06768e9d52e94202cf4347
Closes-Bug: #1501433
2015-10-06 17:17:24 +00:00
Renat Akhmerov 3707062d67 Renaming state DELAYED to RUNNING_DELAYED
* As discussed in the mailing list it's better to rename DELAYED
  to RUNNING_DELAYED so that we semantically express it as a substate
  of RUNNING whereas WAITING is not.

Closes-Bug: #1470369
Change-Id: I3b7033d894d29fe755d4d0262c1029c4576421cd
2015-09-25 15:21:24 +06:00
Nikolay Mahotkin 920b25ffc7 Fixing working concurrency when value is YAQL
Closes-Bug: #1487651

Change-Id: I45abd0c064f4e1be9aee86f4488281ecb9830a62
2015-08-27 12:44:20 +00:00
LingxianKong 40e98c4c5e Fix inappropriate condition for retry policy
When retry policy is used without continue-on clause, the retry iteration
will still be scheduled even if the task succeeds.

This patch fixes the problem by return when task succeeds with retry policy.

Change-Id: I9f07ed3565fe7169f2831a435e4e76a49af34f6c
Closes-Bug: #1469330
2015-07-01 10:38:24 +08:00
Nikolay Mahotkin b42a5b86a2 Implementing 'continue-on' retry policy property
Implements blueprint mistral-retry-continue-on

Change-Id: Idf893fdb201a05521ab2503ad756bafda95ae7d5
2015-06-05 19:30:36 +03:00
bhavenst 5da2791126 Allow pause-before to override wait-before policy
* Prevent wait-before from scheduling task for paused workflow

Closes-bug: #1449948

Change-Id: I411d961d0bee7d874e737e05357a7c8c183add51
2015-05-13 12:26:47 -04:00
Renat Akhmerov 235a863c1b Renaming "engine1" to "engine"
Change-Id: I106704c26b4598f9f3dc2ed04213b1f565d00010
2015-04-09 17:47:36 +06:00