mistral

Commit Graph

Author	SHA1	Message	Date
Q.hongtao	4bc6162515	Remove six library Remove six-library Replace the following items with Python 3 style code. - six.interger_types - six.itervalues - six.text_type - six.string_types - six.StringIO - six.next - six.b - six.PY3 Change-Id: I299c90d5cbeb41be0132691265b8dcbeae65520e	2020-09-23 10:27:12 +08:00
Renat Akhmerov	7dec19ae19	Fix calculating task execution result for "with-items" * The logic of calculating a task result in case of "with-items" was overcomplicated and broke encapsulation of a "with-items" task. This patch makes it simpler, so that the method doesn't need to peek into the internals of a "with-items" task (e.g. runtime_context). Change-Id: I036193cbae15d7f3c3414b123525ceafa91fdeb1	2020-06-02 16:28:42 +07:00
Renat Akhmerov	ddf9577785	Refactor task policies * The purpose of this patch is to improve encapsulation of task execution state management. We already have the class Task (engine.tasks.Task) that represents an engine task and it is supposed to be responsible for everything related to managing persistent state of the corresponding task execution object. However, we break this encapsulation in many places and various modules manipulate with task execution state directly. This fact leads to what is called "spagetty code" because important things are often spread out across the system and it's hard to maintain. It also leads to lots of duplications. So this patch refactors policies so that they manipulate with a task execution through an instance of Task which hides low level aspects. Change-Id: Ie728bf950c4244db3fec0f3dadd5e195ad42081d	2020-06-01 14:05:49 +07:00
Zuul	1c7e242975	Merge "Reformat rerun logic for tasks with join"	2019-11-11 14:42:37 +00:00
Oleg Ovcharuk	4e926a1f13	Fail-on policy Fail-on policy allows to fail success tasks by condition. It is useful in cases we have to fail task if its result is unacceptable and it makes workflow definition more readable. Change-Id: I57b4f3d1533982d3b9b7063925f8d70f044aefea Implements: blueprint fail-on-policy Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>	2019-08-11 07:21:57 +00:00
Oleg Ovcharuk	bdbfb82301	Reformat rerun logic for tasks with join Change-Id: I055bc2d5a4bdf839f1e262e49563616d8deff92f Closes-Bug: #1833262 Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>	2019-07-11 17:11:13 +03:00
Renat Akhmerov	43d23c0e25	Create needed infrastructure to switch scheduler implementations * After this patch we can switch scheduler implementations in the configuration. All functionality related to scheduling jobs is now expressed vi the internal API classes Scheduler and SchedulerJob. Patch also adds another entry point into setup.cfg where we can register a new scheduler implementation. * The new scheduler (which is now called DefaultScheduler) still should be considered experimental and requires a lot of testing and optimisations. * Fixed and refactored "with-items" tests. Before the patch they were breaking the "black box" testing principle and relied on on some either purely implementation or volatile data (e.g. checks of the internal 'capacity' property) * Fixed all other relevant tests. Change-Id: I340f886615d416a1db08e4516f825d200f76860d	2019-06-24 11:25:57 +03:00
Oleg Ovcharuk	475b82c532	Delete delayed calls for deleted entities Delayed calls for nonexistent entities should not fail; they should do nothing and be deleted in normal way. Change-Id: I1b818d671468b95ce8ae06416b57fd4a22cc6eb2 Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>	2019-06-10 11:14:31 +03:00
Oleg Ovcharuk	88e5af4148	Reformat retry logic for tasks with join Change-Id: Ie31f08a20265a59bcaa63dd6480834eb6918f349 Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>	2019-04-26 16:21:49 +03:00
Oleg Ovcharuk	99ebc1b5f7	Retries shouldn't execute if join task failed because of child task Change-Id: Ideaa9938497f74335af633044cb6e98fbb1522d8 Closes-Bug: #1819418 Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>	2019-03-11 22:29:08 +03:00
Renat Akhmerov	80a1bed67b	Simplify workflow and join completion logic * action_queue module is replaced with the more generic post_tx_queue module that allows to register operations that must run after the main DB transaction associated with processing a workflow event such as completing action. * Instead of calling workflow completion check from all places where task may possibly complete, Mistral now registers a post transactional operation that runs after the main DB transaction (to make sure at least one needed consistent DB read) right inside the task completion logic. It reduces clutter significantly. * Workflow completion check is now registered only if the just completed task may lead to workflow completion, i.e. if it's the last one in a workflow branch. * Join now checks delayed calls to reduce a number of join completion checks created with scheduler and also uses post transactional queue for that. Closes-Bug: #1801872 Change-Id: I90741d4121c48c42606dfa850cfe824557b095d0	2018-11-09 14:17:20 +07:00
Renat Akhmerov	3d7acd3957	Improve workflow completion logic by removing periodic jobs * Workflow completion algorithm use periodic scheduled jobs to poll DB and determine when a workflow is finished. The problem with this approach is that if Mistral runs another iteration of such job too soon then running such jobs will create a big load on the system. If too late, then a workflow may be in RUNNING state for too long after all its tasks are completed. The current implementation tries to predict a delay with which the next job should run, based on a number of incompleted tasks. This approach was initially taken because we switched to a non-blocking transactional model (previously we locked the entire workflow execution graph in order to change a state of anything) and in this architecture, when we have parallel branches, i.e. parallel DB transactions, we can't make a consistent read from DB from neither of these transactions to make a reliable decision about whether the workflow is completed or not. Using periodic jobs was a solution. However, this approach has been proven to work unreliably because such a prediction about delay before the next job iteration doesn't work well on all variety of use cases that we have. This patch removes using periodic jobs in favor of using the "two transactions" approach when in the first transaction we handle action completion event (and task completion if it causes it) and in the second transaction, if a task is completed, we check if the workflow is completed. This approach guarantees that at least one of the "second" transactions in parallel branches will make needed consistent read from DB (i.e. will see the actuall state of all needed objects) to make the right decision. Closes-Bug: #1799382 Change-Id: I2333507503b3b8226c184beb0bd783e1dcfa397f	2018-11-07 04:00:04 +00:00
Vitalii Solodilov	78b542c4c5	Refresh a number of retry a task when task was rerun Change-Id: If0a8219bb54ee0d01084dbaf5c9ed5b2041c2bc4 Closes-Bug: #1772265 Signed-off-by: Vitalii Solodilov <mcdkr@yandex.ru>	2018-06-24 05:34:33 +00:00
Renat Akhmerov	6b7b58ed6c	Add '__task_execution' structure to task execution context on the fly * Previously we stored the data structure describing the current task execution (id and name) in the inbound task execution context directly so that it'd be saved to DB. This was needed to evaluate YAQL/Jinja function task() without parameters properly. However, it's not needed, we can just build a context view on the fly just before evaluating an expression. Change-Id: If523039446ab3e2ccc9542617de2a170168f6e20 Closes-Bug: #1764704	2018-04-17 18:13:35 +07:00
Renat Akhmerov	9726189c43	Fix 'pause' engine command * Commands going after 'pause' in 'on-XXX' clauses were never processed after workflow resume. The solution is to introduce a notion of a workflow execution backlog where we can save these commands in a serialized form so that the engine dispatcher could see and process them after resume. * Other minor changes Change-Id: I963b5660daf528d1caf6a785311de4fb272cafd0 Closes-Bug: #1714054	2018-03-24 11:10:08 +00:00
Vitalii Solodilov	e8d6c382c5	Correction of comments for the #539039 review Change-Id: Id76c20d2da20c362ca94727e5c5dea2e19ed6b6d Signed-off-by: Vitalii Solodilov <mcdkr@yandex.ru>	2018-02-12 04:09:12 +04:00
Vitalii Solodilov	b79f91e9ec	Propagated a task timeout to a action execution It shall be possible to specify timeout for Mistral actions in order to cancel some long-performed action so that to provide predictable execution time for client service. Currently Mistral allows configure timeout on task and automatically changes task status to error. However mistral don't interrupt action execution. We need Mistral to terminate timed out action execution, because there might be the following issues: * several the same action executions can run at the same time breaking data consistency * stale action executions may lead to the massive resources consumption (memory, cpu..) Change-Id: I2a960110663627a54b8150917fd01eec68e8933d Signed-off-by: Vitalii Solodilov <mcdkr@yandex.ru>	2018-01-31 17:40:52 +04:00
Vitalii Solodilov	2ffbc412c4	Fix break_on calculation in before_task_start RetryPolicy: prevent break_on from evaluation before task execution. Sometimes expressions in break_on require existence of task execution (see example in updated test). But if break_on is evaluated before first execution of task, it may end up with exception. Change-Id: Ia836c0330dbed62954d79059df1bef3758f7c5e5 Signed-off-by: Anton Kazakov <ton.kazakov@gmail.com> Signed-off-by: Vitalii Solodilov <mcdkr@yandex.ru>	2018-01-23 13:21:17 +00:00
Andras Kovi	7184596443	Gracefully handle DB disconnected connect errors When the DB is disconnected, the Mistral API should retry the operation for a predefined amount of time at least for GET type requests as this error is highly probable to be caused by temporary failures. The handlind of Operational errors was already implemented. Change-Id: I3adb94dd695aeaa40d37956beae088d5618422c3	2017-12-28 16:50:19 +07:00
Mike Fedosin	4283998694	Fix inconsistencies when setting policy values This patch fixes inconsistencies between two ways of setting policy values: as a variable and directly in workflow definition as a constant. Inconsistency #1: For policies 'wait-before', 'wait-after', 'retry', 'timeout', 'concurrency' there is a difference on how they are executed if value is 0. If the value is hardcoded in workflow, the policies are omitted [1], but if a user defines them as a variable, then the policies are applied. Inconsistency #2: Policy values in workflow definitions cannot be negative numbers (validated by schema) [2], but if a user sets them as variables it's okay [3]. It happens because the schemas are different for both cases. [1] https://github.com/openstack/mistral/blob/master/mistral/engine/policies.py#L83 [2] https://github.com/openstack/mistral/blob/master/mistral/lang/v2/policies.py#L27 [3] https://github.com/openstack/mistral/blob/master/mistral/engine/policies.py#L161 Change-Id: I660ec2fe00e9f524292957560548447e517332fc Closes-bug: #1731100	2017-12-13 11:13:12 +00:00
Renat Akhmerov	397a562788	Fix deletion of delayed calls * Deletion of delayed calls is incorrect. A list of delayed calls gets deleted within one DB transaction and if at least one object is not deleted because of a DBDeadlock exception (on MySQL) then the entire transaction fails and, what's more important, the exception is swallowed by the try-finally block without reraising it so that it could be handled by the "retry_on_deadlock" decorator. This patch fixes this problem by reraising the initial exception. * Added "retry_on_deadlock" decorator to all methods methods that open DB transactions and where we have a risk of hitting a deadlock. Change-Id: I816c8c2a940e38cf1698d76e1019671249238598	2017-10-23 13:56:57 +07:00
Winson Chan	16b54d8766	Cascade pause from pause-before in subworkflows When a workflow is paused by pause-before, the state will cascade down to other subworkflows and up to parent workflow. Change-Id: Ied178fe08f8308455bf05b3168635a3b69799cec Closes-Bug: #1700196	2017-08-07 21:02:15 +00:00
Jenkins	cbe881ef7e	Merge "Updated the retries_remain statement"	2016-11-29 09:07:34 +00:00
Sharat Sharma	2842415f09	Updated the retries_remain statement If the task is specified with number of retries as 1, then it is not retried on error. So, this patch changes the statement of retries_remain to consider 1 as a value for retry. Change-Id: Ib0ede7a119bb57108141e50722928d53dd904d5f Closes-Bug: #1631140	2016-11-23 11:21:26 +00:00
Winson Chan	ab2c23acc3	Add cancelled state to action executions Allow action executions to be cancelled, specifically for async actions, and handle the cancellation for task and with-items task appropriately. For with-items tasks, if one of the action executions is cancelled, then the task is cancelled. Previously, if there is a mix of error and cancels, the task is marked with error. But this leads to on-complete being processed which shouldn't since the with-items task is incomplete due to partially cancelled. Change-Id: Iafc2263735f75fe06ae5f03a885cda8f965a7cc4 Implements: blueprint mistral-cancel-state	2016-11-16 19:34:04 +07:00
Renat Akhmerov	6b229d360a	Run actions without Scheduer Change-Id: If6bc0d62851f0ef73ce0c56f770883ddfd4a40bf Implements: blueprint mistral-run-actions-without-scheduler	2016-11-03 17:29:30 +07:00
Renat Akhmerov	a4287a5e63	Avoid storing workflow input in task inbound context * 'in_context' field of task executions changed its semantics to not store workflow input and other data stored in the initial workflow context such as openstack security context and workflow variables, therefore task executions occupy less space in DB * Introduced ContextView class to avoid having to merge dictionaries every time we need to evaluate YAQL functions against some context. This class is a composite structure built on top of regular dictionaries that provides priority based lookup algorithm over these dictionaries. For example, if we need to evaluate an expression against a task inbound context we just need to build a context view including task 'in_context', workflow initial context (wf_ex.context) and workflow input dictionary (wf_ex.input). Using this class is a significant performance boost * Fixed unit tests * Other minor changes Change-Id: I7fe90533e260e7d78818b69a087fb5175b9d5199	2016-09-21 13:32:44 +03:00
Renat Akhmerov	c7aa89e03d	Splitting executions into different tables * Having different types of execution objects in different tables will give less contention on DB tables and hence better performance so DB schema was changed accordingly * Fixed all unit tests and places in the code where we assumed polymorphic access to execution objects * Other minor fixes TODO(in upcoming patches): * DB migration script Change-Id: Ibc8408e12dd85e143302d7fdddace32954551ac5	2016-08-02 11:47:25 +07:00
Renat Akhmerov	633eb0fe6d	Add proper error handling for task continuation * In case if task needs to be continued, e.g. in case of 'wait-before' policy which inserts a delay into normal task execution flow (between creation of task policy and scheduling actions), possible exceptions also need to be handled properly (move task and worklfow into ERROR). This patch adds error handling and the test to check this. * Other minor changes related to addressing a few TODO's across engine code. Change-Id: I525f193a149e3b0341aa8d0ffa0858ded96ba94f	2016-07-08 15:08:51 +07:00
Renat Akhmerov	3641b46d15	Remove unnecessary database transaction from Scheduler Change-Id: I08f0fcd67e0cd0e40e76ed6cfc7bb214096a2c16 Closes-Bug: #1484521	2016-06-01 08:21:39 +00:00
Renat Akhmerov	816bfd9dcc	Refactor Mistral Engine * Introduced class hierarchies Task and Action used by Mistral engine. Note: Action here is a different than executor Action and represents rather actions of different types: regular python action, ad-hoc action and workflow action (since for task action and workflow are polymorphic) * Refactored task_handler.py and action_handler.py with Task and Action hierarchies * Rebuilt a chain call so that the entire action processing would look like a chain of calls Action -> Task -> Workflow where each level knows only about the next level and can influence it (e.g. if adhoc action has failed due to YAQL error in 'output' transformer action itself fails its task) * Refactored policies according to new object model * Fixed some of the tests to match the idea of having two types of exceptions, MistralException and MistralError, where the latter is considered either a harsh environmental problem or a logical issue in the system itself so that it must not be handled anywhere in the code TODO(in subsequent patches): * Refactor WithItemsTask w/o using with_items.py * Remove DB transaction in Scheduler when making a delayed call, helper policy methods like 'continue_workflow' * Refactor policies test so that workflow definitions live right in test methods * Refactor workflow_handler with Workflow abstraction * Get rid of RunExistingTask workflow command, it should be just one command with various properties * Refactor resume and rerun with Task abstraction (same way as other methods, e.g. on_action_complete()) * Add error handling to all required places such as task_handler.continue_task() * More tests for error handling P.S. This patch is very big but it was nearly impossible to split it into multiple smaller patches just because how entangled everything was in Mistral Engine. Partially implements: blueprint mistral-engine-error-handling Implements: blueprint mistral-action-result-processing-pipeline Implements: blueprint mistral-refactor-task-handler Closes-Bug: #1568909 Change-Id: I0668e695c60dde31efc690563fc891387d44d6ba	2016-05-31 14:08:36 +00:00
Limor Stotland	c4a614273d	If task fails on timeout - there is no clear message of failure * Adding state_info to fail_task_if_incomplete solve it * Unskip test TaskDefaultsReverseWorkflowEngineTest#test_task_defaults_timeout_policy Closes-Bug: #1527976 Change-Id: I1f44f648ea71d2dcf8bdca77e6bcca0023963be0	2015-12-23 13:29:22 +00:00
hparekh	cc13c4ecd7	Added Unit test when policy input is variable. Also code has been changed for py34 compatibility. Change-Id: I7e7675265807dbcf4637bbb56c927a2da86e7c5d	2015-11-30 08:50:07 +00:00
hparekh	4109f1cc9a	Comparision opeartor has been changed. While creating policy '>' operator is used, due to which in py34 exception is occurred when variable is provided from input parameter. Exception was TypeError: unorderable types: str() > int() TODO: Add more unit test to catch such scenarios. Partially-Implements: blueprint mistral-py3 Change-Id: I2c652812ae4a04cd7610f2a6684da76c582a4e32	2015-11-03 10:22:46 +05:30
Winson Chan	61ec312160	Remove the transaction scope from task executions API Since the task execution API get_all method is in a transaction block, if there is a lot of read against the task execution API GET method , it will lead to unnecessary DB locks that can result in deadlocks and consequently WF execution failures. Change-Id: I5a6b7829176178bb6e06768e9d52e94202cf4347 Closes-Bug: #1501433	2015-10-06 17:17:24 +00:00
Renat Akhmerov	3707062d67	Renaming state DELAYED to RUNNING_DELAYED * As discussed in the mailing list it's better to rename DELAYED to RUNNING_DELAYED so that we semantically express it as a substate of RUNNING whereas WAITING is not. Closes-Bug: #1470369 Change-Id: I3b7033d894d29fe755d4d0262c1029c4576421cd	2015-09-25 15:21:24 +06:00
Nikolay Mahotkin	920b25ffc7	Fixing working concurrency when value is YAQL Closes-Bug: #1487651 Change-Id: I45abd0c064f4e1be9aee86f4488281ecb9830a62	2015-08-27 12:44:20 +00:00
LingxianKong	40e98c4c5e	Fix inappropriate condition for retry policy When retry policy is used without continue-on clause, the retry iteration will still be scheduled even if the task succeeds. This patch fixes the problem by return when task succeeds with retry policy. Change-Id: I9f07ed3565fe7169f2831a435e4e76a49af34f6c Closes-Bug: #1469330	2015-07-01 10:38:24 +08:00
Nikolay Mahotkin	b42a5b86a2	Implementing 'continue-on' retry policy property Implements blueprint mistral-retry-continue-on Change-Id: Idf893fdb201a05521ab2503ad756bafda95ae7d5	2015-06-05 19:30:36 +03:00
bhavenst	5da2791126	Allow pause-before to override wait-before policy * Prevent wait-before from scheduling task for paused workflow Closes-bug: #1449948 Change-Id: I411d961d0bee7d874e737e05357a7c8c183add51	2015-05-13 12:26:47 -04:00
Renat Akhmerov	235a863c1b	Renaming "engine1" to "engine" Change-Id: I106704c26b4598f9f3dc2ed04213b1f565d00010	2015-04-09 17:47:36 +06:00

41 Commits