Commit Graph

283 Commits

Author SHA1 Message Date
Zuul 6e190e65ea Merge "Context versioning feature" 2024-03-14 13:03:26 +00:00
Oleg Ovcharuk f2cbe1c59d Context versioning feature
With complex parallel joins mistral had no mechanism to choose which
publish (left or right in terms of merge) should it use. It is a
common case when one branch updates existing value, but after merge
we see the old version.
This patch introduce context versioning feature, where every existing
key of mistral context has its version, and this version is used in
the context merge stage.

Change-Id: I604a9a8391150ac4801115b9892f781c33ecfdcb
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
2024-03-14 09:25:17 +00:00
Takashi Kajinami 44cd95684b Bump hacking
hacking 3.0.x is too old.

Also remove the note about pip's behavior which was already fixed in
recent versions.

Change-Id: I65d350943649c3346ed5741631c01724ddd256ef
2024-02-19 02:23:53 +09:00
Oleg Ovcharuk 6d3018ea01 Fix join task not refreshing inbound context
In error cases join task could lose context of some branches

Change-Id: I58a94c4ebc5d860473c9b48df326f6ea29cba9fa
Closes-Bug: #2020370
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
2023-05-24 18:16:59 +03:00
Zuul 4289317a91 Merge "Task skipping feature" 2023-02-14 13:22:17 +00:00
Vasudeo Nimbekar 7c35734300 Merge mistral tasks data to execution context
after this patch user can choose option whether to replace or merge task data to the execution context.
ex: merge_strategy: replace/merge

Implements: blueprint merge-mistral-tasks-data
Change-Id: I3c96bab9953c4995f2b718ac48dff0f153872026
2023-01-31 17:27:17 +05:30
Oleg Ovcharuk e72a4e9a70 Task skipping feature
This patch adds an ability to rerun failed workflow by
skipping failed tasks. Workflow behavior in skip case could
be configured by new fields in task definition:
* on-skip
* publish-on-skip

Change-Id: Ib802a1b54e69c29b4d0361f048c2b9c076a4c176
Implements: blueprint mistral-task-skipping-feature
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
2022-12-01 01:47:30 +03:00
Renat Akhmerov 06a0f33476 Refactor Mistral with Action Providers
* This patch refactors Mistral with the action provider concept
  that is responsible for delivering actions to the system. So
  it takes all the burden of managing action definitions w/o
  having to spread that across multiple subsystems like Engine
  and API and w/o having to assume that action definitions are
  always stored in DB.
* Added LegacyActionProvider  that represents the old way of
  delivering action definitions to the system. It pretty much just
  analyses what entries are configured in the entry point
  "mistral.actions" in setup.cfg and build a collection of
  corresponding Python action classes in memory accessible by names.
* The module mistral/services/actions.py is now renamed to
  adhoc_actions.py because it's effectively responsible only for
  ad-hoc actions (those defined in YAML).
* Added the new entry point in setup.cfg "mistral.action.providers"
  to register action provider classes
* Added the module mistral/services/actions.py that will be a facade
  for action providers. Engine and other subsystems will need to
  work with it.
* Other small code changes.

Depends-On: I13033253d5098655a001135c8702d1b1d13e76d4
Depends-On: Ic9108c9293731b3576081c75f2786e1156ba0ccd
Change-Id: I8e826657acb12bbd705668180f7a3305e1e597e2
2020-09-24 11:10:33 +00:00
Renat Akhmerov 7dec19ae19 Fix calculating task execution result for "with-items"
* The logic of calculating a task result in case of "with-items" was
  overcomplicated and broke encapsulation of a "with-items" task.
  This patch makes it simpler, so that the method doesn't need to
  peek into the internals of a "with-items" task (e.g. runtime_context).

Change-Id: I036193cbae15d7f3c3414b123525ceafa91fdeb1
2020-06-02 16:28:42 +07:00
Renat Akhmerov ddf9577785 Refactor task policies
* The purpose of this patch is to improve encapsulation of task
  execution state management. We already have the class Task
  (engine.tasks.Task) that represents an engine task and it is
  supposed to be responsible for everything related to managing
  persistent state of the corresponding task execution object.
  However, we break this encapsulation in many places and various
  modules manipulate with task execution state directly. This fact
  leads to what is called "spagetty code" because important
  things are often spread out across the system and it's hard to
  maintain. It also leads to lots of duplications. So this patch
  refactors policies so that they manipulate with a task execution
  through an instance of Task which hides low level aspects.

Change-Id: Ie728bf950c4244db3fec0f3dadd5e195ad42081d
2020-06-01 14:05:49 +07:00
Renat Akhmerov 019cffb3ab Fix ContextView JSON serialization
* With disabled YAQL data output conversion, YAQL may return
  instances of ContextView which can't be properly saved into
  DB. This happens because Mistral serialization code doesn't
  turn on JSON conversion of custom objects, and they are just
  ignored by the "json" lib when it encounters them.
* Fixed how Mistral serializes context for Javascript evaluation
  to address the same problem.
* Implemented __repr__ method of ContextView.
* Removed logging of "data_context" from YAQL evaluation because
  previously it was always empty (because the string represetation
  of ContextView was always "{}") and now it may be very big, like
  megabytes, and the log gets populated too fast. It makes sense to
  log YAQL data context only when an error happened. In this case
  it helps to investigate an issue.
* Added all required unit tests.
* Fixed the tests for disabled YAQL conversion. In fact, they
  didn't test it properly because data conversion wasn't disabled.

Closes-Bug: #1867899
Change-Id: I12b4d0c5f1f49990d8ae09b72f73c0da96254a86
2020-03-19 17:07:42 +07:00
Zuul dbfc0bea22 Merge "Fix incorrect in-depth search of affected tasks" 2020-03-07 16:51:17 +00:00
Oleg Ovcharuk de633d5d48 Fix incorrect in-depth search of affected tasks
In case if the join task is not exist (every upstream task have
ERROR state) workflow will stuck in RUNNING state because of
the bug in affected tasks search. This patch fixes this bug.

Change-Id: If9f0c9bea587b486998af1c18e282bedba453499
Closes-Bug: #1862161
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
2020-03-05 14:00:22 +03:00
Eyal 8bdf341af7 Remove OpenStack actions from mistral
Depends-on: https://review.opendev.org/#/c/703296/
Depends-On: https://review.opendev.org/#/c/704280/
Change-Id: Id62fdabe7699e7c3b2977166e253cfc77779e467
2020-02-26 10:12:01 +02:00
Renat Akhmerov 829e822581 Init profiler in for a new thread in post_tx_queue.py
* Initialization of profiler was also missing for a thread
  spawned within post_tx_queue.py so we were loosing important
  profiling info
* Changed the profiler test since its logic was already obsolete.
  Now we initialize profiler in every thread so the only reason to
  not get any profiler traces when a workflow completed is
  "enabled = False" in the "profiler" group in the configuration.
* Added more profiler traces
* Small readability changes in the workflow language spec

Change-Id: I35e6711f8e10bb08d7e842f4bca8753b929328fd
2020-02-07 13:42:55 +07:00
Renat Akhmerov 5b5576dd04 Set the delayed call "key" field to the right value
* The field "key" of the DelayedCall class must be set to the right
  value. It allows to significantly optimize performance of workflows
  with "join" tasks. However, it was broken during the refactoring
  in the summer of 2019. This patch fixes it.
* Added another profiler trace decorator.

Closes-Bug: #1861988
Change-Id: I247b674d8a358795871cfa87bcdf29f4857ca2d8
2020-02-05 09:09:50 +00:00
Zuul 66d1776f1b Merge "Extend capabilities to clean up old executions" 2019-11-12 11:31:39 +00:00
Zuul c11d8eade9 Merge "Refactor rerun of joins" 2019-11-12 09:32:56 +00:00
ali 6b862e625e Extend capabilities to clean up old executions
* Added a configuration option to the expiration policy
  to filter out workflow states.

Closes-Bug: #1796627
Change-Id: Ife49e6da1d7d52a3f50f1628d808d4c65a22cad9
2019-11-12 07:12:45 +00:00
Renat Akhmerov 59bf2509eb Refactor rerun of joins
* This patch moves logic that schedules a task state refreshing
  periodic job in case of rerun from the Task class to
  task_handler.run_task() so that Task doesn't have to know any
  language specific details and call task handler back. It is
  more architecturally clean.

Change-Id: If7a054bbf77f9ed761d8f3ac36b6d329544f5ff5
2019-11-11 17:10:16 +07:00
Renat Akhmerov 7a6aac0f5f Fix "root_execution" lazy loading issue and refactor execution.py
* There's an issue with lazyly loaded field of WorkflowExecution
  model occuring on GET /v2/execution/<id> because the logic
  that calculates "published_global" of the execution rest resource
  hits "root_execution" field out of transaction scope indirectly
  within the "data_flow.get_workflow_environment_dict" method.
  This patch makes refactoring of this logic and calculates
  globally published variables of the workflow execution simply
  as its context that doesn't contain all internal data like
  "__execution" and "openstack".
* Other style change.

Closes-Bug: #1846152
Change-Id: Ic8609e55930e2ed13653e79e8ca7a31c951d9030
2019-10-02 11:02:52 +00:00
ali 7e7f1cb92b moved generic util functions from mistral to mistral-lib
Depends-On: I780c270e4b1a184d7d4dcc580d23697ba75edab1
Closes-bug: #1815183
Change-Id: I5a1d402baa3f69c37f9347c8b3d02a83b8f60423
2019-09-13 04:06:27 +00:00
Mike Fedosin b0fb101c47 Optimize finding upstream task executions
This commit introduces next optimizations:
1. No need to send a request if the list of inbound tasks is empty.
2. Instead of checking already downloaded tasks, it is more reasonable
to add filters to the request.
3. Generation of task execution cache is meaningless here, because
we only check the nearest tasks without going up the graph.

Change-Id: I5cb144903cd2abb6eecfe32a13da4a2ebf7db3dc
2019-06-16 20:11:33 +02:00
Mike Fedosin c215c0520f Direct workflow code cleanup and refactoring
When Ib3a684f63f05a7cfe846782774d2c68be78bebab is merged a lot of
code becomes useless and can be removed.

Change-Id: Ic9bf73736392e91ce02faa0b3a73b5b3ce40fd14
2019-06-12 10:51:54 +02:00
Renat Akhmerov b694770561 Store next task names in DB
* In many places Mistral has to calculate next task names of the
  given task execution. It may be expensive in case of large graphs
  and/or large expressions. Instead of evaluating next tasks again
  and again we can evaluate next task names just once when a task
  completes (it happens already anyway) and store them in DB. They
  can be reused later whenever it's needed.
* Style changes
* Removed a large amount of unused variables in
  test_subworkflows_pause_resume.

Change-Id: Ib3a684f63f05a7cfe846782774d2c68be78bebab
2019-06-11 05:20:07 +00:00
Mike Fedosin eb59328556 Limit max search depth
Now, to find an induced join state for a task we prepare a set of
all task's parent names and then perform a db request to fetch all
the tasks in one request.

But sometimes the depth of the search can be great, and even searching
of parent task names can be time-consuming.
To prevent this situation this patch introduces limited depth search,
Which finds parent tasks at a limited depth first, and loads the rest
only if necessary.

Change-Id: I7126b7c9652f190ec5c8423fad60d18729840a3a
2019-06-11 05:19:54 +00:00
Zuul 5131444a62 Merge "Use get_task_executions_count for any_cancels method" 2019-06-08 01:07:59 +00:00
Mike Fedosin 15355ea5e1 Use get_task_executions_count for any_cancels method
To get the result we need only a number of tasks, so there is no
need to fetch full task executions from the database.

Change-Id: I34ea985e6de08f46e1c40c64996357c8e6f3d79c
2019-06-06 15:46:34 +02:00
Zuul f85caa02c9 Merge "Skip context evaluation for non-conditional transitions" 2019-06-06 10:09:45 +00:00
Mike Fedosin f09c8ebec1 Skip context evaluation for non-conditional transitions
Context evaluation is not required for non-conditional transitions,
because the list of next tasks is known in advance.
For this reason we can skip this operation, that can be quite
time-consuming, and generate a list of task directly.

Also a little cleanup was done to remove unnecessary methods.
Change-Id: Ia419c47a7d71db46a5cae557fe8bc7512390715e
2019-06-05 18:31:56 +02:00
Mike Fedosin c1e4fd8d48 Remove _get_next_clauses
A little code clean-up: this commit replaces _get_next_clauses method
from direct_workflow.py with a built-in find_outbound_task_names

Change-Id: Ic5ea956ee45b8c455d42669c0f0e386d53e22425
2019-06-05 04:15:42 +00:00
Mike Fedosin 58b714eb16 Prepare cache for _is_upstream_task_execution
This trick is similar to what we do in _get_join_logical_state.
Instead of bypassing the entire workflow graph and downloading
tasks executions one by one, we prepare a lightweight cache with
a single database query and then analyze it on the spot.

Change-Id: If1adb4ea408248e766bff449637de2b1b1356738
2019-06-02 23:18:07 +02:00
Mike Fedosin cd19e48697 Remove _find_task_execution_by_name
This method was used to find a task execution by its name in
_is_upstream_task_execution only. But this operation can be
simplified, because we already have the task execution,
and therefore there is no need to search for it again in the
database.

Change-Id: I45a175970bc2567f5ef63e161fa431c8e9245ec1
2019-06-02 23:05:52 +02:00
Mike Fedosin c52688523c Rework finding indirectly affected created joins
Current algorithm of finding the joins requires a recursive search
through the execution graph, which leads to a large number of calls
to the database.

To optimize it we introduce a new algorithm that requires only one db
request. It downloads all potencial join task ids and names, and then
analyzes them without any additional db calls.

Change-Id: Ic73f2112406e681ae8a2aa67bcbccebe488fc03c
2019-05-29 23:58:00 +02:00
Mike Fedosin 84b8e92acc Get rid of lookup utils
With the new joining mechanism lookup utils for task executions
become useless and can be removed from the codebase.
Now all requests to the database will be perforemed directly from
a workflow controller.

Action defention cache was moved to mistral/engine/actions.py,
because it's needed only there.

Change-Id: If0d4403f5c61883ecfec4cfa14b98cc39aae5618
2019-05-22 17:11:49 +02:00
Mike Fedosin 8549aeaf66 Optimize searching of upstream task executions
Now to find upstream task executions mistral does a lookup for each
inbound task spec. This is not optimal, because it leads to a significant
number of db requests.

To optimize this behavior we prepare a list of task names in advance,
and then search executions using 'in' filter.

Change-Id: Ia7bf62c45b889f753671bdda048f91c46af41039
2019-05-20 18:09:55 +02:00
Mike Fedosin ff00c9c778 Rework joining mechanism
Current joining mechanism in some cases can be expensive because
it uses a multi-step recursive search, which leads to a huge amount
of db requests.

This work changes this behavior by precaching required task
executions to prevent hammering the database during the lookup.

Change-Id: I2d1b7e72c728a14c85b015dfdb0f8800b95f3749
2019-05-17 12:24:12 +02:00
Renat Akhmerov 83c541acbf Reduce the number of "on-xxx" evaluations
* Mistral evaluates expressions under "on-xxx" clauses more than
  once during the processing of a workflow. It's now been fixed by
  storing crucial information about the result of those expressions
  after it was first obtained in the DB.
* Added two boolean fields "has_next_tasks" and "error_handled" in
  the TaskExecution class and the required migration. These fields
  allow not to calculate expressions under "on-xxx" clauses many
  times which leads to reducing execution time in case of heavy
  expressions and/or their data contexts.
* Minor style changes.

Closes-Bug: #1824121
Change-Id: Ib236ba7a72d8e578f9c52460d2a7d8d4540f9c37
2019-05-15 07:39:05 +00:00
Renat Akhmerov b0829f943b Fix an expression context for all_errors_handled()
* "__task_execution" wasn't included in this case into the
  expression data context so the function task() didn't work
  properly

Change-Id: I3cacae90f9031d09a5e6d8153d728ddc01e1bb21
Closes-Bug: #1823875
2019-04-10 11:49:48 +07:00
Renat Akhmerov 32c96b1b6c Add "root_execution" mapped property to WorkflowExecution model
* We need to avoid using direct DB queries w/o using a mapped
  model where possible for performance sake. Using a mapped entity
  property is always more efficient because it's cached in an
  SQLAlchemy session.

Change-Id: I2d7652ea0cff8f2db7259d285ac98c582bf15b62
2019-03-18 11:19:44 +07:00
Andras Kovi 81af1b4838 Process all task batches in wf output evaluation
All batches must be processed in workflow output evaluation. An
empty batch means only that no tasks were end tasks in the queried
slice.

Closes-Bug: 1811775
Change-Id: I0ed4e690f67966ba2d145ad6430b517bd896ced6
2019-01-15 13:54:41 +01:00
Oleg Ovcharuk ea7fa0e4a6 Add started_at and finished_at to task execution.
Sometimes it is very important to know exact time of task execution, but using fields
created_at and updated_at is incorrect, because in this case duration will consist
of not only Running time.
That new fields solve this problem.

Change-Id: I15be0648a0346f5b3dc9ef4a1b330a6c0e818385
Implements: blueprint mistral-add-started-finished-at
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
2018-11-19 11:25:06 +03:00
Renat Akhmerov c9e08a8839 Fix "join" when the last indirect inbound task failed
* See bug description for the example that didn't work. It was
  caused by a simple mistake in a python expression of type
  "my_set = my_set or set()" that didn't work as expected, i.e.
  it created a new set even if my_set is already an empty set.
  So, the proper expression that's needed is
  "my_set = set() if my_set is None else my_set"

Change-Id: I2a787921449fecf3301013a770ffe712e9606baf
Closes-Bug: #1803677
2018-11-16 15:35:18 +07:00
Renat Akhmerov 90ddf442ee Clone cached action definitions
* Once in a while we get DetachedInstanceError for action definitions
  and it happens when they are fetched from cache. We must always
  clone persistent objects before caching them.

Change-Id: I1d0cffea6775eb258dcefc0dbb8a6ee18effe597
Closes-Bug: #1803528
2018-11-15 18:39:21 +07:00
Renat Akhmerov 80a1bed67b Simplify workflow and join completion logic
* action_queue module is replaced with the more generic
  post_tx_queue module that allows to register operations that must
  run after the main DB transaction associated with processing a
  workflow event such as completing action.
* Instead of calling workflow completion check from all places
  where task may possibly complete, Mistral now registers a post
  transactional operation that runs after the main DB transaction
  (to make sure at least one needed consistent DB read) right
  inside the task completion logic. It reduces clutter significantly.
* Workflow completion check is now registered only if the just
  completed task may lead to workflow completion, i.e. if it's the
  last one in a workflow branch.
* Join now checks delayed calls to reduce a number of join
  completion checks created with scheduler and also uses post
  transactional queue for that.

Closes-Bug: #1801872
Change-Id: I90741d4121c48c42606dfa850cfe824557b095d0
2018-11-09 14:17:20 +07:00
Renat Akhmerov c39842b849 Fix usage of cachetools in lookup_utils
* In the latest version of cachetools lib (3.0.0) the previously
  deprecated argument "missing" of cache classes has been removed.
* Disabled test_generator failing due to the changes in the
  senlin client until it's fixed by https://review.openstack.org/614211

Change-Id: Iac42f592834734a6fddb743e947860b3bb7e1aba
2018-11-06 15:36:43 +07:00
Renat Akhmerov 1a4c599a4d Improve join by removing periodic jobs
* This patch removes the approach with DB polling needed to
  determine if a "join" task is ready to run. Instead of running
  a periodic scheduled job, each task completion now runs the
  algorithm that finds all potentially affected join tasks
  and schedules just one job (instead of a periodic job) to check
  their readiness.
  This solves a problem of system cascaded overloading in case of
  having many very large joins (when a workflow has many joins with
  many  dependencies each). Previously, in such case Mistral created
  too many periodic jobs that just didn't let the workflow progress
  well, i.e. most CPU was used by scheduler to run those periodic
  jobs that very rarely switched "join" tasks to the RUNNING state.

Change-Id: I5ebc44c7a3f95c868d653689dc5cea689c788cd0
Closes-Bug: #1799356
2018-10-23 14:01:39 +07:00
Andras Kovi c08e44f17b Allow engine commands as task name
A change for disabling some task names has introduced a massively
backward incompatible behavior. E.g. even though there is a 'noop' engine
command, the usual way of handling noop is in many cases is to create a
task called the same. The other commands are not used that often but
noop is massively present in currently deployed workflows and it is
not possible to mitigate the error if the workflows are coming from
3rd parties.

This change re-enables the usage of the engine commands as task names
and adds documentation on why this is a useful feature.

Change-Id: If90ee5f787e4587a25c156d12c7750407081bf0d
Related-Change: https://review.openstack.org/#/c/535297
2018-07-19 14:23:18 +00:00
amassalh 259d8a8099 add docs for states.
add docs explaining what each state mean.

Change-Id: I0092473c3be7f5ef5a28532984ebbe753434becb
2018-07-12 08:50:52 +00:00
Renat Akhmerov f2a9bd45ab Do not copy workflow environment into subworkflows
* We previously always copied a workflow environment of a parent
  workflow into a subworkflow when starting it. However, this is
  redundant because we now have 'root_execution_id' field in the
  the workflow execution model so that we can always get an
  environment of a subworkflow just by accessing the root execution.
  It saves a lot of space in DB and increases performance in cases
  when we have a large workflow environment and many subworkflows.

Related-Bug: #1757966
Change-Id: I15077240ba53663a6267b886ab7b081a7dde2710
2018-04-27 20:08:56 +07:00