Commit Graph

540 Commits

Author SHA1 Message Date
Takashi Kajinami 44cd95684b Bump hacking
hacking 3.0.x is too old.

Also remove the note about pip's behavior which was already fixed in
recent versions.

Change-Id: I65d350943649c3346ed5741631c01724ddd256ef
2024-02-19 02:23:53 +09:00
Vadim Zelenevsky 7cc007415b Partial Workflow Failure Handling
This feature introduces an enhanced error-handling
mechanism for workflows, allowing them to gracefully
handle issues within individual tasks without
causing a complete workflow failure. Previously,
when using subworkflow and passing an incomplete set
of parameters, the entire workflow would terminate.
With this feature, the workflow continues execution,
isolating errors at the task level. Consequently,
partial issues in one task no longer impact other
branches of workflow execution.

Implements blueprint partial-workflow-failure-handling

Change-Id: Id6a910c85c1d6953408682a2a724c4826333422f
2023-11-29 07:55:38 +00:00
Oleg Ovcharuk 6d3018ea01 Fix join task not refreshing inbound context
In error cases join task could lose context of some branches

Change-Id: I58a94c4ebc5d860473c9b48df326f6ea29cba9fa
Closes-Bug: #2020370
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
2023-05-24 18:16:59 +03:00
Vasudeo Nimbekar 9f52e2b6f3 Starting tasks via RPC
After this patch mistral will run tasks using RPC which will distribute tasks amongst available engine threads. this will enhance performance in case of executing huge executions containing multiple tasks.

Implements: blueprint distribute-mistral-operations

Change-Id: I0b7202589eee68ba5560bf2aa60fbbd6118f3719
2023-02-16 13:19:42 +05:30
Zuul 4289317a91 Merge "Task skipping feature" 2023-02-14 13:22:17 +00:00
Vasudeo Nimbekar 88e7e7ceee Adding root_execution_id parameter to mistral loggers
After this patch, user can update logging format to include root_execution_id in logs, which will be helpful to find and debug logs related to specific workflow execution.

  - Logs about creation and status changes of Mistral entities(execution,
    task, action execution, etc) are changed to INFO log level.
  - User can update logging_context_format_string to include root_execution_id in logs.

Implements: Implements: blueprint improve-mistral-loggers

Change-Id: I54fe058e5451abba6ea7f69d03d498d78a90993e
2023-02-13 05:01:39 +00:00
Oleg Ovcharuk e72a4e9a70 Task skipping feature
This patch adds an ability to rerun failed workflow by
skipping failed tasks. Workflow behavior in skip case could
be configured by new fields in task definition:
* on-skip
* publish-on-skip

Change-Id: Ib802a1b54e69c29b4d0361f048c2b9c076a4c176
Implements: blueprint mistral-task-skipping-feature
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
2022-12-01 01:47:30 +03:00
Takashi Kajinami 20aa42b75b Replace deprecated import of ABCs from collections
ABCs in collections should be imported from collections.abc and direct
import from collections is deprecated since Python 3.3.

Closes-Bug: #1936667
Change-Id: Ide8aa0323d9713c1c2ea0abf3b671ca4dab95ef0
2022-03-02 09:30:29 +00:00
Renat Akhmerov 87c63f4206 Remove a TODO comment about saving an action spec
* It's clear now that we don't have to store an action specification
  as part of the corresponding action execution object because
  the notion of action specification itself is specific for a certain
  type of action. In our case, ad-hoc actions.
  All changes recently made in the Mistral layers above the engine
  prove the correctnes of this thought. The comment can be safely
  deleted.

Change-Id: I45b97b08184c8d5a88bcc537fb5b1e538f105554
2020-10-05 17:22:28 +07:00
Renat Akhmerov 06a0f33476 Refactor Mistral with Action Providers
* This patch refactors Mistral with the action provider concept
  that is responsible for delivering actions to the system. So
  it takes all the burden of managing action definitions w/o
  having to spread that across multiple subsystems like Engine
  and API and w/o having to assume that action definitions are
  always stored in DB.
* Added LegacyActionProvider  that represents the old way of
  delivering action definitions to the system. It pretty much just
  analyses what entries are configured in the entry point
  "mistral.actions" in setup.cfg and build a collection of
  corresponding Python action classes in memory accessible by names.
* The module mistral/services/actions.py is now renamed to
  adhoc_actions.py because it's effectively responsible only for
  ad-hoc actions (those defined in YAML).
* Added the new entry point in setup.cfg "mistral.action.providers"
  to register action provider classes
* Added the module mistral/services/actions.py that will be a facade
  for action providers. Engine and other subsystems will need to
  work with it.
* Other small code changes.

Depends-On: I13033253d5098655a001135c8702d1b1d13e76d4
Depends-On: Ic9108c9293731b3576081c75f2786e1156ba0ccd
Change-Id: I8e826657acb12bbd705668180f7a3305e1e597e2
2020-09-24 11:10:33 +00:00
Q.hongtao 4bc6162515 Remove six library
Remove six-library Replace the following items with Python 3 style code.
- six.interger_types
- six.itervalues
- six.text_type
- six.string_types
- six.StringIO
- six.next
- six.b
- six.PY3

Change-Id: I299c90d5cbeb41be0132691265b8dcbeae65520e
2020-09-23 10:27:12 +08:00
Q.hongtao da5ac25415 Remove six.moves
Remove six.moves Replace the following items with Python 3 style code.
- six.moves.urllib
- six.moves.queue
- six.moves.range
- six.moves.http_client

Subsequent patches will replace other six usages.

Change-Id: I80c713546fcc97391c64e95ef708830632e1ef32
2020-09-22 08:34:20 +08:00
Q.hongtao aba14934e7 Remove usage of six.add_metaclass
With python 3.x, classes can use the metaclass= logic
to not require usage of the six library.

Subsequent patches will replace other six usages.

Change-Id: Iefdc99c338c7aaea18d535426c4676dbedb44f32
2020-09-19 11:37:24 +08:00
Renat Akhmerov 2b7a2bba01 Remove one more self.notify() call from class Task
* Getting rid of another self.notify() call in the Task class
  that's not inside of the set_state() method. That also gave
  an opportunity to not manage started_at timestamp out of
  the method set_state().

Change-Id: Ib8f61481a606fe4fc9f37112ef625b8e3c6d5cd3
2020-06-04 15:31:41 +07:00
Renat Akhmerov 90d1f1ba8e Refactor workflow notifications
* Moved all notification management for workflows into the method
  Workflow.set_state(). It's now in one place. Workflow events are
  now also identified in one method similar to how it works for
  tasks based on state transitions.
* Other style changes.

Change-Id: I40941ecca3eb4b46a06a2f7dc2fd5d909d5d087a
2020-06-03 18:44:18 +07:00
Renat Akhmerov b55dbdea68 Refactor task notifications
* All calls to a notifier within the Task class have now been
  moved into the method set_state() so that the relation between
  a state change and a notification is now straightforward and the
  notification calls don't have to be spread out across different
  modules.

Change-Id: I9c0647235e1439049d3e7db13f19bef542f10508
2020-06-03 17:51:41 +07:00
Renat Akhmerov a620dabb78 Simplify setting task "started_at" and "finished_at"
* Moving the responsibility to manage values of these timestamps
  into the method Task.set_state() so that this logic is now
  fully associated with how task execution state changes.

Change-Id: I13a5a5921dea06cee7f3efd53af5c327fe89a180
2020-06-02 16:53:51 +07:00
Renat Akhmerov 7dec19ae19 Fix calculating task execution result for "with-items"
* The logic of calculating a task result in case of "with-items" was
  overcomplicated and broke encapsulation of a "with-items" task.
  This patch makes it simpler, so that the method doesn't need to
  peek into the internals of a "with-items" task (e.g. runtime_context).

Change-Id: I036193cbae15d7f3c3414b123525ceafa91fdeb1
2020-06-02 16:28:42 +07:00
Renat Akhmerov ddf9577785 Refactor task policies
* The purpose of this patch is to improve encapsulation of task
  execution state management. We already have the class Task
  (engine.tasks.Task) that represents an engine task and it is
  supposed to be responsible for everything related to managing
  persistent state of the corresponding task execution object.
  However, we break this encapsulation in many places and various
  modules manipulate with task execution state directly. This fact
  leads to what is called "spagetty code" because important
  things are often spread out across the system and it's hard to
  maintain. It also leads to lots of duplications. So this patch
  refactors policies so that they manipulate with a task execution
  through an instance of Task which hides low level aspects.

Change-Id: Ie728bf950c4244db3fec0f3dadd5e195ad42081d
2020-06-01 14:05:49 +07:00
Zuul 0237898d59 Merge "Remove OpenStack actions from mistral" 2020-03-06 09:08:24 +00:00
Eyal 8bdf341af7 Remove OpenStack actions from mistral
Depends-on: https://review.opendev.org/#/c/703296/
Depends-On: https://review.opendev.org/#/c/704280/
Change-Id: Id62fdabe7699e7c3b2977166e253cfc77779e467
2020-02-26 10:12:01 +02:00
Renat Akhmerov 592981f487 Refactor expressions
* This patch moves code related to YAQL and Jinja into their
  specific modules so that there isn't any module that works with
  both. It makes it easier to understand how code related to one
  of these technologies works.
* Custome built-in functions for YAQL and Jinja are now in a
  separate module. It's easier now to see what's related with
  the expression framework now and what's with integration part,
  i.e. functions themselves.
* Renamed the base module of expressions similar to other packages.
* Other style changes.

Change-Id: I94f57a6534b9c10e202205dfae4d039296c26407
2020-02-26 12:36:34 +07:00
Oleg Ovcharuk 95d9f899db Extend task and workflow notification data
Change-Id: I93c1e9ed166847aea07531f98a9924a728efbab3
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
2020-02-20 10:55:46 +00:00
Renat Akhmerov 6dc0c05f04 Fix adhoc actions
* Method _create_action_execution() for AdHocActin didn't have
  the right signature. It was missing the argument "namespace" and
  failed under some conditions. This patch does some refactoring
  to preserve the target namespace during action init time. For
  regular python actions it's just taken from it's action definition
  object. For ad-hoc actions it is taken from its definition also
  but it has to do it separately because it extends the class
  PythonAction passing a base action definition into it as a parameter
  of the initializer (so that extracts the namespace of the base action).
  The benefit of preserving a namespace value during init time is that
  it becomes available for the entire instance life-span, not only for
  the method _create_action_execution().
* Style changes (blank lines, indentation, formatting).

Change-Id: I84d1cd0fb4a746197ad890276f654cd12455603e
2020-02-19 14:48:35 +07:00
Renat Akhmerov 8d75784356 Add "convert_output_data" config property for YAQL
* Adding "convert_output_data" config property gives an opportunity
  to increase overal performance. If YAQL always converts an expression
  result, it often takes significant CPU time and overall workflow
  execution time increases. It is especially important when a workflow
  publishes lots of data into the context and uses big workflow
  environments. It's been tested on a very big workflow (~7k tasks)
  with a big workflow environment (~2.5mb) that often uses the YAQL
  function "<% env() %>". This function basically just returns the
  workflow environment.
* Created all necessary unit tests.
* Other style fixes.

Change-Id: Ie3169ec884ec9a0e7e50327dd03cd78dcda0a39b
2020-02-13 17:31:41 +07:00
Renat Akhmerov 829e822581 Init profiler in for a new thread in post_tx_queue.py
* Initialization of profiler was also missing for a thread
  spawned within post_tx_queue.py so we were loosing important
  profiling info
* Changed the profiler test since its logic was already obsolete.
  Now we initialize profiler in every thread so the only reason to
  not get any profiler traces when a workflow completed is
  "enabled = False" in the "profiler" group in the configuration.
* Added more profiler traces
* Small readability changes in the workflow language spec

Change-Id: I35e6711f8e10bb08d7e842f4bca8753b929328fd
2020-02-07 13:42:55 +07:00
Zuul f1be1dd955 Merge "Update hacking and fix warnings" 2020-01-09 17:23:55 +00:00
ali 20c3408692 Add namespaces to Ad-Hoc actions
added namespace for the actions, actions can have the same name if they
 are not in the same namespace, when executing an action, if an action
 with that name is not found in the workflow namespace or given
 namespace mistral will look for that action in the default namespace.

  * action base can only be in the same namespace,or in the
    default namespace.
  * Namespaces are not part of the mistral DSL.
  * The default namespace is an empty string ''.
  * all actions will be in a namespace, if not specified, they will be
    under default namespace

Depends-On: I61acaed1658d291798e10229e81136259fcdb627
Change-Id: I07862e30adf28404ec70a473571a9213e53d8a08
Partially-Implements: blueprint create-and-run-workflows-within-a-namespace
Signed-off-by: ali <ali.abdelal@nokia.com>
2020-01-07 08:10:53 +00:00
Eyal a0663305e5 Update hacking and fix warnings
Change-Id: I47a17e140f1686e901c67c034105eeec1c421ae7
2020-01-02 17:18:38 +02:00
Zuul 7e0c7c92b7 Merge "Enlarge tags support" 2019-12-09 08:40:38 +00:00
Renat Akhmerov f61929a3c8 Implement engine graceful shutdown
* The functionality of graceful engine shutdown is now possible
  due to correct calculation of the "graceful" flag in the engine
  server's stop() method. Unfortunately, the Oslo Service framework
  doesn't pass it correctly, it simply ignores it in the call chain.
  So the only way to understand if the shutdown is graceful is to
  peek at the configuration property "graceful_shutdown_timeout"
  provided by Oslo Service. If it's greater than zero then we can
  treat it as graceful.
* Oslo Service handles only four OS signals: SIGTERM, SIGINT,
  SIGHUP and SIGALRM. Only sending SIGTERM to the process leads
  to a graceful shutdown. For example, SIGINT (which is equal to
  ctrl + C in a unix shell) interrupts the process immediately.
  So the only way to do a graceful shutdown of an engine instance
  using a unix shell is to run the "kill <PID>" command. This
  needs to be taken into account when using it.
* The patch also changes the order in which the engine server
  stops its inner services so that the underlying RPC server
  (currently Oslo Messaging based or Kombu based) stops first.
  This is needed to make sure that, first of all, no new RPC
  calls can arrive, and thereby, let all active DB transactions
  finish normally w/o starting new ones. Stopping the RPC server
  may be a heavy operation if there are already lots of RPC
  messages waiting for processing that are polled from the queue.
  So to the great extent the entire functionality of graceful
  shutdown will depend on whether an underlying RPC server
  implements the corresponding functionality in the proper way,
  i.e. after calling stop(graceful=True) it will stop receiving
  new calls and wait till all buffered RPC messages are processed
  normally.
* The maximum time given to graceful shutdown is controlled via
  the "graceful_shutdown_timeout" configuration option, which is
  60 seconds, by default.
* Minor refactoring

Implements blueprint: mistral-graceful-scale-in

Change-Id: I6d1234dfa21b1e3420ec9ca2c5235dee973748ee
2019-12-06 09:29:26 +00:00
Oleg Ovcharuk e596ee2e63 Enlarge tags support
Workflow and task executions will inherit tags from
definition. Executions filtering by tag is included.

Change-Id: Id5d615b829901258af2be7ca99178ad92b60d1fb
Closes-Bug: #1853457
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
2019-11-29 18:40:37 +03:00
Zuul a9a7a99237 Merge "Make action heartbeats work for all executor types" 2019-11-18 04:49:28 +00:00
Dougal Matthews a25c8fab88 Mask sensitive data when logging action results
When passwords or other sensible data is returned by a Action, they can
be logged by Mistral. This change uses the password masking
functionality used in mistral-lib and privided by oslo-utils.

This function uses the standard cut_repr method in mistral-lib, which
also means the output is more standardised.

Related-Bug: #1850843
Change-Id: I01bf47f7a83102a1a16b15bf0bbb4021707e11fe
2019-11-15 12:20:05 +00:00
Renat Akhmerov 7ec4f26744 Make action heartbeats work for all executor types
* Previously action hearbeats didn't work in case of using local
  executors because the component responsible for sending heartbeats
  was started by the executor RPC server which doesn't make sense to
  initialize for a local executor. This patch refactors the code
  so that now heartbeats get sent for any type of executors. For
  local executors it is also useful because a cluster node that
  runs an engine and a local executor may also crash. With this
  change, remaining cluster nodes will be able to understand that
  the action will never complete and one of them will time it out.
  If all is fine with the node where the local executor is running
  then heartbeats will be sent normally and the action won't time
  out. Before this change, in case of local executors a long running
  action would always time out after a configured amount of time
  (by default, 60 mins) just because local executors never sent
  heartbeats.
* Made a lot of renamings to clearly see what component is
  responsible for.
* Wrote the tests that check the heartbeat sender, both positive
  and negative scenarios for local and remote executor types.

Closes-Bug: #1852722

Change-Id: I4d0fdff54de9bee70aeaf10a4ef483ad7000840b
2019-11-15 16:44:40 +07:00
Zuul c11d8eade9 Merge "Refactor rerun of joins" 2019-11-12 09:32:56 +00:00
Zuul 1c7e242975 Merge "Reformat rerun logic for tasks with join" 2019-11-11 14:42:37 +00:00
Renat Akhmerov 59bf2509eb Refactor rerun of joins
* This patch moves logic that schedules a task state refreshing
  periodic job in case of rerun from the Task class to
  task_handler.run_task() so that Task doesn't have to know any
  language specific details and call task handler back. It is
  more architecturally clean.

Change-Id: If7a054bbf77f9ed761d8f3ac36b6d329544f5ff5
2019-11-11 17:10:16 +07:00
Renat Akhmerov fd24972bef Fix task expression context
* When Mistral prepares a context view to evaluate a YAQL/Jinja
  expression it needs to put "task_ex.in_contex" before
  "wf_ex.context" because the first one should take higher priority.
  For example, if a workflow declares a variable (via the "var"
  keyword) and then this variable is updated by one of the workflow
  branches then it should shadow the initial value of the variable
  when evaluating an expression (e.g. in the action input).
* We also don't need to use "ctx or self.ctx" in the modified
  _evaluate_expression() function because "self.ctx" always becomes
  "task_ex.in_context" when a task execution is created.
* Added one more test to check data flow correctness.

Change-Id: Ib9a0e2b3f5cc686cbc53d9e6c049ad7fdc12c76d
2019-11-07 17:46:20 +07:00
Eyal a68136d1e4 Evaluate input expression should check the in_context
The workflow in the test fails because contextView
does not evaluate in_context

Closes-Bug: #1850315
Change-Id: I54a4cd38e962d363fd2626476bcae9ec0aa8dad6
2019-10-30 06:15:52 +00:00
Zuul 772043881d Merge "Log the original exception in is_sync" 2019-10-09 16:51:15 +00:00
Dougal Matthews c0857a7a95 Log the original exception in is_sync
Change-Id: Id6521ba37dc5ccff727b06aecd573ac600ea8711
2019-10-03 09:31:29 +01:00
Renat Akhmerov 7a6aac0f5f Fix "root_execution" lazy loading issue and refactor execution.py
* There's an issue with lazyly loaded field of WorkflowExecution
  model occuring on GET /v2/execution/<id> because the logic
  that calculates "published_global" of the execution rest resource
  hits "root_execution" field out of transaction scope indirectly
  within the "data_flow.get_workflow_environment_dict" method.
  This patch makes refactoring of this logic and calculates
  globally published variables of the workflow execution simply
  as its context that doesn't contain all internal data like
  "__execution" and "openstack".
* Other style change.

Closes-Bug: #1846152
Change-Id: Ic8609e55930e2ed13653e79e8ca7a31c951d9030
2019-10-02 11:02:52 +00:00
ali 7e7f1cb92b moved generic util functions from mistral to mistral-lib
Depends-On: I780c270e4b1a184d7d4dcc580d23697ba75edab1
Closes-bug: #1815183
Change-Id: I5a1d402baa3f69c37f9347c8b3d02a83b8f60423
2019-09-13 04:06:27 +00:00
Zuul a9fec62cd7 Merge "Improve workflow notifications and webhook data" 2019-09-06 12:38:59 +00:00
Renat Akhmerov c99b87a8c8 Check if workflow execution is empty in integrity checker
* Method get_workflow_execution() raises an exception if the workflow
  execution does not exist. Since this is a valid case (the method
  may be called via scheduler after the execution is deleted)
  we need to smoothly handle it.

Change-Id: Ibd6099f1e0fd07c71130f11457b355a367229977
2019-09-05 13:47:14 +07:00
Andras Kovi 5eb2a21607 Improve workflow notifications and webhook data
The task_execution_id is required to be able to restore the hierarchy
of tasks and workflows on the notification receiver side. Also, including
the event in the notification is very useful.

Also fix the documentation as multiline strings are not supported in
ini files.

Change-Id: I714fd5c32b0f31f85ac5a4d22d161e662bf18687
2019-09-04 07:12:20 +02:00
Renat Akhmerov f92a5c8f44 Fix 'with-items' expression evaluation
* There was a bug left after the recent refactoring. While
  evaluating 'with-items' expression we didn't construct a context
  view properly, it didn't include a workflow environment. This
  patch fixes it.

Closes-Bug: #1839840
Change-Id: I3df711ef2484374418085fe0117fe8b37ce5ba3f
2019-09-04 03:57:03 +00:00
Zuul 2cdcb5415e Merge "Improve new scheduler" 2019-08-28 20:24:07 +00:00
Renat Akhmerov 0f6bc1897f Improve new scheduler
* Changed method get_scheduled_jobs_count() in the Scheduler
  interface to has_scheduled_jobs(). In fact, the callers
  always need to compare the number of jobs with 0, i.e.
  to see if there are any jobs or not. But more importantly,
  this semantics (returning just boolean) allows to make a
  good optimisation for DefaultScheduler and avoid DB calls
  in a number of cases. Practically, for example, it saves
  several seconds (5-6) if we run a workflow with 500 parallel
  no-op tasks that are all merged with one "join" task. Tested
  on 1 and 2 engines.
* Added test assertions for has_scheduled_jobs()
* Other minor chagnes

Change-Id: Ife48d9e464114fd60a08707d8f32f847a6f623c9
2019-08-16 13:39:39 +07:00