hacking 3.0.x is too old.
Also remove the note about pip's behavior which was already fixed in
recent versions.
Change-Id: I65d350943649c3346ed5741631c01724ddd256ef
This feature introduces an enhanced error-handling
mechanism for workflows, allowing them to gracefully
handle issues within individual tasks without
causing a complete workflow failure. Previously,
when using subworkflow and passing an incomplete set
of parameters, the entire workflow would terminate.
With this feature, the workflow continues execution,
isolating errors at the task level. Consequently,
partial issues in one task no longer impact other
branches of workflow execution.
Implements blueprint partial-workflow-failure-handling
Change-Id: Id6a910c85c1d6953408682a2a724c4826333422f
In error cases join task could lose context of some branches
Change-Id: I58a94c4ebc5d860473c9b48df326f6ea29cba9fa
Closes-Bug: #2020370
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
After this patch mistral will run tasks using RPC which will distribute tasks amongst available engine threads. this will enhance performance in case of executing huge executions containing multiple tasks.
Implements: blueprint distribute-mistral-operations
Change-Id: I0b7202589eee68ba5560bf2aa60fbbd6118f3719
After this patch, user can update logging format to include root_execution_id in logs, which will be helpful to find and debug logs related to specific workflow execution.
- Logs about creation and status changes of Mistral entities(execution,
task, action execution, etc) are changed to INFO log level.
- User can update logging_context_format_string to include root_execution_id in logs.
Implements: Implements: blueprint improve-mistral-loggers
Change-Id: I54fe058e5451abba6ea7f69d03d498d78a90993e
This patch adds an ability to rerun failed workflow by
skipping failed tasks. Workflow behavior in skip case could
be configured by new fields in task definition:
* on-skip
* publish-on-skip
Change-Id: Ib802a1b54e69c29b4d0361f048c2b9c076a4c176
Implements: blueprint mistral-task-skipping-feature
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
ABCs in collections should be imported from collections.abc and direct
import from collections is deprecated since Python 3.3.
Closes-Bug: #1936667
Change-Id: Ide8aa0323d9713c1c2ea0abf3b671ca4dab95ef0
* It's clear now that we don't have to store an action specification
as part of the corresponding action execution object because
the notion of action specification itself is specific for a certain
type of action. In our case, ad-hoc actions.
All changes recently made in the Mistral layers above the engine
prove the correctnes of this thought. The comment can be safely
deleted.
Change-Id: I45b97b08184c8d5a88bcc537fb5b1e538f105554
* This patch refactors Mistral with the action provider concept
that is responsible for delivering actions to the system. So
it takes all the burden of managing action definitions w/o
having to spread that across multiple subsystems like Engine
and API and w/o having to assume that action definitions are
always stored in DB.
* Added LegacyActionProvider that represents the old way of
delivering action definitions to the system. It pretty much just
analyses what entries are configured in the entry point
"mistral.actions" in setup.cfg and build a collection of
corresponding Python action classes in memory accessible by names.
* The module mistral/services/actions.py is now renamed to
adhoc_actions.py because it's effectively responsible only for
ad-hoc actions (those defined in YAML).
* Added the new entry point in setup.cfg "mistral.action.providers"
to register action provider classes
* Added the module mistral/services/actions.py that will be a facade
for action providers. Engine and other subsystems will need to
work with it.
* Other small code changes.
Depends-On: I13033253d5098655a001135c8702d1b1d13e76d4
Depends-On: Ic9108c9293731b3576081c75f2786e1156ba0ccd
Change-Id: I8e826657acb12bbd705668180f7a3305e1e597e2
Remove six.moves Replace the following items with Python 3 style code.
- six.moves.urllib
- six.moves.queue
- six.moves.range
- six.moves.http_client
Subsequent patches will replace other six usages.
Change-Id: I80c713546fcc97391c64e95ef708830632e1ef32
With python 3.x, classes can use the metaclass= logic
to not require usage of the six library.
Subsequent patches will replace other six usages.
Change-Id: Iefdc99c338c7aaea18d535426c4676dbedb44f32
* Getting rid of another self.notify() call in the Task class
that's not inside of the set_state() method. That also gave
an opportunity to not manage started_at timestamp out of
the method set_state().
Change-Id: Ib8f61481a606fe4fc9f37112ef625b8e3c6d5cd3
* Moved all notification management for workflows into the method
Workflow.set_state(). It's now in one place. Workflow events are
now also identified in one method similar to how it works for
tasks based on state transitions.
* Other style changes.
Change-Id: I40941ecca3eb4b46a06a2f7dc2fd5d909d5d087a
* All calls to a notifier within the Task class have now been
moved into the method set_state() so that the relation between
a state change and a notification is now straightforward and the
notification calls don't have to be spread out across different
modules.
Change-Id: I9c0647235e1439049d3e7db13f19bef542f10508
* Moving the responsibility to manage values of these timestamps
into the method Task.set_state() so that this logic is now
fully associated with how task execution state changes.
Change-Id: I13a5a5921dea06cee7f3efd53af5c327fe89a180
* The logic of calculating a task result in case of "with-items" was
overcomplicated and broke encapsulation of a "with-items" task.
This patch makes it simpler, so that the method doesn't need to
peek into the internals of a "with-items" task (e.g. runtime_context).
Change-Id: I036193cbae15d7f3c3414b123525ceafa91fdeb1
* The purpose of this patch is to improve encapsulation of task
execution state management. We already have the class Task
(engine.tasks.Task) that represents an engine task and it is
supposed to be responsible for everything related to managing
persistent state of the corresponding task execution object.
However, we break this encapsulation in many places and various
modules manipulate with task execution state directly. This fact
leads to what is called "spagetty code" because important
things are often spread out across the system and it's hard to
maintain. It also leads to lots of duplications. So this patch
refactors policies so that they manipulate with a task execution
through an instance of Task which hides low level aspects.
Change-Id: Ie728bf950c4244db3fec0f3dadd5e195ad42081d
* This patch moves code related to YAQL and Jinja into their
specific modules so that there isn't any module that works with
both. It makes it easier to understand how code related to one
of these technologies works.
* Custome built-in functions for YAQL and Jinja are now in a
separate module. It's easier now to see what's related with
the expression framework now and what's with integration part,
i.e. functions themselves.
* Renamed the base module of expressions similar to other packages.
* Other style changes.
Change-Id: I94f57a6534b9c10e202205dfae4d039296c26407
* Method _create_action_execution() for AdHocActin didn't have
the right signature. It was missing the argument "namespace" and
failed under some conditions. This patch does some refactoring
to preserve the target namespace during action init time. For
regular python actions it's just taken from it's action definition
object. For ad-hoc actions it is taken from its definition also
but it has to do it separately because it extends the class
PythonAction passing a base action definition into it as a parameter
of the initializer (so that extracts the namespace of the base action).
The benefit of preserving a namespace value during init time is that
it becomes available for the entire instance life-span, not only for
the method _create_action_execution().
* Style changes (blank lines, indentation, formatting).
Change-Id: I84d1cd0fb4a746197ad890276f654cd12455603e
* Adding "convert_output_data" config property gives an opportunity
to increase overal performance. If YAQL always converts an expression
result, it often takes significant CPU time and overall workflow
execution time increases. It is especially important when a workflow
publishes lots of data into the context and uses big workflow
environments. It's been tested on a very big workflow (~7k tasks)
with a big workflow environment (~2.5mb) that often uses the YAQL
function "<% env() %>". This function basically just returns the
workflow environment.
* Created all necessary unit tests.
* Other style fixes.
Change-Id: Ie3169ec884ec9a0e7e50327dd03cd78dcda0a39b
* Initialization of profiler was also missing for a thread
spawned within post_tx_queue.py so we were loosing important
profiling info
* Changed the profiler test since its logic was already obsolete.
Now we initialize profiler in every thread so the only reason to
not get any profiler traces when a workflow completed is
"enabled = False" in the "profiler" group in the configuration.
* Added more profiler traces
* Small readability changes in the workflow language spec
Change-Id: I35e6711f8e10bb08d7e842f4bca8753b929328fd
added namespace for the actions, actions can have the same name if they
are not in the same namespace, when executing an action, if an action
with that name is not found in the workflow namespace or given
namespace mistral will look for that action in the default namespace.
* action base can only be in the same namespace,or in the
default namespace.
* Namespaces are not part of the mistral DSL.
* The default namespace is an empty string ''.
* all actions will be in a namespace, if not specified, they will be
under default namespace
Depends-On: I61acaed1658d291798e10229e81136259fcdb627
Change-Id: I07862e30adf28404ec70a473571a9213e53d8a08
Partially-Implements: blueprint create-and-run-workflows-within-a-namespace
Signed-off-by: ali <ali.abdelal@nokia.com>
* The functionality of graceful engine shutdown is now possible
due to correct calculation of the "graceful" flag in the engine
server's stop() method. Unfortunately, the Oslo Service framework
doesn't pass it correctly, it simply ignores it in the call chain.
So the only way to understand if the shutdown is graceful is to
peek at the configuration property "graceful_shutdown_timeout"
provided by Oslo Service. If it's greater than zero then we can
treat it as graceful.
* Oslo Service handles only four OS signals: SIGTERM, SIGINT,
SIGHUP and SIGALRM. Only sending SIGTERM to the process leads
to a graceful shutdown. For example, SIGINT (which is equal to
ctrl + C in a unix shell) interrupts the process immediately.
So the only way to do a graceful shutdown of an engine instance
using a unix shell is to run the "kill <PID>" command. This
needs to be taken into account when using it.
* The patch also changes the order in which the engine server
stops its inner services so that the underlying RPC server
(currently Oslo Messaging based or Kombu based) stops first.
This is needed to make sure that, first of all, no new RPC
calls can arrive, and thereby, let all active DB transactions
finish normally w/o starting new ones. Stopping the RPC server
may be a heavy operation if there are already lots of RPC
messages waiting for processing that are polled from the queue.
So to the great extent the entire functionality of graceful
shutdown will depend on whether an underlying RPC server
implements the corresponding functionality in the proper way,
i.e. after calling stop(graceful=True) it will stop receiving
new calls and wait till all buffered RPC messages are processed
normally.
* The maximum time given to graceful shutdown is controlled via
the "graceful_shutdown_timeout" configuration option, which is
60 seconds, by default.
* Minor refactoring
Implements blueprint: mistral-graceful-scale-in
Change-Id: I6d1234dfa21b1e3420ec9ca2c5235dee973748ee
Workflow and task executions will inherit tags from
definition. Executions filtering by tag is included.
Change-Id: Id5d615b829901258af2be7ca99178ad92b60d1fb
Closes-Bug: #1853457
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
When passwords or other sensible data is returned by a Action, they can
be logged by Mistral. This change uses the password masking
functionality used in mistral-lib and privided by oslo-utils.
This function uses the standard cut_repr method in mistral-lib, which
also means the output is more standardised.
Related-Bug: #1850843
Change-Id: I01bf47f7a83102a1a16b15bf0bbb4021707e11fe
* Previously action hearbeats didn't work in case of using local
executors because the component responsible for sending heartbeats
was started by the executor RPC server which doesn't make sense to
initialize for a local executor. This patch refactors the code
so that now heartbeats get sent for any type of executors. For
local executors it is also useful because a cluster node that
runs an engine and a local executor may also crash. With this
change, remaining cluster nodes will be able to understand that
the action will never complete and one of them will time it out.
If all is fine with the node where the local executor is running
then heartbeats will be sent normally and the action won't time
out. Before this change, in case of local executors a long running
action would always time out after a configured amount of time
(by default, 60 mins) just because local executors never sent
heartbeats.
* Made a lot of renamings to clearly see what component is
responsible for.
* Wrote the tests that check the heartbeat sender, both positive
and negative scenarios for local and remote executor types.
Closes-Bug: #1852722
Change-Id: I4d0fdff54de9bee70aeaf10a4ef483ad7000840b
* This patch moves logic that schedules a task state refreshing
periodic job in case of rerun from the Task class to
task_handler.run_task() so that Task doesn't have to know any
language specific details and call task handler back. It is
more architecturally clean.
Change-Id: If7a054bbf77f9ed761d8f3ac36b6d329544f5ff5
* When Mistral prepares a context view to evaluate a YAQL/Jinja
expression it needs to put "task_ex.in_contex" before
"wf_ex.context" because the first one should take higher priority.
For example, if a workflow declares a variable (via the "var"
keyword) and then this variable is updated by one of the workflow
branches then it should shadow the initial value of the variable
when evaluating an expression (e.g. in the action input).
* We also don't need to use "ctx or self.ctx" in the modified
_evaluate_expression() function because "self.ctx" always becomes
"task_ex.in_context" when a task execution is created.
* Added one more test to check data flow correctness.
Change-Id: Ib9a0e2b3f5cc686cbc53d9e6c049ad7fdc12c76d
The workflow in the test fails because contextView
does not evaluate in_context
Closes-Bug: #1850315
Change-Id: I54a4cd38e962d363fd2626476bcae9ec0aa8dad6
* There's an issue with lazyly loaded field of WorkflowExecution
model occuring on GET /v2/execution/<id> because the logic
that calculates "published_global" of the execution rest resource
hits "root_execution" field out of transaction scope indirectly
within the "data_flow.get_workflow_environment_dict" method.
This patch makes refactoring of this logic and calculates
globally published variables of the workflow execution simply
as its context that doesn't contain all internal data like
"__execution" and "openstack".
* Other style change.
Closes-Bug: #1846152
Change-Id: Ic8609e55930e2ed13653e79e8ca7a31c951d9030
* Method get_workflow_execution() raises an exception if the workflow
execution does not exist. Since this is a valid case (the method
may be called via scheduler after the execution is deleted)
we need to smoothly handle it.
Change-Id: Ibd6099f1e0fd07c71130f11457b355a367229977
The task_execution_id is required to be able to restore the hierarchy
of tasks and workflows on the notification receiver side. Also, including
the event in the notification is very useful.
Also fix the documentation as multiline strings are not supported in
ini files.
Change-Id: I714fd5c32b0f31f85ac5a4d22d161e662bf18687
* There was a bug left after the recent refactoring. While
evaluating 'with-items' expression we didn't construct a context
view properly, it didn't include a workflow environment. This
patch fixes it.
Closes-Bug: #1839840
Change-Id: I3df711ef2484374418085fe0117fe8b37ce5ba3f
* Changed method get_scheduled_jobs_count() in the Scheduler
interface to has_scheduled_jobs(). In fact, the callers
always need to compare the number of jobs with 0, i.e.
to see if there are any jobs or not. But more importantly,
this semantics (returning just boolean) allows to make a
good optimisation for DefaultScheduler and avoid DB calls
in a number of cases. Practically, for example, it saves
several seconds (5-6) if we run a workflow with 500 parallel
no-op tasks that are all merged with one "join" task. Tested
on 1 and 2 engines.
* Added test assertions for has_scheduled_jobs()
* Other minor chagnes
Change-Id: Ife48d9e464114fd60a08707d8f32f847a6f623c9