* This patch refactors Mistral with the action provider concept
that is responsible for delivering actions to the system. So
it takes all the burden of managing action definitions w/o
having to spread that across multiple subsystems like Engine
and API and w/o having to assume that action definitions are
always stored in DB.
* Added LegacyActionProvider that represents the old way of
delivering action definitions to the system. It pretty much just
analyses what entries are configured in the entry point
"mistral.actions" in setup.cfg and build a collection of
corresponding Python action classes in memory accessible by names.
* The module mistral/services/actions.py is now renamed to
adhoc_actions.py because it's effectively responsible only for
ad-hoc actions (those defined in YAML).
* Added the new entry point in setup.cfg "mistral.action.providers"
to register action provider classes
* Added the module mistral/services/actions.py that will be a facade
for action providers. Engine and other subsystems will need to
work with it.
* Other small code changes.
Depends-On: I13033253d5098655a001135c8702d1b1d13e76d4
Depends-On: Ic9108c9293731b3576081c75f2786e1156ba0ccd
Change-Id: I8e826657acb12bbd705668180f7a3305e1e597e2
With python 3.x, classes can use the metaclass= logic
to not require usage of the six library.
Subsequent patches will replace other six usages.
Change-Id: Iefdc99c338c7aaea18d535426c4676dbedb44f32
added namespace for the actions, actions can have the same name if they
are not in the same namespace, when executing an action, if an action
with that name is not found in the workflow namespace or given
namespace mistral will look for that action in the default namespace.
* action base can only be in the same namespace,or in the
default namespace.
* Namespaces are not part of the mistral DSL.
* The default namespace is an empty string ''.
* all actions will be in a namespace, if not specified, they will be
under default namespace
Depends-On: I61acaed1658d291798e10229e81136259fcdb627
Change-Id: I07862e30adf28404ec70a473571a9213e53d8a08
Partially-Implements: blueprint create-and-run-workflows-within-a-namespace
Signed-off-by: ali <ali.abdelal@nokia.com>
* Previously action hearbeats didn't work in case of using local
executors because the component responsible for sending heartbeats
was started by the executor RPC server which doesn't make sense to
initialize for a local executor. This patch refactors the code
so that now heartbeats get sent for any type of executors. For
local executors it is also useful because a cluster node that
runs an engine and a local executor may also crash. With this
change, remaining cluster nodes will be able to understand that
the action will never complete and one of them will time it out.
If all is fine with the node where the local executor is running
then heartbeats will be sent normally and the action won't time
out. Before this change, in case of local executors a long running
action would always time out after a configured amount of time
(by default, 60 mins) just because local executors never sent
heartbeats.
* Made a lot of renamings to clearly see what component is
responsible for.
* Wrote the tests that check the heartbeat sender, both positive
and negative scenarios for local and remote executor types.
Closes-Bug: #1852722
Change-Id: I4d0fdff54de9bee70aeaf10a4ef483ad7000840b
* Moved away from using Oslo periodic tasks in the action execution
reporter since in this case they don't make the code more readable.
Also, now it is symmetric with other similar components like action
execution checker.
* Refactored action execution checker w/o using classes since having
many instances of it doesn't make sense.
* Small style changes
Change-Id: I9a97c40222e8dc4870c9b6a7c5f5e3c14f37bdd6
* Some users rely on the presence of the root error related to
running an action and it's not convenient that it is now in
the end of the string, e.g. if we look at the corresponding
task execution "state_info" field. This patch includes the cause
error message in the beginning of the resulting error string
returned by the action executor so that it's clearly visible.
This message can be also truncated in some cases (depending on
the config option) so we need to make sure we keep the cause
error message.
Closes-Bug: #1847984
Change-Id: Ieb10c10401380410665c418f4688681e929b1e23
When Python actions raise an exception they may not have failed to run.
This may not even be an issue. For example, the OpenStack action
`swift.head_container` will raise an exception is the container doesn't
exist.
This change lowers the exception to a warning but keeps the exception
traceback in the logs. It also changes the wording in the message. We
didn't fail to run the action, rather the action raise an exception.
Change-Id: If9a6a3b98999acae8b80ad4ddeb9d197a628c280
We previously ported the code to mistral-lib, but Mistral has been using
the original copy.
Closes-Bug: #1782765
Change-Id: Ifb518d821097fdf2ec76161ae00f312ced19c272
If an executor dies while running an action execution, then the
execution will remain in RUNNING state (because the dead executor
can't signal the error).
Implements blueprint: action-execution-reporting
Change-Id: I51b4db6aa321d0e53bbb85a74f8ebaea0376d22e
It shall be possible to specify timeout for Mistral actions in order
to cancel some long-performed action so that to provide predictable
execution time for client service.
Currently Mistral allows configure timeout on task and automatically
changes task status to error. However mistral don't interrupt action
execution.
We need Mistral to terminate timed out action execution, because there
might be the following issues:
* several the same action executions can run at the same time breaking
data consistency
* stale action executions may lead to the massive resources
consumption (memory, cpu..)
Change-Id: I2a960110663627a54b8150917fd01eec68e8933d
Signed-off-by: Vitalii Solodilov <mcdkr@yandex.ru>
* If a subworkflow completes it sends its result to a parent
workflow by using the scheduler (delayed call) which operates
through the database and has a delay between iterations.
This patch optimizes this by reusing already existing
decorator @action_queue.process to make RPC calls to convey
subworkflow results outside of a DB transaction, similar
way as we schedule action runs after completion of a task.
The main reason for making this change is how Scheduler now
works in HA mode. In fact, it doesn't scale well because
every Scheduler instance keeps quering DB for delayed calls
eligible for processing and hence in HA setup many Schedulers
take same delayed calls often and clash between each other
causing DB deadlocks in mysql. They are caused just by mysql
locking model (it's documented in their docs) so we have
means to handle them. However, Scheduler still remans a
bottleneck in the system and it's better to reduce the load
on it as much as possible.
One more reason to make this change is that we don't solve
the problem of eleminating the possibility to loose RPC
messages (when a DB TX is committed and RPC calls is not made
yet) with Scheduler anyway. If we use Scheduler for scheduling
RPC calls we just shift the place where we can unsync DB and
MQ to the Scheduler. So, in other words, it is a fundamental
problem of syncing two external data sources which can't be
naturally enrolled into one distributed transaction.
Based on our experience or running big workflows we concluded
that simplication of network protocols gives better results,
meaning that the less components we use for network
communications the better. Eventually it increases performance
and reduces the load on the system and also reduces the
probability of having DB and MQ out of sync.
We used to use Scheduler for running actions on executors too by
scheduling RPC calls but at some point we saw that it reduces
performance on 40-50% without bringing any real benefits at
this expense. The opposite way, Scheduler was even a worse
bottleneck because of this. So we decided to eliminate the
Scheduler from this chain and the system became practically
much more performant and reliable. So now I did the same
with delivering a subworkflow result.
I believe when it comes to recovering from situations of
DB and MQ being out of sync we need to come up with special
tools that will assume some minimal human intervention
(although I think we can recover some things automatically).
Such a tool should just make it very obvious what's broken
and how to fix it, and make it convenient to fix it (restart
a task/action etc.).
* Processing action queue now happens within a new greenthread
because otherwise Mistral engine can get into a deadlock
by sending a request to itself while processing another one.
It can happen if we use blocking RPC which is the only option
for now.
* Other small fixes
Change-Id: Ic3cf6c47bba215dc6a13944b0585cce59e4e88f9
This is a middle step to move serveral parts into mistral-lib
Fixes a typo on the deprecation message
Changes the last actions to use mistral-lib actions
Convert the serialization to use mistral-lib serialization
Use mistral-lib results as the standard. (the serialization mixin change
is required for the results to work)
Next steps are:
change mistral-lib serialization to take care of all serialization
change dependent libraries to use mistral-lib directly
Change-Id: I4eacf5ce2e72916b700e8bc77ac9d95859131931
This patch wont pass CI until mistral-lib is packaged for the TripleO
CI.
Depends-On-External: https://review.rdoproject.org/r/6266
Depends-On: Icec6d1a3c483a30e9e3fa3175ed0233053c69daa
Change-Id: Iab8d093f53477585e60a99413ed5379fb7e5b4ae
The rpc_backend with kombu and oslo are being used by the executor
and event engine as well. This patch move the rpc_backend up one
level so it's not engine specific. Also Event engine has its own module
and the EventEngine class is defined in the engine module. This patch
moves the EventEngine to it's own base file in the event_engine module.
Implements: blueprint mistral-actions-run-by-engine
Change-Id: Ie814a26e05f5ca6bfba10f20a7d5921836aa7602
Make executor pluggable and allow option to run the executor
locally on the engine or remotely over RPC.
Change-Id: I7cfb13068aa1d1f88136eaa092e629c34b78adf2
Implements: blueprint mistral-actions-run-by-engine