With complex parallel joins mistral had no mechanism to choose which
publish (left or right in terms of merge) should it use. It is a
common case when one branch updates existing value, but after merge
we see the old version.
This patch introduce context versioning feature, where every existing
key of mistral context has its version, and this version is used in
the context merge stage.
Change-Id: I604a9a8391150ac4801115b9892f781c33ecfdcb
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
This option has never been used by actual logic. This deprecates
the ineffective option so that we can remove it in a future release.
Change-Id: I350e45fc9aef28db8790614ade7a5ad3071e574b
The backend_url option can sometimes contain secrets.
For example when redis coordination backend is used and authentication
is enabled in redis, the plain redis password is put as an URL element.
[coordination]
backend_url=redis://:password@127.0.0.1:6379
Change-Id: Ie073d9ac8fd8580f7442370291814d99aad92066
This change introduces a new option, [healthcheck] enabled, which
enables the healthcheck middleware in mistral-api pipeline.
This middleware allows status check at /healthcheck path, which is
useful for load balancers or any monitoring services to validate health
of its backend services.
This change is created based on the same change proposed to ironic[1].
[1] 6f439414bdcef9fc02f844f475ec798d48d42558
Co-Authored-By: Jim Rollenhagen <jim@jimrollenhagen.com>
Change-Id: I9bf3de8a5ae6a8c9346285147732b773a3667f7e
after this patch user can choose option whether to replace or merge task data to the execution context.
ex: merge_strategy: replace/merge
Implements: blueprint merge-mistral-tasks-data
Change-Id: I3c96bab9953c4995f2b718ac48dff0f153872026
Opportunity to hide sensitive data from http action logs, such as:
* Request headers
* Request body
* Response body
Change-Id: I6d1b1844898343b8fa30f704761096e3d2936c4d
Implements: blueprint mistral-hide-sensitive-data-from-http-actions-logs
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
For many various support reasons, Mistral should have a special
endpoint to store all necessary info data. This endpoint will read
json from created by admin info file. To configure this you should
use mistral configuration:
[api]
enable_info_endpoint = True
info_json_file_path = info.json
Change-Id: I6f344dc15a4ca5c69a6b21841544a31f95eb393f
Implements: blueprint mistral-info-endpoint
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
As per the community goal of migrating the policy file
the format from JSON to YAML[1], we need to do two things:
1. Change the default value of '[oslo_policy] policy_file''
config option from 'policy.json' to 'policy.yaml' with
upgrade checks.
2. Deprecate the JSON formatted policy file on the project side
via warning in doc and releasenotes.
Also replace policy.json to policy.yaml ref from doc and tests.
[1]https://governance.openstack.org/tc/goals/selected/wallaby/migrate-policy-format-from-json-to-yaml.html
Change-Id: I3b9aeb3379a76f7e40dab0c46e27f4447a0c3d03
* This patch refactors Mistral with the action provider concept
that is responsible for delivering actions to the system. So
it takes all the burden of managing action definitions w/o
having to spread that across multiple subsystems like Engine
and API and w/o having to assume that action definitions are
always stored in DB.
* Added LegacyActionProvider that represents the old way of
delivering action definitions to the system. It pretty much just
analyses what entries are configured in the entry point
"mistral.actions" in setup.cfg and build a collection of
corresponding Python action classes in memory accessible by names.
* The module mistral/services/actions.py is now renamed to
adhoc_actions.py because it's effectively responsible only for
ad-hoc actions (those defined in YAML).
* Added the new entry point in setup.cfg "mistral.action.providers"
to register action provider classes
* Added the module mistral/services/actions.py that will be a facade
for action providers. Engine and other subsystems will need to
work with it.
* Other small code changes.
Depends-On: I13033253d5098655a001135c8702d1b1d13e76d4
Depends-On: Ic9108c9293731b3576081c75f2786e1156ba0ccd
Change-Id: I8e826657acb12bbd705668180f7a3305e1e597e2
* Adding "convert_output_data" config property gives an opportunity
to increase overal performance. If YAQL always converts an expression
result, it often takes significant CPU time and overall workflow
execution time increases. It is especially important when a workflow
publishes lots of data into the context and uses big workflow
environments. It's been tested on a very big workflow (~7k tasks)
with a big workflow environment (~2.5mb) that often uses the YAQL
function "<% env() %>". This function basically just returns the
workflow environment.
* Created all necessary unit tests.
* Other style fixes.
Change-Id: Ie3169ec884ec9a0e7e50327dd03cd78dcda0a39b
* Added a configuration option to the expiration policy
to filter out workflow states.
Closes-Bug: #1796627
Change-Id: Ife49e6da1d7d52a3f50f1628d808d4c65a22cad9
* For the sake of the service performance, it may make sense to
disable validation of the workflow language syntax if it is
affordable for a particular use case. For example, if all
workflows are auto-generated by a 3rd party system and tested
thoroughly (either by running them with Mistral or at least
validating them via the special Mistral endpoint) then we can
safely disable validation of the language syntax when uploading
workflow definitions. For production systems it makes a big
difference if workflow texts are large (thousands of tasks).
This patch adds the boolean parameter "skip_validation" for API
requests like "POST /v2/workflows" to disable validation, if
needed, and the new configuration property "validation_mode"
to set a desired validation mode.
The option is an enumeration and has the following valid values:
1) "enabled" - enabled for all API requests unless it's
explicitly disabled in the request itself
2) "mandatory" - enabled for all API requests regardless
of the flag in the request
3) "disabled" - disabled for all API requrests regardless
of the flag in the request
"mandatory" is choosen as the default value for this new
property to keep compatibility with the previous versions.
* Minor style changes.
Closes-Bug: #1844242
Change-Id: Ib509653d38254954f8449be3031457e5f636ccf2
* After this patch we can switch scheduler implementations in the
configuration. All functionality related to scheduling jobs is
now expressed vi the internal API classes Scheduler and
SchedulerJob. Patch also adds another entry point into setup.cfg
where we can register a new scheduler implementation.
* The new scheduler (which is now called DefaultScheduler) still
should be considered experimental and requires a lot of testing
and optimisations.
* Fixed and refactored "with-items" tests. Before the patch they
were breaking the "black box" testing principle and relied on
on some either purely implementation or volatile data (e.g.
checks of the internal 'capacity' property)
* Fixed all other relevant tests.
Change-Id: I340f886615d416a1db08e4516f825d200f76860d
In case of several engine instances, starting subworkflows not in the
same instance with parent task, could significantly smooth load between
engine instances. Furthermore, starting workflow is not always
a lightweight operation and we should try to make it as more 'atomic'
as possible.
Change-Id: I895bee811496f920b075880a6c438c53f7ecb2ca
Signed-off-by: Oleg Ovcharuk <vgvoleg@gmail.com>
* Removed using scheduler from action execution heartbeat checker
in favor of regular threads.
* Added the new config options "batch_size" under [action_heartbeat]
group to limit a number of action executions being processed during
one iteration the checker.
* Added a test checking that an action execution is automatically
failed by the heartbeat checker.
Closes-Bug: #1802065
Change-Id: I18c0c2c3159b9294c8af96c93c65a6edfc1de1a1
* Added the new property 'execution_integrity_check_batch_size'
under the [engine] group to limit the number of task executions
that the integrity checker may process during one iteration.
Closes-Bug: #1801876
Change-Id: I3c5074c45c476ebff109617cb15d56c54575dd4f
This patch delivers the first working version of a distributed
scheduler implementation based on local and persistent job
queues. The idea is inspired by the parallel computing pattern
known as "Work stealing" although it doesn't fully repeat it
due to a nature of Mistral.
See https://en.wikipedia.org/wiki/Work_stealing for details.
Advantages of this scheduler implementation:
* It doesn't have job processing delays when a cluster topology'
is stable caused by DB polling intervals. A job gets scheduled
in memory and also saved into the persistent storage for
reliability. A persistent job can be picked up only after a
configured allowed period of time so that it happens effectively
after a node responsible for local processing crashed.
* Low DB load. DB polling still exists but it's not a primary
scheduling mechamisn now but rather a protection from node crash
situations. That means that a polling interval can now be made
large like 30 seconds, instead of 1-2 seconds. Less DB load
leads to less DB deadlocks between scheduler instances and less
retries on MySQL.
* Since DB load is now less it gives better scalability properties.
A bigger number of engines won't now lead to much bigger
contention because of a big DB polling intervals.
* Protection from having jobs forever hanging in processing state.
In the existing implementation, if a scheduler captured a job
for processing (set its "processing" flag to True) and then
crashed then a job will be in processing state forever in the DB.
Instead of a boolean "processing" flag, the new implementation
uses a timestamp showing when a job was captured. That gives us
the opportunity to make such jobs eligible for recapturing and
further processing after a certain configured timeout.
TODO:
* More testing
* DB migration for the new scheduled jobs table
* Benchmarks and testing under load
* Standardize the scheduler interface and write an adapter for the
existing scheduler so that we could choose between scheduler
implementations. It's highly desired to make transition to the
new scheduler smooth in production: we always need to be able
to roll back to the existing scheduler.
Partial blueprint: mistral-redesign-scheduler
Partial blueprint: mistral-eliminate-scheduler-delays
Change-Id: If7d06b64ac14d01e80d31242e1640cb93f2aa6fe
eval isn't safe or secure and shouldn't ever be used in this situation.
We can possibly use oslo.config for this, so this is only a partial fix
but might be good enough. This change removes a security issue.
Partial-Bug: #1783293
Change-Id: Id5c02d92ad7335c3d7d42ac353b88376cdb704fb
Using this API is needed to correctly initialize the configuration.
[keystone_authtoken] group is used for keystonemiddleware to validating
token.
The new [keystone] group is used for keystoneauth to initialization
the keystone session.
Co-Authored-By: wangxiyuan<wangxiyuan@huawei.com>
Change-Id: Ie3ab512b0ab42c0f97b3099e0787f4edee08ddce
Partial-Bug: #1775140
* With Pymysql driver and "eventlet" Oslo Messaging executor type
Mistral seems to work fine. Just in case there's a regression in
comparison with using "blocking" executor this patch makes a
a change in a form of a config option that defines an RPC
executor rather making it hardcoded.
Change-Id: Id73364eee29f2113fc983718b9891a496ca32ee4
If an executor dies while running an action execution, then the
execution will remain in RUNNING state (because the dead executor
can't signal the error).
Implements blueprint: action-execution-reporting
Change-Id: I51b4db6aa321d0e53bbb85a74f8ebaea0376d22e
* Added new JavaScript evaluator py_mini_racer. Advantages:
* is distributed as wheel package
* supports differences platforms
* live project
* BUILD_V8EVAL was removed because it was replaced by py_mini_racer in
Mistral Docker image
* Added stevedore integration to javascript evaluators
* Refreshed javascript tests. Add test for py_mini_racer evaluator
* Install py_mini_racer library in during mistral test
* Refreshed javascript action doc
Change-Id: Id9d558b9b8374a2c2639e10cb1868f4e67f96e86
Implements: blueprint mistral-add-py-mini-racer-javascript-evaluator
Signed-off-by: Vitalii Solodilov <mcdkr@yandex.ru>
Introduce execution events and notification server and plugins for
publishing these events for consumers. Event notification is defined per
workflow execution and can be configured to notify on all the events or
only for specific events.
Change-Id: I9820bdc4792a374dad9ad5310f84cd7aaddab8ca
Implements: blueprint mistral-execution-event-subscription
Now to perform some action mistral gets its definition from
the database first. It's not really optimal, because if there are
a lot of similar action calls, mistral will reread the same data
from db. It increases the whole execution time and the load on the
database.
To improve the performance it's suggested to cache read definitions
and take them from the cache instead of the database in the subsequent
times.
Cache ttl can be configured with ``action_definition_cache_time``
option from [engine] group. The default value is 60 seconds.
Change-Id: I330b7cde982821d4f0a06cdd2954499ac0b7be37
The cron_trigger subsystem in Mistral queries the database every second
to look for triggers that require execution. This can be very wasteful
if your deployment only has cron triggers that run infrequently (every
hour or day etc.). Letting operators configure this value reduces the
load and allows the cron triggers to be useful in more scenarios.
Operators do need to be aware that this configuration would limit the
frequency of execution. For example, if the value is set to 600 then cron
triggers configured to run every minute will only run every 10 minutes.
Related-Bug: #1747386
Change-Id: I9060253bc416be28af4ef81f3edf694059d92066
This patch changes the call into oslo_config to pass the string
containing the version instead of passing the version module.
Change-Id: I698d7206c195f1762dfbadb78c599c60be7f310b
Fixes-bug: 1717869
It would be good when you have many Mistral-Engine instances.
It distributes selection of delayed calls between all of them.
Also replaced the sleep function by the await function in the tests.
Change-Id: I4f0e0a79ff8169ee4e9b359015db22898f634614
Signed-off-by: Vitalii Solodilov <mcdkr@yandex.ru>
ceilometer api and client have been deprecated for over
two releases and now removed completely. Lets drop these
actions and update the requirements.
Change-Id: Ica2b835a885b9b4705996f91080afc12587bd314
* Added a method called via scheduler to check and fix workflow
execution integrity. Specifically, this method finds all
task executions in RUNNING state and for each of them checks
whether all its action/workflow executions are finished. If yes,
it most likely means that the task got stuck (i.e. an RPC
was lost) and we need to repair it.
* Also added atomic update of a task execution state using the
well-known approach with "UPDATE ... WHERE" SQL query which
returns a number of actually updated rows. This prevents
from possible races between concurrent processes that try to
update a task execution in overlapping transactions.
* Other minor style changes
Change-Id: If4f2efb05d959d1ffdb16515f08a811c5ce597e6
* Cron triggers (implemented in periodic.py) are always enabled
although in many installations they are not needed. When they
are enabled they consume resoruces (keep polling DB etc.).
This patch add the config option "enabled" under "cron_trigger"
group that can be used to disable the entire subsystem.
* Wiped out "disable_cron_trigger_thread" pecan app config variable
that was use to disable cron triggers in unit tests in favor of
the new config option
* Other minor style changes
Closes-bug: #1724147
Change-Id: I79b9ccb2f4286b3ea8696b7cd65472c8a49937bf
* Now it is impossible to set the same
pool name for queue listeners which use
event engine. By default, it creates
an unique pool named <hostname> so each
event engine is in its own pool. Due to
that and documentation of oslo.messaging,
any message that comes to topic duplicates
across all event engines.
* But if they have the same pool name, the message
will be delivered only to one of event-engines
(by round-robin).
* This patch adds a possibility to change listener pool
name for each event-engine.
Change-Id: Iea83c461694a26d9cea810e6cc6169a0fe3f9f06
* Made scheduler delay configurable. It now consists of a fixed
part configured with the 'fixed_delay' property and a random
addition limited by the 'random_delay' config property.
Because of this, using loopingcall from oslo was replaced with
a regular loop in a separate thread becase loopingcall
supports only fixed delays.
Closes-Bug: #1721733
Change-Id: I8f6a15be339e208755323afb18e4b58f886770c1
This patch enables ssl support for keycloak middleware. It adds 3
new config options: 'certfile', 'keyfile' and 'cafile' and substitues
their values to the request to keycloak server.
Change-Id: Id8a771af373cd9d1e198142c21957622f9d0232c
Closes-bug: #1712749
In the file common/config.py, for the bool type config, use
BoolOpt instead of StrOpt, and also for host type config,
use HostAddressOpt instead.
Change-Id: I1c025e53f685f491f3d5b8ce11ad0bd7d0a7a08c