Distributed serialization is implemented with python distributed
library. We have scheduler for jobs management and workers for
jobs processing. Scheduler is started on the master node as well
as set of workers on it. Also workers are started on all nodes.
In the cluster settings we can select the type of serialization
and nodes statuses that allows serialization on it. By default
nodes with status 'ready' are excluded from the workers list.
For data serialization we are using only nodes from the cluster
where serialization is performing.
Before the computation fresh nailgun code is sent to the workers
as zip file and it will be imported for job execution. So we always
have fresh nailgun code on the workers.
In one job we are processing chunks of tasks on the workers. This
approach significantly boosts performance. The tasks chunk size
is defined as settings.LCM_DS_TASKS_PER_JOB parameter.
For limiting memory consumption on the master node we use parameter
settings.LCM_DS_NODE_LOAD_COEFF for calculation max number of jobs
in the processing queue.
Synthetic tests of distributed serialization for 500 nodes with
nubmer of ifaces >= 5 performed on 40 cores (4 different machines)
took 6-7 minutes on average.
Change-Id: Id8ff8fada2f1ab036775fc01c78d91befdda9ea2
Implements: blueprint distributed-serialization
This commit switching tasks resolution approach to the tags based one.
Tag - minimal unit what's necessary only for task resolution and can be
mapped to the node through the role interface only. Each role provides set
of tags in its 'tags' field and may be modified via role API. Tag may be
created separately via tag API, but, this tag can not be used unless it's
stuck to the role.
Change-Id: Icd78fd124997c8aafb07964eeb8e0f7dbb1b1cd2
Implements: blueprint role-decomposition
A 'tags' attribute has been added to each role in 'roles_metadata'.
Initially all non-controller roles will only have a tag of their own
role name. This will allow existing tasks which do not have tags
associated with them to work correctly. In the abscence of tags a
task's roles will be used to determine which nodes it will run on.
Implements: blueprint role-decomposition
Change-Id: I390580146048b6e00ec5c42d0adf995a4cff9167
The size of deployment_info grows as n^2 depending on
nodes number. That's because common_attrs, which is
merged into each node's contains info about all nodes.
For example for 600 nodes we store about 1Gb of data in
the database. So as first step let's store common_attrs
separately in deployment_info structure inside python
code and in the database.
Also removed old test for migrations, which are not related
to actual database state.
Change-Id: I431062b3f9c8dedd407570729166072b780dc59a
Partial-Bug: #1596987
This property contains list of groups, that is built from
tasks with type 'group' and each task may contain property
fault_tolerance, that shall be moved from openstack.yaml
to deployment tasks.
For plugins this attribute is filled from roles_metadata
for all tasks with type group (for backward compatibility).
DocImpact
Partial-Bug: 1435610
Change-Id: I1969b953eca667c09248a6b67ffee37bfd20f474
Tasks serialization process take long time in case when environment
contains many nodes. To reduce time of serialization, make
serialization process work in parallel (reduces time of
serialization mostly linear way with increasing workers pool)
DocImpact
Change-Id: Id3753dbc6983256d410e69c98ab02b61ab6bfb7f
Partial-Bug: #1572103
Co-Authored-With: V. Kuklin <vkuklin@mirantis.com>
Co-Authored-With: B. Gaifullin <bgaifullin@mirantis.com>
The option LCM_CHECK_TASK_VERSION will control
task serializer allows to start deploy with tasks
that have version less than 2.0.0 or not.
Change-Id: I452c91b8343bf2a77b037920d924238c0a851ba1
Closes-Bug: 1570973
nailgun.errors have a huge set of exceptions but without hierarchy. This
patch remove exception generation from dict and make it explicitly with
python classes and add some exceptions hierarchy. Now all network errors
inherit from NetworkException and same for other exceptions.
Change-Id: I9a2c6b358ea02a16711da74562308664ad7aed97
Closes-bug: #1566195
The condition for task was checked before evaluation of
YAQL expressions.
Also fixed that extra attributes of task were passed to astute.
Change-Id: Iaed23a8d0f263eef5d56281ee383328a6f0a98cc
Closes-Bug: 1563016
We need separate task serializers for LCM,
because LCM uses context per node for serialize tasks.
It also allows to isolate LCM related code for backward
compatibility with existing environments, that is not ready for LCM.
Change-Id: Ie95a58c8cf86eac1a5c3dbd956fafc401e40fed6
Implements: blueprint computable-task-fields-yaql