Distributed serialization is implemented with python distributed
library. We have scheduler for jobs management and workers for
jobs processing. Scheduler is started on the master node as well
as set of workers on it. Also workers are started on all nodes.
In the cluster settings we can select the type of serialization
and nodes statuses that allows serialization on it. By default
nodes with status 'ready' are excluded from the workers list.
For data serialization we are using only nodes from the cluster
where serialization is performing.
Before the computation fresh nailgun code is sent to the workers
as zip file and it will be imported for job execution. So we always
have fresh nailgun code on the workers.
In one job we are processing chunks of tasks on the workers. This
approach significantly boosts performance. The tasks chunk size
is defined as settings.LCM_DS_TASKS_PER_JOB parameter.
For limiting memory consumption on the master node we use parameter
settings.LCM_DS_NODE_LOAD_COEFF for calculation max number of jobs
in the processing queue.
Synthetic tests of distributed serialization for 500 nodes with
nubmer of ifaces >= 5 performed on 40 cores (4 different machines)
took 6-7 minutes on average.
Change-Id: Id8ff8fada2f1ab036775fc01c78d91befdda9ea2
Implements: blueprint distributed-serialization
When root is set to context manually it will not
be converted into immutable object and task serialization
will do faster because it does not spent time to this conversion
Related-Bug: 1569859
Change-Id: I46e4c43a757b78d96eec3edea0f66361afae3d73
The main purpose of this commit is to have an ability
to split configuration file astute.yaml into common
and node parts. Common part is huge and we will
dump it once and also there will be only one instance
of this data in RAM which saves a lot of memory when
you run deploy on many nodes (>100).
This patch adds two new variables to context, which can be
used in yaql_exp: $node and $common for node and common
parts of the context. Functions changed, changed_all,
changed_any, added, deleted don't work for these variables.
DocImpact
Change-Id: I56bf982652a5dc27882e4a401ca9ec124899fed7
Partial-Bug: #1596987
There are 3 new orchestration tasks:
* master_shell
Run task on master node with a context of other node. If 'roles'
selects N nodes, the task will be executed N times.
* erase_node
Erase node. It's necessary task if we want to remove nodes by means
of graphs and not pre-hardcoded actions in Astute.
* move_to_bootstrap
Change node's PXE config to boot via LAN (into bootstrap). As a
previous one, it's necessary task for deletion graph.
Change-Id: Ie8f852762b837a68e0e0b49e11653a8f2e56a014
Blueprint: graph-concept-extension
It would be very convenient (but not necessary) to extend YAQL context
(i.e. set of helper functions) via Nailgun extensions. That might be
very helpful if end users build their own custom deployments on top of
Fuel.
Change-Id: Ib47d8204ff995517fc9a4d7889de8bbc9c23f227
Now following handlers:
/clusters/:cluster_id/changes/
/clusters/:cluster_id/changes/redeploy/
/clusters/:cluster_id/deploy/
/clusters/:cluster_id/deploy_tasks/
?dry_run=1 that is telling Astute not to run cluster executionat all.
Dry run assumes that it does not actually affect
cluster status regardless of their result.
Also, remove redundant update of nodes statuses to 'deploying'
within OpenStackConfigManager and DeploymentTaskManager as it should be
done by receiever.
Do not set cluster status to 'deployment' for these nodes in order to
retain its real status
Modify stop deployment tests to move failing stop deployment for already
deployed clusters to another test class. Since 9.0 we can run stop
deployment for new clusters.
Change-Id: I374fc86b63af64411d4a5ca45ff6c3680cb44897
Partial-bug: #1569839
The yaql_exp can be used for calculate dependencies of all tasks
including skipped tasks, so the task attributes traverse should
be called for all task too.
Also added check that dependency is not empty object, because
it is possible when it has been dynamically generated via yaql.
Partial-Bug: 1541309
Change-Id: Ibcb786d2a7917d7583433c0b96f6324be4de759b
Now we get deployment state from DeploymentHistory model. For every task
we get last success transaction and its state.
Change-Id: I2288bc2bc34023c2ca705f1d3cc6ff48347bf549
Closes-bug: #1572226
nailgun.errors have a huge set of exceptions but without hierarchy. This
patch remove exception generation from dict and make it explicitly with
python classes and add some exceptions hierarchy. Now all network errors
inherit from NetworkException and same for other exceptions.
Change-Id: I9a2c6b358ea02a16711da74562308664ad7aed97
Closes-bug: #1566195
The multi field conditions do not transform properly
because the regexp uses greedy algorithm
Change-Id: Iad1d839e640942d3a447b226387273b89c040fcd
Closes-Bug: 1569420
The condition for task was checked before evaluation of
YAQL expressions.
Also fixed that extra attributes of task were passed to astute.
Change-Id: Iaed23a8d0f263eef5d56281ee383328a6f0a98cc
Closes-Bug: 1563016
We need separate task serializers for LCM,
because LCM uses context per node for serialize tasks.
It also allows to isolate LCM related code for backward
compatibility with existing environments, that is not ready for LCM.
Change-Id: Ie95a58c8cf86eac1a5c3dbd956fafc401e40fed6
Implements: blueprint computable-task-fields-yaql