Nailgun use data about nodes in stop deployment respond
to reset it to discovory state which is unexpected behavior
for already provisioned nodes in case of task deployment
Change-Id: I39de8a8afd627b0bf209d9a7f6ad6e19abd99016
Partial-Bug: #1672964
Report ready status for node means successful node status
which can be get if all tasks was passed with ready and skipped
statuses.
Same effect can be get if Astute mark node as skipped. In this
case we also get equal status 'successful'.
So we need ask node about skipped statuses before ask it about
successful status to prevent losing context about stop
deployment operation.
Change-Id: I3c042425cab800de0bfc4e03f29414b145f44983
Closes-Bug: #1672964
This will allow to run puppet with environment variables.
E.g. FACTER_foo=bar puppet apply ...
Change-Id: I1e435262e810ead46689078513607f6a99a19043
Implements: blueprint get-rid-cobbler-dnsmasq
(cherry picked from commit c734b03042)
Now Astute will not calculate fault tolerance groups and
critical node uids twice.
Change-Id: I3bf2dd0ffc0fc74fd9c670bd50b32e3285ae7e2a
Closes-Bug: #1669499
Some of mcollective client for some reason can ignore task
from Astute. For such cases Astute shoult retry it request.
Also:
- refactoring tasks to support class hook post_initialize
instead of super
- change @task and @ctx to equal instance methods
- removed old fixtures
Change-Id: I96613f53303fd71acc437d2f8f47b599bcf3b5d9
Run cluster with enable debug option do not affect
puppet task which always run with debug disable.
It is happened because Nailgun send debug option which
Astute setup for every task, but puppet task required
puppet_debug option to control it behavior. This code
will connect this parameters.
Change-Id: I8df68105aa699e83673c39a0f03bb22673171d6f
Closes-Bug: #1662512
When there are lot of nodes to provision and we provision
them by chunks, we could fail in the middle due to "Too many
nodes failed to provision". If so, we need to append those
nodes where we did not started provision at all to the list
of failed nodes. Otherwise, those nodes will be reported
as 'provisioned' with progress = 100 and rebooted.
But for some reasons we bind all nodes before starting provision
to debian-installer profile in cobbler, and being rebooted
these not provisioned nodes will fail to boot, because since
7.0 we put empty files where cobbler expects debian-installer
kernel and initrd files. :-)
Change-Id: I2a401b80614ee7dd5a10931b9b50bcff066f790f
Closes-Bug: #1656269
(cherry picked from commit 570049ca1f)
Connection between node and Astute can be lost some
times, so we need more tries to get info about task
status on node.
Two changes:
- instead of 1 try Astute will run 6 tries with 10
timeout for every attempt;
- it will process such behavior for puppet using separately
retries: puppet_undefined_retries
Instead of full puppet retry status retry is safety because
it is idempotent.
Puppet undefined retries can be setup using Astute config
or sending undefined_retries in puppet task parameters same
way as for usual retries. Most important thing: it will refresh
to original value every time when Astute get defined answer.
Change-Id: Ie86576a3400be5a6b11041c8e6acf89abf3bbd51
Related-Bug: #1653210
Closes-Bug: #1653737
This change allow to use async shell task based on
puppet to run provision commands.
It is transition change between old run way of image
provision and provision as graph which will also
used async shell to run.
It is more fault tolerance way to provision because
temporary problem with connection between master node
and provisioning node do not block or fail provision.
Important notice: it is allow only if bootstrap image
has puppet and daemonize packages which is true for 9.2
or higher releases.
Change-Id: Ie634fae9b63bf0c103ec8926647af75b57cefe23
Related-Bug: #1644618
Astute will not retry and will not wait around 10 minutes for
every node which connection was missed in case of
upload file task. For now it will wait only default upload
timeout.
Default timeout for upload now can be setup in config. For now
it is 60 seconds. Also upload file task now support timeout
parameter which will overide default.
Change-Id: Ice8207f539566a50d4eb30c04ab563c3ee1278ec
Closes-Bug: #1629031
- in case of big number of node (more then 200) and tasks
(more then 20000), progress calculation can slow down
- remove status magent call from puppet run (decrease number
of magent calls from 2 to 1 in case of positive scenario)
Change-Id: I70675a6bbd391d0112c594626bdb0ce7bb9e3e1e
This change change error message by adding '\n\n' before error
details which give Fuel UI ability to hide this part of message.
Change-Id: I2e93ee3aa0aae183cd320d2438f781a975c5e70f
Closes-Bug: #1614422
Slow tasks fail, because default timeout for `exec` resource
is 300 seconds. The patch passes timeout from the task to
puppet wrapper.
Closes-bug: #1641190
Change-Id: I8f7c2120e61144911481c83b0da391e30bbc6f2f
Changes:
- remove report from task engine;
- remove old logic for hangs and 'idling' statuses;
- increase code redability;
- add code docs;
- support retries in case of MClient errors for status
and run actions;
- replace timeout raise on usual code;
- descrease waiting time for puppet run (from 120 to 10) and
time between try (from 30 to 2);
- mcollective retry descrease from 5 to 1. Now it will use
puppet retries if failed during network/mcollective problem
after 1 try.
Closes-Bug: #1613396
Change-Id: I98fe3df65ef335b03eceb2c401eba12cf68ee1c8
Without this change some node can be marked wrongly
as offline in Nailgun side.
Change-Id: I4a89ac101867effe6f277c2dcaa93e9b67b65875
Closes-Bug: #1626072
Calculate progress for cluster using simple formula
100 * all_tasks_finished / all_tasks_total
It will works with custom graph too.
Change-Id: Iaea07ec19d80d5f344c8ecf434f771da7a608157
Closes-Bug: #1623937
This prevents us from picking up status files from previous run when
current run hasn't written them for some reason, i.e. crash etc.
Change-Id: I83d0b4aa3c42210279b75ed7b575919d2d092ff0
Closes-Bug: #1560026
(cherry picked from commit e5311dd97b)
Also:
- catch divide by 0 in case of progress
- catch sutuation with report for node without current tasks
Change-Id: If4a975abf6da4ba1848be50a23f6532f649d2982
Closes-Bug: #1620858
With hundreds of nodes Cobbler sync cannot fit default 30 secods
timeout. Cobbler performance is going to be investigated in the next
release. By now lets just increase the timeout.
Change-Id: Ief8ff93fc808549e8d729040512a266b0c09383d
Closes-Bug: #1608700
(cherry picked from commit f030161d19)
Without this change we do not mark deployment as error
if task on node failed.
Also use early initialize of logger for support library
Deployment
Change-Id: Ibcac4569756b34c3c1ac33f68ae203246d94d2a4
Closes-Bug: #1620858
This reverts commit d6a40e0590.
Also, this change pins activesupport gem version used for ruby 2.1.5
Change-Id: I4002b11fe7716a38ff2321643a8bad9af9de3fa0
Closes-Bug: #1619621
Signed-off-by: Maksim Malchuk <mmalchuk@mirantis.com>
Currently Astute uses 'custom' field for sending task summary and
it's wrong, because Nailgun searching for 'summary' field.
Change-Id: Ieb01161d92f82768cbc5057b5dbb501fcf53a74f
Currently noop_run option is fetching from tasks_metadata, but from
nailgun side it is passed as deployment_option.
Change-Id: Id07c5ecd83fc37a95f7f289879459ee5d7aebd7c
Several changes:
- new task type 'master_shell': run task on master node using
node context;
- new task type 'move_to_bootstrap': move non-bootstrap node to
bootstrap, remove and add all nodes to Cobbler;
- add new task type similar to noop: skipped, stage;
- add new task type 'erase_node': erase node as task;
- refactoring reporting message: now it simple and protect
from sending duplicate message for any formats
- allow to setup node report behavior using
node_statuses_transitions in tasks_metadata in case of
successful, stopped or failed
Change-Id: Iac128fc9d8c764269bebb3e95d6ba9e4a086f919
This patch adds ability to run deployment graph with noop option.
In same time, this option will be applied only to 'shell' and
'puppet' types only.
Change-Id: Ibcb275bb84dfd553ab07e6d58af753ecf96ab3a5