Commit Graph

1148 Commits

Author SHA1 Message Date
OpenDev Sysadmins 33be0d299a OpenDev Migration Patch
This commit was bulk generated and pushed by the OpenDev sysadmins
as a part of the Git hosting and code review systems migration
detailed in these mailing list posts:

http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003603.html
http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004920.html

Attempts have been made to correct repository namespaces and
hostnames based on simple pattern matching, but it's possible some
were updated incorrectly or missed entirely. Please reach out to us
via the contact information listed at https://opendev.org/ with any
questions you may have.
2019-04-19 19:39:52 +00:00
Vladimir Sharshov (warpc) ee2d95de33 Do not send data about nodes in case of task deployment
Nailgun use data about nodes in stop deployment respond
to reset it to discovory state which is unexpected behavior
for already provisioned nodes in case of task deployment

Change-Id: I39de8a8afd627b0bf209d9a7f6ad6e19abd99016
Partial-Bug: #1672964
(cherry picked from commit 039ce9e0b8)
2017-03-28 16:40:05 +00:00
Vladimir Sharshov (warpc) 8213652e49 Fix wrong ready status instead of stopped for stop deployment
Report ready status for node means successful node status
which can be get if all tasks was passed with ready and skipped
statuses.

Same effect can be get if Astute mark node as skipped. In this
case we also get equal status 'successful'.

So we need ask node about skipped statuses before ask it about
successful status to prevent losing context about stop
deployment operation.

Change-Id: I3c042425cab800de0bfc4e03f29414b145f44983
Closes-Bug: #1672964
(cherry picked from commit 496212798e)
2017-03-20 13:04:40 +00:00
Vladimir Kozhukalov 645d12a01a Fix puppetd mcagent issues
* Add maxlength parameter to command_prefix puppetd ddl
* Add command_prefix default parameter to puppet task
* Add command_prefix parameter to puppet mclient

Change-Id: Ie826bccd09fa22512526ecf1a08c6ab4e0d3f0d7
Implements: blueprint get-rid-cobbler-dnsmasq
2017-03-13 19:09:25 +03:00
Jenkins c7339032af Merge "Add command_prefix field to puppet mcagent" into stable/newton 2017-03-10 12:10:27 +00:00
Vladimir Sharshov (warpc) 29988c78e9 task_deploy: no implicit conversion of String into Integer
Now Astute will not calculate fault tolerance groups and
critical node uids twice.

Change-Id: I3bf2dd0ffc0fc74fd9c670bd50b32e3285ae7e2a
Closes-Bug: #1669499
(cherry picked from commit 892894bfb5)
2017-03-07 14:33:10 +00:00
Vladimir Kozhukalov c734b03042 Add command_prefix field to puppet mcagent
This will allow to run puppet with environment variables.
E.g. FACTER_foo=bar puppet apply ...

Change-Id: I1e435262e810ead46689078513607f6a99a19043
Implements: blueprint get-rid-cobbler-dnsmasq
2017-03-06 15:20:20 +03:00
Vladimir Sharshov (warpc) 3c0bc71a87 Fail tolerance behavior for upload file tasks
Additional changes:

- decrease number of "reset undefined retries" messages;
- rewriting log messages for better understanding.

Change-Id: I17db392ac4c73a3c08505fcbaf17dbcce96ebd91
Blueprint: graph-concept-extension
(cherry picked from commit 48ee1f7467)
2017-02-28 16:15:57 +00:00
Vladimir Sharshov (warpc) 2cfc4f30f5 Add retries for upload tasks: upload_file and upload_files
Some of mcollective client for some reason can ignore task
from Astute. For such cases Astute shoult retry it request.

Also:
- refactoring tasks to support class hook post_initialize
  instead of super
- change @task and @ctx to equal instance methods
- removed old fixtures

Change-Id: I96613f53303fd71acc437d2f8f47b599bcf3b5d9
(cherry picked from commit c489e972ff)
2017-02-20 15:25:20 +00:00
Vladimir Sharshov (warpc) 5b00f6081d Astute do not respect debug option for puppet task
Run cluster with enable debug option do not affect
puppet task which always run with debug disable.

It is happened because Nailgun send debug option which
Astute setup for every task, but puppet task required
puppet_debug option to control it behavior. This code
will connect this parameters.

Change-Id: I8df68105aa699e83673c39a0f03bb22673171d6f
Closes-Bug: #1662512
(cherry picked from commit 003a0a0efd)
2017-02-08 10:34:28 +00:00
Jenkins 19038bed30 Merge "Add missing comments" into stable/newton 2017-02-08 10:09:55 +00:00
Dmitry Ilyin be41baa200 Add missing comments
Change-Id: I4d59bb10b3340ae85ea8406fd238ad8843b4bc50
(cherry picked from commit cc4ecafe2c)
2017-02-07 17:59:20 +00:00
Dmitry Ilyin 3c087854a5 Add basemodulepath to the Puppet mcagent
This parameter is required for Puppet 4
to be able to find the base modules without
any environment defined.

Puppet 3 is able to work in the legacy mode
without the environment support and does not
case if there is base module path or not.

Closes-Bug: 1655663
Change-Id: I60f2c78ef5fe366314eea186f4671d198e54f1d6
(cherry picked from commit 1c1578b64a)
2017-02-07 17:58:50 +00:00
Vladimir Kozhukalov e05f66d12e Move not provisioned nodes to error status
When there are lot of nodes to provision and we provision
them by chunks, we could fail in the middle due to "Too many
nodes failed to provision". If so, we need to append those
nodes where we did not started provision at all to the list
of failed nodes. Otherwise, those nodes will be reported
as 'provisioned' with progress = 100 and rebooted.
But for some reasons we bind all nodes before starting provision
to debian-installer profile in cobbler, and being rebooted
these not provisioned nodes will fail to boot, because since
7.0 we put empty files where cobbler expects debian-installer
kernel and initrd files. :-)

Change-Id: I2a401b80614ee7dd5a10931b9b50bcff066f790f
Closes-Bug: #1656269
(cherry picked from commit 570049ca1f)
2017-01-16 15:08:31 +03:00
Jenkins 128f61f636 Merge "Improve the gracefully stop debug messages" into stable/newton 2017-01-16 08:39:38 +00:00
Dmitry Ilyin 5fb11c352a Improve the gracefully stop debug messages
Change-Id: I736aebefbeba7fd1ddaa0ce844a53999d346d7fd
(cherry picked from commit b78e52d204)
2017-01-13 17:10:49 +00:00
Dmitry Ilyin afab09a7a0 Fix mcagent report
Set reports to none in order to disable the Puppet
reporting during the deployment. It some cases it
Puppet was failing the deploymnet being unable to
store the report file to the missing Puppet master.

Change-Id: I888316824920f71f6c4953c513eea3a4c277d50b
(cherry picked from commit 2505ab1d8e)
2017-01-13 15:06:13 +00:00
Vladimir Sharshov (warpc) 0497a610d2 Network problem tolerance puppet status check
Connection between node and Astute can be lost some
times, so we need more tries to get info about task
status on node.

Two changes:

- instead of 1 try Astute will run 6 tries with 10
timeout for every attempt;
- it will process such behavior for puppet using separately
  retries: puppet_undefined_retries

Instead of full puppet retry status retry is safety because
it is idempotent.

Puppet undefined retries can be setup using Astute config
or sending undefined_retries in puppet task parameters same
way as for usual retries. Most important thing: it will refresh
to original value every time when Astute get defined answer.

Change-Id: Ie86576a3400be5a6b11041c8e6acf89abf3bbd51
Related-Bug: #1653210
Closes-Bug: #1653737
(cherry picked from commit 7c0485eb1a)
2017-01-06 12:02:11 +00:00
Vladimir Sharshov (warpc) e9fa6117b5 Fix fail detection for provisioning
Add test to prevent such behavior in future

Change-Id: If833723f0301f2008e6aabfc888d0bdf693f4f2e
Partial-Bug: #1653210
(cherry picked from commit b0752c7a78)
2016-12-30 11:25:27 +00:00
Jenkins 7e6a6a7b8c Merge "Use async shell call for provision" into stable/newton 2016-12-28 17:15:09 +00:00
Vladimir Sharshov (warpc) 47e27fb76c Upload file task timeout support
Astute will not retry and will not wait around 10 minutes for
every node which connection was missed in case of
upload file task. For now it will wait only default upload
timeout.

Default timeout for upload now can be setup in config. For now
it is 60 seconds. Also upload file task now support timeout
parameter which will overide default.

Change-Id: Ice8207f539566a50d4eb30c04ab563c3ee1278ec
Closes-Bug: #1629031
(cherry picked from commit f475c45dfc)
2016-12-28 14:58:13 +00:00
Vladimir Sharshov (warpc) 9d7ba716fc Use async shell call for provision
This change allow to use async shell task based on
puppet to run provision commands.

It is transition change between old run way of image
provision and provision as graph which will also
used async shell to run.

It is more fault tolerance way to provision because
temporary problem with connection between master node
and provisioning node do not block or fail provision.

Important notice: it is allow only if bootstrap image
has puppet and daemonize packages which is true for 9.2
or higher releases.

Change-Id: Ie634fae9b63bf0c103ec8926647af75b57cefe23
Related-Bug: #1644618
(cherry picked from commit dc47550460)
2016-12-28 14:30:54 +00:00
Evgeny L 1b86a47c9c Remove info from the log message about retries for suceed tasks.
Sometimes engineers missread the message, it's not obvious
that there such a thing as succeed_retries, so it's read
as a message of failed task which does not have more
retries.

Closes-bug: #1641194
Change-Id: I948996fcb8054a2bde27a9de7c7cac650b3c2b8c
(cherry picked from commit 99153e5374)
2016-12-23 17:25:31 +00:00
Jenkins e0add5b67d Merge "Speed up graph && node processing" into stable/newton 2016-12-16 18:04:19 +00:00
Vladimir Sharshov (warpc) 2501c278dd Compact nailgun hook error message
This change change error message by adding '\n\n' before error
details which give Fuel UI ability to hide this part of message.

Change-Id: I2e93ee3aa0aae183cd320d2438f781a975c5e70f
Closes-Bug: #1614422
(cherry picked from commit 3905cab1ea)
2016-12-16 13:54:26 +00:00
Vladimir Sharshov (warpc) 0f95d62cef Speed up graph && node processing
- in case of big number of node (more then 200) and tasks
(more then 20000), progress calculation can slow down
- remove status magent call from puppet run (decrease number
  of magent calls from 2 to 1 in case of positive scenario)

Change-Id: I70675a6bbd391d0112c594626bdb0ce7bb9e3e1e
2016-12-16 12:04:44 +00:00
Vladimir Sharshov (warpc) 03b914edf2 Run nailgun-agent on rebooted nodes
Without this change some node can be marked wrongly
as offline in Nailgun side.

Change-Id: I4a89ac101867effe6f277c2dcaa93e9b67b65875
Closes-Bug: #1626072
2016-12-13 16:24:24 +00:00
Jenkins 87ba44dcde Merge "Add logging for network checker MCollective plugin" into stable/newton 2016-11-25 07:45:23 +00:00
Jenkins 97db78a9ed Merge "Revert "Support global progress for tasks"" into stable/newton 2016-11-23 17:27:45 +00:00
Vladimir Sharshov (warpc) 5be2a5f281 New version of puppet task engine
Changes:

- remove report from task engine;
- remove old logic for hangs and 'idling' statuses;
- increase code redability;
- add code docs;
- support retries in case of MClient errors for status
  and run actions;
- replace timeout raise on usual code;
- descrease waiting time for puppet run (from 120 to 10) and
  time between try (from 30 to 2);
- mcollective retry descrease from 5 to 1. Now it will use
  puppet retries if failed during network/mcollective problem
  after 1 try.

Closes-Bug: #1613396
Change-Id: I98fe3df65ef335b03eceb2c401eba12cf68ee1c8
(cherry picked from commit bca595a964)
2016-11-22 18:14:03 +00:00
Evgeny L 480a73dc0c Add logging for network checker MCollective plugin
Closes-bug: #1641741
Change-Id: I0ab24230a036c22d6fa96d5cf2e534260bed6e33
2016-11-22 18:11:26 +00:00
Jenkins 79573a7c40 Merge "Set timeout for resource which is used to wrap shell tasks." into stable/newton 2016-11-22 09:31:24 +00:00
Evgeny L e8e6a3bdb9 Set timeout for resource which is used to wrap shell tasks.
Slow tasks fail, because default timeout for `exec` resource
is 300 seconds. The patch passes timeout from the task to
puppet wrapper.

Closes-bug: #1641190
Change-Id: I8f7c2120e61144911481c83b0da391e30bbc6f2f
2016-11-22 09:02:47 +00:00
Evgeny L 715aae026b Fix default retries parameters for shell tasks
Use shell specific parameters shell_retries and shell_interval,
instead of mc_retries and mc_interval which are used for retries
if node was not accesible via MCollective.

Closes-bug: #1641198
Change-Id: I04a4d187ab3aa4cde46b2775766eb88babd46ab7
2016-11-22 09:01:20 +00:00
Vladimir Kuklin 10a59cc3ca Revert "Support global progress for tasks"
This reverts commit 3f21d35f35.

Change-Id: If0cf99129fdc38c40ee8322c872f6b4f9b83c0b5
Partial-bug: #1633212
2016-11-08 09:38:20 +00:00
Davanum Srinivas f3ae7576b1 Update .gitreview for stable/newton
Change-Id: I1295a24e0b786bd1753789c9164ca02a1c74a897
2016-10-14 09:15:15 -04:00
Vladimir Sharshov (warpc) 11ec66899e Support global progress for tasks
Calculate progress for cluster using simple formula

    100 * all_tasks_finished / all_tasks_total

It will works with custom graph too.

Change-Id: Iaea07ec19d80d5f344c8ecf434f771da7a608157
Closes-Bug: #1623937
2016-09-26 14:02:30 +03:00
Jenkins c11a24c052 Merge "Fix non-working zero tolerance error group" 2016-09-26 10:56:49 +00:00
Jenkins 00f10f8cfc Merge "Increase xml rpc timeout" 2016-09-23 09:32:07 +00:00
Jenkins 90617acd7d Merge "Remove puppet status files right before running it" 2016-09-23 09:31:04 +00:00
Georgy Kibardin ac2703949f Remove puppet status files right before running it
This prevents us from picking up status files from previous run when
current run hasn't written them for some reason, i.e. crash etc.

Change-Id: I83d0b4aa3c42210279b75ed7b575919d2d092ff0
Closes-Bug: #1560026
(cherry picked from commit e5311dd97b)
2016-09-21 14:53:23 +00:00
Vladimir Sharshov (warpc) 9dee3b3da7 Fix non-working zero tolerance error group
Also:

- catch divide by 0 in case of progress
- catch sutuation with report for node without current tasks

Change-Id: If4a975abf6da4ba1848be50a23f6532f649d2982
Closes-Bug: #1620858
2016-09-20 15:29:35 +03:00
Dmitry Ilyin 686a1550a2 Set task run and finish messages to info
Change-Id: I449b18bb759170ef9d131a2ddf3b534a348fc5c5
2016-09-19 13:51:44 -05:00
Georgy Kibardin 0e93c8b6c8 Increase xml rpc timeout
With hundreds of nodes Cobbler sync cannot fit default 30 secods
timeout. Cobbler performance is going to be investigated in the next
release. By now lets just increase the timeout.

Change-Id: Ief8ff93fc808549e8d729040512a266b0c09383d
Closes-Bug: #1608700
(cherry picked from commit f030161d19)
2016-09-16 07:56:55 +00:00
Vladimir Sharshov (warpc) 67896b9a59 Zero tolerance for errors on nodes as default behavior
Without this change we do not mark deployment as error
if task on node failed.

Also use early initialize of logger for support library
Deployment

Change-Id: Ibcac4569756b34c3c1ac33f68ae203246d94d2a4
Closes-Bug: #1620858
2016-09-13 20:35:39 +03:00
Jenkins 58dd9d2f2c Merge "Pass auth token to Timmy" 2016-09-13 12:05:40 +00:00
Jenkins 1d4b268359 Merge "Remove task in orchestrator call for service tasks" 2016-09-13 10:41:50 +00:00
Jenkins d86e9624c3 Merge "Fix problem with newest activesupport version" 2016-09-13 10:04:23 +00:00
Jenkins 1c40c2c9d2 Merge "Ressurect --start|--end options for graph execution" 2016-09-13 09:57:34 +00:00
Vladimir Sharshov (warpc) 9733dc90c9 Remove task in orchestrator call for service tasks
Astute will send message about running tasks for all tasks

Change-Id: I1579cba030007501938cd89d95d5032c3aaa1417
Related-Bug: #1621003
2016-09-13 12:41:57 +03:00