Doing this we avoid inclusion of task cache update statement into the next
transaction which may cause different problems such as deadlocks.
(update happens inside make_astute_message() function)
Change-Id: I865b98beb621bee089cf79f1304498fd3637d64f
Closes-Bug: #1618852
Doing this we avoid inclusion of task cache update statement into the next
transaction which may cause different problems such as deadlock.
In this particular case we've got the following deadlock:
1. DeleteIBPImagesTask makes UPDATE tasks SET cache....
2. Response handler in receiver makes SELECT clusters FOR UPDATE
3. The code following DeleteIBPImagesTask makes SELECT clusters FOR UPDATE
4. Response handler performs SELECTS tasks FOR UPDATE
Change-Id: Ic8e5f2386364421b0667f920499e031f587f726e
Closes-Bug: #1653083
* reset_environment supertask contains 3 subtasks:
base_reset_environment, remove_keys_task,
remove_ironic_bootstrap_task
* names for tasks were changed
* response methods for remove_keys_task and
remove_ironic_bootstrap_task were added to receiver
* _restore_pending_changes method was add only for
reset_environment_resp
* migration for adding new transaction names and appropriate
test were added
* test for check task message was added
Change-Id: Ib8a215174431486316bca533797932e02969c037
Closes-Bug: #1541868
This commit switching tasks resolution approach to the tags based one.
Tag - minimal unit what's necessary only for task resolution and can be
mapped to the node through the role interface only. Each role provides set
of tags in its 'tags' field and may be modified via role API. Tag may be
created separately via tag API, but, this tag can not be used unless it's
stuck to the role.
Change-Id: Icd78fd124997c8aafb07964eeb8e0f7dbb1b1cd2
Implements: blueprint role-decomposition
A 'tags' attribute has been added to each role in 'roles_metadata'.
Initially all non-controller roles will only have a tag of their own
role name. This will allow existing tasks which do not have tags
associated with them to work correctly. In the abscence of tags a
task's roles will be used to determine which nodes it will run on.
Implements: blueprint role-decomposition
Change-Id: I390580146048b6e00ec5c42d0adf995a4cff9167
CA certificate verification should be available only if
Bypass verification is disabled.
Partial-Bug: 1616438
Change-Id: Ib83210f52c7874398fcb1791e51091e05151273f
Depends-On: Id38bf7c74869fa60852ca1cb2ccaa9c63412cf64
Astute will send 'running' status for task
when start to process it, so we do not need
task in orchestrator mechanism.
Change-Id: I5e2deccdbd27daff3e3b131ef9c8c7ffbbb40dd4
Closes-Bug: #1621003
Since Timmy is selfsufficient utility it requires auth token to login to
Fuel to get nodes to dump.
Also getting rid of snapshot configuration unit tests since Timmy
doesn't use most of it.
Change-Id: I559776a701c76bf9f9153550d2989d939d30eb3f
Partial-Bug: #1618965
Also reworked legacy task manger to use this flag
instead of patch every deployment task
Change-Id: Ic4031b94ee359d414f1834a56b085ff12cc6b38f
Closes-Bug: 1618774
The size of deployment_info grows as n^2 depending on
nodes number. That's because common_attrs, which is
merged into each node's contains info about all nodes.
For example for 600 nodes we store about 1Gb of data in
the database. So as first step let's store common_attrs
separately in deployment_info structure inside python
code and in the database.
Also removed old test for migrations, which are not related
to actual database state.
Change-Id: I431062b3f9c8dedd407570729166072b780dc59a
Partial-Bug: #1596987
Simultaneous run of 2 or tasks may cause
side-effects and the simplest way is prevent this action.
Also fixed all places where rpc.cast called without commit
Change-Id: I029768900d345540c3b501f1fa3649b063d3a55d
Partial-Bug: 1615557
Task model is extended with noop_run boolean column;
Introdicing and passing down the noop_run param from API to
execution manager;
Execution manager supports noop_run argument,
and uses it for creating astute message;
DeploymentHistory model is extended with summary JSON column;
The summary column should be returned only if some
include_summary=1 query string is passed to API.
Implements blueprint: puppet-noop-run
Change-Id: I80090d96f818cef7c6f88208bdacf5849f0f5d0f
The transient statuses are not persited in DB because
the status of node should represent the current state
of node, for example: node is provisioned or node is deployed.
The synthetic states: like error, deploying, etc. are used
only for providing additional for user about status of node at this moment.
This statuses can be calculated on demand.
Also node.error_type has type 'String' instead of 'Enum'
Also added handling status 'deleted' in response, which means that node should be deleted from cluster.
Change-Id: If5e79b9274f34e01d2b795491c23361c9050669d
Blueprint: graph-concept-extension
Currently the nodes which need provisioning and then deployment avoid role
restrictions check because of their 'not ready for deployment'
status. This commit adds unprovisioned nodes to the list of checked
nodes.
Change-Id: I7b19d1beeb71849e2def465127873821a4f93052
Closes-Bug: #1605472
Instead of wide callbacks process_deployment or process_provision
implemented methods to patch deployemnt or provision info per
cluster or node.
The code of extensions and tests was updated accordingly.
Also added helper to mark methods of extension as deprecated.
the extension load behaviour was modified, instead of fail operation
when extension cannot be loaded, the nailgun only write error in log
that extension is not loaded and continue operation.
Partial-Bug: 1596987
Change-Id: I577c8ffc105734e12646ca7c6a4fe4927e70b119
DocImpact
Size of deployment_info field in tasks table grows as n**2
(depending on number of nodes). If we have 200 nodes, the
size of the structure is about 20Mb. In case of 600 nodes it
would be theoretically about 720Mb, in practice it doesn't fit
into 1Gb.
Good solution is to put common part to separate place. But it's
not so fast. Also it will not help if all nodes will be going to
be deployed with customized deployment info.
Change-Id: Id3154ab423b0863d9cc4952335293bf5fc30df38
Partial-Bug: #1596987
Stopped nodes could be also helpful for troubleshooting, so including
them into the diagnostic snapshot.
Change-Id: Ib8608f8e04663d16249b514788522f701b9a88b9
Closes-Bug: #1599530
NFV features (DPDK, SR-IOV, NUMA/CPU pinning, HugePages)
can't be checked for old clusters, due to old nailgun-agent.
Old nailgun-agent doesn't send NFV specific information.
So all NFV related checks and functional should be disabled for old
environments.
Change-Id: Ib589d67658f45414b8049398316af5c7298d459e
Closes-Bug: #1594443
Before when we filter out deployment tasks history all the difference
between tasks graph snapshot and filter result was returned as surrogate tasks
with no run name and node.
The deployment tasks history is saved for dry run as well.
Change-Id: I39a3341230a00aa53fa3a4cba31ee0aacb0ec2ae
Closes-Bug: #1590872
For dump configuration section with empty hosts array shotgun tries to
process all entries locally - on master node.
When there are no nodes ready for log collection this behaviour causes
all slave and controller's log entries to be processed on master
node. Taking into account that currently master node logs are symlinked
before being packed this leads to a situation when snapshot lacks some
files.
For instance when a symlink /var/log/libvirt is created it is impossible
to create /var/log symlink later.
Change-Id: Ife60115d8d0203654ae58ed7c13f94fd9b7b3b8a
Closes-Bug: #1590750
In rare cases when we have 1 ready node without DPDK and 1+ discover
nodes with DPDK, Nailgun has produced a wrong verification message.
That as happened because Nailgun mistakenly assumes 1 ready node as DPDK
capable and didn't serialize VLAN verification message for this node.
Co-Authored-By: Artem Panchenko <apanchenko@mirantis.com>
Closes-Bug: #1589707
Change-Id: Id01d8772707994ed6da8b0c3979693580a3c417f
This change introduces a new callback on_nodegroup_delete
which is called when a nodegroup is deleted.
It also adds a decorator that can be used to generate
before and after callbacks for any method.
Change-Id: Ia1c4ef3956175af6c223af854c9543cd781e8dbf
Blueprint: network-manager-extension
CPU distribution mechanism should be changed due
to incorect requirements to nova and dpdk CPUs allocation
Changes:
* Change CPU distribution
* Add function for recognizing DPDK NICs for node
* Remove requirement of enabled hugepages for
DPDK NICs (it's checked before deployment)
* Change HugePages distribution. Now it take into
account Nova CPUs placement
Requirements Before:
DPDK's CPUs should be located on the same NUMAs as
Nova CPUs
Requirements Now:
1. DPDK component CPU pinning has two parts:
* OVS pmd core CPUs - These CPUs must be placed on the
NUMAs where DPDK NIC is located. Since DPDK NIC can
handle about 12 Mpps/s and 1 CPU can handle about
3 Mpps/s there is no necessity to place more than
4 CPUs per NIC. Let's name all remained CPUs as
additional CPUs.
* OVS Core CPUs - 1 CPU is enough and that CPU should
be taken from any NUMA where at least 1 OVS pmd core
CPU is located
2. To improve Nova and DPDK performance, all additional CPUs
should be distributed along with Nova's CPUs as
OVS pmd core CPUs.
Change-Id: Ib2adf39c36b2e1536bb02b07fd8b5af50e3744b2
Closes-Bug: #1584006
This property contains list of groups, that is built from
tasks with type 'group' and each task may contain property
fault_tolerance, that shall be moved from openstack.yaml
to deployment tasks.
For plugins this attribute is filled from roles_metadata
for all tasks with type group (for backward compatibility).
DocImpact
Partial-Bug: 1435610
Change-Id: I1969b953eca667c09248a6b67ffee37bfd20f474
Deployment and provisioning preparations could last a long when Fuel is
working with a hundreds of nodes. But operator should be able to
observe cluster state and operate other clusters.
Change-Id: I73802c91f93a46855b006cb05a8b5722109e9e6a
Partial-Bug: #1569859
Zabbix has been removed from Fuel 7 and is not supported for 3 releases.
It's time to make cleanup and remove its code.
Closes-Bug: #1583990
Change-Id: I7393caebc629fcf652369b98731455abe8a2c378
Now following handlers:
/clusters/:cluster_id/changes/
/clusters/:cluster_id/changes/redeploy/
/clusters/:cluster_id/deploy/
/clusters/:cluster_id/deploy_tasks/
?dry_run=1 that is telling Astute not to run cluster executionat all.
Dry run assumes that it does not actually affect
cluster status regardless of their result.
Also, remove redundant update of nodes statuses to 'deploying'
within OpenStackConfigManager and DeploymentTaskManager as it should be
done by receiever.
Do not set cluster status to 'deployment' for these nodes in order to
retain its real status
Modify stop deployment tests to move failing stop deployment for already
deployed clusters to another test class. Since 9.0 we can run stop
deployment for new clusters.
Change-Id: I374fc86b63af64411d4a5ca45ff6c3680cb44897
Partial-bug: #1569839
The yaql_exp can be used for calculate dependencies of all tasks
including skipped tasks, so the task attributes traverse should
be called for all task too.
Also added check that dependency is not empty object, because
it is possible when it has been dynamically generated via yaql.
Partial-Bug: 1541309
Change-Id: Ibcb786d2a7917d7583433c0b96f6324be4de759b
The cluster state will be used on new nodes as previous state
in case if the cluster already has successful deployment.
Closes-Bug: 1581002
Closes-Bug: 1573602
Change-Id: Ie8a81193dc6002d4d7dec56b3b73e186b835d5fc