Distributed serialization is implemented with python distributed
library. We have scheduler for jobs management and workers for
jobs processing. Scheduler is started on the master node as well
as set of workers on it. Also workers are started on all nodes.
In the cluster settings we can select the type of serialization
and nodes statuses that allows serialization on it. By default
nodes with status 'ready' are excluded from the workers list.
For data serialization we are using only nodes from the cluster
where serialization is performing.
Before the computation fresh nailgun code is sent to the workers
as zip file and it will be imported for job execution. So we always
have fresh nailgun code on the workers.
In one job we are processing chunks of tasks on the workers. This
approach significantly boosts performance. The tasks chunk size
is defined as settings.LCM_DS_TASKS_PER_JOB parameter.
For limiting memory consumption on the master node we use parameter
settings.LCM_DS_NODE_LOAD_COEFF for calculation max number of jobs
in the processing queue.
Synthetic tests of distributed serialization for 500 nodes with
nubmer of ifaces >= 5 performed on 40 cores (4 different machines)
took 6-7 minutes on average.
Change-Id: Id8ff8fada2f1ab036775fc01c78d91befdda9ea2
Implements: blueprint distributed-serialization
Also reworked legacy task manger to use this flag
instead of patch every deployment task
Change-Id: Ic4031b94ee359d414f1834a56b085ff12cc6b38f
Closes-Bug: 1618774
Simultaneous run of 2 or tasks may cause
side-effects and the simplest way is prevent this action.
Also fixed all places where rpc.cast called without commit
Change-Id: I029768900d345540c3b501f1fa3649b063d3a55d
Partial-Bug: 1615557
Task model is extended with noop_run boolean column;
Introdicing and passing down the noop_run param from API to
execution manager;
Execution manager supports noop_run argument,
and uses it for creating astute message;
DeploymentHistory model is extended with summary JSON column;
The summary column should be returned only if some
include_summary=1 query string is passed to API.
Implements blueprint: puppet-noop-run
Change-Id: I80090d96f818cef7c6f88208bdacf5849f0f5d0f
Now following handlers:
/clusters/:cluster_id/changes/
/clusters/:cluster_id/changes/redeploy/
/clusters/:cluster_id/deploy/
/clusters/:cluster_id/deploy_tasks/
?dry_run=1 that is telling Astute not to run cluster executionat all.
Dry run assumes that it does not actually affect
cluster status regardless of their result.
Also, remove redundant update of nodes statuses to 'deploying'
within OpenStackConfigManager and DeploymentTaskManager as it should be
done by receiever.
Do not set cluster status to 'deployment' for these nodes in order to
retain its real status
Modify stop deployment tests to move failing stop deployment for already
deployed clusters to another test class. Since 9.0 we can run stop
deployment for new clusters.
Change-Id: I374fc86b63af64411d4a5ca45ff6c3680cb44897
Partial-bug: #1569839
Nodes roles should be checked in CheckBeforeDeploymentTask,
because it's possible to deploy node with conflicting roles
or with incompatible role. Roles release metadata will be
used for roles checks, this metadata contains restrictions.
Since `depends` is not used anymore, it's changed to
`restrictions` in assignment validator.
Change-Id: Ibba7951968cbafd59fff0d516e74f9dd9e454edc
Closes-Bug: #1573006
Replace fake_tasks with rpc mock for some tests. This patch covers not
all files with tests, the fix will be done in series of few patches.
Change-Id: I6d316a45974ea5518c32a78f069e3e88b167d1a4
Partial-Bug: #1440671
Since self.env.create always return db object now,
we can use this returned value instead of
self.env.clusters list.
It's a refactoring, so no bug or blueprint.
Change-Id: If7c84cb7124bcf08ef5ff110542012564190fae1
This change updates the deployment serializer for the test vm data to
pass the glance properties as a hash that can be used by the
glance_image provider rather than using the glance_properties string
that is currently in place. The glance_properties string should be
considered deprecated and anything that uses it should switch to the
properties hash.
DocImpact: glance_properties string provided as part of the
test_vm_image hiera data is deprecated in favor of the properties hash
provided by this change
Change-Id: I79a9b20d89ae00a7ceaa24c4ce655cbd16972c30
Partial-Bug: #1566434
Since Nailgun contains attributes restriction
mechanism it's possible to verify attributes
restrictions. This commit applies restrictions
checks into validation for both node attributes
and cluster attributes.
Change-Id: I269da9a7a7df5fea336c07784b37d6ced1641993
Closes-Bug: #1567394
* remove node_extension_call from everywhere Nailgun
core source code
* remove volume_manager Node property (models)
* moved volume and disk releated data manipulations to
volume manager extension pipeline
* removed no longer valid tests
* added new extension callback for pre deployment check
* fix some tests
* moved volume_manger specific tests to volume_manger module
* marked 'skip test' to some tests which are no longer
valid in current places but they valuable and should be moved
to volume_manager module in next patches
implements: blueprint data-pipeline
Change-Id: I8edd25166e5eccf914eb92882b6b4a7b3fff6a89
In default setup we should not assign public ip to all nodes.
Just to ones with controller role. So we filter out public ip
network for nodes that should not have it.
Change-Id: I2a9ea4d06cc1ba15bad20b817659b7539827472a
Closes-Bug: 1415552
This moves the files for NetworkManager and its sub-classes into
a new extension. All import paths have been updated.
Blueprint: network-manager-extension
Change-Id: Icc2410fd9c411a47a3dee4573d4ef6f1a039c303
If some tasks in deployment graph have version less
than 2.0.0 than we fallback to granular deploy
method. This check should be done before cluster
serialization, which take more time.
So let's check if task can be executed with task based
method before serialization.
This patch significantly improves performance.
Partial-Bug: #1498365
Change-Id: I5d0fe8ee9b73958ac07e2fda3ed1bd2f29f0e5fb
There are plenty of problems with converting.
Since UI can work with type of number,
node attributes can be changed to number type.
Depends-On: I0866aa01df23ac944bc5f134aec49311791a4b36
Change-Id: Ibc7578f5587fc75c97cb62530a0219059e33c477
Closes-Bug: #1565518
UI doesn't support such type and became broken after merge of the original patch.
Closes-Bug: #1564847
This reverts commit b04a2541fe.
Change-Id: I06417e0c48c9c1d24223b93c0c4c8b9d437b200c
Nailgun integration tests use fake tasks, which run in threads.
This has been causing random failures. Although those failures could
of been fixed by different sorts of improvements, increasing timeouts,
etc. - this is still not an ideal solution.
Proposed patch removes use of threads in tests. All fake tasks code
which used to be running in threads only is ran synchronously for tests.
Threads are still used for fakeUI as before.
More details about random test failures can be found in mailing list:
http://lists.openstack.org/pipermail/openstack-dev/2016-March/089514.html
Change-Id: Iaa5b245680e7257ff46b5ddc1b7aa9400284e705
We have extra SQLs generated in the NetworkManager when passing
node_id instead already loaded SQLAlchemy node object.
Additional changes:
- Bulk insert used in IPs assiging process.
- zip changed on six.moves.zip in the NetworkManager.
- Removed unused function get_admin_ips_for_interfaces from NetworkManager.
Co-Authored-By: Dmitry Guryanov <dguryanov@mirantis.com>
Partial-Bug: #1498365
Change-Id: I0518a5879c775d568de5652dbdd856a0cede80ce
Most of distribution implementation located in policy/hugepages_distribution.py
Distribution function gets numa node topology and huge pages
configuration from Node.attributes and use greedy algorithm to
distribute pages with differet sizes. At first we distribute pages of
components that must be on every numa node, then allocate other
components starting with bigger pages to smaller. If component can't be
fully allocated on one node then part of his pages moved to next numa
node.
Also this info passed to deployment serializer.
Change-Id: I8bec5ed4efcd197d02b6de23395d8fb9a3136579
Implements: blueprint support-hugepages
Since commit Id65b7e106d62be92467c18bcb93c9d5da716242f we do not need to
setup PostgreSQL for tests. Now OpenStack Infra provides ready to use
PostgreSQL installation as well as MySQL. This commit stops running
setup code and unblocks gate tests on OpenStack CI.
Also, it fixes random failures of test_force_redeploy_changes. The test
used to consume fake threads, and to check status of created task. That
status might be different and depends on progress of fake threads
(it might be pending, running or even ready). That commit removes usage
of fake threads and improves its quality by checking not only task status,
but actualy deployment data.
Also, it fixes test_assign_given_vips_for_net_groups.
The motivation behind that all-in-one commit is that it's almost
impossible to fix CI due to random test failures. It's the only
choice to get it work.
Closes-Bug: #1554038
Change-Id: I074a2cb4f0e6647c605c8e4449a5beca0c6e9bbc
The flag allows to apply the changes to the cluster if the cluster
is in the operational state. This allows you to deploy the changes
without the reprovisioning procedure.
Example: `fuel deploy-changes --env <env_id> --force`
Partial-Bug: 1540558
Change-Id: Ibc89fdbfbd0a36a890412cd8e861d35bcf930690
In Fuel 5.1 we had an experimental feature - 'patching openstack env'.
The idea was to update and to rollback OpenStack environments between
minor releases. However, we have encounter a lot of problems with
restoring OpenStack databases andresolving dependency hell in packages,
so we buried it and never release it.
This patch removes legacy code from the source tree. We can do it
without fear, since it wasn't released in public.
Related-Bug: #1511499
Change-Id: I58b3fedd239eb7fe4226e51c2d6386efab14395d
All network-related database queries are moved into the appropriate
object methods. This is being done to make it possible to have an external
network management service.
Blueprint: network-config-refactoring
Change-Id: I4ce965f227c54577659e64f598ff5cdf4c868ed6
This patch removes validation of the number of controllers
in a cluster from pre-deployment task.
Closes-bug: #1538233
Change-Id: I5325af73367d6c3edab873a8080cd8a7e24e9692
Mellanox Section in Fuel UI Settings tab has been added in 5.1,
moved to experimental mode in 6.1 due to the implementation as a plugin.
This change deletes the upstream section for enabling Mellanox features.
Implemented as a certified Fuel plugin.
Closes-Bug: #1452800
Change-Id: I6a1f33827d6c4fc9c9bc42cad567f4f96941f094
Hiera plugin extension needs a way to get a list of all enabled plugins in
cluster from astute.yaml during the deployment.
Change-Id: I819bf8e547ecd8eeabffefda4579c2a3e73d0fcf
Closes-Bug: #1528212
'identity' parameter represents id of the node and is placed in
mcollective config by nailgun-agent. But sometimes such behavior
(especially if to take into consideration that restart of mcollective
follows it) may lead to failed deployment (See related bug). Now the parameter
is supplied by nailgun and is used by fuel-agent to create the config
with the data already present in it when node boots after provision is
done.
Change-Id: I753eb76ed9c3b80f249c0c4b86ef48ef49274990
Related-Bug: #1518306
This is redesign of plugins architecture in order to store
the plugin's attributes in a separate table, not in cluster
attributes, so it will be possible to remove connection between
plugin and cluster when a plugin gets deleted.
Added ability to work with different versions of a plugin.
User can choose the preferred version in UI.
The test "test_plugin_generator" was removed because no longer
relevant.
Closes-Bug: #1440046
Implements: blueprint store-plugins-attributes
Change-Id: I52115f130bf1c7c80c66e18d0bf9f7acb16dd56c
In order to remove internal and floating network names hardcoding from
Nailgun database, let's move them to neutron parameters. It'll simplify
support for developers, and allow user to change them as they want.
Change-Id: Ic0ca82047cb750609a5313c2eab02569b9633239
Partial-Bug: #1349702
We should know is task handled by orchestrator or not.
For instance, we should send stop_deployment task only if
provision or deploy tasks are handled by the orchestrator.
Task status 'pending' added into Task DB model and
handled in stop deployment, provisioning, deployment and
apply cluster changes task managers.
Nailgun Task object update function changed for bubble 'running'
status to parent task.
Locking of all cluster tasks calls removed for deadlocks
exclusion.
Consts used instead hardcoded tasks statuses in part of tests.
Co-Authored-By: Alexandra Morozova <astepanchuk@mirantis.com>
Depends-On: Ib054517696dc4e53487557b09b75ebfcb1255ecb
Depends-On: Idedb061b7b5c4dca4a0ca7adcaa570cecbb691af
Change-Id: I15ebeb85226c832923f9476bb91fa19c0ff87a4f
Closes-Bug: #1498827
In previous release (Fuel 7.0) nova-network is became deprecated, and
that means it will be dropped in Fuel 8.0. So let's change default to
neutron, because IIUC nova-network is already removed from
fuel-library's master and some swarm tests are broken.
Closes-Bug: #1503657
Closes-Bug: #1505258
Change-Id: I65a0ca2906503cb9c83fd99fddda9a8ee5156b16