openstack/nova - nova - OpenDev: Free Software Needs Free Tools

Commit Graph

Author	SHA1	Message	Date
Bence Romsics	f1dc4ec39b	Do not untrack resources of a server being unshelved This patch concerns the time when a VM is being unshelved and the compute manager set the task_state to spawning, claimed resources of the VM and then called driver.spawn(). So the instance is in vm_state SHELVED_OFFLOADED, task_state spawning. If at this point a new update_available_resource periodic job is started that collects all the instances assigned to the node to calculate resource usage. However the calculation assumed that a VM in SHELVED_OFFLOADED state does not need resource allocation on the node (probably being removed from the node as it is offloaded) and deleted the resource claim. Given all this we ended up with the VM spawned successfully but having lost the resource claim on the node. This patch changes what we do in vm_state SHELVED_OFFLOADED, task_state spawning. We no longer delete the resource claim in this state and keep tracking the resource in stats. Change-Id: I8c9944810c09d501a6d3f60f095d9817b756872d Closes-Bug: #2025480	2023-08-17 10:50:32 +02:00
Dan Smith	84e7bed27e	Online migrate missing Instance.compute_id fields This migrates existing Instance records with incomplete compute_id fields, where necessary, while updating node resources. This is done separately from the earlier patch to do new instances just to demonstrate in CI that we're still able to run in such a partially- migrated state. Related to blueprint compute-object-ids Change-Id: Ie18342deeddb7f58564953e6b46ea0b0d7495595	2023-05-31 07:13:16 -07:00
Dan Smith	625fb569a7	Add compute_id to Instance object This adds the compute_id field to the Instance object and adds checks in save() and create() to make sure we no longer update node without also updating compute_id. Related to blueprint compute-object-ids Change-Id: I0740a2e4e09a526da8565a18e6761b4dbdc4ec0b	2023-05-31 07:13:16 -07:00
Dan Smith	70516d4ff9	Add dest_compute_id to Migration object This makes us store the compute_id of the destination node in the Migration object. Since resize/cold-migration changes the node affiliation of an instance to the destination node from the source node, we need a positive record of the node id to be used. The destination node set this to its own node when creating the migration, and it is used by the source node when the switchover happens. Because the migration may be backleveled for an older node involved in that process and thus saved or passed without this field, this adds a compatibility routine that falls back to looking up the node by host/nodename. Related to blueprint compute-object-ids Change-Id: I362a40403d1094be36412f5f7afba00da8af8301	2023-05-31 07:13:16 -07:00
Dan Smith	afad847e4d	Populate ComputeNode.service_id The ComputeNode object already has a service_id field that we stopped using a while ago. This moves us back to the point where we set it when creating new ComputeNode records, and also migrates existing records when they are loaded. The resource tracker is created before we may have created the service record, but is updated afterwards in the pre_start_hook(). So this adds a way for us to pass the service_ref to the resource tracker during that hook so that it is present before the first time we update all of our ComputeNode records. It also makes sure to pass the Service through from the actual Service manager instead of looking it up again to make sure we maintain the tight relationship and avoid any name-based ambiguity. Related to blueprint compute-object-ids Change-Id: I5e060d674b6145c9797c2251a2822106fc6d4a71	2023-05-31 07:06:34 -07:00
Dan Smith	82deb0ce4b	Stop ignoring missing compute nodes in claims The resource tracker will silently ignore attempts to claim resources when the node requested is not managed by this host. The misleading "self.disabled(nodename)" check will fail if the nodename is not known to the resource tracker, causing us to bail early with a NopClaim. That means we also don't do additional setup like creating a migration context for the instance, claim resources in placement, and handle PCI/NUMA things. This behavior is quite old, and clearly doesn't make sense in a world with things like placement. The bulk of the test changes here are due to the fact that a lot of tests were relying on this silent ignoring of a mismatching node, because they were passing node names that weren't even tracked. This change makes us raise an error if this happens so that we can actually catch it, and avoid silently continuing with no resource claim. Change-Id: I416126ee5d10428c296fe618aa877cca0e8dffcf	2023-04-24 15:26:52 -07:00
Dan Smith	cf33be6871	Abort startup if nodename conflict is detected We do run update_available_resource() synchronously during service startup, but we only allow certain exceptions to abort startup. This makes us abort for InvalidConfiguration, and makes the resource tracker raise that for the case where the compute node create failed due to a duplicate entry. This also modifies the object to raise a nova-specific error for that condition to avoid the compute node needing to import oslo_db stuff just to be able to catch it. Change-Id: I5de98e6fe52e45996bc2e1014fa8a09a2de53682	2023-02-01 09:23:33 -08:00
Dan Smith	23c5f3d585	Make resource tracker use UUIDs instead of names This makes the resource tracker look up and create ComputeNode objects by uuid instead of nodename. For drivers like ironic that already provide 'uuid' in the resources dict, we can use that. For those that do not, we force the uuid to be the locally-persisted node uuid, and use that to find/create the ComputeNode object. A (happy) side-effect of this is that if we find a deleted compute node object that matches that of our hypervisor, we undelete it instead of re-creating one with a new uuid, which may clash with our old one. This means we remove some of the special-casing of ironic rebalance, although the tests for that still largely stay the same. Change-Id: I6a582a38c302fd1554a49abc38cfeda7c324d911	2023-01-30 10:53:44 -08:00
Balazs Gibizer	fa4832c660	Support same host resize with PCI in placement Id02e445c55fc956965b7d725f0260876d42422f2 added special case in the healing logic for same host resize. Now that the scheduler also creates allocation on the destination host during resize we need to make sure that the drop_move_claim code that runs during revert and confirm drops the tracked migration from the resource tracker only after the healing logic run as these migrations being confirmed / reverted are still affecting PciDevices at this point. blueprint: pci-device-tracking-in-placement Change-Id: I6241965fe6c1cc1f2560fcce65d5e32ef308d502	2022-12-21 16:17:34 +01:00
Balazs Gibizer	e96601c606	Map PCI pools to RP UUIDs Nova's PCI scheduling (and the PCI claim) works based on PCI device pools where the similar available PCI devices are assigned. The PCI devices are now represented in placement as RPs. And the allocation candidates during scheduling and the allocation after scheduling now contain PCI devices. This information needs to affect the PCI scheduling and PCI claim. To be able to do that we need to map PCI device pools to RPs. We achieve that here by first mapping PciDevice objects to RPs during placement PCI inventory reporting. Then mapping pools to RPs based on the PCI devices assigned to the pools. Also because now ResourceTracker._update_to_placement() call updates the PCI device pools the sequence of events needed to changed in the ResourceTracker to: 1) run _update_to_placement() 2) copy the pools to the CompouteNode object 3) save the compute to the DB 4) save the PCI tracker blueprint: pci-device-tracking-in-placement Change-Id: I9bb450ac235ab72ff0d8078635e7a11c04ff6c1e	2022-10-17 13:56:18 +02:00
Balazs Gibizer	9268bc36a3	Handle PCI dev reconf with allocations PCI devices which are allocated to instances can be removed from the [pci]device_spec configuration or can be removed from the hypervisor directly. The existing PciTracker code handle this cases by keeping the PciDevice in the nova DB exists and allocated and issue a warning in the logs during the compute service startup that nova is in an inconsistent state. Similar behavior is now added to the PCI placement tracking code so the PCI inventories and allocations in placement is kept in such situation. There is one case where we cannot simply accept the PCI device reconfiguration by keeping the existing allocations and applying the new config. It is when a PF that is configured and allocated is removed and VFs from this PF is now configured in the [pci]device_spec. And vice versa when VFs are removed and its parent PF is configured. In this case keeping the existing inventory and allocations and adding the new inventory to placement would result in placement model where a single PCI device would provide both PF and VF inventories. This dependent device configuration is not supported as it could lead to double consumption. In such situation the compute service will refuse to start. blueprint: pci-device-tracking-in-placement Change-Id: Id130893de650cc2d38953cea7cf9f53af71ced93	2022-08-26 19:05:45 +02:00
Balazs Gibizer	ab439dadb1	Heal allocation for same host resize Same host resize needs special handling in the allocation healing logic as both the source and the dest host PCI devices are visible to the healing code as the PciDevice.instance_uuid points to the healed instance in both cases. blueprint: pci-device-tracking-in-placement Change-Id: Id02e445c55fc956965b7d725f0260876d42422f2	2022-08-26 19:05:23 +02:00
Balazs Gibizer	48229b46b4	Retry /reshape at provider generation conflict During a normal update_available_resources run if the local provider tree caches is invalid (i.e. due to the scheduler made an allocation bumping the generation of the RPs) and the virt driver try to update the inventory of an RP based on the cache Placement will report conflict, the report client will invalidate the caches and the retry decorator on ResourceTracker._update_to_placement will re-drive the top of the fresh RP data. However the same thing can happen during reshape as well but the retry mechanism is missing in that code path so the stale caches can cause reshape failures. This patch adds specific error handling in the reshape code path to implement the same retry mechanism as exists for inventory update. blueprint: pci-device-tracking-in-placement Change-Id: Ieb954a04e6aba827611765f7f401124a1fe298f3	2022-08-25 10:00:10 +02:00
Balazs Gibizer	953f1eef19	Basics for PCI Placement reporting A new PCI resource handler is added to the update_available_resources code path update the ProviderTree with PCI device RPs, inventories and traits. It is a bit different than the other Placement inventory reporter. It does not run in the virt driver level as PCI is tracked in a generic way in the PCI tracker in the resource tracker. So the virt specific information is already parsed and abstracted by the resource tracker. Another difference is that to support rolling upgrade the PCI handler code needs to be prepared for situations where the scheduler does not create PCI allocations even after some of the compute already started reporting inventories and started healing PCI allocations. So the code is not prepared to do a single, one shot, reshape at startup, but instead to do a continuous healing of the allocations. We can remove this continuous healing after the PCI prefilter will be made mandatory in a future release. The whole PCI placement reporting behavior is disabled by default while it is incomplete. When it is functionally complete a new [pci]report_in_placement config option will be added to allow enabling the feature. This config is intentionally not added by this patch as we don't want to allow enabling this logic yet. blueprint: pci-device-tracking-in-placement Change-Id: If975c3ec09ffa95f647eb4419874aa8417a59721	2022-08-25 10:00:10 +02:00
Dan Smith	c178d93606	Unify placement client singleton implementations We have many places where we implement singleton behavior for the placement client. This unifies them into a single place and implementation. Not only does this DRY things up, but may cause us to initialize it fewer times and also allows for emitting a common set of error messages about expected failures for better troubleshooting. Change-Id: Iab8a791f64323f996e1d6e6d5a7e7a7c34eb4fb3 Related-Bug: #1846820	2022-08-18 07:22:37 -07:00
Rajesh Tailor	aa1e7a6933	Fix typos in help messages This change fixes typos in conf parameter help messages and in error log message. Change-Id: Iedc268072d77771b208603e663b0ce9b94215eb8	2022-05-30 17:28:29 +05:30
Dmitrii Shcherbakov	3fd7e94893	Fix migration with remote-managed ports & add FT `binding:profile` updates are handled differently for migration from instance creation which was not taken into account previously. Relevant fields (card_serial_number, pf_mac_address, vf_num) are now added to the `binding:profile` after a new remote-managed PCI device is determined at the destination node. Likewise, there is special handling for the unshelve operation which is fixed too. Func testing: * Allow the generated device XML to contain the PCI VPD capability; * Add test cases for basic operations on instances with remote-managed ports (tunnel or physical); * Add a live migration test case similar to how it is done for non-remote-managed SR-IOV ports but taking remote-managed port related specifics into account; * Add evacuate, shelve/unshelve, cold migration test cases. Change-Id: I9a1532e9a98f89db69b9ae3b41b06318a43519b3	2022-03-04 18:41:48 +03:00
Dmitrii Shcherbakov	c487c730d0	Filter computes without remote-managed ports early Add a pre-filter for requests that contain VNIC_TYPE_REMOTE_MANAGED ports in them: hosts that do not have either the relevant compute driver capability COMPUTE_REMOTE_MANAGED_PORTS or PCI device pools with "remote_managed" devices are filtered out early. Presence of devices actually available for allocation is checked at a later point by the PciPassthroughFilter. Change-Id: I168d3ccc914f25a3d4255c9b319ee6b91a2f66e2 Implements: blueprint integration-with-off-path-network-backends	2022-02-09 01:23:27 +03:00
Balazs Gibizer	32c1044d86	[rt] Apply migration context for incoming migrations There is a race condition between an incoming resize and an update_available_resource periodic in the resource tracker. The race window starts when the resize_instance RPC finishes and ends when the finish_resize compute RPC finally applies the migration context on the instance. In the race window, if the update_available_resource periodic is run on the destination node, then it will see the instance as being tracked on this host as the instance.node is already pointing to the dest. But the instance.numa_topology still points to the source host topology as the migration context is not applied yet. This leads to CPU pinning error if the source topology does not fit to the dest topology. Also it stops the periodic task and leaves the tracker in an inconsistent state. The inconsistent state only cleanup up after the periodic is run outside of the race window. This patch applies the migration context temporarily to the specific instances during the periodic to keep resource accounting correct. Change-Id: Icaad155e22c9e2d86e464a0deb741c73f0dfb28a Closes-Bug: #1953359 Closes-Bug: #1952915	2021-12-07 13:32:26 +01:00
Matt Riedemann	c09d98dadb	Add force kwarg to delete_allocation_for_instance This adds a force kwarg to delete_allocation_for_instance which defaults to True because that was found to be the most common use case by a significant margin during implementation of this patch. In most cases, this method is called when we want to delete the allocations because they should be gone, e.g. server delete, failed build, or shelve offload. The alternative in these cases is the caller could trap the conflict error and retry but we might as well just force the delete in that case (it's cleaner). When force=True, it will DELETE the consumer allocations rather than GET and PUT with an empty allocations dict and the consumer generation which can result in a 409 conflict from Placement. For example, bug 1836754 shows that in one tempest test that creates a server and then immediately deletes it, we can hit a very tight window where the method GETs the allocations and before it PUTs the empty allocations to remove them, something changes which results in a conflict and the server delete fails with a 409 error. It's worth noting that delete_allocation_for_instance used to just DELETE the allocations before Stein [1] when we started taking consumer generations into account. There was also a related mailing list thread [2]. Closes-Bug: #1836754 [1] I77f34788dd7ab8fdf60d668a4f76452e03cf9888 [2] http://lists.openstack.org/pipermail/openstack-dev/2018-August/133374.html Change-Id: Ife3c7a5a95c5d707983ab33fd2fbfc1cfb72f676	2021-08-30 06:11:25 +00:00
Mark Goddard	2bb4527228	Invalidate provider tree when compute node disappears There is a race condition in nova-compute with the ironic virt driver as nodes get rebalanced. It can lead to compute nodes being removed in the DB and not repopulated. Ultimately this prevents these nodes from being scheduled to. The issue being addressed here is that if a compute node is deleted by a host which thinks it is an orphan, then the resource provider for that node might also be deleted. The compute host that owns the node might not recreate the resource provider if it exists in the provider tree cache. This change fixes the issue by clearing resource providers from the provider tree cache for which a compute node entry does not exist. Then, when the available resource for the node is updated, the resource providers are not found in the cache and get recreated in placement. Change-Id: Ia53ff43e6964963cdf295604ba0fb7171389606e Related-Bug: #1853009 Related-Bug: #1841481	2021-08-12 14:26:45 +01:00
Stephen Finucane	32676a9f45	Clear rebalanced compute nodes from resource tracker There is a race condition in nova-compute with the ironic virt driver as nodes get rebalanced. It can lead to compute nodes being removed in the DB and not repopulated. Ultimately this prevents these nodes from being scheduled to. The issue being addressed here is that if a compute node is deleted by a host which thinks it is an orphan, then the compute host that actually owns the node might not recreate it if the node is already in its resource tracker cache. This change fixes the issue by clearing nodes from the resource tracker cache for which a compute node entry does not exist. Then, when the available resource for the node is updated, the compute node object is not found in the cache and gets recreated. Change-Id: I39241223b447fcc671161c370dbf16e1773b684a Partial-Bug: #1853009	2021-08-12 14:26:45 +01:00
Stephen Finucane	1bf45c4720	Remove (almost) all references to 'instance_type' This continues on from I81fec10535034f3a81d46713a6eda813f90561cf and removes all other references to 'instance_type' where it's possible to do so. The only things left are DB columns, o.vo fields, some unversioned objects, and RPC API methods. If we want to remove these, we can but it's a lot more work. Change-Id: I264d6df1809d7283415e69a66a9153829b8df537 Signed-off-by: Stephen Finucane <stephenfin@redhat.com>	2021-03-29 12:24:15 +01:00
Artom Lifshitz	6c3175d3ee	pci manager: replace node_id parameter with compute_node To implement the `socket` PCI NUMA affinity policy, we'll need to track the host NUMA topology in the PCI stats code. To achieve this, PCI stats will need to know the compute node it's running on. Prepare for this by replacing the node_id parameter with compute_node. Node_id was previously optional, but that looks to have been only to facilitate testing, as that's the only place where it was not passed it. We use compute_node (instead of just making node_id mandatory) because it allows for an optimization later on wherein the PCI manager does not need to pull the ComputeNode object from the database needlessly. Implements: blueprint pci-socket-affinity Change-Id: Idc839312d1449e9327ee7e3793d53ed080a44d0c	2021-03-08 15:18:46 -05:00
Balazs Gibizer	1273c5ee0b	Make PCI claim NUMA aware during live migration NUMA aware live migration and SRIOV live migration was implemented as two separate feature. As a consequence the case when both SRIOV and NUMA is present in the instance was missed. When the PCI device is claimed on the destination host the NUMA topology of the instance needs to be passed to the claim call. Change-Id: If469762b22d687151198468f0291821cebdf26b2 Closes-Bug: #1893221	2020-11-24 11:54:14 +00:00
Zuul	ffb916e0a1	Merge "Set instance host and drop migration under lock"	2020-11-18 10:32:56 +00:00
Zuul	4f2540a7a6	Merge "virt: Remove 'get_per_instance_usage' API"	2020-11-09 18:50:31 +00:00
Balazs Gibizer	7675964af8	Set instance host and drop migration under lock The _update_available_resources periodic makes resource allocation adjustments while holding the COMPUTE_RESOURCE_SEMAPHORE based on the list of instances assigned to this host of the resource tracker and based on the migrations where the source or the target host is the host of the resource tracker. So if the instance.host or the migration context changes without holding the COMPUTE_RESOURCE_SEMAPHORE while the _update_available_resources task is running there there will be data inconsistency in the resource tracker. This patch makes sure that during evacuation the instance.host and the migration context is changed while holding the semaphore. Change-Id: Ica180165184b319651d22fe77e076af036228860 Closes-Bug: #1896463	2020-11-04 15:02:35 +01:00
Zuul	73846fc37f	Merge "Follow up for I67504a37b0fe2ae5da3cba2f3122d9d0e18b9481"	2020-09-11 21:56:26 +00:00
Zuul	648ac72818	Merge "Move revert resize under semaphore"	2020-09-11 16:17:42 +00:00
Stephen Finucane	8aea747c97	virt: Remove 'get_per_instance_usage' API Another pretty trivial one. This one was intended to provide an overview of instances that weren't properly tracked but were running on the host. It was only ever implemented for the XenAPI driver so remove it now. Change-Id: Icaba3fc89e3295200e3d165722a5c24ee070002c Signed-off-by: Stephen Finucane <stephenfin@redhat.com>	2020-09-11 14:10:30 +01:00
Balazs Gibizer	53172fa3b0	Follow up for I67504a37b0fe2ae5da3cba2f3122d9d0e18b9481 Part of blueprint sriov-interface-attach-detach Change-Id: Ifc5417a8eddf62ad49d898fa6c9c1da71c6e0bb3	2020-09-11 12:39:08 +02:00
Zuul	94810a8612	Merge "Track error migrations in resource tracker"	2020-09-11 02:31:36 +00:00
Zuul	509c01e86d	Merge "Support SRIOV interface attach and detach"	2020-09-10 22:54:26 +00:00
Balazs Gibizer	1361ea5ad1	Support SRIOV interface attach and detach For attach: * Generates InstancePciRequest for SRIOV interfaces attach requests * Claims and allocates a PciDevice for such request For detach: * Frees PciDevice and deletes the InstancePciRequests On the libvirt driver side the following small fixes was necessar: * Fixes PCI address generation to avoid double 0x prefixes in LibvirtConfigGuestHostdevPCI * Adds support for comparing LibvirtConfigGuestHostdevPCI objects * Extends the comparison of LibvirtConfigGuestInterface to support macvtap interfaces where target_dev is only known by libvirt but not nova * generalize guest.get_interface_by_cfg() to work with both LibvirtConfigGuest[Inteface\|HostdevPCI] objects Implements: blueprint sriov-interface-attach-detach Change-Id: I67504a37b0fe2ae5da3cba2f3122d9d0e18b9481	2020-09-10 18:44:53 +01:00
LuyaoZhong	255b3f2f91	Track error migrations in resource tracker If rollback_live_migration failed, the migration status is set to 'error', and there might me some resource not be cleaned up like vpmem since rollback is not completed. So we propose to track those 'error' migrations in resource tracker until they are cleaned up by periodic task '_cleanup_incomplete_migrations'. So if rollback_live_migration succeeds, we need to set the migration status to 'failed' which will not be tracked in resource tracker. The 'failed' status is already used for resize to indicated a migration finishing the cleanup. '_cleanup_incomplete_migrations' will also handle failed rollback_live_migration cleanup except for failed resize/revert-resize. Besides, we introduce a new 'cleanup_lingering_instance_resources' virt driver interface to handle lingering instance resources cleanup including vpmem cleanup and whatever we add in the future. Change-Id: I422a907056543f9bf95acbffdd2658438febf801 Partially-Implements: blueprint vpmem-enhancement	2020-09-10 05:30:39 +00:00
Stephen Finucane	dc9c7a5ebf	Move revert resize under semaphore As discussed in change I26b050c402f5721fc490126e9becb643af9279b4, the resource tracker's periodic task is reliant on the status of migrations to determine whether to include usage from these migrations in the total, and races between setting the migration status and decrementing resource usage via 'drop_move_claim' can result in incorrect usage. That change tackled the confirm resize operation. This one changes the revert resize operation, and is a little trickier due to kinks in how both the same-cell and cross-cell resize revert operations work. For same-cell resize revert, the 'ComputeManager.revert_resize' function, running on the destination host, sets the migration status to 'reverted' before dropping the move claim. This exposes the same race that we previously saw with the confirm resize operation. It then calls back to 'ComputeManager.finish_revert_resize' on the source host to boot up the instance itself. This is kind of weird, because, even ignoring the race, we're marking the migration as 'reverted' before we've done any of the necessary work on the source host. The cross-cell resize revert splits dropping of the move claim and setting of the migration status between the source and destination host tasks. Specifically, we do cleanup on the destination and drop the move claim first, via 'ComputeManager.revert_snapshot_based_resize_at_dest' before resuming the instance and setting the migration status on the source via 'ComputeManager.finish_revert_snapshot_based_resize_at_source'. This would appear to avoid the weird quirk of same-cell migration, however, in typical weird cross-cell fashion, these are actually different instances and different migration records. The solution is once again to move the setting of the migration status and the dropping of the claim under 'COMPUTE_RESOURCE_SEMAPHORE'. This introduces the weird setting of migration status before completion to the cross-cell resize case and perpetuates it in the same-cell case, but this seems like a suitable compromise to avoid attempts to do things like unplugging already unplugged PCI devices or unpinning already unpinned CPUs. From an end-user perspective, instance state changes are what really matter and once a revert is completed on the destination host and the instance has been marked as having returned to the source host, hard reboots can help us resolve any remaining issues. Change-Id: I29d6f4a78c0206385a550967ce244794e71cef6d Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Closes-Bug: #1879878	2020-09-03 08:55:55 +00:00
Stephen Finucane	a57800d382	Move confirm resize under semaphore The 'ResourceTracker.update_available_resource' periodic task builds usage information for the current host by inspecting instances and in-progress migrations, combining the two. Specifically, it finds all instances that are not in the 'DELETED' or 'SHELVED_OFFLOADED' state, calculates the usage from these, then finds all in-progress migrations for the host that don't have an associated instance (to prevent double accounting) and includes the usage for these. In addition to the periodic task, the 'ResourceTracker' class has a number of helper functions to make or drop claims for the inventory generated by the 'update_available_resource' periodic task as part of the various instance operations. These helpers naturally assume that when making a claim for a particular instance or migration, there shouldn't already be resources allocated for same. Conversely, when dropping claims, the resources should currently be allocated. However, the check for active instances and in-progress migrations in the periodic task means we have to be careful in how we make changes to a given instance or migration record. Running the periodic task between such an operation and an attempt to make or drop a claim can result in TOCTOU-like races. This generally isn't an issue: we use the 'COMPUTE_RESOURCE_SEMAPHORE' semaphore to prevent the periodic task running while we're claiming resources in helpers like 'ResourceTracker.instance_claim' and we make our changes to the instances and migrations within this context. There is one exception though: the 'drop_move_claim' helper. This function is used when dropping a claim for either a cold migration, a resize or a live migration, and will drop usage from either the source host (based on the "old" flavor) for a resize confirm or the destination host (based on the "new" flavor) for a resize revert or live migration rollback. Unfortunately, while the function itself is wrapped in the semaphore, no changes to the state or the instance or migration in question are protected by it. Consider the confirm resize case, which we're addressing here. If we mark the migration as 'confirmed' before running 'drop_move_claim', then the periodic task running between these steps will not account for the usage on the source since the migration is allegedly 'confirmed'. The call to 'drop_move_claim' will then result in the tracker dropping usage that we're no longer accounting for. This "set migration status before dropping usage" is the current behaviour for both same-cell and cross-cell resize, via the 'ComputeManager.confirm_resize' and 'ComputeManager.confirm_snapshot_based_resize_at_source' functions, respectively. We could reverse those calls and run 'drop_move_claim' before marking the migration as 'confirmed', but while our usage will be momentarily correct, the periodic task running between these steps will re-add the usage we just dropped since the migration isn't yet 'confirmed'. The correct solution is to close this gap between setting the migration status and dropping the move claim to zero. We do this by putting both operations behind the 'COMPUTE_RESOURCE_SEMAPHORE', just like the claim operations. Change-Id: I26b050c402f5721fc490126e9becb643af9279b4 Signed-off-by: Stephen Finucane <stephenfin@redhat.com> Partial-Bug: #1879878	2020-09-03 08:55:47 +00:00
Dustin Cowles	260713dc22	Provider Config File: Enable loading and merging of provider configs This series implements the referenced blueprint to allow for specifying custom resource provider traits and inventories via yaml config files. This fourth commit adds the config option, release notes, documentation, functional tests, and calls to the previously implemented functions in order to load provider config files and merge them to the provider tree. Change-Id: I59c5758c570acccb629f7010d3104e00d79976e4 Blueprint: provider-config-file	2020-08-26 23:18:53 +08:00
Dustin Cowles	fc8deb4f86	Provider Config File: Functions to merge provider configs to provider tree This series implements the referenced blueprint to allow for specifying custom resource provider traits and inventories via yaml config files. This third commit includes functions on the provider tree to merge additional inventories and traits to resource providers and update those providers on the provider tree. Those functions are not currently being called, but will be in a future commit. Co-Author: Tony Su <tao.su@intel.com> Author: Dustin Cowles <dustin.cowles@intel.com> Blueprint: provider-config-file Change-Id: I142a1f24ff2219cf308578f0236259d183785cff	2020-08-26 04:51:03 +00:00
Stephen Finucane	f203da3838	objects: Add MigrationTypeField We use these things many places in the code and it would be good to have constants to reference. Do just that. Note that this results in a change in the object hash. However, there are no actual changes in the output object so that's okay. Change-Id: If02567ce0a3431dda5b2bf6d398bbf7cc954eed0 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>	2020-05-08 14:45:54 +01:00
LuyaoZhong	990a26ef1f	partial support for live migration with specific resources 1. Claim allocations from placement first, then claim specific resources in Resource Tracker on destination to populate migration_context.new_resources 3. cleanup specific resources when live migration succeeds/fails Because we store specific resources in migration_context during live migration, to ensure cleanup correctly we can't drop migration_context before cleanup is complete: a) when post live migration, we move source host cleanup before destination cleanup(post_live_migration_at_destination will apply migration_context and drop it) b) when rollback live migration, we drop migration_context after rollback operations are complete For different specific resource, we might need driver specific support, such as vpmem. This change just ensures that new claimed specific resources are populated to migration_context and migration_context is not droped before cleanup is complete. Change-Id: I44ad826f0edb39d770bb3201c675dff78154cbf3 Implements: blueprint support-live-migration-with-virtual-persistent-memory	2020-04-07 13:12:53 +00:00
Jason Anderson	1ed9f9dac5	Use fair locks in resource tracker When the resource tracker has to lock a compute host for updates or inspection, it uses a single semaphore. In most cases, this is fine, as a compute process only is tracking one hypervisor. However, in Ironic, it's possible for one compute process to track many hypervisors. In this case, wait queues for instance claims can get "stuck" briefly behind longer processing loops such as the update_resources periodic job. The reason this is possible is because the oslo.lockutils synchronized library does not use fair locks by default. When a lock is released, one of the threads waiting for the lock is randomly allowed to take the lock next. A fair lock ensures that the thread that next requested the lock will be allowed to take it. This should ensure that instance claim requests do not have a chance of losing the lock contest, which should ensure that instance build requests do not queue unnecessarily behind long-running tasks. This includes bumping the oslo.concurrency dependency; fair locks were added in 3.29.0 (I37577becff4978bf643c65fa9bc2d78d342ea35a). Change-Id: Ia5e521e0f0c7a78b5ace5de9f343e84d872553f9 Related-Bug: #1864122	2020-03-09 11:03:17 -05:00
Balazs Gibizer	56f29b3e4a	Remove extra instance.save() calls related to qos SRIOV ports During creating or moving of an instance with qos SRIOV port the PCI device claim on the destination compute needs to be restricted to select PCI VFs from the same PF where the bandwidth for the qos port is allocated from. This is achieved by updating the spec part of the InstancePCIRequest with the device name of the PF by calling update_pci_request_spec_with_allocated_interface_name(). Until now such update of the instance object was directly persisted by the call. During code review it was came up that the instance.save() in the util is not appropriate as the caller has a lot more context to decide when to persist the changes. The original eager instance.save was introduced when support added to the server create flow. Now I realized that the need for such save was due to a mistake in the original ResourceTracker.instance_claim() call that loads the InstancePCIRequest from the DB instead of using the requests through the passed in instance object. By removing the extra DB call the need for eagerly persisting the PCI spec update is also removed. It turned out that both the server create code path and every server move code paths eventually persist the instance object either during at the end of the claim process or in case of live migration in the post_live_migration_at_destination compute manager call. This means that the code now can be simplified. Especially the live migration cases. In the live migrate abort case we don't need to roll back the eagerly persisted PCI change as now such change is only persisted at the end of the migration but still we need to refresh pci_requests field of the instance object during the rollback as that field might be stale, containing dest host related PCI information. Also in case of rescheduling during live migrate if the rescheduling failed the PCI change needed to be rolled back to the source host by a specific code. But now those change are never persisted until the migration finishes so this rollback code can be removed too. Change-Id: Ied8f96b4e67f79498519931cb6b35dad5288bbb8 blueprint: support-move-ops-with-qos-ports-ussuri	2020-02-03 11:41:38 +01:00
Matt Riedemann	26da4418a9	Deal with cross-cell resize in _remove_deleted_instances_allocations When reverting a cross-cell resize, conductor will: 1. clean up the destination host 2. set instance.hidden=True and destroy the instance in the target cell database 3. finish the revert on the source host which will revert the allocations on the source host held by the migration record so the instance will hold those again and drop the allocations against the dest host which were held by the instance. If the ResourceTracker.update_available_resource periodic task runs between steps 2 and 3 it could see that the instance is deleted from the target cell but there are still allocations held by it and delete them. Step 3 is what handles deleting those allocations for the destination node, so we want to leave it that way and take the ResourceTracker out of the flow. This change simply checks the instance.hidden value on the deleted instance and if hidden=True, assumes the allocations will be cleaned up elsehwere (finish_revert_snapshot_based_resize_at_source). Ultimately this is probably not something we have to have since finish_revert_snapshot_based_resize_at_source is going to drop the destination node allocations anyway, but it is good to keep clear which actor is doing what in this process. Part of blueprint cross-cell-resize Change-Id: Idb82b056c39fd167864cadd205d624cb87cbe9cb	2019-12-12 12:00:33 -05:00
Zuul	f01fbd8cf0	Merge "Always trait the compute node RP with COMPUTE_NODE"	2019-11-15 23:01:32 +00:00
Zuul	28963bd64c	Merge "FUP for Ib62ac0b692eb92a2ed364ec9f486ded05def39ad"	2019-11-15 11:53:51 +00:00
Zuul	aa21fe9c9c	Merge "Delete _normalize_inventory_from_cn_obj"	2019-11-14 00:59:20 +00:00
Zuul	2dbe174278	Merge "Drop compat for non-update_provider_tree code paths"	2019-11-14 00:54:44 +00:00
Zuul	1c7a3d5908	Merge "Clear instance.launched_on when build fails"	2019-11-13 21:45:04 +00:00

1 2 3 4 5 ...

458 Commits