Commit Graph

458 Commits

Author SHA1 Message Date
Bence Romsics f1dc4ec39b Do not untrack resources of a server being unshelved
This patch concerns the time when a VM is being unshelved and the
compute manager set the task_state to spawning, claimed resources of
the VM and then called driver.spawn(). So the instance is in vm_state
SHELVED_OFFLOADED, task_state spawning.

If at this point a new update_available_resource periodic job is
started that collects all the instances assigned to the node to
calculate resource usage. However the calculation assumed that a
VM in SHELVED_OFFLOADED state does not need resource allocation on
the node (probably being removed from the node as it is offloaded)
and deleted the resource claim.

Given all this we ended up with the VM spawned successfully but having
lost the resource claim on the node.

This patch changes what we do in vm_state SHELVED_OFFLOADED, task_state
spawning. We no longer delete the resource claim in this state and
keep tracking the resource in stats.

Change-Id: I8c9944810c09d501a6d3f60f095d9817b756872d
Closes-Bug: #2025480
2023-08-17 10:50:32 +02:00
Dan Smith 84e7bed27e Online migrate missing Instance.compute_id fields
This migrates *existing* Instance records with incomplete compute_id
fields, where necessary, while updating node resources. This is done
separately from the earlier patch to do new instances just to
demonstrate in CI that we're still able to run in such a partially-
migrated state.

Related to blueprint compute-object-ids

Change-Id: Ie18342deeddb7f58564953e6b46ea0b0d7495595
2023-05-31 07:13:16 -07:00
Dan Smith 625fb569a7 Add compute_id to Instance object
This adds the compute_id field to the Instance object and adds
checks in save() and create() to make sure we no longer update node
without also updating compute_id.

Related to blueprint compute-object-ids

Change-Id: I0740a2e4e09a526da8565a18e6761b4dbdc4ec0b
2023-05-31 07:13:16 -07:00
Dan Smith 70516d4ff9 Add dest_compute_id to Migration object
This makes us store the compute_id of the destination node in the
Migration object. Since resize/cold-migration changes the node
affiliation of an instance *to* the destination node *from* the source
node, we need a positive record of the node id to be used. The
destination node set this to its own node when creating the migration,
and it is used by the source node when the switchover happens.

Because the migration may be backleveled for an older node involved
in that process and thus saved or passed without this field, this
adds a compatibility routine that falls back to looking up the node
by host/nodename.

Related to blueprint compute-object-ids

Change-Id: I362a40403d1094be36412f5f7afba00da8af8301
2023-05-31 07:13:16 -07:00
Dan Smith afad847e4d Populate ComputeNode.service_id
The ComputeNode object already has a service_id field that we stopped
using a while ago. This moves us back to the point where we set it when
creating new ComputeNode records, and also migrates existing records
when they are loaded.

The resource tracker is created before we may have created the
service record, but is updated afterwards in the pre_start_hook().
So this adds a way for us to pass the service_ref to the resource
tracker during that hook so that it is present before the first time
we update all of our ComputeNode records. It also makes sure to pass
the Service through from the actual Service manager instead of looking
it up again to make sure we maintain the tight relationship and avoid
any name-based ambiguity.

Related to blueprint compute-object-ids

Change-Id: I5e060d674b6145c9797c2251a2822106fc6d4a71
2023-05-31 07:06:34 -07:00
Dan Smith 82deb0ce4b Stop ignoring missing compute nodes in claims
The resource tracker will silently ignore attempts to claim resources
when the node requested is not managed by this host. The misleading
"self.disabled(nodename)" check will fail if the nodename is not known
to the resource tracker, causing us to bail early with a NopClaim.
That means we also don't do additional setup like creating a migration
context for the instance, claim resources in placement, and handle
PCI/NUMA things. This behavior is quite old, and clearly doesn't make
sense in a world with things like placement. The bulk of the test
changes here are due to the fact that a lot of tests were relying on
this silent ignoring of a mismatching node, because they were passing
node names that weren't even tracked.

This change makes us raise an error if this happens so that we can
actually catch it, and avoid silently continuing with no resource
claim.

Change-Id: I416126ee5d10428c296fe618aa877cca0e8dffcf
2023-04-24 15:26:52 -07:00
Dan Smith cf33be6871 Abort startup if nodename conflict is detected
We do run update_available_resource() synchronously during service
startup, but we only allow certain exceptions to abort startup. This
makes us abort for InvalidConfiguration, and makes the resource
tracker raise that for the case where the compute node create failed
due to a duplicate entry.

This also modifies the object to raise a nova-specific error for that
condition to avoid the compute node needing to import oslo_db stuff
just to be able to catch it.

Change-Id: I5de98e6fe52e45996bc2e1014fa8a09a2de53682
2023-02-01 09:23:33 -08:00
Dan Smith 23c5f3d585 Make resource tracker use UUIDs instead of names
This makes the resource tracker look up and create ComputeNode objects
by uuid instead of nodename. For drivers like ironic that already
provide 'uuid' in the resources dict, we can use that. For those
that do not, we force the uuid to be the locally-persisted node
uuid, and use that to find/create the ComputeNode object.

A (happy) side-effect of this is that if we find a deleted compute
node object that matches that of our hypervisor, we undelete it
instead of re-creating one with a new uuid, which may clash with our
old one. This means we remove some of the special-casing of ironic
rebalance, although the tests for that still largely stay the same.

Change-Id: I6a582a38c302fd1554a49abc38cfeda7c324d911
2023-01-30 10:53:44 -08:00
Balazs Gibizer fa4832c660 Support same host resize with PCI in placement
Id02e445c55fc956965b7d725f0260876d42422f2 added special case in the
healing logic for same host resize. Now that the scheduler also creates
allocation on the destination host during resize we need to make sure
that the drop_move_claim code that runs during revert and confirm drops
the tracked migration from the resource tracker only after the healing
logic run as these migrations being confirmed / reverted are still
affecting PciDevices at this point.

blueprint: pci-device-tracking-in-placement
Change-Id: I6241965fe6c1cc1f2560fcce65d5e32ef308d502
2022-12-21 16:17:34 +01:00
Balazs Gibizer e96601c606 Map PCI pools to RP UUIDs
Nova's PCI scheduling (and the PCI claim) works based on PCI device
pools where the similar available PCI devices are assigned. The PCI
devices are now represented in placement as RPs. And the allocation
candidates during scheduling and the allocation after scheduling
now contain PCI devices. This information needs to affect the PCI
scheduling and PCI claim. To be able to do that we need to map PCI
device pools to RPs. We achieve that here by first mapping
PciDevice objects to RPs during placement PCI inventory reporting.
Then mapping pools to RPs based on the PCI devices assigned to the
pools.

Also because now ResourceTracker._update_to_placement() call updates
the PCI device pools the sequence of events needed to changed in the
ResourceTracker to:
1) run _update_to_placement()
2) copy the pools to the CompouteNode object
3) save the compute to the DB
4) save the PCI tracker

blueprint: pci-device-tracking-in-placement
Change-Id: I9bb450ac235ab72ff0d8078635e7a11c04ff6c1e
2022-10-17 13:56:18 +02:00
Balazs Gibizer 9268bc36a3 Handle PCI dev reconf with allocations
PCI devices which are allocated to instances can be removed from the
[pci]device_spec configuration or can be removed from the hypervisor
directly. The existing PciTracker code handle this cases by keeping the
PciDevice in the nova DB exists and allocated and issue a warning in the
logs during the compute service startup that nova is in an inconsistent
state. Similar behavior is now added to the PCI placement tracking code
so the PCI inventories and allocations in placement is kept in such
situation.

There is one case where we cannot simply accept the PCI device
reconfiguration by keeping the existing allocations and applying the new
config. It is when a PF that is configured and allocated is removed and
VFs from this PF is now configured in the [pci]device_spec. And vice
versa when VFs are removed and its parent PF is configured. In this case
keeping the existing inventory and allocations and adding the new inventory
to placement would result in placement model where a single PCI device
would provide both PF and VF inventories. This dependent device
configuration is not supported as it could lead to double consumption.
In such situation the compute service will refuse to start.

blueprint: pci-device-tracking-in-placement
Change-Id: Id130893de650cc2d38953cea7cf9f53af71ced93
2022-08-26 19:05:45 +02:00
Balazs Gibizer ab439dadb1 Heal allocation for same host resize
Same host resize needs special handling in the allocation healing logic
as both the source and the dest host PCI devices are visible to the
healing code as the PciDevice.instance_uuid points to the healed
instance in both cases.

blueprint: pci-device-tracking-in-placement
Change-Id: Id02e445c55fc956965b7d725f0260876d42422f2
2022-08-26 19:05:23 +02:00
Balazs Gibizer 48229b46b4 Retry /reshape at provider generation conflict
During a normal update_available_resources run if the local provider
tree caches is invalid (i.e. due to the scheduler made an allocation
bumping the generation of the RPs) and the virt driver try to update the
inventory of an RP based on the cache Placement will report conflict,
the report client will invalidate the caches and the retry decorator
on ResourceTracker._update_to_placement will re-drive the top of the
fresh RP data.

However the same thing can happen during reshape as well but the retry
mechanism is missing in that code path so the stale caches can cause
reshape failures.

This patch adds specific error handling in the reshape code path to
implement the same retry mechanism as exists for inventory update.

blueprint: pci-device-tracking-in-placement
Change-Id: Ieb954a04e6aba827611765f7f401124a1fe298f3
2022-08-25 10:00:10 +02:00
Balazs Gibizer 953f1eef19 Basics for PCI Placement reporting
A new PCI resource handler is added to the update_available_resources
code path update the ProviderTree with PCI device RPs, inventories and
traits.

It is a bit different than the other Placement inventory reporter. It
does not run in the virt driver level as PCI is tracked in a generic way
in the PCI tracker in the resource tracker. So the virt specific
information is already parsed and abstracted by the resource tracker.

Another difference is that to support rolling upgrade the PCI handler
code needs to be prepared for situations where the scheduler does not
create PCI allocations even after some of the compute already started
reporting inventories and started healing PCI allocations. So the code
is not prepared to do a single, one shot, reshape at startup, but
instead to do a continuous healing of the allocations. We can remove
this continuous healing after the PCI prefilter will be made mandatory
in a future release.

The whole PCI placement reporting behavior is disabled by default while
it is incomplete. When it is functionally complete a new
[pci]report_in_placement config option will be added to allow enabling
the feature. This config is intentionally not added by this patch as we
don't want to allow enabling this logic yet.

blueprint: pci-device-tracking-in-placement
Change-Id: If975c3ec09ffa95f647eb4419874aa8417a59721
2022-08-25 10:00:10 +02:00
Dan Smith c178d93606 Unify placement client singleton implementations
We have many places where we implement singleton behavior for the
placement client. This unifies them into a single place and
implementation. Not only does this DRY things up, but may cause us
to initialize it fewer times and also allows for emitting a common
set of error messages about expected failures for better
troubleshooting.

Change-Id: Iab8a791f64323f996e1d6e6d5a7e7a7c34eb4fb3
Related-Bug: #1846820
2022-08-18 07:22:37 -07:00
Rajesh Tailor aa1e7a6933 Fix typos in help messages
This change fixes typos in conf parameter help messages
and in error log message.

Change-Id: Iedc268072d77771b208603e663b0ce9b94215eb8
2022-05-30 17:28:29 +05:30
Dmitrii Shcherbakov 3fd7e94893 Fix migration with remote-managed ports & add FT
`binding:profile` updates are handled differently for migration from
instance creation which was not taken into account previously. Relevant
fields (card_serial_number, pf_mac_address, vf_num) are now added to the
`binding:profile` after a new remote-managed PCI device is determined at
the destination node.

Likewise, there is special handling for the unshelve operation which is
fixed too.

Func testing:

* Allow the generated device XML to contain the PCI VPD capability;
* Add test cases for basic operations on instances with remote-managed
  ports (tunnel or physical);
* Add a live migration test case similar to how it is done for
  non-remote-managed SR-IOV ports but taking remote-managed port related
  specifics into account;
* Add evacuate, shelve/unshelve, cold migration test cases.

Change-Id: I9a1532e9a98f89db69b9ae3b41b06318a43519b3
2022-03-04 18:41:48 +03:00
Dmitrii Shcherbakov c487c730d0 Filter computes without remote-managed ports early
Add a pre-filter for requests that contain VNIC_TYPE_REMOTE_MANAGED
ports in them: hosts that do not have either the relevant compute
driver capability COMPUTE_REMOTE_MANAGED_PORTS or PCI device pools
with "remote_managed" devices are filtered out early. Presence of
devices actually available for allocation is checked at a later
point by the PciPassthroughFilter.

Change-Id: I168d3ccc914f25a3d4255c9b319ee6b91a2f66e2
Implements: blueprint integration-with-off-path-network-backends
2022-02-09 01:23:27 +03:00
Balazs Gibizer 32c1044d86 [rt] Apply migration context for incoming migrations
There is a race condition between an incoming resize and an
update_available_resource periodic in the resource tracker. The race
window starts when the resize_instance RPC finishes  and ends when the
finish_resize compute RPC finally applies the migration context on the
instance.

In the race window, if the update_available_resource periodic is run on
the destination node, then it will see the instance as being tracked on
this host as the instance.node is already pointing to the dest. But the
instance.numa_topology still points to the source host topology as the
migration context is not applied yet. This leads to CPU pinning error if
the source topology does not fit to the dest topology. Also it stops the
periodic task and leaves the tracker in an inconsistent state. The
inconsistent state only cleanup up after the periodic is run outside of
the race window.

This patch applies the migration context temporarily to the specific
instances during the periodic to keep resource accounting correct.

Change-Id: Icaad155e22c9e2d86e464a0deb741c73f0dfb28a
Closes-Bug: #1953359
Closes-Bug: #1952915
2021-12-07 13:32:26 +01:00
Matt Riedemann c09d98dadb Add force kwarg to delete_allocation_for_instance
This adds a force kwarg to delete_allocation_for_instance which
defaults to True because that was found to be the most common use case
by a significant margin during implementation of this patch.
In most cases, this method is called when we want to delete the
allocations because they should be gone, e.g. server delete, failed
build, or shelve offload. The alternative in these cases is the caller
could trap the conflict error and retry but we might as well just force
the delete in that case (it's cleaner).

When force=True, it will DELETE the consumer allocations rather than
GET and PUT with an empty allocations dict and the consumer generation
which can result in a 409 conflict from Placement. For example, bug
1836754 shows that in one tempest test that creates a server and then
immediately deletes it, we can hit a very tight window where the method
GETs the allocations and before it PUTs the empty allocations to remove
them, something changes which results in a conflict and the server
delete fails with a 409 error.

It's worth noting that delete_allocation_for_instance used to just
DELETE the allocations before Stein [1] when we started taking consumer
generations into account. There was also a related mailing list thread
[2].


Closes-Bug: #1836754

[1] I77f34788dd7ab8fdf60d668a4f76452e03cf9888
[2] http://lists.openstack.org/pipermail/openstack-dev/2018-August/133374.html

Change-Id: Ife3c7a5a95c5d707983ab33fd2fbfc1cfb72f676
2021-08-30 06:11:25 +00:00
Mark Goddard 2bb4527228 Invalidate provider tree when compute node disappears
There is a race condition in nova-compute with the ironic virt driver
as nodes get rebalanced. It can lead to compute nodes being removed in
the DB and not repopulated. Ultimately this prevents these nodes from
being scheduled to.

The issue being addressed here is that if a compute node is deleted by a
host which thinks it is an orphan, then the resource provider for that
node might also be deleted. The compute host that owns the node might
not recreate the resource provider if it exists in the provider tree
cache.

This change fixes the issue by clearing resource providers from the
provider tree cache for which a compute node entry does not exist. Then,
when the available resource for the node is updated, the resource
providers are not found in the cache and get recreated in placement.

Change-Id: Ia53ff43e6964963cdf295604ba0fb7171389606e
Related-Bug: #1853009
Related-Bug: #1841481
2021-08-12 14:26:45 +01:00
Stephen Finucane 32676a9f45 Clear rebalanced compute nodes from resource tracker
There is a race condition in nova-compute with the ironic virt driver as
nodes get rebalanced. It can lead to compute nodes being removed in the
DB and not repopulated. Ultimately this prevents these nodes from being
scheduled to.

The issue being addressed here is that if a compute node is deleted by a host
which thinks it is an orphan, then the compute host that actually owns the node
might not recreate it if the node is already in its resource tracker cache.

This change fixes the issue by clearing nodes from the resource tracker cache
for which a compute node entry does not exist. Then, when the available
resource for the node is updated, the compute node object is not found in the
cache and gets recreated.

Change-Id: I39241223b447fcc671161c370dbf16e1773b684a
Partial-Bug: #1853009
2021-08-12 14:26:45 +01:00
Stephen Finucane 1bf45c4720 Remove (almost) all references to 'instance_type'
This continues on from I81fec10535034f3a81d46713a6eda813f90561cf and
removes all other references to 'instance_type' where it's possible to
do so. The only things left are DB columns, o.vo fields, some
unversioned objects, and RPC API methods. If we want to remove these, we
can but it's a lot more work.

Change-Id: I264d6df1809d7283415e69a66a9153829b8df537
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2021-03-29 12:24:15 +01:00
Artom Lifshitz 6c3175d3ee pci manager: replace node_id parameter with compute_node
To implement the `socket` PCI NUMA affinity policy, we'll need to
track the host NUMA topology in the PCI stats code. To achieve this,
PCI stats will need to know the compute node it's running on. Prepare
for this by replacing the node_id parameter with compute_node. Node_id
was previously optional, but that looks to have been only to
facilitate testing, as that's the only place where it was not passed
it. We use compute_node (instead of just making node_id mandatory)
because it allows for an optimization later on wherein the PCI manager
does not need to pull the ComputeNode object from the database
needlessly.

Implements: blueprint pci-socket-affinity
Change-Id: Idc839312d1449e9327ee7e3793d53ed080a44d0c
2021-03-08 15:18:46 -05:00
Balazs Gibizer 1273c5ee0b Make PCI claim NUMA aware during live migration
NUMA aware live migration and SRIOV live migration was implemented as
two separate feature. As a consequence the case when both SRIOV and NUMA
is present in the instance was missed. When the PCI device is claimed on
the destination host the NUMA topology of the instance needs to be
passed to the claim call.

Change-Id: If469762b22d687151198468f0291821cebdf26b2
Closes-Bug: #1893221
2020-11-24 11:54:14 +00:00
Zuul ffb916e0a1 Merge "Set instance host and drop migration under lock" 2020-11-18 10:32:56 +00:00
Zuul 4f2540a7a6 Merge "virt: Remove 'get_per_instance_usage' API" 2020-11-09 18:50:31 +00:00
Balazs Gibizer 7675964af8 Set instance host and drop migration under lock
The _update_available_resources periodic makes resource allocation
adjustments while holding the COMPUTE_RESOURCE_SEMAPHORE based on the
list of instances assigned to this host of the resource tracker and
based on the migrations where the source or the target host is the host
of the resource tracker. So if the instance.host or the migration
context changes without holding the COMPUTE_RESOURCE_SEMAPHORE while
the _update_available_resources task is running there there will be data
inconsistency in the resource tracker.

This patch makes sure that during evacuation the instance.host and the
migration context is changed while holding the semaphore.

Change-Id: Ica180165184b319651d22fe77e076af036228860
Closes-Bug: #1896463
2020-11-04 15:02:35 +01:00
Zuul 73846fc37f Merge "Follow up for I67504a37b0fe2ae5da3cba2f3122d9d0e18b9481" 2020-09-11 21:56:26 +00:00
Zuul 648ac72818 Merge "Move revert resize under semaphore" 2020-09-11 16:17:42 +00:00
Stephen Finucane 8aea747c97 virt: Remove 'get_per_instance_usage' API
Another pretty trivial one. This one was intended to provide an overview
of instances that weren't properly tracked but were running on the host.
It was only ever implemented for the XenAPI driver so remove it now.

Change-Id: Icaba3fc89e3295200e3d165722a5c24ee070002c
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2020-09-11 14:10:30 +01:00
Balazs Gibizer 53172fa3b0 Follow up for I67504a37b0fe2ae5da3cba2f3122d9d0e18b9481
Part of blueprint sriov-interface-attach-detach

Change-Id: Ifc5417a8eddf62ad49d898fa6c9c1da71c6e0bb3
2020-09-11 12:39:08 +02:00
Zuul 94810a8612 Merge "Track error migrations in resource tracker" 2020-09-11 02:31:36 +00:00
Zuul 509c01e86d Merge "Support SRIOV interface attach and detach" 2020-09-10 22:54:26 +00:00
Balazs Gibizer 1361ea5ad1 Support SRIOV interface attach and detach
For attach:
* Generates InstancePciRequest for SRIOV interfaces attach requests
* Claims and allocates a PciDevice for such request

For detach:
* Frees PciDevice and deletes the InstancePciRequests

On the libvirt driver side the following small fixes was necessar:
* Fixes PCI address generation to avoid double 0x prefixes in LibvirtConfigGuestHostdevPCI
* Adds support for comparing LibvirtConfigGuestHostdevPCI objects
* Extends the comparison of LibvirtConfigGuestInterface to support
  macvtap interfaces where target_dev is only known by libvirt but not
  nova
* generalize guest.get_interface_by_cfg() to work with both
  LibvirtConfigGuest[Inteface|HostdevPCI] objects

Implements: blueprint sriov-interface-attach-detach

Change-Id: I67504a37b0fe2ae5da3cba2f3122d9d0e18b9481
2020-09-10 18:44:53 +01:00
LuyaoZhong 255b3f2f91 Track error migrations in resource tracker
If rollback_live_migration failed, the migration status is set to
'error', and there might me some resource not be cleaned up like vpmem
since rollback is not completed. So we propose to track those 'error'
migrations in resource tracker until they are cleaned up by periodic
task '_cleanup_incomplete_migrations'.

So if rollback_live_migration succeeds, we need to set the migration
status to 'failed' which will not be tracked in resource tracker. The
'failed' status is already used for resize to indicated a migration
finishing the cleanup.

'_cleanup_incomplete_migrations' will also handle failed
rollback_live_migration cleanup except for failed resize/revert-resize.

Besides, we introduce a new 'cleanup_lingering_instance_resources' virt
driver interface to handle lingering instance resources cleanup
including vpmem cleanup and whatever we add in the future.

Change-Id: I422a907056543f9bf95acbffdd2658438febf801
Partially-Implements: blueprint vpmem-enhancement
2020-09-10 05:30:39 +00:00
Stephen Finucane dc9c7a5ebf Move revert resize under semaphore
As discussed in change I26b050c402f5721fc490126e9becb643af9279b4, the
resource tracker's periodic task is reliant on the status of migrations
to determine whether to include usage from these migrations in the
total, and races between setting the migration status and decrementing
resource usage via 'drop_move_claim' can result in incorrect usage.
That change tackled the confirm resize operation. This one changes the
revert resize operation, and is a little trickier due to kinks in how
both the same-cell and cross-cell resize revert operations work.

For same-cell resize revert, the 'ComputeManager.revert_resize'
function, running on the destination host, sets the migration status to
'reverted' before dropping the move claim. This exposes the same race
that we previously saw with the confirm resize operation. It then calls
back to 'ComputeManager.finish_revert_resize' on the source host to boot
up the instance itself. This is kind of weird, because, even ignoring
the race, we're marking the migration as 'reverted' before we've done
any of the necessary work on the source host.

The cross-cell resize revert splits dropping of the move claim and
setting of the migration status between the source and destination host
tasks. Specifically, we do cleanup on the destination and drop the move
claim first, via 'ComputeManager.revert_snapshot_based_resize_at_dest'
before resuming the instance and setting the migration status on the
source via
'ComputeManager.finish_revert_snapshot_based_resize_at_source'. This
would appear to avoid the weird quirk of same-cell migration, however,
in typical weird cross-cell fashion, these are actually different
instances and different migration records.

The solution is once again to move the setting of the migration status
and the dropping of the claim under 'COMPUTE_RESOURCE_SEMAPHORE'. This
introduces the weird setting of migration status before completion to
the cross-cell resize case and perpetuates it in the same-cell case, but
this seems like a suitable compromise to avoid attempts to do things
like unplugging already unplugged PCI devices or unpinning already
unpinned CPUs. From an end-user perspective, instance state changes are
what really matter and once a revert is completed on the destination
host and the instance has been marked as having returned to the source
host, hard reboots can help us resolve any remaining issues.

Change-Id: I29d6f4a78c0206385a550967ce244794e71cef6d
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-Bug: #1879878
2020-09-03 08:55:55 +00:00
Stephen Finucane a57800d382 Move confirm resize under semaphore
The 'ResourceTracker.update_available_resource' periodic task builds
usage information for the current host by inspecting instances and
in-progress migrations, combining the two. Specifically, it finds all
instances that are not in the 'DELETED' or 'SHELVED_OFFLOADED' state,
calculates the usage from these, then finds all in-progress migrations
for the host that don't have an associated instance (to prevent double
accounting) and includes the usage for these.

In addition to the periodic task, the 'ResourceTracker' class has a
number of helper functions to make or drop claims for the inventory
generated by the 'update_available_resource' periodic task as part of
the various instance operations. These helpers naturally assume that
when making a claim for a particular instance or migration, there
shouldn't already be resources allocated for same. Conversely, when
dropping claims, the resources should currently be allocated. However,
the check for *active* instances and *in-progress* migrations in the
periodic task means we have to be careful in how we make changes to a
given instance or migration record. Running the periodic task between
such an operation and an attempt to make or drop a claim can result in
TOCTOU-like races.

This generally isn't an issue: we use the 'COMPUTE_RESOURCE_SEMAPHORE'
semaphore to prevent the periodic task running while we're claiming
resources in helpers like 'ResourceTracker.instance_claim' and we make
our changes to the instances and migrations within this context. There
is one exception though: the 'drop_move_claim' helper. This function is
used when dropping a claim for either a cold migration, a resize or a
live migration, and will drop usage from either the source host (based
on the "old" flavor) for a resize confirm or the destination host (based
on the "new" flavor) for a resize revert or live migration rollback.
Unfortunately, while the function itself is wrapped in the semaphore, no
changes to the state or the instance or migration in question are
protected by it.

Consider the confirm resize case, which we're addressing here. If we
mark the migration as 'confirmed' before running 'drop_move_claim', then
the periodic task running between these steps will not account for the
usage on the source since the migration is allegedly 'confirmed'. The
call to 'drop_move_claim' will then result in the tracker dropping usage
that we're no longer accounting for. This "set migration status before
dropping usage" is the current behaviour for both same-cell and
cross-cell resize, via the 'ComputeManager.confirm_resize' and
'ComputeManager.confirm_snapshot_based_resize_at_source' functions,
respectively. We could reverse those calls and run 'drop_move_claim'
before marking the migration as 'confirmed', but while our usage will be
momentarily correct, the periodic task running between these steps will
re-add the usage we just dropped since the migration isn't yet
'confirmed'. The correct solution is to close this gap between setting
the migration status and dropping the move claim to zero. We do this by
putting both operations behind the 'COMPUTE_RESOURCE_SEMAPHORE', just
like the claim operations.

Change-Id: I26b050c402f5721fc490126e9becb643af9279b4
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Partial-Bug: #1879878
2020-09-03 08:55:47 +00:00
Dustin Cowles 260713dc22 Provider Config File: Enable loading and merging of provider configs
This series implements the referenced blueprint to allow for specifying
custom resource provider traits and inventories via yaml config files.

This fourth commit adds the config option, release notes, documentation,
functional tests, and calls to the previously implemented functions in
order to load provider config files and merge them to the provider tree.

Change-Id: I59c5758c570acccb629f7010d3104e00d79976e4
Blueprint: provider-config-file
2020-08-26 23:18:53 +08:00
Dustin Cowles fc8deb4f86 Provider Config File: Functions to merge provider configs to provider tree
This series implements the referenced blueprint to allow for specifying
custom resource provider traits and inventories via yaml config files.

This third commit includes functions on the provider tree to merge
additional inventories and traits to resource providers and update
those providers on the provider tree. Those functions are not currently
being called, but will be in a future commit.

Co-Author: Tony Su <tao.su@intel.com>
Author: Dustin Cowles <dustin.cowles@intel.com>
Blueprint: provider-config-file
Change-Id: I142a1f24ff2219cf308578f0236259d183785cff
2020-08-26 04:51:03 +00:00
Stephen Finucane f203da3838 objects: Add MigrationTypeField
We use these things many places in the code and it would be good to have
constants to reference. Do just that.

Note that this results in a change in the object hash. However, there
are no actual changes in the output object so that's okay.

Change-Id: If02567ce0a3431dda5b2bf6d398bbf7cc954eed0
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2020-05-08 14:45:54 +01:00
LuyaoZhong 990a26ef1f partial support for live migration with specific resources
1. Claim allocations from placement first, then claim specific
   resources in Resource Tracker on destination to populate
   migration_context.new_resources
3. cleanup specific resources when live migration succeeds/fails

Because we store specific resources in migration_context during
live migration, to ensure cleanup correctly we can't drop
migration_context before cleanup is complete:
 a) when post live migration, we move source host cleanup before
    destination cleanup(post_live_migration_at_destination will
    apply migration_context and drop it)
 b) when rollback live migration, we drop migration_context after
    rollback operations are complete

For different specific resource, we might need driver specific support,
such as vpmem. This change just ensures that new claimed specific
resources are populated to migration_context and migration_context is not
droped before cleanup is complete.

Change-Id: I44ad826f0edb39d770bb3201c675dff78154cbf3
Implements: blueprint support-live-migration-with-virtual-persistent-memory
2020-04-07 13:12:53 +00:00
Jason Anderson 1ed9f9dac5
Use fair locks in resource tracker
When the resource tracker has to lock a compute host for updates or
inspection, it uses a single semaphore. In most cases, this is fine, as
a compute process only is tracking one hypervisor. However, in Ironic, it's
possible for one compute process to track many hypervisors. In this
case, wait queues for instance claims can get "stuck" briefly behind
longer processing loops such as the update_resources periodic job. The
reason this is possible is because the oslo.lockutils synchronized
library does not use fair locks by default. When a lock is released, one
of the threads waiting for the lock is randomly allowed to take the lock
next. A fair lock ensures that the thread that next requested the lock
will be allowed to take it.

This should ensure that instance claim requests do not have a chance of
losing the lock contest, which should ensure that instance build
requests do not queue unnecessarily behind long-running tasks.

This includes bumping the oslo.concurrency dependency; fair locks were
added in 3.29.0 (I37577becff4978bf643c65fa9bc2d78d342ea35a).

Change-Id: Ia5e521e0f0c7a78b5ace5de9f343e84d872553f9
Related-Bug: #1864122
2020-03-09 11:03:17 -05:00
Balazs Gibizer 56f29b3e4a Remove extra instance.save() calls related to qos SRIOV ports
During creating or moving of an instance with qos SRIOV port the PCI
device claim on the destination compute needs to be restricted to select
PCI VFs from the same PF where the bandwidth for the qos port is
allocated from. This is achieved by updating the spec part of the
InstancePCIRequest with the device name of the PF by calling
update_pci_request_spec_with_allocated_interface_name(). Until now
such update of the instance object was directly persisted by the call.

During code review it was came up that the instance.save() in the util
is not appropriate as the caller has a lot more context to decide when
to persist the changes.

The original eager instance.save was introduced when support added to
the server create flow. Now I realized that the need for such save was
due to a mistake in the original ResourceTracker.instance_claim() call
that loads the InstancePCIRequest from the DB instead of using the
requests through the passed in instance object. By removing the extra DB
call the need for eagerly persisting the PCI spec update is also
removed. It turned out that both the server create code path and every
server move code paths eventually persist the instance object either
during at the end of the claim process or in case of live migration in
the post_live_migration_at_destination compute manager call. This means
that the code now can be simplified. Especially the live migration cases.

In the live migrate abort case we don't need to roll back the eagerly
persisted PCI change as now such change is only persisted at the end
of the migration but still we need to refresh pci_requests field of
the instance object during the rollback as that field might be stale,
containing dest host related PCI information.

Also in case of rescheduling during live migrate if the rescheduling
failed the PCI change needed to be rolled back to the source host by a
specific code. But now those change are never persisted until the
migration finishes so this rollback code can be removed too.

Change-Id: Ied8f96b4e67f79498519931cb6b35dad5288bbb8
blueprint: support-move-ops-with-qos-ports-ussuri
2020-02-03 11:41:38 +01:00
Matt Riedemann 26da4418a9 Deal with cross-cell resize in _remove_deleted_instances_allocations
When reverting a cross-cell resize, conductor will:

1. clean up the destination host
2. set instance.hidden=True and destroy the instance in the
   target cell database
3. finish the revert on the source host which will revert the
   allocations on the source host held by the migration record
   so the instance will hold those again and drop the allocations
   against the dest host which were held by the instance.

If the ResourceTracker.update_available_resource periodic task runs
between steps 2 and 3 it could see that the instance is deleted
from the target cell but there are still allocations held by it and
delete them. Step 3 is what handles deleting those allocations for
the destination node, so we want to leave it that way and take the
ResourceTracker out of the flow.

This change simply checks the instance.hidden value on the deleted
instance and if hidden=True, assumes the allocations will be cleaned
up elsehwere (finish_revert_snapshot_based_resize_at_source).

Ultimately this is probably not something we *have* to have since
finish_revert_snapshot_based_resize_at_source is going to drop the
destination node allocations anyway, but it is good to keep clear
which actor is doing what in this process.

Part of blueprint cross-cell-resize

Change-Id: Idb82b056c39fd167864cadd205d624cb87cbe9cb
2019-12-12 12:00:33 -05:00
Zuul f01fbd8cf0 Merge "Always trait the compute node RP with COMPUTE_NODE" 2019-11-15 23:01:32 +00:00
Zuul 28963bd64c Merge "FUP for Ib62ac0b692eb92a2ed364ec9f486ded05def39ad" 2019-11-15 11:53:51 +00:00
Zuul aa21fe9c9c Merge "Delete _normalize_inventory_from_cn_obj" 2019-11-14 00:59:20 +00:00
Zuul 2dbe174278 Merge "Drop compat for non-update_provider_tree code paths" 2019-11-14 00:54:44 +00:00
Zuul 1c7a3d5908 Merge "Clear instance.launched_on when build fails" 2019-11-13 21:45:04 +00:00