This patch concerns the time when a VM is being unshelved and the
compute manager set the task_state to spawning, claimed resources of
the VM and then called driver.spawn(). So the instance is in vm_state
SHELVED_OFFLOADED, task_state spawning.
If at this point a new update_available_resource periodic job is
started that collects all the instances assigned to the node to
calculate resource usage. However the calculation assumed that a
VM in SHELVED_OFFLOADED state does not need resource allocation on
the node (probably being removed from the node as it is offloaded)
and deleted the resource claim.
Given all this we ended up with the VM spawned successfully but having
lost the resource claim on the node.
This patch changes what we do in vm_state SHELVED_OFFLOADED, task_state
spawning. We no longer delete the resource claim in this state and
keep tracking the resource in stats.
Change-Id: I8c9944810c09d501a6d3f60f095d9817b756872d
Closes-Bug: #2025480
This migrates *existing* Instance records with incomplete compute_id
fields, where necessary, while updating node resources. This is done
separately from the earlier patch to do new instances just to
demonstrate in CI that we're still able to run in such a partially-
migrated state.
Related to blueprint compute-object-ids
Change-Id: Ie18342deeddb7f58564953e6b46ea0b0d7495595
This adds the compute_id field to the Instance object and adds
checks in save() and create() to make sure we no longer update node
without also updating compute_id.
Related to blueprint compute-object-ids
Change-Id: I0740a2e4e09a526da8565a18e6761b4dbdc4ec0b
This makes us store the compute_id of the destination node in the
Migration object. Since resize/cold-migration changes the node
affiliation of an instance *to* the destination node *from* the source
node, we need a positive record of the node id to be used. The
destination node set this to its own node when creating the migration,
and it is used by the source node when the switchover happens.
Because the migration may be backleveled for an older node involved
in that process and thus saved or passed without this field, this
adds a compatibility routine that falls back to looking up the node
by host/nodename.
Related to blueprint compute-object-ids
Change-Id: I362a40403d1094be36412f5f7afba00da8af8301
The ComputeNode object already has a service_id field that we stopped
using a while ago. This moves us back to the point where we set it when
creating new ComputeNode records, and also migrates existing records
when they are loaded.
The resource tracker is created before we may have created the
service record, but is updated afterwards in the pre_start_hook().
So this adds a way for us to pass the service_ref to the resource
tracker during that hook so that it is present before the first time
we update all of our ComputeNode records. It also makes sure to pass
the Service through from the actual Service manager instead of looking
it up again to make sure we maintain the tight relationship and avoid
any name-based ambiguity.
Related to blueprint compute-object-ids
Change-Id: I5e060d674b6145c9797c2251a2822106fc6d4a71
The resource tracker will silently ignore attempts to claim resources
when the node requested is not managed by this host. The misleading
"self.disabled(nodename)" check will fail if the nodename is not known
to the resource tracker, causing us to bail early with a NopClaim.
That means we also don't do additional setup like creating a migration
context for the instance, claim resources in placement, and handle
PCI/NUMA things. This behavior is quite old, and clearly doesn't make
sense in a world with things like placement. The bulk of the test
changes here are due to the fact that a lot of tests were relying on
this silent ignoring of a mismatching node, because they were passing
node names that weren't even tracked.
This change makes us raise an error if this happens so that we can
actually catch it, and avoid silently continuing with no resource
claim.
Change-Id: I416126ee5d10428c296fe618aa877cca0e8dffcf
We do run update_available_resource() synchronously during service
startup, but we only allow certain exceptions to abort startup. This
makes us abort for InvalidConfiguration, and makes the resource
tracker raise that for the case where the compute node create failed
due to a duplicate entry.
This also modifies the object to raise a nova-specific error for that
condition to avoid the compute node needing to import oslo_db stuff
just to be able to catch it.
Change-Id: I5de98e6fe52e45996bc2e1014fa8a09a2de53682
This makes the resource tracker look up and create ComputeNode objects
by uuid instead of nodename. For drivers like ironic that already
provide 'uuid' in the resources dict, we can use that. For those
that do not, we force the uuid to be the locally-persisted node
uuid, and use that to find/create the ComputeNode object.
A (happy) side-effect of this is that if we find a deleted compute
node object that matches that of our hypervisor, we undelete it
instead of re-creating one with a new uuid, which may clash with our
old one. This means we remove some of the special-casing of ironic
rebalance, although the tests for that still largely stay the same.
Change-Id: I6a582a38c302fd1554a49abc38cfeda7c324d911
Id02e445c55fc956965b7d725f0260876d42422f2 added special case in the
healing logic for same host resize. Now that the scheduler also creates
allocation on the destination host during resize we need to make sure
that the drop_move_claim code that runs during revert and confirm drops
the tracked migration from the resource tracker only after the healing
logic run as these migrations being confirmed / reverted are still
affecting PciDevices at this point.
blueprint: pci-device-tracking-in-placement
Change-Id: I6241965fe6c1cc1f2560fcce65d5e32ef308d502
Nova's PCI scheduling (and the PCI claim) works based on PCI device
pools where the similar available PCI devices are assigned. The PCI
devices are now represented in placement as RPs. And the allocation
candidates during scheduling and the allocation after scheduling
now contain PCI devices. This information needs to affect the PCI
scheduling and PCI claim. To be able to do that we need to map PCI
device pools to RPs. We achieve that here by first mapping
PciDevice objects to RPs during placement PCI inventory reporting.
Then mapping pools to RPs based on the PCI devices assigned to the
pools.
Also because now ResourceTracker._update_to_placement() call updates
the PCI device pools the sequence of events needed to changed in the
ResourceTracker to:
1) run _update_to_placement()
2) copy the pools to the CompouteNode object
3) save the compute to the DB
4) save the PCI tracker
blueprint: pci-device-tracking-in-placement
Change-Id: I9bb450ac235ab72ff0d8078635e7a11c04ff6c1e
PCI devices which are allocated to instances can be removed from the
[pci]device_spec configuration or can be removed from the hypervisor
directly. The existing PciTracker code handle this cases by keeping the
PciDevice in the nova DB exists and allocated and issue a warning in the
logs during the compute service startup that nova is in an inconsistent
state. Similar behavior is now added to the PCI placement tracking code
so the PCI inventories and allocations in placement is kept in such
situation.
There is one case where we cannot simply accept the PCI device
reconfiguration by keeping the existing allocations and applying the new
config. It is when a PF that is configured and allocated is removed and
VFs from this PF is now configured in the [pci]device_spec. And vice
versa when VFs are removed and its parent PF is configured. In this case
keeping the existing inventory and allocations and adding the new inventory
to placement would result in placement model where a single PCI device
would provide both PF and VF inventories. This dependent device
configuration is not supported as it could lead to double consumption.
In such situation the compute service will refuse to start.
blueprint: pci-device-tracking-in-placement
Change-Id: Id130893de650cc2d38953cea7cf9f53af71ced93
Same host resize needs special handling in the allocation healing logic
as both the source and the dest host PCI devices are visible to the
healing code as the PciDevice.instance_uuid points to the healed
instance in both cases.
blueprint: pci-device-tracking-in-placement
Change-Id: Id02e445c55fc956965b7d725f0260876d42422f2
During a normal update_available_resources run if the local provider
tree caches is invalid (i.e. due to the scheduler made an allocation
bumping the generation of the RPs) and the virt driver try to update the
inventory of an RP based on the cache Placement will report conflict,
the report client will invalidate the caches and the retry decorator
on ResourceTracker._update_to_placement will re-drive the top of the
fresh RP data.
However the same thing can happen during reshape as well but the retry
mechanism is missing in that code path so the stale caches can cause
reshape failures.
This patch adds specific error handling in the reshape code path to
implement the same retry mechanism as exists for inventory update.
blueprint: pci-device-tracking-in-placement
Change-Id: Ieb954a04e6aba827611765f7f401124a1fe298f3
A new PCI resource handler is added to the update_available_resources
code path update the ProviderTree with PCI device RPs, inventories and
traits.
It is a bit different than the other Placement inventory reporter. It
does not run in the virt driver level as PCI is tracked in a generic way
in the PCI tracker in the resource tracker. So the virt specific
information is already parsed and abstracted by the resource tracker.
Another difference is that to support rolling upgrade the PCI handler
code needs to be prepared for situations where the scheduler does not
create PCI allocations even after some of the compute already started
reporting inventories and started healing PCI allocations. So the code
is not prepared to do a single, one shot, reshape at startup, but
instead to do a continuous healing of the allocations. We can remove
this continuous healing after the PCI prefilter will be made mandatory
in a future release.
The whole PCI placement reporting behavior is disabled by default while
it is incomplete. When it is functionally complete a new
[pci]report_in_placement config option will be added to allow enabling
the feature. This config is intentionally not added by this patch as we
don't want to allow enabling this logic yet.
blueprint: pci-device-tracking-in-placement
Change-Id: If975c3ec09ffa95f647eb4419874aa8417a59721
We have many places where we implement singleton behavior for the
placement client. This unifies them into a single place and
implementation. Not only does this DRY things up, but may cause us
to initialize it fewer times and also allows for emitting a common
set of error messages about expected failures for better
troubleshooting.
Change-Id: Iab8a791f64323f996e1d6e6d5a7e7a7c34eb4fb3
Related-Bug: #1846820
`binding:profile` updates are handled differently for migration from
instance creation which was not taken into account previously. Relevant
fields (card_serial_number, pf_mac_address, vf_num) are now added to the
`binding:profile` after a new remote-managed PCI device is determined at
the destination node.
Likewise, there is special handling for the unshelve operation which is
fixed too.
Func testing:
* Allow the generated device XML to contain the PCI VPD capability;
* Add test cases for basic operations on instances with remote-managed
ports (tunnel or physical);
* Add a live migration test case similar to how it is done for
non-remote-managed SR-IOV ports but taking remote-managed port related
specifics into account;
* Add evacuate, shelve/unshelve, cold migration test cases.
Change-Id: I9a1532e9a98f89db69b9ae3b41b06318a43519b3
Add a pre-filter for requests that contain VNIC_TYPE_REMOTE_MANAGED
ports in them: hosts that do not have either the relevant compute
driver capability COMPUTE_REMOTE_MANAGED_PORTS or PCI device pools
with "remote_managed" devices are filtered out early. Presence of
devices actually available for allocation is checked at a later
point by the PciPassthroughFilter.
Change-Id: I168d3ccc914f25a3d4255c9b319ee6b91a2f66e2
Implements: blueprint integration-with-off-path-network-backends
There is a race condition between an incoming resize and an
update_available_resource periodic in the resource tracker. The race
window starts when the resize_instance RPC finishes and ends when the
finish_resize compute RPC finally applies the migration context on the
instance.
In the race window, if the update_available_resource periodic is run on
the destination node, then it will see the instance as being tracked on
this host as the instance.node is already pointing to the dest. But the
instance.numa_topology still points to the source host topology as the
migration context is not applied yet. This leads to CPU pinning error if
the source topology does not fit to the dest topology. Also it stops the
periodic task and leaves the tracker in an inconsistent state. The
inconsistent state only cleanup up after the periodic is run outside of
the race window.
This patch applies the migration context temporarily to the specific
instances during the periodic to keep resource accounting correct.
Change-Id: Icaad155e22c9e2d86e464a0deb741c73f0dfb28a
Closes-Bug: #1953359
Closes-Bug: #1952915
This adds a force kwarg to delete_allocation_for_instance which
defaults to True because that was found to be the most common use case
by a significant margin during implementation of this patch.
In most cases, this method is called when we want to delete the
allocations because they should be gone, e.g. server delete, failed
build, or shelve offload. The alternative in these cases is the caller
could trap the conflict error and retry but we might as well just force
the delete in that case (it's cleaner).
When force=True, it will DELETE the consumer allocations rather than
GET and PUT with an empty allocations dict and the consumer generation
which can result in a 409 conflict from Placement. For example, bug
1836754 shows that in one tempest test that creates a server and then
immediately deletes it, we can hit a very tight window where the method
GETs the allocations and before it PUTs the empty allocations to remove
them, something changes which results in a conflict and the server
delete fails with a 409 error.
It's worth noting that delete_allocation_for_instance used to just
DELETE the allocations before Stein [1] when we started taking consumer
generations into account. There was also a related mailing list thread
[2].
Closes-Bug: #1836754
[1] I77f34788dd7ab8fdf60d668a4f76452e03cf9888
[2] http://lists.openstack.org/pipermail/openstack-dev/2018-August/133374.html
Change-Id: Ife3c7a5a95c5d707983ab33fd2fbfc1cfb72f676
There is a race condition in nova-compute with the ironic virt driver
as nodes get rebalanced. It can lead to compute nodes being removed in
the DB and not repopulated. Ultimately this prevents these nodes from
being scheduled to.
The issue being addressed here is that if a compute node is deleted by a
host which thinks it is an orphan, then the resource provider for that
node might also be deleted. The compute host that owns the node might
not recreate the resource provider if it exists in the provider tree
cache.
This change fixes the issue by clearing resource providers from the
provider tree cache for which a compute node entry does not exist. Then,
when the available resource for the node is updated, the resource
providers are not found in the cache and get recreated in placement.
Change-Id: Ia53ff43e6964963cdf295604ba0fb7171389606e
Related-Bug: #1853009
Related-Bug: #1841481
There is a race condition in nova-compute with the ironic virt driver as
nodes get rebalanced. It can lead to compute nodes being removed in the
DB and not repopulated. Ultimately this prevents these nodes from being
scheduled to.
The issue being addressed here is that if a compute node is deleted by a host
which thinks it is an orphan, then the compute host that actually owns the node
might not recreate it if the node is already in its resource tracker cache.
This change fixes the issue by clearing nodes from the resource tracker cache
for which a compute node entry does not exist. Then, when the available
resource for the node is updated, the compute node object is not found in the
cache and gets recreated.
Change-Id: I39241223b447fcc671161c370dbf16e1773b684a
Partial-Bug: #1853009
This continues on from I81fec10535034f3a81d46713a6eda813f90561cf and
removes all other references to 'instance_type' where it's possible to
do so. The only things left are DB columns, o.vo fields, some
unversioned objects, and RPC API methods. If we want to remove these, we
can but it's a lot more work.
Change-Id: I264d6df1809d7283415e69a66a9153829b8df537
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
To implement the `socket` PCI NUMA affinity policy, we'll need to
track the host NUMA topology in the PCI stats code. To achieve this,
PCI stats will need to know the compute node it's running on. Prepare
for this by replacing the node_id parameter with compute_node. Node_id
was previously optional, but that looks to have been only to
facilitate testing, as that's the only place where it was not passed
it. We use compute_node (instead of just making node_id mandatory)
because it allows for an optimization later on wherein the PCI manager
does not need to pull the ComputeNode object from the database
needlessly.
Implements: blueprint pci-socket-affinity
Change-Id: Idc839312d1449e9327ee7e3793d53ed080a44d0c
NUMA aware live migration and SRIOV live migration was implemented as
two separate feature. As a consequence the case when both SRIOV and NUMA
is present in the instance was missed. When the PCI device is claimed on
the destination host the NUMA topology of the instance needs to be
passed to the claim call.
Change-Id: If469762b22d687151198468f0291821cebdf26b2
Closes-Bug: #1893221
The _update_available_resources periodic makes resource allocation
adjustments while holding the COMPUTE_RESOURCE_SEMAPHORE based on the
list of instances assigned to this host of the resource tracker and
based on the migrations where the source or the target host is the host
of the resource tracker. So if the instance.host or the migration
context changes without holding the COMPUTE_RESOURCE_SEMAPHORE while
the _update_available_resources task is running there there will be data
inconsistency in the resource tracker.
This patch makes sure that during evacuation the instance.host and the
migration context is changed while holding the semaphore.
Change-Id: Ica180165184b319651d22fe77e076af036228860
Closes-Bug: #1896463
Another pretty trivial one. This one was intended to provide an overview
of instances that weren't properly tracked but were running on the host.
It was only ever implemented for the XenAPI driver so remove it now.
Change-Id: Icaba3fc89e3295200e3d165722a5c24ee070002c
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
For attach:
* Generates InstancePciRequest for SRIOV interfaces attach requests
* Claims and allocates a PciDevice for such request
For detach:
* Frees PciDevice and deletes the InstancePciRequests
On the libvirt driver side the following small fixes was necessar:
* Fixes PCI address generation to avoid double 0x prefixes in LibvirtConfigGuestHostdevPCI
* Adds support for comparing LibvirtConfigGuestHostdevPCI objects
* Extends the comparison of LibvirtConfigGuestInterface to support
macvtap interfaces where target_dev is only known by libvirt but not
nova
* generalize guest.get_interface_by_cfg() to work with both
LibvirtConfigGuest[Inteface|HostdevPCI] objects
Implements: blueprint sriov-interface-attach-detach
Change-Id: I67504a37b0fe2ae5da3cba2f3122d9d0e18b9481
If rollback_live_migration failed, the migration status is set to
'error', and there might me some resource not be cleaned up like vpmem
since rollback is not completed. So we propose to track those 'error'
migrations in resource tracker until they are cleaned up by periodic
task '_cleanup_incomplete_migrations'.
So if rollback_live_migration succeeds, we need to set the migration
status to 'failed' which will not be tracked in resource tracker. The
'failed' status is already used for resize to indicated a migration
finishing the cleanup.
'_cleanup_incomplete_migrations' will also handle failed
rollback_live_migration cleanup except for failed resize/revert-resize.
Besides, we introduce a new 'cleanup_lingering_instance_resources' virt
driver interface to handle lingering instance resources cleanup
including vpmem cleanup and whatever we add in the future.
Change-Id: I422a907056543f9bf95acbffdd2658438febf801
Partially-Implements: blueprint vpmem-enhancement
As discussed in change I26b050c402f5721fc490126e9becb643af9279b4, the
resource tracker's periodic task is reliant on the status of migrations
to determine whether to include usage from these migrations in the
total, and races between setting the migration status and decrementing
resource usage via 'drop_move_claim' can result in incorrect usage.
That change tackled the confirm resize operation. This one changes the
revert resize operation, and is a little trickier due to kinks in how
both the same-cell and cross-cell resize revert operations work.
For same-cell resize revert, the 'ComputeManager.revert_resize'
function, running on the destination host, sets the migration status to
'reverted' before dropping the move claim. This exposes the same race
that we previously saw with the confirm resize operation. It then calls
back to 'ComputeManager.finish_revert_resize' on the source host to boot
up the instance itself. This is kind of weird, because, even ignoring
the race, we're marking the migration as 'reverted' before we've done
any of the necessary work on the source host.
The cross-cell resize revert splits dropping of the move claim and
setting of the migration status between the source and destination host
tasks. Specifically, we do cleanup on the destination and drop the move
claim first, via 'ComputeManager.revert_snapshot_based_resize_at_dest'
before resuming the instance and setting the migration status on the
source via
'ComputeManager.finish_revert_snapshot_based_resize_at_source'. This
would appear to avoid the weird quirk of same-cell migration, however,
in typical weird cross-cell fashion, these are actually different
instances and different migration records.
The solution is once again to move the setting of the migration status
and the dropping of the claim under 'COMPUTE_RESOURCE_SEMAPHORE'. This
introduces the weird setting of migration status before completion to
the cross-cell resize case and perpetuates it in the same-cell case, but
this seems like a suitable compromise to avoid attempts to do things
like unplugging already unplugged PCI devices or unpinning already
unpinned CPUs. From an end-user perspective, instance state changes are
what really matter and once a revert is completed on the destination
host and the instance has been marked as having returned to the source
host, hard reboots can help us resolve any remaining issues.
Change-Id: I29d6f4a78c0206385a550967ce244794e71cef6d
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-Bug: #1879878
The 'ResourceTracker.update_available_resource' periodic task builds
usage information for the current host by inspecting instances and
in-progress migrations, combining the two. Specifically, it finds all
instances that are not in the 'DELETED' or 'SHELVED_OFFLOADED' state,
calculates the usage from these, then finds all in-progress migrations
for the host that don't have an associated instance (to prevent double
accounting) and includes the usage for these.
In addition to the periodic task, the 'ResourceTracker' class has a
number of helper functions to make or drop claims for the inventory
generated by the 'update_available_resource' periodic task as part of
the various instance operations. These helpers naturally assume that
when making a claim for a particular instance or migration, there
shouldn't already be resources allocated for same. Conversely, when
dropping claims, the resources should currently be allocated. However,
the check for *active* instances and *in-progress* migrations in the
periodic task means we have to be careful in how we make changes to a
given instance or migration record. Running the periodic task between
such an operation and an attempt to make or drop a claim can result in
TOCTOU-like races.
This generally isn't an issue: we use the 'COMPUTE_RESOURCE_SEMAPHORE'
semaphore to prevent the periodic task running while we're claiming
resources in helpers like 'ResourceTracker.instance_claim' and we make
our changes to the instances and migrations within this context. There
is one exception though: the 'drop_move_claim' helper. This function is
used when dropping a claim for either a cold migration, a resize or a
live migration, and will drop usage from either the source host (based
on the "old" flavor) for a resize confirm or the destination host (based
on the "new" flavor) for a resize revert or live migration rollback.
Unfortunately, while the function itself is wrapped in the semaphore, no
changes to the state or the instance or migration in question are
protected by it.
Consider the confirm resize case, which we're addressing here. If we
mark the migration as 'confirmed' before running 'drop_move_claim', then
the periodic task running between these steps will not account for the
usage on the source since the migration is allegedly 'confirmed'. The
call to 'drop_move_claim' will then result in the tracker dropping usage
that we're no longer accounting for. This "set migration status before
dropping usage" is the current behaviour for both same-cell and
cross-cell resize, via the 'ComputeManager.confirm_resize' and
'ComputeManager.confirm_snapshot_based_resize_at_source' functions,
respectively. We could reverse those calls and run 'drop_move_claim'
before marking the migration as 'confirmed', but while our usage will be
momentarily correct, the periodic task running between these steps will
re-add the usage we just dropped since the migration isn't yet
'confirmed'. The correct solution is to close this gap between setting
the migration status and dropping the move claim to zero. We do this by
putting both operations behind the 'COMPUTE_RESOURCE_SEMAPHORE', just
like the claim operations.
Change-Id: I26b050c402f5721fc490126e9becb643af9279b4
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Partial-Bug: #1879878
This series implements the referenced blueprint to allow for specifying
custom resource provider traits and inventories via yaml config files.
This fourth commit adds the config option, release notes, documentation,
functional tests, and calls to the previously implemented functions in
order to load provider config files and merge them to the provider tree.
Change-Id: I59c5758c570acccb629f7010d3104e00d79976e4
Blueprint: provider-config-file
This series implements the referenced blueprint to allow for specifying
custom resource provider traits and inventories via yaml config files.
This third commit includes functions on the provider tree to merge
additional inventories and traits to resource providers and update
those providers on the provider tree. Those functions are not currently
being called, but will be in a future commit.
Co-Author: Tony Su <tao.su@intel.com>
Author: Dustin Cowles <dustin.cowles@intel.com>
Blueprint: provider-config-file
Change-Id: I142a1f24ff2219cf308578f0236259d183785cff
We use these things many places in the code and it would be good to have
constants to reference. Do just that.
Note that this results in a change in the object hash. However, there
are no actual changes in the output object so that's okay.
Change-Id: If02567ce0a3431dda5b2bf6d398bbf7cc954eed0
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
1. Claim allocations from placement first, then claim specific
resources in Resource Tracker on destination to populate
migration_context.new_resources
3. cleanup specific resources when live migration succeeds/fails
Because we store specific resources in migration_context during
live migration, to ensure cleanup correctly we can't drop
migration_context before cleanup is complete:
a) when post live migration, we move source host cleanup before
destination cleanup(post_live_migration_at_destination will
apply migration_context and drop it)
b) when rollback live migration, we drop migration_context after
rollback operations are complete
For different specific resource, we might need driver specific support,
such as vpmem. This change just ensures that new claimed specific
resources are populated to migration_context and migration_context is not
droped before cleanup is complete.
Change-Id: I44ad826f0edb39d770bb3201c675dff78154cbf3
Implements: blueprint support-live-migration-with-virtual-persistent-memory
When the resource tracker has to lock a compute host for updates or
inspection, it uses a single semaphore. In most cases, this is fine, as
a compute process only is tracking one hypervisor. However, in Ironic, it's
possible for one compute process to track many hypervisors. In this
case, wait queues for instance claims can get "stuck" briefly behind
longer processing loops such as the update_resources periodic job. The
reason this is possible is because the oslo.lockutils synchronized
library does not use fair locks by default. When a lock is released, one
of the threads waiting for the lock is randomly allowed to take the lock
next. A fair lock ensures that the thread that next requested the lock
will be allowed to take it.
This should ensure that instance claim requests do not have a chance of
losing the lock contest, which should ensure that instance build
requests do not queue unnecessarily behind long-running tasks.
This includes bumping the oslo.concurrency dependency; fair locks were
added in 3.29.0 (I37577becff4978bf643c65fa9bc2d78d342ea35a).
Change-Id: Ia5e521e0f0c7a78b5ace5de9f343e84d872553f9
Related-Bug: #1864122
During creating or moving of an instance with qos SRIOV port the PCI
device claim on the destination compute needs to be restricted to select
PCI VFs from the same PF where the bandwidth for the qos port is
allocated from. This is achieved by updating the spec part of the
InstancePCIRequest with the device name of the PF by calling
update_pci_request_spec_with_allocated_interface_name(). Until now
such update of the instance object was directly persisted by the call.
During code review it was came up that the instance.save() in the util
is not appropriate as the caller has a lot more context to decide when
to persist the changes.
The original eager instance.save was introduced when support added to
the server create flow. Now I realized that the need for such save was
due to a mistake in the original ResourceTracker.instance_claim() call
that loads the InstancePCIRequest from the DB instead of using the
requests through the passed in instance object. By removing the extra DB
call the need for eagerly persisting the PCI spec update is also
removed. It turned out that both the server create code path and every
server move code paths eventually persist the instance object either
during at the end of the claim process or in case of live migration in
the post_live_migration_at_destination compute manager call. This means
that the code now can be simplified. Especially the live migration cases.
In the live migrate abort case we don't need to roll back the eagerly
persisted PCI change as now such change is only persisted at the end
of the migration but still we need to refresh pci_requests field of
the instance object during the rollback as that field might be stale,
containing dest host related PCI information.
Also in case of rescheduling during live migrate if the rescheduling
failed the PCI change needed to be rolled back to the source host by a
specific code. But now those change are never persisted until the
migration finishes so this rollback code can be removed too.
Change-Id: Ied8f96b4e67f79498519931cb6b35dad5288bbb8
blueprint: support-move-ops-with-qos-ports-ussuri
When reverting a cross-cell resize, conductor will:
1. clean up the destination host
2. set instance.hidden=True and destroy the instance in the
target cell database
3. finish the revert on the source host which will revert the
allocations on the source host held by the migration record
so the instance will hold those again and drop the allocations
against the dest host which were held by the instance.
If the ResourceTracker.update_available_resource periodic task runs
between steps 2 and 3 it could see that the instance is deleted
from the target cell but there are still allocations held by it and
delete them. Step 3 is what handles deleting those allocations for
the destination node, so we want to leave it that way and take the
ResourceTracker out of the flow.
This change simply checks the instance.hidden value on the deleted
instance and if hidden=True, assumes the allocations will be cleaned
up elsehwere (finish_revert_snapshot_based_resize_at_source).
Ultimately this is probably not something we *have* to have since
finish_revert_snapshot_based_resize_at_source is going to drop the
destination node allocations anyway, but it is good to keep clear
which actor is doing what in this process.
Part of blueprint cross-cell-resize
Change-Id: Idb82b056c39fd167864cadd205d624cb87cbe9cb