This chnage adds the pre-commit config and
tox targets to run codespell both indepenetly
and via the pep8 target.
This change correct all the final typos in the
codebase as detected by codespell.
Change-Id: Ic4fb5b3a5559bc3c43aca0a39edc0885da58eaa2
this is the inital patch of applying codespell to nova.
codespell is a programing focused spellchecker that
looks for common typos and corrects them.
i am breaking this into multiple commits to make it simpler
to read and will automate the execution of codespell
at the end of the series.
Change-Id: If24a6c0a890f713545faa2d44b069c352655274e
When [quota]count_usage_from_placement = true or
[quota]driver = nova.quota.UnifiedLimitsDriver, cores and ram quota
usage are counted from placement. When an instance is SHELVED_OFFLOADED,
it will not have allocations in placement, so its cores and ram should
not count against quota during that time.
This means however that when an instance is unshelved, there is a
possibility of going over quota if the cores and ram it needs were
allocated by some other instance(s) while it was SHELVED_OFFLOADED.
This fixes a bug where quota was not being properly enforced during
unshelve of a SHELVED_OFFLOADED instance when quota usage is counted
from placement. Test coverage is also added for the "recheck" quota
cases.
Closes-Bug: #2003991
Change-Id: I4ab97626c10052c7af9934a80ff8db9ddab82738
Related to the bp/allowing-target-state-for-evacuate. This change
is extending compute API to accept a new argument targetState.
The targetState argument when set will force state of an evacuated
instance to the destination host.
Signed-off-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@industrialdiscipline.com>
Change-Id: I9660d42937ad62d647afc6be965f166cc5631392
This patch adds support for cold migrate, and resize with PCI
devices when the placement tracking is enabled.
Same host resize, evacuate and unshelve will be supported by subsequent
patches. Live migration was not supported with flavor based PCI requests
before so it won't be supported now either.
blueprint: pci-device-tracking-in-placement
Change-Id: I8eec331ab3c30e5958ed19c173eff9998c1f41b0
After the scheduler selected a target host and allocated an allocation
candidate that is passed the filters nova need to make sure that PCI
claim will allocate the real PCI devices from the RP which is allocated
in placement. Placement returns the request group - provider mapping for
each allocation candidate so nova can map which InstancePCIRequest was
fulfilled from which RP in the selected allocation candidate. This
mapping is then recorded in the InstancePCIRequest object and used
during the PCI claim to filter for PCI pools that can be used to claim
PCI devices from.
blueprint: pci-device-tracking-in-placement
Change-Id: I18bb31e23cc014411db68c31317ed983886d1a8e
This patch adds support for passing the ``reimage_boot_volume``
flag from the API layer through the conductor layer to the
computer layer and also includes RPC bump as necessary.
Related blueprint volume-backed-server-rebuild
Change-Id: I8daf177eb67d08112a16fe788910644abf338fa6
This patch adds the plumbing for rebuilding a volume backed
instance in compute code. This functionality will be enabled
in a subsequent patch which adds a new microversion and the
external support for requesting it.
The flow of the operation is as follows:
1) Create an empty attachment
2) Detach the volume
3) Request cinder to reimage the volume
4) Wait for cinder to notify success to nova (via external events)
5) Update and complete the attachment
Related blueprint volume-backed-server-rebuild
Change-Id: I0d889691de1af6875603a9f0f174590229e7be18
Conductor creates a placement client for the potential case where
it needs to make a call for certain operations. A transient network
or keystone failure will currently cause it to abort startup, which
means it is not available for other unrelated activities, such as
DB proxying for compute.
This makes conductor test the placement client on startup, but only
abort startup on errors that are highly likely to be permanent
configuration errors, and only warn about things like being unable
to contact keystone/placement during initialization. If a non-fatal
error is encountered at startup, later operations needing the
placement client will retry initialization.
Closes-Bug: #1846820
Change-Id: Idb7fcbce0c9562e7b9bd3e80f2a6d4b9bc286830
We have many places where we implement singleton behavior for the
placement client. This unifies them into a single place and
implementation. Not only does this DRY things up, but may cause us
to initialize it fewer times and also allows for emitting a common
set of error messages about expected failures for better
troubleshooting.
Change-Id: Iab8a791f64323f996e1d6e6d5a7e7a7c34eb4fb3
Related-Bug: #1846820
The nova.utils.spawn and spawn_n methods transport
the context (and profiling information) to the
newly created threads. But the same isn't done
when submitting work to thread-pools in the
ComputeManager.
The code doing that for spawn and spawn_n
is extracted to a new function
and called to submit the work to the thread-pools.
Closes-Bug: #1962574
Change-Id: I9085deaa8cf0b167d87db68e4afc4a463c00569c
When turned on, this will disable the version-checking of hypervisors
during live-migration. This can be useful for operators in certain
scenarios when upgrading. E.g. if you want to relocate all instances
off a compute node due to an emergency hardware issue, and you only have
another old compute node ready at the time.
Note, though: libvirt will do its own internal compatibility checks, and
might still reject live migration if the destination is incompatible.
Closes-Bug: #1982853
Change-Id: Iec387dcbc49ddb91ebf5cfd188224eaf6021c0e1
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
This patch introduce changes to the compute API that will allow
PROJECT_ADMIN to unshelve an shelved offloaded server to a specific host.
This patch also supports the ability to unpin the availability_zone of an
instance that is bound to it.
Implements: blueprint unshelve-to-host
Change-Id: Ieb4766fdd88c469574fad823e05fe401537cdc30
This change removes return statement from rpc cast method calls.
As rpc cast are asynchronous, so doesn't return anything.
Change-Id: I766f64f2c086dd652bc28b338320cc94ccc48f1f
We now enforce limits on resources requested in the flavor.
This includes: instances, ram, cores. It also works for any resource
class being requested via the flavor chosen, such as custom resource
classes relating to Ironic resources.
Note because disk resources can be limited, we need to know if the
instance is boot from volume or not. This has meant adding extra code to
make sure we know that when enforcing the limits.
Follow on patches will update the APIs to accurately report the limits
being applied to instances, ram and cores.
blueprint unified-limits-nova
Change-Id: If1df93400dcbcb1d3aac0ade80ae5ecf6ce38d11
autopep8 is a code formating tool that makes python code pep8
compliant without changing everything. Unlike black it will
not radically change all code and the primary change to the
existing codebase is adding a new line after class level doc strings.
This change adds a new tox autopep8 env to manually run it on your
code before you submit a patch, it also adds autopep8 to pre-commit
so if you use pre-commit it will do it for you automatically.
This change runs autopep8 in diff mode with --exit-code in the pep8
tox env so it will fail if autopep8 would modify your code if run
in in-place mode. This allows use to gate on autopep8 not modifying
patches that are submited. This will ensure authorship of patches is
maintianed.
The intent of this change is to save the large amount of time we spend
on ensuring style guidlines are followed automatically to make it
simpler for both new and old contibutors to work on nova and save
time and effort for all involved.
Change-Id: Idd618d634cc70ae8d58fab32f322e75bfabefb9d
There are couple of changes we can make here:
- Always attempt to refresh the cache before checking if an extension is
enabled.
- Using extension slugs as our reference point rather than extension
names. They seem like a better thing to use as a constant and are
similarly fixed.
- Be consistent in how we name and call the extension check functions
- Add documentation for what each extension doing/used for
There's a TODO here to remove some code that relies on an out-of-tree
extension that I can't see. That's done separately since this is already
big enough.
Change-Id: I8058902df167239fa455396d3595a56bcf472b2b
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
This adds a force kwarg to delete_allocation_for_instance which
defaults to True because that was found to be the most common use case
by a significant margin during implementation of this patch.
In most cases, this method is called when we want to delete the
allocations because they should be gone, e.g. server delete, failed
build, or shelve offload. The alternative in these cases is the caller
could trap the conflict error and retry but we might as well just force
the delete in that case (it's cleaner).
When force=True, it will DELETE the consumer allocations rather than
GET and PUT with an empty allocations dict and the consumer generation
which can result in a 409 conflict from Placement. For example, bug
1836754 shows that in one tempest test that creates a server and then
immediately deletes it, we can hit a very tight window where the method
GETs the allocations and before it PUTs the empty allocations to remove
them, something changes which results in a conflict and the server
delete fails with a 409 error.
It's worth noting that delete_allocation_for_instance used to just
DELETE the allocations before Stein [1] when we started taking consumer
generations into account. There was also a related mailing list thread
[2].
Closes-Bug: #1836754
[1] I77f34788dd7ab8fdf60d668a4f76452e03cf9888
[2] http://lists.openstack.org/pipermail/openstack-dev/2018-August/133374.html
Change-Id: Ife3c7a5a95c5d707983ab33fd2fbfc1cfb72f676
Nova re-generates the resource request of an instance for each server
move operation (migrate, resize, evacuate, live-migrate, unshelve) to
find (or validate) a target host for the instance move. This patch
extends the this logic to support the extended resource request from
neutron.
As the changes in the neutron interface code is called from nova-compute
service during the port binding the compute service version is bumped.
And a check is added to the compute-api to reject the move operations
with ports having extended resource request if there are old computes
in the cluster.
blueprint: qos-minimum-guaranteed-packet-rate
Change-Id: Ibcf703e254e720b9a6de17527325758676628d48
delete arqs:
- delete arq while port unbind
- create ops failed and arqs did not bind to instance
- arq bind to instance but not bind to port
Implements: blueprint sriov-smartnic-support
Change-Id: Idab0ee38750d018de409699a0dbdff106d9e11fb
There are no longer any custom filters. We don't need the abstract base
class. Merge the code in and give it a more useful 'SchedulerDriver'
name.
Change-Id: Id08dafa72d617ca85e66d50b3c91045e0e8723d0
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
This made sense back in the day where the ORM was configurable and we
were making lots of direct calls to the database. Now, in a world where
most things happen via o.vo, it's just noise. Remove it.
Change-Id: I216cabcde5311abd46fdad9c95bb72c31b414010
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
This continues on from I81fec10535034f3a81d46713a6eda813f90561cf and
removes all other references to 'instance_type' where it's possible to
do so. The only things left are DB columns, o.vo fields, some
unversioned objects, and RPC API methods. If we want to remove these, we
can but it's a lot more work.
Change-Id: I264d6df1809d7283415e69a66a9153829b8df537
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
The 'nova.exception_wrapper.wrap_exception' decorator accepted either a
pre-configured notifier or a 'get_notifier' function, but the forget was
never provided and the latter was consistently a notifier created via a
call to 'nova.rpc.get_notifier'. Simplify things by passing the
arguments relied by 'get_notifier' into 'wrap_exception', allowing the
latter to create the former for us.
While doing this rework, it became obvious that 'get_notifier' accepted
a 'published_id' that is never provided nowadays, so that is dropped. In
addition, a number of calls to 'get_notifier' were passing in
'host=CONF.host', which duplicated the default value for this parameter
and is therefore unnecessary. Finally, the unit tests are split up by
file, as they should be.
Change-Id: I89e1c13e8a0df18594593b1e80c60d177e0d9c4c
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
As we don't persist (fortunately) the requested networks when booting an
instance, we need a way to implement the value of the RequestSpec field
during any create or move operation so we would know in a later change
which port or network was asked.
Partially-Implements: blueprint routed-networks-scheduling
Change-Id: I0c7e32f6088a8fc1625a0655af824dee2df4a12c
Make update_pci_request_spec_with_allocated_interface_name only depend
on a list of IntancePCIRequest o.vos instead of a whole Instance object.
This will come in handy for the qos interface attach case where we only
need to make the changes on the Instance o.vo after we are sure that
the both the resource allocation and the pci claim is succeeded for the
request.
Change-Id: I5a6c6d3eed61895b00f9e9c3fb3b5d09d6786e9c
blueprint: support-interface-attach-with-qos-ports
This change extends the conductor manager to append the cyborg
resource request to the request spec when performing an unshelve.
On shelve offload an instance will be deleted the instance's ARQs
binding info to free up the bound ARQs in Cyborg service.
And this change passes the ARQs to spawn during unshelve an instance.
This change extends the ``shelve_instance``, ``shelve_offload_instance``
and ``unshelve_instance`` rpcapi function to carry the arq_uuids.
Co-Authored-By: Wenping Song <songwenping@inspur.com>
Implements: blueprint cyborg-shelve-and-unshelve
Change-Id: I258df4d77f6d86df1d867a8fe27360731c21d237
Replace six.text_type with str.
A subsequent patch will replace other six.text_type.
Change-Id: I23bb9e539d08f5c6202909054c2dd49b6c7a7a0e
Implements: blueprint six-removal
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
To support move operations with qos ports both the source and the
destination compute hosts need to be on Ussuri level. We have service
level checks implemented in Ussuri. In Victoria we could remove those
checks as nova only supports compatibility between N and N-1 computes.
But we kept them there just for extra safety. In the meanwhile we
codified [1] the rule that nova does not support N-2 computes any
more. So in Wallaby we can assume that the oldest compute is already
on Victoria (Ussuri would be enough too).
So this patch removes the unnecessary service level checks and related
test cases.
[1] Ie15ec8299ae52ae8f5334d591ed3944e9585cf71
Change-Id: I14177e35b9d6d27d49e092604bf0f288cd05f57e
setattr kills discoverability, making it hard to figure out who's
setting various fields. Don't do it.
While we're here, we drop legacy compat handlers for pre-Train
compute nodes.
Change-Id: Ie694a80e89f99c8d3e326eebb4590d93c0ebf671
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Cross-cell resize is confusing. We need to set this information ahead of
time.
Change-Id: I5a403c072b9f03074882b552e1925f22cb5b15b6
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Partial-Bug: #1879878
The 'ResourceTracker.update_available_resource' periodic task builds
usage information for the current host by inspecting instances and
in-progress migrations, combining the two. Specifically, it finds all
instances that are not in the 'DELETED' or 'SHELVED_OFFLOADED' state,
calculates the usage from these, then finds all in-progress migrations
for the host that don't have an associated instance (to prevent double
accounting) and includes the usage for these.
In addition to the periodic task, the 'ResourceTracker' class has a
number of helper functions to make or drop claims for the inventory
generated by the 'update_available_resource' periodic task as part of
the various instance operations. These helpers naturally assume that
when making a claim for a particular instance or migration, there
shouldn't already be resources allocated for same. Conversely, when
dropping claims, the resources should currently be allocated. However,
the check for *active* instances and *in-progress* migrations in the
periodic task means we have to be careful in how we make changes to a
given instance or migration record. Running the periodic task between
such an operation and an attempt to make or drop a claim can result in
TOCTOU-like races.
This generally isn't an issue: we use the 'COMPUTE_RESOURCE_SEMAPHORE'
semaphore to prevent the periodic task running while we're claiming
resources in helpers like 'ResourceTracker.instance_claim' and we make
our changes to the instances and migrations within this context. There
is one exception though: the 'drop_move_claim' helper. This function is
used when dropping a claim for either a cold migration, a resize or a
live migration, and will drop usage from either the source host (based
on the "old" flavor) for a resize confirm or the destination host (based
on the "new" flavor) for a resize revert or live migration rollback.
Unfortunately, while the function itself is wrapped in the semaphore, no
changes to the state or the instance or migration in question are
protected by it.
Consider the confirm resize case, which we're addressing here. If we
mark the migration as 'confirmed' before running 'drop_move_claim', then
the periodic task running between these steps will not account for the
usage on the source since the migration is allegedly 'confirmed'. The
call to 'drop_move_claim' will then result in the tracker dropping usage
that we're no longer accounting for. This "set migration status before
dropping usage" is the current behaviour for both same-cell and
cross-cell resize, via the 'ComputeManager.confirm_resize' and
'ComputeManager.confirm_snapshot_based_resize_at_source' functions,
respectively. We could reverse those calls and run 'drop_move_claim'
before marking the migration as 'confirmed', but while our usage will be
momentarily correct, the periodic task running between these steps will
re-add the usage we just dropped since the migration isn't yet
'confirmed'. The correct solution is to close this gap between setting
the migration status and dropping the move claim to zero. We do this by
putting both operations behind the 'COMPUTE_RESOURCE_SEMAPHORE', just
like the claim operations.
Change-Id: I26b050c402f5721fc490126e9becb643af9279b4
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Partial-Bug: #1879878
This change extends the conductor manager
to append the cyborg resource request to the
request spec when performing an evacuate.
This change passes the ARQs to spawn during rebuild
and evacuate. On evacuate the existing ARQs will be deleted
and new ARQs will be created and bound, during rebuild the
existing ARQs are reused.
This change extends the rebuild_instance compute rpcapi
function to carry the arq_uuids. This eliminates the
need to lookup the uuids associated with the arqs assinged
to the instance by quering cyborg.
Co-Authored-By: Wenping Song <songwenping@inspur.com>
Co-Authored-By: Brin Zhang <zhangbailin@inspur.com>
Implements: blueprint cyborg-rebuild-and-evacuate
Change-Id: I147bf4d95e6d86ff1f967a8ce37260730f21d236
During the reivew of the cyborg series it was noted that
in some cases ARQs could be leaked during binding.
See https://review.opendev.org/#/c/673735/46/nova/conductor/manager.py@1632
This change adds a delete_arqs_by_uuid function that can delete
unbound ARQs by instance uuid.
This change modifies build_instances and schedule_and_build_instances
to handel the AcceleratorRequestBindingFailed exception raised when
binding fails and clean up instance arqs.
Co-Authored-By: Wenping Song <songwenping@inspur.com>
Closes-Bug: #1872730
Change-Id: I86c2f00e2368fe02211175e7328b2cd9c0ebf41b
Blueprint: nova-cyborg-interaction
We use these things many places in the code and it would be good to have
constants to reference. Do just that.
Note that this results in a change in the object hash. However, there
are no actual changes in the output object so that's okay.
Change-Id: If02567ce0a3431dda5b2bf6d398bbf7cc954eed0
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
1. Check if the cluster supports live migration with vpmem
2. On source host we generate new dest xml with vpmem info stored in
migration_context.new_resources.
3. If there are vpmems, cleanup them on host/destination when live
migration succeeds/fails
Change-Id: I5c346e690148678a2f0dc63f4f516a944c3db8cd
Implements: blueprint support-live-migration-with-virtual-persistent-memory
1. Claim allocations from placement first, then claim specific
resources in Resource Tracker on destination to populate
migration_context.new_resources
3. cleanup specific resources when live migration succeeds/fails
Because we store specific resources in migration_context during
live migration, to ensure cleanup correctly we can't drop
migration_context before cleanup is complete:
a) when post live migration, we move source host cleanup before
destination cleanup(post_live_migration_at_destination will
apply migration_context and drop it)
b) when rollback live migration, we drop migration_context after
rollback operations are complete
For different specific resource, we might need driver specific support,
such as vpmem. This change just ensures that new claimed specific
resources are populated to migration_context and migration_context is not
droped before cleanup is complete.
Change-Id: I44ad826f0edb39d770bb3201c675dff78154cbf3
Implements: blueprint support-live-migration-with-virtual-persistent-memory