When resize instance, the flavors returned may not meet the image
minimum memory requirement, resizing instance ignores the minimum
memory limit of the image, which may cause the resizing be
successfully, but the instance fails to start because the memory is
too small to run the system.
Related-Bug: 2007968
Change-Id: I132e444eedc10b950a2fc9ed259cd6d9aa9bed65
RDP console was only for HyperV driver so removing the
API. As API url stay same (because same used for other
console types API), RDP console API will return 400.
Cleaning up the related config options as well as moving its
API ref to obsolete seciton.
Keeping RPC method to avoid error when old controller is used
with new compute. It can be removed in next RPC version bump.
Change-Id: I8f5755009da4af0d12bda096d7a8e85fd41e1a8c
This chnage adds the pre-commit config and
tox targets to run codespell both indepenetly
and via the pep8 target.
This change correct all the final typos in the
codebase as detected by codespell.
Change-Id: Ic4fb5b3a5559bc3c43aca0a39edc0885da58eaa2
This bumps the version of flake8 and resolves some erroneous failures in
f-strings. A number of new E721 (do not compare types) class errors are
picked up, which are all addressed.
Change-Id: I7a1937b107ff3af8d1e5fe23fc32b120ef4697f7
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
When we added the all_cells flag to this we just kinda hacked it
into place, leaving a big chunk of the method nested inside a
conditional. This refactors out that chunk into a helper, and also
corrects a naming error that was very confusing when reading the code
(a variable named "service" which was a list of services).
Change-Id: I41ff076864dce9ed826922f6609536ea4545a181
While debugging a field issue recently, we determined that computes
had been pointed at cell0 and created service and node records there.
This makes us warn during service list if we find compute services
in cell0 to tip off operators that they have a configuration problem.
Change-Id: Id95c0d02cc34348623b01997fcd1930628d48ccc
this mainly fixes typos in the tests and
one type in an exception message.
some addtional items are added to the dict based on
our usage of vars in test but we could remove them later
by doing minor test updates. They are intentionally not
fixed in this commit to limit scope creep.
Change-Id: Iacfbb0a5dc8ffb0857219c8d7c7a7d6e188f5980
When [quota]count_usage_from_placement = true or
[quota]driver = nova.quota.UnifiedLimitsDriver, cores and ram quota
usage are counted from placement. When an instance is SHELVED_OFFLOADED,
it will not have allocations in placement, so its cores and ram should
not count against quota during that time.
This means however that when an instance is unshelved, there is a
possibility of going over quota if the cores and ram it needs were
allocated by some other instance(s) while it was SHELVED_OFFLOADED.
This fixes a bug where quota was not being properly enforced during
unshelve of a SHELVED_OFFLOADED instance when quota usage is counted
from placement. Test coverage is also added for the "recheck" quota
cases.
Closes-Bug: #2003991
Change-Id: I4ab97626c10052c7af9934a80ff8db9ddab82738
Related to the bp/allowing-target-state-for-evacuate. This change
is extending compute API to accept a new argument targetState.
The targetState argument when set will force state of an evacuated
instance to the destination host.
Signed-off-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@industrialdiscipline.com>
Change-Id: I9660d42937ad62d647afc6be965f166cc5631392
This adds a microversion and API support for triggering a rebuild
of volume-backed instances by leveraging cinder functionality to
do so.
Implements: blueprint volume-backed-server-rebuild
Closes-Bug: #1482040
Co-Authored-By: Rajat Dhasmana <rajatdhasmana@gmail.com>
Change-Id: I211ad6b8aa7856eb94bfd40e4fdb7376a7f5c358
As of now, when attempting to rescue a volume-based instance
using an image without the hw_rescue_device and/or hw_rescue_bus
properties set, the rescue api call fails (as non-stable rescue
for volume-based instances are not supported) leaving the instance
in error state.
This change checks for hw_rescue_device/hw_rescue_bus image
properties before attempting to rescue and if the property
is not set, then fail with proper error message, without changing
instance state.
Related-Bug: #1978958
Closes-Bug: #1926601
Change-Id: Id4c8c5f3b32985ac7d3d7c833b82e0876f7367c1
This change append vnic-type vdpa to the list
of passthough vnic types and removes the api blocks
This should enable the existing suspend and live migrate
code to properly manage vdpa interfaces enabling
"hot plug" live migrations similar to direct sr-iov.
Implements: blueprint vdpa-suspend-detach-and-live-migrate
Change-Id: I878a9609ce0d84f7e3c2fef99e369b34d627a0df
This change extends the guest xml parsing such that
the source device path can be extreacted from interface
elements of type vdpa.
This is required to identify the interface to remove when
detaching a vdpa port from a domain.
This change fixes a latent bug in the libvirt fixutre
related to the domain xml generation for vdpa interfaces.
Change-Id: I5f41170e7038f4b872066de4b1ad509113034960
We have many places where we implement singleton behavior for the
placement client. This unifies them into a single place and
implementation. Not only does this DRY things up, but may cause us
to initialize it fewer times and also allows for emitting a common
set of error messages about expected failures for better
troubleshooting.
Change-Id: Iab8a791f64323f996e1d6e6d5a7e7a7c34eb4fb3
Related-Bug: #1846820
This change adds functional test for operations on servers with VDPA
devices that are expected to work but currently blocked due to lack
of testing or qemu bugs.
cold-migrate, resize, evacuate,and shelve are enabled
and tested by this patch
Closes-Bug: #1970467
Change-Id: I6e220cf3231670d156632e075fcf7701df744773
Nova uses the RequestSpec.pci_request in the PciPassthroughFilter to
decide if the PCI devicesm, requested via the pci_alias in the flavor
extra_spec, are available on a potential target host. During resize the
new flavor might contain different pci_alias request than the old flavor
of the instance. In this case Nova should use the pci_alias from the new
flavor to scheduler the destination host of the resize. However this
logic was missing and Nova used the old pci_request value based on the
old flavor. This patch adds the missing logic.
Closes-Bug: #1983753
Closes-Bug: #1941005
Change-Id: I73c9ae27e9c42ee211a53bed3d849650b65f08be
This change starts the process of wiring up the new ephemeral encryption
control mechanisims in the compute layer. This initial step being to
ensure the BlockDeviceMapping objects are correctly updated with the
required ephemeral encryption details when requested through the
instance flavor extra specs or image metadata properties.
Change-Id: Id49cb238f7bbf2b97f018ddbe090ebdc08d762dc
As agreed in the spec, we will both drop the generation support for a keypair
but we'll also accept @ (at) and . (dot) chars in the keyname, all of them in
the same API microversion.
Rebased the work from I5de15935e83823afa545a250cf84f6a7a37036b4
APIImpact
Implements: blueprint keypair-generation-removal
Co-Authored-By: Nicolas Parquet <nicolas.parquet@gandi.net>
Change-Id: I6a7c71fb4385348c87067543d0454f302907395e
This adds support to the REST API, in a new microversion, for specifying
a destination host to unshelve server action when the server
is shelved offloaded.
This patch also supports the ability to unpin the availability_zone of an
instance that is bound to it.
Note that the functional test changes are due to those tests using the
"latest" microversion 2.91.
Implements: blueprint unshelve-to-host
Change-Id: I9e95428c208582741e6cd99bd3260d6742fcc6b7
This patch introduce changes to the compute API that will allow
PROJECT_ADMIN to unshelve an shelved offloaded server to a specific host.
This patch also supports the ability to unpin the availability_zone of an
instance that is bound to it.
Implements: blueprint unshelve-to-host
Change-Id: Ieb4766fdd88c469574fad823e05fe401537cdc30
When getting an instance using the compute.API we call
scatter_gather_single_cell() to be able to capture details when we fail
to retrieve a result from a cell such as timeouts and exceptions.
Currently however, we aren't logging the content of an exception if
scatter_gather_single_cell() returns an exception as the result. The
scatter gather method itself logs exceptions that are not of type
NovaException as these represent definite unexpected errors such as
database errors but NovaException handling are left for the caller to
decide whether they want to log it or re-raise it and so on.
It can be difficult to debug a situation where a cell is returning a
NovaException result so this adds logging of the exception content in
the compute API when we encounter an unexpected NovaException.
The existing log message has been updated to more accurately reflect
what has happened (did not respond vs exception). The assignment of the
exception object in scatter gather has also been updated to not
unnecessarily construct a new exception object because it (a) wasn't
necessary and (b) made asserting the LOG.exception() call argument in
the unit test difficult.
Related-Bug: #1970087
Change-Id: Iae1c61c72be5b6017b934293e3dc079a24eeb0e7
We now enforce limits on resources requested in the flavor.
This includes: instances, ram, cores. It also works for any resource
class being requested via the flavor chosen, such as custom resource
classes relating to Ironic resources.
Note because disk resources can be limited, we need to know if the
instance is boot from volume or not. This has meant adding extra code to
make sure we know that when enforcing the limits.
Follow on patches will update the APIs to accurately report the limits
being applied to instances, ram and cores.
blueprint unified-limits-nova
Change-Id: If1df93400dcbcb1d3aac0ade80ae5ecf6ce38d11
When using unified limits, we add enforcement of those limits on all
related API calls. Note: we do not yet correctly report the configured
limits to users via the quota APIs, that is in a future patch.
Note the unified limits calls are made alongside the existing legacy
quota calls. The old quota calls will be handed by the quota engine
driver, that is basically a no-op. This is to make it easier to remove
the legacy code paths in the future.
Note, over quota exceptions raised with unified limits use the standard
(improved) exception message as those raised by oslo.limit. They
however do use the existing exception code to ease integration. The
user of the API will see the same return codes, no matter which code is
enabled to enforce the limits.
Finally, this also adds test coverage where it was missing. Coverage
for "quota recheck" behavior in KeypairAPI is added where all other
KeypairAPI testing is located. Duplicate coverage is removed from
nova/api/openstack/compute/test_keypairs.py at the same time.
blueprint unified-limits-nova
Change-Id: I36e82a17579158063396d7e55b495ccff4959ceb
When trying to attach a volume to an already running instance the nova-api
requests the nova-compute service to create a BlockDeviceMapping. If the
nova-api does not receive a response within `rpc_response_timeout` it will
treat the request as failed and raise an exception.
There are multiple cases where nova-compute actually already processed the
request and just the reply did not reach the nova-api in time (see bug report).
After the failed request the database will contain a BlockDeviceMapping entry
for the volume + instance combination that will never be cleaned up again.
This entry also causes the nova-api to reject all future attachments of this
volume to this instance (as it assumes it is already attached).
To work around this we check if a BlockDeviceMapping has already been created
when we see a messaging timeout. If this is the case we can safely delete it
as the compute node has already finished processing and we will no longer pick
it up.
This allows users to try the request again.
A previous fix was abandoned but without a clear reason ([1]).
[1]: https://review.opendev.org/c/openstack/nova/+/731804
Closes-Bug: 1960401
Change-Id: I17f4d7d2cb129c4ec1479cc4e5d723da75d3a527
Allow instances to be created with VNIC_TYPE_REMOTE_MANAGED ports.
Those ports are assumed to require remote-managed PCI devices which
means that operators need to tag those as "remote_managed" in the PCI
whitelist if this is the case (there is no meta information or standard
means of querying this information).
The following changes are introduced:
* Handling for VNIC_TYPE_REMOTE_MANAGED ports during allocation of
resources for instance creation (remote_managed == true in
InstancePciRequests);
* Usage of the noop os-vif plugin for VNIC_TYPE_REMOTE_MANAGED ports
in order to avoid the invocation of the local representor plugging
logic since a networking backend is responsible for that in this
case;
* Expectation of bind time events for ports of VNIC_TYPE_REMOTE_MANAGED.
Events for those arrive early from Neutron after a port update (before
Nova begins to wait in the virt driver code, therefore, Nova is set
to avoid waiting for plug events for VNIC_TYPE_REMOTE_MANAGED ports;
* Making sure the service version is high enough on all compute services
before creating instances with ports that have VNIC type
VNIC_TYPE_REMOTE_MANAGED. Network requests are examined for the presence
of port ids to determine the VNIC type via Neutron API. If
remote-managed ports are requested, a compute service version check
is performed across all cells.
Change-Id: Ica09376951d49bc60ce6e33147477e4fa38b9482
Implements: blueprint integration-with-off-path-network-backends
For some reason, we have two lineages of quota-related exceptions in
Nova. We have QuotaError (which sounds like an actual error), from
which all of our case-specific "over quota" exceptions inhert, such
as KeypairLimitExceeded, etc. In contrast, we have OverQuota which
lives outside that hierarchy and is unrelated. In a number of places,
we raise one and translate to the other, or raise the generic
QuotaError to signal an overquota situation, instead of OverQuota.
This leads to places where we have to catch both, signaling the same
over quota situation, but looking like there could be two different
causes (i.e. an error and being over quota).
This joins the two cases, by putting OverQuota at the top of the
hierarchy of specific exceptions and removing QuotaError. The latter
was only used in a few situations, so this isn't actually much change.
Cleaning this up will help with the unified limits work, reducing the
number of potential exceptions that mean the same thing.
Related to blueprint bp/unified-limits-nova
Change-Id: I17a3e20b8be98f9fb1a04b91fcf1237d67165871
Virtually all of the code for parsing 'hw:'-prefixed extra specs and
'hw_'-prefix image metadata properties lives in the 'nova.virt.hardware'
module. It makes sense for these to be included there. Do that.
Change-Id: I1fabdf1827af597f9e5fdb40d5aef244024dd015
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
This mirrors the 'hw_vif_multiqueue_enabled' image metadata property.
Providing a way to set this via flavor extra specs allows admins to
enable this by default and easily enable it for existing instances
without the need to rebuild (a destructive operation).
Note that, in theory at least, the image import workflow provided by
glance should allows admins to enable this by default, but the legacy
image create workflow does not allow this and admins cannot really
control which API end users use when uploading their own images.
Also note that we could provide this behavior using a host-level
configuration option. This would be similar to what we do for other
attributes such as machine type ('hw_machine_type' image meta prop or
'[libvirt] hw_machine_type' config option) or pointer model
('hw_pointer_model' image meta prop or '[compute] pointer_model' config
option) and would be well suited to things that we don't expect to
change, such as enabling multiqueue (it's a sensible default). However,
we would need to start storing this information in system_metadata, like
we do for machine type (since Wallaby) to prevent things changing over
live migration. We have also started avoiding host-level config options
for things like this since one must ensure that the value configured are
consistent across deployments to behavior that varies depending on the
host the guest is initially created on.
Change-Id: I405d0324abe32b31a434105cf2c104876fe9c127
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
The patch I03cf285ad83e09d88cdb702a88dfed53c01610f8 fixed most of the
possible cases for this to happen but missed one. An early enough
exception during _delete() can cause that the instance_uuid never gets
defined but then we try to use it during the finally block. This patch
moves the saving of the instance_uuid to the top of the try block to
avoid the issue.
Change-Id: Ib3073d7f595c8927532b7c49fc7e5ffe80d508b9
Closes-Bug: #1940812
Related-Bug: #1914777
The port.resource_request field is admin only. Nova depends on the
value of this field to do a proper scheduling and resource allocation
and deallocation for ports with resource request as well as to update
the port.binding:profile.allocation field with the resource providers
the requested resources are fulfilled from. However in some cases nova
does not use a neutron admin client / elevated context to read the
port. In this case neutron returns None for the port.resource_request
field and nova thinks that the port has no resource request.
This patch fixes all three places where previous testing showed that
context elevation was missing.
Change-Id: Icb35e20179572fb713a397b4605312cf3294b41b
Closes-Bug: #1945310
The interface attach and detach logic is now fully adapted to the new
extended resource request format, and supports more than one request
group in a single port.
blueprint: qos-minimum-guaranteed-packet-rate
Change-Id: I73e6acf5adfffa9203efa3374671ec18f4ea79eb
Nova re-generates the resource request of an instance for each server
move operation (migrate, resize, evacuate, live-migrate, unshelve) to
find (or validate) a target host for the instance move. This patch
extends the this logic to support the extended resource request from
neutron.
As the changes in the neutron interface code is called from nova-compute
service during the port binding the compute service version is bumped.
And a check is added to the compute-api to reject the move operations
with ports having extended resource request if there are old computes
in the cluster.
blueprint: qos-minimum-guaranteed-packet-rate
Change-Id: Ibcf703e254e720b9a6de17527325758676628d48
This adds the final missing pieces to support creating servers with
ports having extended resource request. As the changes in the neutron
interface code is called from nova-compute service during the port
binding the compute service version is bumped. And a check is added to
the compute-api to reject such server create requests if there are old
computes in the cluster.
Note that some of the negative and SRIOV related interface attach
tests are also started to pass as they are not dependent on any of the
interface attach specific implementation. Still interface attach is
broken here as the failing of the positive tests show.
blueprint: qos-minimum-guaranteed-packet-rate
Change-Id: I9060cc9cb9e0d5de641ade78c5fd7e1cc77ade46
The new format of the resource_request field of the Neutron port allows
expressing not just request groups but also request global parameters
for the allocation candidate query. This patch adapts the neutron client
in nova to parse such parameters. Then transfer this information to the
scheduler to include it in the allocation candidate request.
It relies on previous patches that already extended the
RequestLevelParams ovo and the allocation candidate query generation.
Change-Id: Icb91f6429050a161f577d0ed94d4cd906d3da461
blueprint: qos-minimum-guaranteed-packet-rate