With Libvirt v8.7.0+, the <maxphysaddr> sub-element
of the <cpu> element specifies the number of vCPU
physical address bits [1].
[1] https://libvirt.org/news.html#v8-7-0-2022-09-01
New flavor extra_specs and image properties are added to
control the physical address bits of vCPUs in Libvirt guests.
The nova-scheduler requests COMPUTE_ADDRESS_SPACE_* traits
based on them. The traits are already defined in os-traits
v2.10.0. Also numerical comparisons are performed at
both compute capabilities filter and image props filter.
blueprint: libvirt-maxphysaddr-support-caracal
Change-Id: I98968f6ef1621c9fb4f682c119038e26d62ce381
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
This chnage adds the pre-commit config and
tox targets to run codespell both indepenetly
and via the pep8 target.
This change correct all the final typos in the
codebase as detected by codespell.
Change-Id: Ic4fb5b3a5559bc3c43aca0a39edc0885da58eaa2
this is the inital patch of applying codespell to nova.
codespell is a programing focused spellchecker that
looks for common typos and corrects them.
i am breaking this into multiple commits to make it simpler
to read and will automate the execution of codespell
at the end of the series.
Change-Id: If24a6c0a890f713545faa2d44b069c352655274e
Despite having a NumInstancesFilter, we miss a weigher that would classify hosts
based on their instance usage.
Change-Id: Id232c2caf29d3443c61c0329d573a34a7481fd57
Implements-Blueprint: bp/num-instances-weigher
This change remvoes the az filter and always enabled
the placement pre-filter. As part of this removal
the config option to control enabling the pre-filter
is removed as it is now mandatory.
The AZ filter docs and tests are also removed and an upgrade
release note is added.
Depends-On: https://review.opendev.org/c/openstack/devstack/+/886972
Change-Id: Icc8580835beb2b4d40341f81c25eb1f024e70ade
Hacking has bumped the version of flake8 that it's using to 5.0 in its
6.0.1 release. This turns up quite a few pep8 errors lurking in our
code. Fix them.
Needed-by: https://review.opendev.org/c/openstack/hacking/+/874516
Change-Id: I3b9c7f9f5de757f818ec358c992ffb0e5f3e310f
Current versions of mypy run with no-implicit-optional
by default. This change gets Nova's mypy test environment
to pass again.
Change-Id: Ie50c8d364ad9c339355cc138b560ec4df14fe307
Like we did for conductor, this makes the scheduler lazy-load the
placement client instead of only doing it during __init__. This avoids
a startup crash if keystone or placement are not available, but
retains startup failures for other problems and errors likely to be
a result of misconfigurations.
Closes-Bug: #2012530
Change-Id: I42ed876b84d80536e83d9ae01696b0a64299c9f7
Ran into this randomly today, if a test sets
CONF.scheduler.enabled_filters to a non-default value, the affinity
support global variables will be set to False which can affect
subsequent test runs that expect the default configuration (affinity
filter support enabled).
Example error:
WARNING [nova.scheduler.utils] Failed to
compute_task_build_instances: ServerGroup policy is not supported:
ServerGroupAffinityFilter not configured
This resets the global variables during base test setup, similar to how
other globals are reset.
Change-Id: Icbc75b1001c0a609280241f99a780313b244aa9d
The stats module is used to decide if the InstancePCIRequests of a boot
request can fit to a given compute host and to decide which PCI device
pool can fulfill the requests. It is used both during scheduling and also
during the PCI claim code.
PCI devices now modelled in placement and the allocation_candidate query
now requests PCI resources therefore each allocation candidate returned
from placement already restricts which PCI devices can be used during
the PciPassthroughFilter, the NumaTopologyFilter, and PCI claim code
paths. This patch adapts the stats module to consider the PCI
allocation candidate or the already made placement PCI allocation when
filtering the PCI device pools.
blueprint: pci-device-tracking-in-placement
Change-Id: If363981c4aeeb09a96ee94b140070d3d0f6af48f
This patch extends the HostState object with an allocation_candidates
list populated by the scheduler manager. Also this changes the generic
scheduler logic to allocate the candidate of the selected host based on
the candidates in the host state.
So after this patch scheduler filters can be extended to filter the
allocation_candidates list of the HostState object while processing a
host and restrict which candidate can be allocated if the host passes
the all the filters. Potentially all candidates can be removed by
multiple consecutive filters making the host as a non viable scheduling
target.
blueprint: pci-device-tracking-in-placement
Change-Id: Id0afff271d345a94aa83fc886e9c3231c3ff2570
Fixed various small issues from the already merged (or being merged)
patches.
The logic behind the dropped FIXME and empty condition are handled in
the update_allocations() calls of the translator already.
The removal of excutils.save_and_reraise_exception from the report
client is safe as self._clear_provider_cache_for_tree does not raise.
blueprint: pci-device-tracking-in-placement
Change-Id: If87dedc6a14f7b116c4238e7534b67574428c01c
* Add iommu model trait for viommu model
* Add ``hw_viommu_model`` to request_filter, this will extend the
transform_image_metadata prefilter to select host with the correct model.
* Provide new compute ``COMPUTE_VIOMMU_MODEL_*`` capablity trait for each model
it supports in driver.
Implements: blueprint libvirt-viommu-device
Depends-On: https://review.opendev.org/c/openstack/os-traits/+/844336
Change-Id: I2caa1a9c473a2b11c287061280b4a78edb96f859
PCI devices which are allocated to instances can be removed from the
[pci]device_spec configuration or can be removed from the hypervisor
directly. The existing PciTracker code handle this cases by keeping the
PciDevice in the nova DB exists and allocated and issue a warning in the
logs during the compute service startup that nova is in an inconsistent
state. Similar behavior is now added to the PCI placement tracking code
so the PCI inventories and allocations in placement is kept in such
situation.
There is one case where we cannot simply accept the PCI device
reconfiguration by keeping the existing allocations and applying the new
config. It is when a PF that is configured and allocated is removed and
VFs from this PF is now configured in the [pci]device_spec. And vice
versa when VFs are removed and its parent PF is configured. In this case
keeping the existing inventory and allocations and adding the new inventory
to placement would result in placement model where a single PCI device
would provide both PF and VF inventories. This dependent device
configuration is not supported as it could lead to double consumption.
In such situation the compute service will refuse to start.
blueprint: pci-device-tracking-in-placement
Change-Id: Id130893de650cc2d38953cea7cf9f53af71ced93
During a normal update_available_resources run if the local provider
tree caches is invalid (i.e. due to the scheduler made an allocation
bumping the generation of the RPs) and the virt driver try to update the
inventory of an RP based on the cache Placement will report conflict,
the report client will invalidate the caches and the retry decorator
on ResourceTracker._update_to_placement will re-drive the top of the
fresh RP data.
However the same thing can happen during reshape as well but the retry
mechanism is missing in that code path so the stale caches can cause
reshape failures.
This patch adds specific error handling in the reshape code path to
implement the same retry mechanism as exists for inventory update.
blueprint: pci-device-tracking-in-placement
Change-Id: Ieb954a04e6aba827611765f7f401124a1fe298f3
We have many places where we implement singleton behavior for the
placement client. This unifies them into a single place and
implementation. Not only does this DRY things up, but may cause us
to initialize it fewer times and also allows for emitting a common
set of error messages about expected failures for better
troubleshooting.
Change-Id: Iab8a791f64323f996e1d6e6d5a7e7a7c34eb4fb3
Related-Bug: #1846820
This change removes return statement from rpc cast method calls.
As rpc cast are asynchronous, so doesn't return anything.
Change-Id: I766f64f2c086dd652bc28b338320cc94ccc48f1f
host arch in libvirt driver support
This is split 2 of 3 for the architecture emulation feature.
This implements emulated multi-architecture support through qemu
within OpenStack Nova.
Additional config variable check to pull host architecture into
hw_architecture field for emulation checks to be made.
Adds a custom function that simply performs a check for
hw_emulation_architecture field being set, allowing for core code to
function as normal while enabling a simple check to enable emulated
architectures to follow the same path as all multi-arch support
already established for physical nodes but instead levaraging qemu
which allows for the overall emulation.
Added check for domain xml unit test to strip arch from the os tag,
as it is not required uefi checks, and only leveraged for emulation
checks.
Added additional test cases test_driver validating emulation
functionality with checking hw_emulation_architecture against the
os_arch/hw_architecture field. Added required os-traits and settings
for scheduler request_filter.
Added RISCV64 to architecture enum for better support in driver.
Implements: blueprint pick-guest-arch-based-on-host-arch-in-libvirt-driver
Closes-Bug: 1863728
Change-Id: Ia070a29186c6123cf51e1b17373c2dc69676ae7c
Signed-off-by: Jonathan Race <jrace@augusta.edu>
A follow on patch will use this code to enforce the limits, this patch
provides integration with oslo.limit and a new internal nova API that is
able to enforce those limits.
The first part is providing a callback for oslo.limit to be able to count
the resources being used. We only count resources grouped by project_id.
For counting servers, we make use of the instance mappings list in the
api database, just as the existing quota code does. While we do check to
ensure the queued for delete migration has been completed, we simply
error out if that is not the case, rather than attempting to fallback to
any other counting system. We hope one day we can count this in
placement using consumer records, or similar.
For counting all other resource usage, they must refer to some usage
relating to a resource class being consumed in placement. This is similar
to how the count with placement variant of the existing placement code
works today. This is not restricted to RAM and VCPU, it is open to any
resource class that is known to placement.
The second part is the enforcement method, that keeps a similar
signature to the existing enforce_num_instnaces call that is use to
check quotas using the legacy quota system.
From the flavor we extract the current resource usage. This is
considered the simplest first step that helps us deliver Ironic limits
alongside all the existing RAM and VCPU limits. At a later date, we
would ideally get passed a more complete view of what resources are
being requested from placement.
NOTE: given the instance object doesn't exist when enforce is called, we
can't just pass the instance into here.
A [workarounds] option is also available for operators who need the
legacy quota usage behavior where VCPU = VCPU + PCPU.
blueprint unified-limits-nova
Change-Id: I272b59b7bc8975bfd602640789f80d2d5f7ee698
Add a pre-filter for requests that contain VNIC_TYPE_REMOTE_MANAGED
ports in them: hosts that do not have either the relevant compute
driver capability COMPUTE_REMOTE_MANAGED_PORTS or PCI device pools
with "remote_managed" devices are filtered out early. Presence of
devices actually available for allocation is checked at a later
point by the PciPassthroughFilter.
Change-Id: I168d3ccc914f25a3d4255c9b319ee6b91a2f66e2
Implements: blueprint integration-with-off-path-network-backends
autopep8 is a code formating tool that makes python code pep8
compliant without changing everything. Unlike black it will
not radically change all code and the primary change to the
existing codebase is adding a new line after class level doc strings.
This change adds a new tox autopep8 env to manually run it on your
code before you submit a patch, it also adds autopep8 to pre-commit
so if you use pre-commit it will do it for you automatically.
This change runs autopep8 in diff mode with --exit-code in the pep8
tox env so it will fail if autopep8 would modify your code if run
in in-place mode. This allows use to gate on autopep8 not modifying
patches that are submited. This will ensure authorship of patches is
maintianed.
The intent of this change is to save the large amount of time we spend
on ensuring style guidlines are followed automatically to make it
simpler for both new and old contibutors to work on nova and save
time and effort for all involved.
Change-Id: Idd618d634cc70ae8d58fab32f322e75bfabefb9d
When a compute node goes down and all instances on the compute node
are evacuated, allocation records about these instance are still left
in the source compute node until nova-compute service is again started
on the node. However if a compute node is completely broken, it is not
possible to start the service again.
In this situation deleting nova-compute service for the compute node
doesn't delete its resource provider record, and even if a user tries
to delete the resource provider, the delete request is rejected because
allocations are still left on that node.
This change ensures that remaining allocations left by successful
evacuations are cleared when deleting a nova-compute service, to avoid
any resource provider record left even if a compute node can't be
recovered. Migration records are still left in 'done' status to trigger
clean-up tasks in case the compute node is recovered later.
Closes-Bug: #1829479
Change-Id: I3ce6f6275bfe09d43718c3a491b3991a804027bd
The interface attach and detach logic is now fully adapted to the new
extended resource request format, and supports more than one request
group in a single port.
blueprint: qos-minimum-guaranteed-packet-rate
Change-Id: I73e6acf5adfffa9203efa3374671ec18f4ea79eb
This adds a force kwarg to delete_allocation_for_instance which
defaults to True because that was found to be the most common use case
by a significant margin during implementation of this patch.
In most cases, this method is called when we want to delete the
allocations because they should be gone, e.g. server delete, failed
build, or shelve offload. The alternative in these cases is the caller
could trap the conflict error and retry but we might as well just force
the delete in that case (it's cleaner).
When force=True, it will DELETE the consumer allocations rather than
GET and PUT with an empty allocations dict and the consumer generation
which can result in a 409 conflict from Placement. For example, bug
1836754 shows that in one tempest test that creates a server and then
immediately deletes it, we can hit a very tight window where the method
GETs the allocations and before it PUTs the empty allocations to remove
them, something changes which results in a conflict and the server
delete fails with a 409 error.
It's worth noting that delete_allocation_for_instance used to just
DELETE the allocations before Stein [1] when we started taking consumer
generations into account. There was also a related mailing list thread
[2].
Closes-Bug: #1836754
[1] I77f34788dd7ab8fdf60d668a4f76452e03cf9888
[2] http://lists.openstack.org/pipermail/openstack-dev/2018-August/133374.html
Change-Id: Ife3c7a5a95c5d707983ab33fd2fbfc1cfb72f676
There's only one driver now, which means there isn't really a driver at
all. Move the code into the manager altogether and avoid a useless layer
of abstraction.
Change-Id: I609df5b707e05ea70c8a738701423ca751682575
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
The new format of the resource_request field of the Neutron port allows
expressing not just request groups but also request global parameters
for the allocation candidate query. This patch adapts the neutron client
in nova to parse such parameters. Then transfer this information to the
scheduler to include it in the allocation candidate request.
It relies on previous patches that already extended the
RequestLevelParams ovo and the allocation candidate query generation.
Change-Id: Icb91f6429050a161f577d0ed94d4cd906d3da461
blueprint: qos-minimum-guaranteed-packet-rate
Extend the allocation candidate query generation to include same_subtree
query parameters based on the RequestSpec.
The same_subtree parameter is available since placement microversion
1.36 but we already bumped our minimum placement requirement to 1.36
in a previous patch.
Change-Id: I7447bbb49fcee5197b176aa64fd5f2c930f70fc0
blueprint: qos-minimum-guaranteed-packet-rate
To implement the usage of same_subtree query parameter in the
allocation candidate request first the minimum requires placement
microversion needs to be bumped from 1.35 to 1.36. This patch makes such
bump and update the related nova upgrade check. Later patches will
modify the query generation to include the same_subtree param to the
request.
Change-Id: I5bfec9b9ec49e60c454d71f6fc645038504ef9ef
blueprint: qos-minimum-guaranteed-packet-rate
There is a race condition in nova-compute with the ironic virt driver
as nodes get rebalanced. It can lead to compute nodes being removed in
the DB and not repopulated. Ultimately this prevents these nodes from
being scheduled to.
The issue being addressed here is that if a compute node is deleted by a
host which thinks it is an orphan, then the resource provider for that
node might also be deleted. The compute host that owns the node might
not recreate the resource provider if it exists in the provider tree
cache.
This change fixes the issue by clearing resource providers from the
provider tree cache for which a compute node entry does not exist. Then,
when the available resource for the node is updated, the resource
providers are not found in the cache and get recreated in placement.
Change-Id: Ia53ff43e6964963cdf295604ba0fb7171389606e
Related-Bug: #1853009
Related-Bug: #1841481
There are no longer any custom filters. We don't need the abstract base
class. Merge the code in and give it a more useful 'SchedulerDriver'
name.
Change-Id: Id08dafa72d617ca85e66d50b3c91045e0e8723d0
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
This hasn't had any users since we removed the 'ChanceScheduler' in
change I44f9c1cabf9fc64b1a6903236bc88f5ed8619e9e.
Change-Id: I5c009b2cf5d64d28c706795de06c0ea1dedf054b
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Follow-up on change I64dc67e2bacd7a6c86153db5ae983dfb54bd40eb by
removing additional code paths that are no longer relevant following the
removal of the 'USES_ALLOCATION_CANDIDATES' option. This is kept
separate from the aforementioned change to help keep both changes
readable.
Change-Id: I1d2b51f5dd2ca75eb565ca5242cfdb938868bff9
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>