With Libvirt v8.7.0+, the <maxphysaddr> sub-element
of the <cpu> element specifies the number of vCPU
physical address bits [1].
[1] https://libvirt.org/news.html#v8-7-0-2022-09-01
New flavor extra_specs and image properties are added to
control the physical address bits of vCPUs in Libvirt guests.
The nova-scheduler requests COMPUTE_ADDRESS_SPACE_* traits
based on them. The traits are already defined in os-traits
v2.10.0. Also numerical comparisons are performed at
both compute capabilities filter and image props filter.
blueprint: libvirt-maxphysaddr-support-caracal
Change-Id: I98968f6ef1621c9fb4f682c119038e26d62ce381
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
This chnage adds the pre-commit config and
tox targets to run codespell both indepenetly
and via the pep8 target.
This change correct all the final typos in the
codebase as detected by codespell.
Change-Id: Ic4fb5b3a5559bc3c43aca0a39edc0885da58eaa2
The CPU power management feature of the libvirt driver, enabled with
[libvirt]cpu_power_management, only manages dedicated CPUs and does not
touch share CPUs. Today nova-compute refuses to start if configured
with [libvirt]cpu_power_management=true [compute]cpu_dedicated_set=None.
While this is functionally not limiting it does limit the possibility to
independently enable the power management and define the
cpu_dedicated_set. E.g. there might be a need to enable the former in
the whole cloud in a single step, while not all nodes of the cloud will
have dedicated CPUs configured.
This patch removes the strict config check. The implementation already
handles each PCPU individually, so if there are an empty list of PCPUs
then it does nothing.
Closes-Bug: #2043707
Change-Id: Ib070e1042c0526f5875e34fa4f0d569590ec2514
If cpu_power_management_strategy is "cpu_state" and CPU0 is in the
dedicated set, we should just ignore it whenever we go to manage the
state. Since CPU0 cannot be powered off, but may be otherwise suitable
for the dedicated set, we can just skip it whenever we would normally
go to power it up or down.
Change-Id: I995c0953b361c7016bd77482fa2e2f276d239828
Fixes-Bug: #2038840
This adds 'debug' level messages at the branches of the function that
lead to a 'False' result. These branches are:
- Physnet found affinity on a NUMA cell outside the chosen ones
- Tunneled networks found affinity on a NUMA cell outside the
chosen ones
Partial-Bug: #1751784
Change-Id: I4d45f383b3c4794f8a114047455efb764f60f2a2
Previously, in numa_usage_from_instance_numa(), any new NUMACell
objects we created did not have the `socket` attribute. In some cases
this was persisted all the way down to the database. Fix this by
copying `socket` from the old_cell.
Change-Id: I9ed3c31ccd3220b02d951fc6dbc5ea049a240a68
Closes-Bug: 1995153
Currently Nova produces ambigous error when volume-backed instance
is started using flavor with hw:mem_encryption extra_specs flag:
ImageMeta doesn't contain name if it represents Cinder volume.
This fix sligtly changes steps to get image_meta.name for
some MemEncryption-related checks where it could make any
difference.
Closes-bug: #2006952
Change-Id: Ia69e7cb18cd862f01ecfdbdc358c87af1ab8fbf6
When cpu_policy is mixed the scheduler tries to find a valid CPU pinning
for each instance NUMA cell. However if there is an instance NUMA cell
that does not request any pinned CPUs then such logic will calculate
empty pinning information for that cell. Then the scheduler logic
wrongly assumes that an empty pinning result means there was no valid
pinning. However there is difference between a None result when no valid
pinning found, from an empty result [] which means there was nothing to
pin.
This patch makes sure that pinning == None is differentiated from
pinning == [].
Closes-Bug: #1994526
Change-Id: I5a35a45abfcfbbb858a94927853777f112e73e5b
The stats module is used to decide if the InstancePCIRequests of a boot
request can fit to a given compute host and to decide which PCI device
pool can fulfill the requests. It is used both during scheduling and also
during the PCI claim code.
PCI devices now modelled in placement and the allocation_candidate query
now requests PCI resources therefore each allocation candidate returned
from placement already restricts which PCI devices can be used during
the PciPassthroughFilter, the NumaTopologyFilter, and PCI claim code
paths. This patch adapts the stats module to consider the PCI
allocation candidate or the already made placement PCI allocation when
filtering the PCI device pools.
blueprint: pci-device-tracking-in-placement
Change-Id: If363981c4aeeb09a96ee94b140070d3d0f6af48f
This change adds a new hw:locked_memory extra spec and hw_locked_memory
image property to contol preventing guest memory from swapping.
This change adds docs and extend the flavor
validators for the new extra spec.
Also add new image property.
Blueprint: libvirt-viommu-device
Change-Id: Id3779594f0078a5045031aded2ed68ee4301abbd
This change starts the process of wiring up the new ephemeral encryption
control mechanisims in the compute layer. This initial step being to
ensure the BlockDeviceMapping objects are correctly updated with the
required ephemeral encryption details when requested through the
instance flavor extra specs or image metadata properties.
Change-Id: Id49cb238f7bbf2b97f018ddbe090ebdc08d762dc
The numa_fit_instance_to_host algorithm tries all the possible
host cell permutations to fit the instance cells. So in worst case
scenario it does n! / (n-k)! _numa_fit_instance_cell calls
(n=len(host_cells) k=len(instance_cells)) to find if the instance can be
fit to the host. With 16 NUMA nodes host and 8 NUMA node guests this
means 500 million calls to _numa_fit_instance_cell. This takes excessive
time.
However going through these permutations there are many repetitive
host_cell, instance_cell pairs to try to fit.
E.g.
host_cells=[H1, H2, H2]
instance_cells=[G1, G2]
Produces pairings:
* H1 <- G1 and H2 <- G2
* H1 <- G1 and H3 <- G2
...
Here G1 is checked to fit H1 twice. But if it does not fit in the first
time then we know that it will not fit in the second time either. So we
can cache the result of the first check and use that cache for the later
permutations.
This patch adds two caches to the algo. A fit_cache to hold
host_cell.id, instance_cell.id pairs that we know fit, and a
no_fit_cache for those pairs that we already know that doesn't fit.
This change significantly boost the performance of the algorithm. The
reproduction provided in the bug 1978372 took 6 minutes on my local
machine to run without the optimization. With the optimization it run in
3 seconds.
This change increase the memory usage of the algorithm with the two
caches. Those caches are sets of integer two tuples. And the total size
of the cache is the total number of possible host_cell, instance_cell
pairs which is len(host_cell) * len(instance_cells). So form the above
example (16 host, 8 instance NUMA) it is 128 pairs of integers in the
cache. That will not cause a significant memory increase.
Closes-Bug: #1978372
Change-Id: Ibcf27d741429a239d13f0404348c61e2668b4ce4
Cells mean NUMA cells below in text.
By default, first instance's cell are placed to the host's cell with
id 0, so it will be exhausted first. Than host's cell with id 1 will
be used and exhausted. It will lead to error placing instance with
number of cells in NUMA topology equal to host's cells number if
some instances with one cell topology are placed on cell with id 0
before. Fix will perform several sorts to put less used cells at
the beginning of host_cells list based on PCI devices, memory and
cpu usage when packing_host_numa_cells_allocation_strategy is set
to False (so called 'spread strategy'), or will try to place all
VM's cell to the same host's cell untill it will be completely
exhausted and only after will start to use next available host's
cell (so called 'pack strategy'), when the configuration option
packing_host_numa_cells_allocation_strategy is set to True.
Partial-Bug: #1940668
Change-Id: I03c4db3c36a780aac19841b750ff59acd3572ec6
Virtually all of the code for parsing 'hw:'-prefixed extra specs and
'hw_'-prefix image metadata properties lives in the 'nova.virt.hardware'
module. It makes sense for these to be included there. Do that.
Change-Id: I1fabdf1827af597f9e5fdb40d5aef244024dd015
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
This mirrors the 'hw_vif_multiqueue_enabled' image metadata property.
Providing a way to set this via flavor extra specs allows admins to
enable this by default and easily enable it for existing instances
without the need to rebuild (a destructive operation).
Note that, in theory at least, the image import workflow provided by
glance should allows admins to enable this by default, but the legacy
image create workflow does not allow this and admins cannot really
control which API end users use when uploading their own images.
Also note that we could provide this behavior using a host-level
configuration option. This would be similar to what we do for other
attributes such as machine type ('hw_machine_type' image meta prop or
'[libvirt] hw_machine_type' config option) or pointer model
('hw_pointer_model' image meta prop or '[compute] pointer_model' config
option) and would be well suited to things that we don't expect to
change, such as enabling multiqueue (it's a sensible default). However,
we would need to start storing this information in system_metadata, like
we do for machine type (since Wallaby) to prevent things changing over
live migration. We have also started avoiding host-level config options
for things like this since one must ensure that the value configured are
consistent across deployments to behavior that varies depending on the
host the guest is initially created on.
Change-Id: I405d0324abe32b31a434105cf2c104876fe9c127
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Nova has never supported specifying per numa node
cpu toplogies. Logically the cpu toplogy of a guest
is independent of its numa toplogy and there is no
way to model different cpu toplogies per numa node
or implement that in hardware.
The presence of the code in nova that allowed the generation
of these invalid configuration has now been removed as it
broke the automatic selection of cpu topologies based
on hw:max_[cpus|sockets|threads] flavor and image properties.
This change removed the incorrect code and related unit
tests with assert nova could generate invalid topologies.
Closes-Bug: #1910466
Change-Id: Ia81a0fdbd950b51dbcc70c65ba492549a224ce2b
This change resolves bug #1928063 by replacing the use of
image_meta.name with image_meta.id as
I55d66c3a6cbd50da90065f4a58f77b5cd29ce9ea should ensure it is always
available. The removal of other references to image_meta.name within
virt.hardware is left for follow ups to keep this change small and
backportable.
Closes-Bug: #1928063
Change-Id: I66299e97bdb5b95e149b1780231a1c1bbdbd9865
This continues on from I81fec10535034f3a81d46713a6eda813f90561cf and
removes all other references to 'instance_type' where it's possible to
do so. The only things left are DB columns, o.vo fields, some
unversioned objects, and RPC API methods. If we want to remove these, we
can but it's a lot more work.
Change-Id: I264d6df1809d7283415e69a66a9153829b8df537
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Start parsing this property from image metadata properties and flavor
extra specs.
Blueprint: allow-secure-boot-for-qemu-kvm-guests
Change-Id: If65a7a63e9ee04270e32602682744175ba2b3a38
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Collected on my journey through these modules.
Blueprint: allow-secure-boot-for-qemu-kvm-guests
Change-Id: Iaf48399a21f31f4e0e1730b6eca3cf2accd44780
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Replace six.text_type with str.
A subsequent patch will replace other six.text_type.
Change-Id: I23bb9e539d08f5c6202909054c2dd49b6c7a7a0e
Implements: blueprint six-removal
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
If using the 'mixed' CPU policy then, by design, the instance is
consuming both "shared" and "dedicated" host CPUs. However, we were only
checking the latter. Fix this.
Change-Id: Ic21c918736e7046ad32a2b8a3330496ce42950b0
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-Bug: #1898272
I'm not sure why these weren't included in the series, but they help
prove out what this is doing.
This highlights a small "bug" in the code, whereby the topology object
associated with the instance's NUMA cell doesn't have the correct number
of CPUs. This has no adverse effects since that attribute isn't actually
used except to indicate a minimum thread count necessary for the cell,
but it is wrong as we fix it all the same.
Some tests are renamed now that we're testing more than one "legacy"
policy.
Part of blueprint use-pcpu-and-vcpu-in-one-instance
Change-Id: I30a040d549f48d53cab2c59c00bb269f821ace88
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Remove six.PY2 and six.PY3.
Subsequent patches will replace other six usages.
Change-Id: Iccce0ab50eee515e533ab36c8e7adc10cb3f7019
Implements: blueprint six-removal
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
Attempting to boot an instance with 'hw:cpu_policy=dedicated' will
result in a request from nova-scheduler to placement for allocation
candidates with $flavor.vcpu 'PCPU' inventory. Similarly, booting an
instance with 'hw:cpu_thread_policy=isolate' will result in a request
for allocation candidates with 'HW_CPU_HYPERTHREADING=forbidden', i.e.
hosts without hyperthreading. This has been the case since the
cpu-resources feature was implemented in Train. However, as part of that
work and to enable upgrades from hosts that predated Train, we also make
a second request for candidates with $flavor.vcpu 'VCPU' inventory. The
idea behind this is that old compute nodes would only report 'VCPU' and
should be useable, and any new compute nodes that got caught up in this
second request could never actually be scheduled to since there wouldn't
be enough cores from 'ComputeNode.numa_topology.cells.[*].pcpuset'
available to schedule to, resulting in rejection by the
'NUMATopologyFilter'. However, if a host was rejected in the first
query because it reported the 'HW_CPU_HYPERTHREADING' trait, it could
get picked up by the second query and would happily be scheduled to,
resulting in an instance consuming 'VCPU' inventory from a host that
properly supported 'PCPU' inventory.
The solution is simply, though also a huge hack. If we detect that the
host is using new style configuration and should be able to report
'PCPU', check if the instance asked for no hyperthreading and whether
the host has it. If all are True, reject the request.
Change-Id: Id39aaaac09585ca1a754b669351c86e234b89dd9
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-Bug: #1889633
Before, realtime CPUs could only be combined with dedicated CPUs
in a 'dedicated' policy instance. This patch supports to create
a type of instance that makes realtime CPUs be mixed with shared
CPUs under the 'mixed' CPU allocation policy.
Part of blueprint use-pcpu-and-vcpu-in-one-instance
Change-Id: Iad7864bf375341ef065bfec229a059e444c910e2
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Enable the 'hw:cpu_dedicated_mask' flavor extra spec interface, user
can create CPU mixing instance through a flavor with following
extra spec settings:
openstack flavor set <flavor_id> \
--property hw:cpu_policy=mixed \
--property hw:cpu_dedicated_mask=0-3,7
In a topic coming later, we'll introduce another way to create a
mixed instance through the real-time interface.
Part of blueprint use-pcpu-and-vcpu-in-one-instance
Change-Id: I2a3311c08a52eb11859c68ef940a0bd755a94c6b
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Add support for the 'hw:tpm_version' and 'hw:tpm_model' flavor extra
specs along with the equivalent image metadata properties. These are
picked up by the scheduler and transformed into trait requests. This is
effectively a no-op for now since we don't yet have a driver that
reports these traits.
Part of blueprint add-emulated-virtual-tpm
Change-Id: I8645c31b4ecb18afea592b2a5b360b0165626009
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
There are two changes to our XML element generation code needed for vTPM
support: the '<secret>' element needs to gain support for the 'vtpm'
usage type, and a wholly new '<tpm>' element [1].
None of this is actually used yet, outside of tests. That will come
later.
Part of blueprint add-emulated-virtual-tpm
[1] https://libvirt.org/formatsecret.html
[2] https://libvirt.org/formatdomain.html#elementsTpm
Change-Id: I8b7d2d45963160bc8acf19f36f7945c7545570b3
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
The mixed instance has two types of CPUs, the shared ones and the
dedicated ones, those CPU usages are tracked in different ways. The
shared CPU is recorded by CPU usage from sharing CPU pool, and the
dedicated CPU is already recorded in 'InstanceNUMACell.cpu_pinning'
when calling 'InstanceNUMACell.pin' method.
This patch enables the usage tracking of the shared CPUs in the
mixed instance.
Part of blueprint use-pcpu-and-vcpu-in-one-instance
Change-Id: I7a31722f1628f47126bb2014555107fffb58aec6
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
The code setting the 'dedicated_cpus' for the 'dedicated' policy
instance does not belong to the scope of CPU policy sanity check,
moving it out of the sanity check block.
Also removed a note saying 'get_dedicated_cpu_constraint' should
be public, which seems not necessary.
This is a follow-up commit in addressing comments in this topic [1].
[1] https://review.opendev.org/#/c/716267/14/
Part of blueprint use-pcpu-and-vcpu-in-one-instance
Change-Id: I3fdfde36641dccfc9de60c25779e60fb9aabba82
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Introduce a 'mixed' instance CPU allocation policy and
will be worked with upcoming patches, for purpose of
creating an instance combined shared CPUs with dedicated
or realtime CPUs.
In an instance mixed with different type of CPUs, the shared CPU
shared CPU time slots with other instances, and also might be a
CPU with less or un-guaranteed hardware resources, which implies
to have no guarantee for the behavior of the workload running on
it. If we call the shared CPU as 'low priority' CPU, then the
realtime or dedicated CPU could be called as 'high priority' CPU,
user could assign more hardware CPU resources or place some
guaranteed resource to it to let the workload to entail high
performance or stable service quality.
Based on https://review.opendev.org/714704
Part of blueprint use-pcpu-and-vcpu-in-one-instance
Change-Id: I99cfee14bb105a8792651129426c0c5a3749796d
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
Introduce the 'pcpuset' to 'InstanceNUMACell' object to track the
instance pinned CPUs. The 'InstanceNUMACell.cpuset' is switched to
keep the instance unpinned CPUs only. As a result, the vCPUs of a
dedicated instance is tracked in NUMA cell object's 'pcpuset', and
vCPUs of a shared instance is put into the 'cpuset' field.
This introduces some object data migration task for an existing instance
that is in the 'dedicated' CPU allocation policy with the fact that all
the CPUs are 1:1 pinned with host CPUs, and it requires to clear the
content of 'InstanceNUMACell.cpuset' and move it to
'InstanceNUMACell.pcpuset' field.
Part of blueprint use-pcpu-and-vcpu-in-one-instance
Change-Id: I901fbd7df00e45196395ff4c69e7b8aa3359edf6
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
If the end-user specifies a cpu_realtime_mask that does not begin
with a carat (i.e. it is not a purely-exclusion mask) it's likely
that they're expecting us to use the exact mask that they have
specified, not realizing that we default to all-vCPUs-are-RT.
Let's make nova's behaviour a bit more friendly by correctly
handling this scenario.
Note that the end-user impact of this is minimal/non-existent. As
discussed in bug #1884231, the only way a user could have used this
before would be if they'd configured an emulator thread and purposefully
set an invalid 'hw:cpu_realtime_mask' set. In fact, they wouldn't have
been able to use this value at all if they used API microversion 2.86
(extra spec validation).
Part of blueprint use-pcpu-and-vcpu-in-one-instance
Change-Id: Id81859186de6fb6b728ad566a532244008fe77d0
Closes-Bug: #1688673