Commit Graph

34903 Commits

Author SHA1 Message Date
Zuul 7096423b34 Merge "Reject AZ changes during aggregate add / remove host" 2024-05-09 20:17:32 +00:00
Zuul 5470dedd4d Merge "Fix device_type=lun with boot_index" 2024-05-09 17:32:28 +00:00
Zuul 67119b7de3 Merge "Avoid setting serial on raw LUN devices" 2024-05-08 18:36:11 +00:00
Zuul 114b8184e4 Merge "Make overcommit check for pinned instance pagesize aware" 2024-05-08 13:55:26 +00:00
Balazs Gibizer 3c0eadae0b Reject AZ changes during aggregate add / remove host
After this patch nova rejects the add host to aggregate API action
if the host has instances and the new aggregate for the host would
mean that these instances need to move from one AZ (even from the
default one) to another. Such AZ change is not implemented in nova
and currently leads to stuck instances.

Similarly nova will reject remove host from aggregate API action if the
host has instances and the aggregate removal would mean that the
instances need to change AZ.

Depends-On: https://review.opendev.org/c/openstack/tempest/+/821732

Change-Id: I19c4c6d34aa2cc1f32d81e8c1a52762fa3a18580
Closes-Bug: #1907775
2024-05-08 14:56:56 +02:00
Zuul 95bfa492e9 Merge "[ironic] Fix rebooting instance" 2024-05-08 01:10:34 +00:00
Dan Smith 2f0c340d39 Fix device_type=lun with boot_index
Right now we'll fail to calculate the boot order of a set of BDMs if
one of them is a device_type=lun. This fixes that and teaches us
that it's just a "hd" from qemu's perspective.

Closes-Bug: #2065084
Change-Id: Ic1340918738d503fc797c9373fe2e1dd16b27a09
2024-05-07 11:14:30 -07:00
Dan Smith 575ff86a4f Avoid setting serial on raw LUN devices
Libvirt now enforces that device="lun" (i.e. raw device passthrough)
disks must not have the <serial> property set. We recently enabled
the ability to manage devices by alias instead of serial, but to
fully enable this use-case we need to avoid putting serial in the
XML to appease libvirt.

Related-Bug: #2065084
Change-Id: Ifa2df89f27e58e1e64ce046edeaf6e49a7c89490
2024-05-07 10:39:49 -07:00
Vasyl Saienko 0e766885f6 [ironic] Fix rebooting instance
The correct state for hard and soft reboots are rebooting [0]

[0] https://github.com/openstack/openstacksdk/blob/master/openstack/baremetal/v1/node.py#L44

Closes-Bug: #2064826
Change-Id: I18e0352b3638872e85ce91a3cfcbbfddc812ab67
2024-05-07 20:39:31 +03:00
Fabian Wiesel 198805c7c5 Do not close returned image-chunk iterator & get_verifier early
The GlanceClientWrapper._get_verifier method may fail already on the
metadata, so we better call it early before we open files and start
downloads, which we then abort uncleanly.

This also likely how the bug #1948706 was triggered in the first place:
- The file gets opened
- _get_verifier fails *before* we even iterate over the data
- glance_utils.IterableWithLength won't close the underlying iterator.

The added close statement, now guarded with `may_close_iterator` is
likely superfluous.

If we return the image chunk iterator, then we should rather not
close the underlying iterable, as it will kill the transfer.

Closes-Bug: #2053027
Change-Id: Ia247af39a96fbed90b027ad30158e66dd2f0bd5e
2024-04-25 13:58:12 +02:00
Zuul ca1db54f1b Merge "Fix: migration configuration with cpu_shared_set (libvirt part)" 2024-04-24 20:37:35 +00:00
Zuul 95b4ef6fa4 Merge "Fix: migration configuration with cpu_shared_set (object part)" 2024-04-24 19:58:56 +00:00
René Ribaud 43dadaeb90 Fix: migration configuration with cpu_shared_set (libvirt part)
Live migrating to a host with cpu_shared_set configured will now
update the VM's configuration accordingly.

Example: live migrating a VM from source host with cpu_shared_set=0,1
to destination host with cpu_shared_set=2,3 will now update the
VM configuration.
(<vcpu cpuset="0-1"> will be updated to <vcpu cpuset="2-3">).

Related-Bug: #1869804
Change-Id: I7c717503eba58088094fac05cb99b276af9a3460
2024-04-24 07:59:15 -07:00
René Ribaud 2c3f4f2da5 Fix: migration configuration with cpu_shared_set (object part)
Live migrating to a host with cpu_shared_set configured will now
update the VM's configuration accordingly.

Example: live migrating a VM from source host with cpu_shared_set=0,1
to destination host with cpu_shared_set=2,3 will now update the
VM configuration.
(<vcpu cpuset="0-1"> will be updated to <vcpu cpuset="2-3">).

This update adds a new field, dst_cpu_shared_set_info, to the
LibvirtLiveMigrateData object, which requires an increase in the
object's version. As a result, this patch cannot be backported.

Related-Bug: #1869804
Change-Id: I806da0958fe436c989e09a52ca6b6f1bbd25a865
2024-04-24 07:53:38 -07:00
Zuul e2ef2240b1 Merge "api: Remove FlavorManageController" 2024-04-22 19:58:38 +00:00
zhong.zhou f3eb76e57b Validate flavor image min ram when resize volume-backed instance
When resize instance, the flavors returned may not meet the image
minimum memory requirement, resizing instance ignores the minimum
memory limit of the image, which may cause the resizing be
successfully, but the instance fails to start because the memory is
too small to run the system.

Related-Bug: 2007968
Change-Id: I132e444eedc10b950a2fc9ed259cd6d9aa9bed65
2024-04-18 10:53:04 +08:00
zhong.zhou b434b42761 Regression test for bug 2007968
Related-Bug: 2007968
Change-Id: I9d5a3813cad3c9ee1c6097959a972cf8307795cd
2024-04-18 10:44:46 +08:00
Stephen Finucane e504b76508 api: Remove FlavorManageController
This is an odd child, registering standard REST operations as actions
(in the '/action' API sense of the term). There's no reason for this
delineation these days so simply remove it. This makes auto-generation
much easier down the road.

Change-Id: Ia45013fc988acb9517aea42c3caa1fa45d63892e
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2024-04-12 00:56:21 +01:00
Zuul c199becf52 Merge "Refactor vf profile for PCI device" 2024-04-11 14:38:11 +00:00
Zuul 1bca24aeb0 Merge "Always delete NVRAM files when deleting instances" 2024-04-04 22:14:14 +00:00
Zuul 6bd99eb2ea Merge "Correctly reset instance task state in rebooting hard" 2024-03-20 13:34:22 +00:00
Stephen Finucane f14c16af82 Make overcommit check for pinned instance pagesize aware
When working on a fix for bug #1811870, it was noted that the check to
ensure pinned instances do not overcommit was not pagesize aware. This
means if an instance without hugepages boots on a host with a large
number of hugepages allocated, it may not get all of the memory
allocated to it. Put in concrete terms, consider a host with 1 NUMA
cell, 2 CPUs, 1G of 4k pages, and a single 1G page. If you boot a first
instance with 1 CPU, CPU pinning, 1G of RAM, and no specific page size,
the instance should boot successfully. An attempt to boot a second
instance with the same configuration should fail because there is only
the single 1G page available, however, this is not currently the case.
The reason this happens is because we currently have two tests: a first
that checks total (not free!) host pages and a second that checks free
memory but with no consideration for page size. The first check passes
because we have 1G worth of 4K pages configured and the second check
passes because we have the single 1G page.

Close this gap.

Change-Id: I74861a67827dda1ab2b8451967f5cf0ae93a4ad3
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Closes-Bug: #1811886
2024-03-19 19:59:41 +00:00
Zuul 818f0cd4a3 Merge "Remove nova.wsgi module" 2024-03-19 19:42:00 +00:00
Zuul 3e358bc37c Merge "vgpu: Allow device_addresses to not be set" 2024-03-18 16:58:28 +00:00
Zuul e255323f46 Merge "libvirt: Cap with max_instances GPU types" 2024-03-18 12:31:30 +00:00
Zuul 45e5d213f8 Merge "Removed explicit call to delete attachment" 2024-03-14 10:47:58 +00:00
Zuul ef069d928a Merge "pwr mgmt: handle live migrations correctly" 2024-03-14 00:06:49 +00:00
Zuul b10cca0282 Merge "Reproducer test for live migration with power management" 2024-03-13 23:48:38 +00:00
Zuul 52a7d9cef9 Merge "pwr mgmt: make API into a per-driver object" 2024-03-13 23:48:29 +00:00
Zuul b59e1f8c00 Merge "Power on cores for isolated emulator threads" 2024-03-13 19:24:15 +00:00
Artom Lifshitz c1ccc1a316 pwr mgmt: handle live migrations correctly
Previously, live migrations completely ignored CPU power management.
This patch makes sure that we correctly:

* Power up the cores on the destination during pre_live_migration, as
  we need them powered up before the instance starts on the
  destination.
* If the live migration is successful, power down the vacated cores on
  the source.
* In case of a rollback, power down the cores previously powered up on
  pre_live_migration.

Closes-bug: 2056613
Change-Id: I787bd7807950370cd865f29b95989d489d4826d0
2024-03-11 14:21:27 -04:00
Artom Lifshitz 1f5e3421ec Reproducer test for live migration with power management
Building on the previous patch's refactor, we can now do functional
testing of live migration with CPU power management. We quickly notice
that it's mostly broken, leaving the CPUs powered up on the source,
and not powering them up on the dest.

Related-bug: 2056613
Change-Id: Ib4de77d68ceeffbc751bca3567ada72228b750af
2024-03-11 12:10:36 -04:00
Zuul 671c4e0313 Merge "Reproducer for not powering on isolated emulator threads cores" 2024-03-11 15:53:10 +00:00
Zuul 3cb7329ad2 Merge "Add cpuset_reserved helper to instance NUMA topology" 2024-03-11 15:53:03 +00:00
Zuul 336b815a30 Merge "Reproducers for bug 1869804" 2024-03-11 14:20:46 +00:00
Artom Lifshitz 29dc044a7a pwr mgmt: make API into a per-driver object
We want to test power management in our functional tests in multinode
scenarios (ex: live migration).

This was previously impossible because all the methods in
nova.virt.libvirt.cpu.api and were at the module level, meaning both
source and destination libvirt drivers would call the same method to
online and offline cores. This made it impossible to maintain distinct
core power state between source and destination.

This patch inserts a nova.virt.libvirt.cpu.api.API class, and gives
the libvirt driver a cpu_api attribute with an instance of that
class. Along with the tiny API.core() helper, this allows new
functional tests in the subsequent patches to stub out the core
"model" code with distinct objects on the source and destination
libvirt drivers, and enables a whole bunch of testing (and fixes!)
around live migration.

Related-bug: 2056613
Change-Id: I052535249b9a3e144bb68b8c588b5995eb345b97
2024-03-08 20:31:42 -05:00
Artom Lifshitz 0986d2bbe8 Power on cores for isolated emulator threads
Previously, with the `isolate` emulator threads policy and libvirt cpu
power management enabled, we did not power on the cores to which the
emulator threads were pin. Start doing that, and don't forget to power
them down when the instance is stopped.

Closes-bug: 2056612
Change-Id: I6e5383d8a0bf3f0ed8c870754cddae4e9163b4fd
2024-03-08 20:31:34 -05:00
Artom Lifshitz 521af26209 Reproducer for not powering on isolated emulator threads cores
Related-bug: 2056612
Change-Id: Icd586cdd015143b2e113fd14904f40410809d247
2024-03-08 20:31:30 -05:00
Artom Lifshitz 8dbfecd663 Add cpuset_reserved helper to instance NUMA topology
When we pin emulator threads with the `isolate` policy, those pins are
stored in the `cpuset_reserved` field in each NUMACell. In subsequent
patches we'll need those pins for the whole instance, so this patch
adds a helper property that does this for us, similar to how the
`cpu_pinning` property helper currently works.

Related-bug: 2056612
Change-Id: I8597f13e8089106434018b94e9bbc2091f95fee9
2024-03-08 20:31:19 -05:00
Zuul 13ccaf75f6 Merge "Implement add_consumer, remove_consumer KeyManager APIs" 2024-03-06 12:53:46 +00:00
Zuul 6230018d65 Merge "Disconnecting volume from the compute host" 2024-03-05 19:36:40 +00:00
Sylvain Bauza d445eaf9dd vgpu: Allow device_addresses to not be set
Sometimes, some GPU may have a long list of PCI addresses (say a SRIOV
GPU) or operators may have a long list of GPUs. In order to help their
lifes, let's allow device_addresses to be optional.

This means that a valid configuration could be :

    [devices]
    enabled_mdev_types = nvidia-35, nvidia-36

    [mdev_nvidia-35]

    [mdev_nvidia-36]

NOTE(sbauza): we have a slight coverage gap for testing what happens
if the groups aren't set, but I'll add it in a next patch

Related-Bug: #2041519
Change-Id: I73762a0295212ee003db2149d6a9cf701023464f
2024-03-05 11:48:25 +01:00
Sylvain Bauza 60851e4464 libvirt: Cap with max_instances GPU types
We want to cap a maximum mdevs we can create.
If some type has enough capacity, then other GPUs won't be used and
existing ResourceProviders would be deleted.

Closes-Bug: #2041519
Change-Id: I069879a333152bb849c248b3dcb56357a11d0324
2024-03-05 11:48:19 +01:00
Zuul 39de10777b Merge "Add support for showing requested az in output" 2024-03-01 20:39:00 +00:00
Zuul 9675f142b0 Merge "testing: Add ephemeral encryption support to fixtures" 2024-03-01 20:05:27 +00:00
Zuul dac8bd2493 Merge "libvirt: make <encryption> a sub element of <source>" 2024-03-01 20:05:16 +00:00
Zuul 91ec918ee7 Merge "Add hw_ephemeral_encryption_secret_uuid image property" 2024-03-01 20:05:01 +00:00
Rajesh Tailor c98c8d84ee Add support for showing requested az in output
As of now, the server show and server list --long output
shows the availability zone, that is, the AZ to which the
host of the instance belongs. There is no way to tell from
this information if the instance create request included an
AZ or not.

This change adds a new api microversion to add support for
including availability zone requested during instance create
in server show and server list --long responses.

Change-Id: If4cf09c1006a3f56d243b9c00712bb24d2a796d3
2024-03-01 21:39:04 +05:30
Amit Uniyal a1a07e0d2d Refactor vf profile for PCI device
In general the card_serial_number will not be present on sriov
VFs/PFs, it is only supported on very new cards.
Also, all 3 need not to be always required for vf_profile.

Related-Bug: #2008238
Change-Id: I00b126635612ace51b5e3138afcb064f001f1901
2024-03-01 15:28:25 +00:00
Zuul 1c903ccc8d Merge "Fix nova-metadata-api for ovn dhcp native networks" 2024-03-01 12:34:52 +00:00