When deleting an instance, always send VIR_DOMAIN_UNDEFINE_NVRAM to
delete the NVRAM file, regardless of whether the image is of type UEFI.
This prevents a bug when rebuilding an instance from an UEFI image to a
non-UEFI image.
Closes-Bug: #1997352
Change-Id: I24648f5b7895bf5d093f222b6c6e364becbb531f
Signed-off-by: Simon Hensel <simon.hensel@inovex.de>
This chnage adds the pre-commit config and
tox targets to run codespell both indepenetly
and via the pep8 target.
This change correct all the final typos in the
codebase as detected by codespell.
Change-Id: Ic4fb5b3a5559bc3c43aca0a39edc0885da58eaa2
This makes us attempt to first look up a disk device by alias using
the volume_uuid, before falling back to the old method of using the
guest target device name.
Related to blueprint libvirt-dev-alias
Change-Id: I1dfe4ad3df81bc810835af9b09cfc6c06e9a5388
This handles the case where the live migration monitoring thread may
race and call jobStats() after the migration has completed resulting in
the following error:
libvirt.libvirtError: internal error: migration was active, but no
RAM info was set
Closes-Bug: #1982284
Change-Id: I77fdfa9cffbd44b2889f49f266b2582bcc6a4267
This change extends the guest xml parsing such that
the source device path can be extreacted from interface
elements of type vdpa.
This is required to identify the interface to remove when
detaching a vdpa port from a domain.
This change fixes a latent bug in the libvirt fixutre
related to the domain xml generation for vdpa interfaces.
Change-Id: I5f41170e7038f4b872066de4b1ad509113034960
This patch adds a workaround that can be enabled
to send an announce_self QEMU monitor command
post live-migration to send out RARP frames
that was lost due to port binding or flows not
being installed.
Please note that this makes marks the domain
in libvirt as tainted.
See previous information about this issue in
the [1] bug.
[1] https://bugs.launchpad.net/nova/+bug/1815989
Change-Id: I7a6a6fe5f5b23e76948b59a85ca9be075a1c2d6d
Related-Bug: 1815989
It is a follow up patch for I86153d31b02e6b74b42d53a6800297cbd0e5cbb4
to add type hints to the functions that was touched by the original
patch.
Change-Id: I332ea49184200fcaf8d1480da9658fcbb2f325c5
Related-Bug: #1882521
This patch fixes couple of thing that is needed to run mypy
* fixed a wrong type hint in LibvirtDriver._add_vtpm_device,
_configure_guest_by_virt_type, and _conf_non_lxc signature
* fixed a local variable type hint in LibvirtDriver._create_guest_with_network
* added an assert to _create_guest_with_network as the guest local
variable can be None if we get eventlet.timeout.Timeout and
CONF.vif_plugging_is_fatal is False.
Change-Id: I42c579531bac61063a381598094720271364ec92
Nova so far applied a retry loop that tried to periodically detach the
device from libvirt while the device was visible in the domain xml. This
could lead to an issue where an already progressing detach on the
libvirt side is interrupted by nova re-sending the detach request for
the same device. See bug #1882521 for more information.
Also if there was both a persistent and a live domain the nova tried the
detach from both at the same call. This lead to confusion about the
result when such call failed. Was the detach failed partially?
We can do better, at least for the live detach case. Based on the
libvirt developers detaching from the persistent domain always
succeeds and it is a synchronous process. Detaching from the live
domain can be both synchronous or asynchronous depending on the guest
OS and the load on the hypervisor. But for live detach libvirt always
sends an event [1] nova can wait for.
So this patch does two things.
1) Separates the detach from the persistent domain from the detach from
the live domain to make the error cases clearer.
2) Changes the retry mechanism.
Detaching from the persistent domain is not retried. If libvirt
reports device not found, while both persistent and live detach
is needed, the error is ignored, and the process continues with
the live detach. In any other case the error considered as fatal.
Detaching from the live domain is changed to always wait for the
libvirt event. In case of timeout, the live detach is retried.
But a failure event from libvirt considered fatal, based on the
information from the libvirt developers, so in this case the
detach is not retried.
Related-Bug: #1882521
[1]https://libvirt.org/html/libvirt-libvirt-domain.html#virConnectDomainEventDeviceRemovedCallback
Change-Id: I7f2b6330decb92e2838aa7cee47fb228f00f47da
At present QEMU will raise an error to libvirt when a device_del request
is made for a device that has already partially detached through a
previous request. This is outlined in more detail in the following
downstream Red Hat QEMU bug report:
Get libvirtError "Device XX is already in the process of unplug" [..]
https://bugzilla.redhat.com/show_bug.cgi?id=1878659
Within Nova we can actually ignore this error and allow our existing
retry logic to attempt again after a short wait, hopefully allowing the
original request to complete removing the device from the domain.
This change does this and should result in one of the following
device_del requests raising a VIR_ERR_DEVICE_MISSING error from libvirt.
_try_detach_device should then translate that libvirt error into a
DeviceNotFound exception which is itself then ignored by all
detach_device_with_retry callers and taken to mean that the device has
detached successfully.
Closes-Bug: #1923206
Change-Id: I0e068043d8267ab91535413d950a3e154c2234f7
This patch adds from_persistent_config kwargs to get_interface_by_cfg()
and get_disk() so that the caller can specify which domain config the
devices is read from. Currently, if there was both a live domain and a
persistent domain then nova only reads from the live domain. In a later
patch during device detach these calls will be used to detach from the
persistent domain separately from the live domain.
Change-Id: I86153d31b02e6b74b42d53a6800297cbd0e5cbb4
Related-Bug: #1882521
Libvirt XML contains useful configuration information such as instance names,
flavors and images as metadata. This change extends this metadata to include
the IP addresses of the instances.
Example:
<metadata>
<nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.1">
...
<nova:ports>
<nova:port uuid="567a4527-b0e4-4d0a-bcc2-71fda37897f7">
<nova:ip type="fixed" address="192.168.1.1" ipVersion="4"/>
<nova:ip type="fixed" address="fe80::f95c:b030:7094" ipVersion="6"/>
<nova:ip type="floating" address="11.22.33.44" ipVersion="4"/>
</nova:port>
</nova:ports>
...
</nova:instance>
</metadata>
Change-Id: I45f1df4935905170957c2ea2496c8a698a7464a2
blueprint: libvirt-driver-ip-metadata
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Introduced by I32908b77c18f8ec08211dd67be49bbf903611c34 this was then
missed by the clean up of I8e349849db0b1a540d295c903f1470917b82fd97 that
actually bumped MIN_LIBVIRT_VERSION to 5.0.0 past the required 4.1.0 for
the original change. It's safe to remove this now ahead of another bump.
Change-Id: I181b3bf433b0fcef92ef4d430e9858506f24153c
Bug #1894804 outlines how DEVICE_DELETED events were often missing from
QEMU on Focal based OpenStack CI hosts as originally seen in bug
#1882521. This has eventually been tracked down to some undefined QEMU
behaviour when a new device_del QMP command is received while another is
still being processed, causing the original attempt to be aborted.
We hit this race in slower OpenStack CI envs as n-cpu rather crudely
retries attempts to detach devices using the RetryDecorator from
oslo.service. The default incremental sleep time currently being tight
enough to ensure QEMU is still processing the first device_del request
on these slower CI hosts when n-cpu asks libvirt to retry the detach,
sending another device_del to QEMU hitting the above behaviour.
Additionally we have also seen the following check being hit when
testing with QEMU >= v5.0.0. This check now rejects overlapping
device_del requests in QEMU rather than aborting the original:
cce8944cc9
This change aims to avoid this situation entirely by raising the default
incremental sleep time between detach requests from 2 seconds to 10,
leaving enough time for the first attempt to complete. The overall
maximum sleep time is also increased from 30 to 60 seconds.
Future work will aim to entirely remove this retry logic with a libvirt
event driven approach, polling for the the
VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED and
VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED events before retrying.
Finally, the cleanup of unused arguments in detach_device_with_retry is
left for a follow up change in order to keep this initial change small
enough to quickly backport.
Closes-Bug: #1882521
Related-Bug: #1894804
Change-Id: Ib9ed7069cef5b73033351f7a78a3fb566753970d
For attach:
* Generates InstancePciRequest for SRIOV interfaces attach requests
* Claims and allocates a PciDevice for such request
For detach:
* Frees PciDevice and deletes the InstancePciRequests
On the libvirt driver side the following small fixes was necessar:
* Fixes PCI address generation to avoid double 0x prefixes in LibvirtConfigGuestHostdevPCI
* Adds support for comparing LibvirtConfigGuestHostdevPCI objects
* Extends the comparison of LibvirtConfigGuestInterface to support
macvtap interfaces where target_dev is only known by libvirt but not
nova
* generalize guest.get_interface_by_cfg() to work with both
LibvirtConfigGuest[Inteface|HostdevPCI] objects
Implements: blueprint sriov-interface-attach-detach
Change-Id: I67504a37b0fe2ae5da3cba2f3122d9d0e18b9481
This is a pure refactoring that moves the equality check of
LibvirtConfigGuestInterface objects from get_interface_by_cfg() into the
class itself.
Later this pattern will be extended to LibvirtConfigGuestHostdevPCI objects
to support detaching direct physical interfaces where PCI address based
equality is needed.
Part of blueprint sriov-interface-attach-detach
Change-Id: I6ce9457179f65f82ff721b315e51a67ebd879673
The VIR_MIGRATE_PARAM_PERSIST_XML parameter was introduced in libvirt
v1.3.4 and is used to provide the new persistent configuration for the
destination during a live migration:
https://libvirt.org/html/libvirt-libvirt-domain.html#VIR_MIGRATE_PARAM_PERSIST_XML
Without this parameter the persistent configuration on the destination
will be the same as the original persistent configuration on the source
when the VIR_MIGRATE_PERSIST_DEST flag is provided.
As Nova does not currently provide the VIR_MIGRATE_PARAM_PERSIST_XML
param but does provide the VIR_MIGRATE_PERSIST_DEST flag this means that
a soft reboot by Nova of the instance after a live migration can revert
the domain back to the original persistent configuration from the
source.
Note that this is only possible in Nova as a soft reboot actually
results in the virDomainShutdown and virDomainLaunch libvirt APIs being
called that recreate the domain using the persistent configuration.
virDomainReboot does not result in this but is not called at this time.
The impact of this on the instance after the soft reboot is pretty
severe, host devices referenced in the original persistent configuration
on the source may not exist or could even be used by other users on the
destination. CPU and NUMA affinity could also differ drastically between
the two hosts resulting in the instance being unable to start etc.
As MIN_LIBVIRT_VERSION is now > v1.3.4 this change simply includes the
VIR_MIGRATE_PARAM_PERSIST_XML param using the same updated XML for the
destination as is already provided to VIR_MIGRATE_PARAM_DEST_XML.
Co-authored-by: Tadayoshi Hosoya <tad-hosoya@wr.jp.nec.com>
Closes-Bug: #1890501
Change-Id: Ia3f1d8e83cbc574ce5cb440032e12bbcb1e10e98
I7eb86edc130d186a66c04b229d46347ec5c0b625 introduced
VIR_ERR_DEVICE_MISSING into the hot unplug libvirt error code list
within detach_device_with_retry. While the change correctly referenced
that the error code was introduced in v4.1.0 it made no attempt to
handle versions prior to this. With MIN_LIBVIRT_VERSION currently pinned
to v4.0.0 we need to handle libvirt < v4.1.0 to avoid referencing the
non-existent error code within the libvirt module.
Closes-Bug: #1891547
Change-Id: I32908b77c18f8ec08211dd67be49bbf903611c34
Remove six.PY2 and six.PY3.
Subsequent patches will replace other six usages.
Change-Id: Iccce0ab50eee515e533ab36c8e7adc10cb3f7019
Implements: blueprint six-removal
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
As documented in the comments this workaround was in place to catch when
blockjobs had not started and reported an end cursor position of 0.
This was resovled in libvirt v2.3.0 and can now be removed as
MIN_LIBVIRT is well past this at 4.0.0.
Change-Id: Ic22a0c6cfa32d03b7e117cc8dbc54ceb3bfc9fe2
Introduced in libvirt v4.1.0 [1] this error code replaces the previously
raised VIR_ERR_INVALID_ARG, VIR_ERR_OPERATION_FAILED and
VIR_ERR_INVALID_ARG codes [2][3].
VIR_ERR_OPERATION_FAILED was introduced and tested as an
active/live/hot unplug config device detach error code in
I131aaf28d2f5d5d964d4045e3d7d62207079cfb0.
VIR_ERR_INTERNAL_ERROR was introduced and tested as an
active/live/hot unplug config device detach error code in
I3055cd7641de92ab188de73733ca9288a9ca730a.
VIR_ERR_INVALID_ARG was introduced and tested as an
inactive/persistent/cold unplug config device detach error code in
I09230fc47b0950aa5a3db839a070613c9c817576.
This change introduces support for the new VIR_ERR_DEVICE_MISSING error
code while also retaining coverage for these codes until
MIN_LIBVIRT_VERSION is bumped past v4.1.0.
The majority of this change is test code motion with the existing tests
being modified to run against either the active or inactive versions of
the above error codes for the time being.
test_detach_device_with_retry_operation_internal and
test_detach_device_with_retry_invalid_argument_no_live have been removed
as they duplicate the logic within the now refactored
_test_detach_device_with_retry_second_detach_failure.
[1] https://libvirt.org/git/?p=libvirt.git;a=commit;h=bb189c8e8c93f115c13fa3bfffdf64498f3f0ce1
[2] https://libvirt.org/git/?p=libvirt.git;a=commit;h=126db34a81bc9f9f9710408f88cceaa1e34bbbd7
[3] https://libvirt.org/git/?p=libvirt.git;a=commit;h=2f54eab7c7c618811de23c60a51e910274cf30de
Closes-Bug: #1887946
Change-Id: I7eb86edc130d186a66c04b229d46347ec5c0b625
While this is extremely chatty given the 0.5 second loop used by callers
into this method it will provide useful insight into why certain
failures are seen.
Change-Id: I6e58908966a2dc62a3fff155bb81481d68aa2d68
Since 0.9.11 virDomainBlockResize has accepted the size argument in
bytes when the VIR_DOMAIN_BLOCK_RESIZE_BYTES flag is provided.
This change switches all callers over to using bytes to simplify the
required call, avoiding the need to divide by units.Ki etc.
Change-Id: Ib8d9318596186acd86a738ceea187420698645e6
Previously virDomainBlockRebase [1] was used by swap_volume to switch
between volumes presented to the compute host as block devices or files.
As outlined in the virDomainBlockCopy [2] documentation this command is
actually a superset of virDomainBlockRebase in our case:
> This command is a superset of the older virDomainBlockRebase() when used
> with the VIR_DOMAIN_BLOCK_REBASE_COPY flag, and offers better control
> over the destination format, the ability to copy to a destination that
> is not a local file, and the possibility of additional tuning
> parameters.
As such we can switch to virDomainBlockCopy and expand support for
swap_volume outside of just host block devices and files.
To allow swap_volume to support RBD volumes we also need the domain to
use the recently introduced -blockdev support within libvirt >= 6.0.0
and QEMU >= 4.2.0. New MIN_LIBVIRT_BLOCKDEV and MIN_QEMU_BLOCKDEV
version constants are introduced and used to determine when to switch to
the virDomainBlockCopy method of moving between volumes.
[1] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockRebase
[2] https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockCopy
Closes-Bug: #1868996
Change-Id: I8e8035dcf508f5215bba9b7575c5c6abfe41da31
We use the oslo.utils save_and_reraise_exception context manager in our
detach device code and catch specific exceptions that mean 'not found'
and raise DeviceNotFound instead. When we do that, the
save_and_reraise_exception context manager logs an ERROR traceback of
the original exception, for informational purposes. This is misleading
when trying to debug other issues, as it makes it look like the caught
exception caused a problem.
This passes the reraise=False keyword arg to the context manager and
sets the 'reraise' attribute to True only if we are not going to raise
a different exception.
Related-Bug: #1836212
Change-Id: Icce1e31fe3ebcbf9e4897bbfa57b7f3d1fba67a3
This contained a single wrapper for 'domain.info()' that worked around a
race in libvirt 1.2.11. We haven't supported this version for quite some
time [1] so the module itself can go.
[1] http://git.openstack.org/cgit/openstack/nova/commit/?id=28d337b --
Pick next minimum libvirt / QEMU versions for "Stein"
Change-Id: I690d64c01accb10afc2ff20844e2d51bfa68e635
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
From the libvirt doc, you can get all the flags of
domain migrate.
The link is here.
https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMigrateFlags
Here we just removed the non-used flags and added
the used flags but not listed in migrate description.
Removed flags:
VIR_MIGRATE_PAUSED
VIR_MIGRATE_NON_SHARED_DISK
VIR_MIGRATE_CHANGE_PROTECTION
VIR_MIGRATE_UNSAFE
VIR_MIGRATE_OFFLINE
Added flags:
VIR_MIGRATE_AUTO_CONVERGE
VIR_MIGRATE_POSTCOPY
Change-Id: I6a9615b636b3394a65ac4c972199c068fda6de14
We always import privsep modules like this:
import nova.privsep.libvirt
Not like this:
from nova.privsep import libvirt
This is because it makes it obvious at the caller that a priviledged
operation is occuring:
nova.privsep.libvirt.destroy_root_filesystem()
Not just:
libvirt.destroy_root_filesystem()
This is especially true when the imported module is called "libvirt",
which is a very common term in the codebase and super hard to grep
for specific uses of.
I've corrected the existing style mismatches to make things consistent.
Note that the next patch in this series covers this case with a
hacking check.
Change-Id: Ief177dbcb018da6fbad13bb0ff153fc47292d5b9
It turns out that when detaching a device libvirt can raise a
libvirt.VIR_ERR_INTERNAL_ERROR exception with an error log of
"unable to execute QEMU command 'device_del': Device <foo> not found".
Add this exception to the existing "not found" case which currently
handles only libvirt.VIR_ERR_OPERATION_FAILED.
Change-Id: I3055cd7641de92ab188de73733ca9288a9ca730a
Closes-Bug: #1815949
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
Nova skips detaching of ovs dpdk interfaces
thinking that it's already detached because
get_interface_by_cfg() return no inteface.
This is due to _set_config_VIFVHostUser()
not setting target_dev in configuration while
LibvirtConfigGuestInterface sets target_dev
if tag "target" is found in the interface.
As target_dev is not a valid value for
vhostuser interface, it will not be checked
for vhostuser type.
Change-Id: Iaf185b98c236df47e44cda0732ee0aed1fd6323d
Closes-Bug: #1807340
The encryption offered by Nova (via `live_migration_tunnelled`, i.e.
"tunnelling via libvirtd") today secures only two migration streams:
guest RAM and device state; but it does _not_ encrypt the NBD (Network
Block Device) transport—which is used to migrate disks that are on
non-shared storage setup (also called: "block migration"). Further, the
"tunnelling via libvirtd" has a huge performance penalty and latency,
because it burns more CPU and memory bandwidth due to increased number
of data copies on both source and destination hosts.
To solve this existing limitation, introduce a new config option
`live_migration_with_native_tls`, which will take advantage of "native
TLS" (i.e. TLS built into QEMU, and relevant support in libvirt). The
native TLS transport will encrypt all migration streams, *including*
disks that are not on shared storage — all of this without incurring the
limitations of the "tunnelled via libvirtd" transport.
Closes-Bug: #1798796
Blueprint: support-qemu-native-tls-for-live-migration
Change-Id: I78f5fef41b6fbf118880cc8aa4036d904626b342
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
This reverts commit 23446a9552.
With change Ibf2b5eeafd962e93ae4ab6290015d58c33024132 there
is nothing using the migrate_configure_max_speed method any
longer and can be removed. An additional mock, added after
the change being reverted, is also removed.
Change-Id: I90d6e14bf9383bf71d65d2180474ba228db2feab
Related-Bug: #1786346
There can be unicode characters in the params for live migration, for
example, the guest domain name in the destination XML. We need to
convert those to bytes when we call migrateToURI3 under python2.
The existing code was just calling str() for this, but that will fail
with the error:
UnicodeEncodeError: 'ascii' codec can't encode characters...
We need to encode the unicode characters to do conversion.
The existing unit test wasn't using any unicode characters in its test
data, so this scenario wasn't covered.
Closes-Bug: #1768807
Change-Id: I4b34139a3c5e3e2b7cf7cbe50bdf3da3131b9b1c
We should use nova.virt.libvirt.Guest instead of call from
a virDomain object.
Change-Id: Ifa8fe1b19980cc9e986d26b284d2fb093466d30c
Signed-off-by: Chen Hanxiao <chenhx@certusnet.com.cn>
The recently updated minimum required libvirt version (1.3.1; in commit
403320b -- libvirt: Bump MIN_{LIBVIRT,QEMU}_VERSION for "Rocky") brings
in the newer libvirt migration API, migrateToURI3(). The newer API was
explicitly designed[*] to be backward compatible with the older variant.
So remove the usage of the older variants:
migrateToURI()
migrateToURI2()
And just stick to the newer API -- migrateToURI3().
Clean up the following:
- Add the 'migrate_disks' and 'destination_xml' paramters, and remove
the no longer needed 'domain_xml' from the Nova migrate() method.
- Remove or fix various unit tests to use migrateToURI3().
- Stub nova.virt.libvirt.guest.Guest.migrate() correctly in
nova/tests/unit/virt/test_virt_drivers.py.
[*] https://libvirt.org/git/?p=libvirt.git;a=commit;h=4bf62f4 --
Extensible migration APIs
Change-Id: Id9ee1feeadf612fa79c3d280cee3a614a74a00a7
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
When detaching a device from a domain we first attempt to remove the
device from both the persistent and live configs before looping to
ensure the device has really been detached from the running live config.
Previously when this failed we logged an error message that suggested
that this was due to issues detaching the device from a transient
domain, however this is not the case as the domain is persistent.
This change simply updates the error and associated comments to only
reference the live config of the domain.
Additionally a DEBUG line claiming that a device has been successfully
detached is now only logged once the device is removed from the live
config, hopefully avoiding any confusion from this line been logged
each time an attempt is made to detach the device.
Change-Id: If869470216600c303d47cf79f12c4fc88abcf813