Re-propose "Add maxphysaddr support for Libvirt" for 2024.1 Caracal

blueprint: libvirt-maxphysaddr-support-caracal
Signed-off-by: Nobuhiro MIKI <nmiki@yahoo-corp.jp>
Change-Id: I5ceb4cfd357187a6be4a62dfbd3bc11a294ec703
This commit is contained in:
Nobuhiro MIKI 2023-09-14 14:36:05 +09:00
parent f1ec291df7
commit 346d77f7c6
1 changed files with 286 additions and 0 deletions

View File

@ -0,0 +1,286 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
Add maxphysaddr support for Libvirt
==========================================
https://blueprints.launchpad.net/nova/+spec/libvirt-maxphysaddr-support
This blueprint propose new flavor extra_specs and image properties to control
the physical address bits of vCPUs in Libvirt guests.
Problem description
===================
When booting a guest with 1TB+ RAM, the default physical address bits are
too small and the boot fails [1]_. So a knob is needed to specify the
appropriate physical address bits.
Use Cases
---------
Booting a guest with large RAM.
Proposed change
===============
In Libvirt v8.7.0+ and QEMU v2.7.0+, physical address bits can be specified
with following XML elements [2]_ [3]_. The former means to adopt any physical
address bits, the latter means to adopt the physical address bits of the
host CPU.
- ``<maxphysaddr mode='emulate' bits='42'/>``
- ``<maxphysaddr mode='passthrough'/>``
Flavor extra_specs and image properties
---------------------------------------
Here I suggest the following two for flavor extra_specs and image properties.
Of course, if these are omitted, the behavior is the same as before.
- ``hw:maxphysaddr_mode`` can be either ``emulate`` or ``passthrough``.
- ``hw:maxphysaddr_bits`` takes a positive integer value.
Only meaningful and must be specified if ``hw:maxphysaddr_mode=emulate``.
So the overall flavor extra_specs look like the following::
openstack flavor set <flavor> \
--property hw:maxphysaddr_mode=emulate \
--property hw:maxphysaddr_bits=42
Also the same, but the overall image properties look like the following::
openstack image set <image> \
--property hw_maxphysaddr_mode=emulate \
--property hw_maxphysaddr_bits=42
Nova scheduler changes
----------------------
Nova scheduler also needs to be modified to take these two properties
into account.
hw:maxphysaddr_mode
^^^^^^^^^^^^^^^^^^^
There can be a mix of supported and unsupported hosts depending
on Libvirt and QEMU versions. So add new traits
``COMPUTE_ADDRESS_SPACE_PASSTHROUGH`` and ``COMPUTE_ADDRESS_SPACE_EMULATED``
to check the scheduled host supports this feature.
``trait:COMPUTE_ADDRESS_SPACE_PASSTHROUGH=required`` is automatically added
if ``hw:maxphysaddr_mode=passthrough`` is specified in flavor extra_specs
or image properties.
And same for ``hw:maxphysaddr_mode=emulate``. This can be implemented inside
the ``from_request_spec`` method of ``ResourceRequest`` class.
Passthrough and emulate modes have different properties. So let's consider
the two separately.
The case of ``hw:maxphysaddr_mode=passthrough``. In this case,
``cpu_mode=host-passthrough`` is a requirement, which is already taken
into account in nova scheduling, and no additional modifications are
required in this proposal. It is not guaranteed whether the instance
can be migrated by nova. So the admin needs to make sure that targets
of cold and live migration have similar hardware and software.
This restriction is similar for ``cpu_mode=host-passthrough``.
The case of ``hw:maxphysaddr_mode=emulate``. In nova scheduling,
it is necessary to check that the hypervisor supports at least
``hw:maxphysaddr_bits``. Numerical comparison is implemented differently
for flavor extra_specs and image properties, so it is divided into two cases.
hw:maxphysadr_bits
^^^^^^^^^^^^^^^^^^
The maximum number of bits supported by hypervisor can be obtained by using
libvirt capabilities [4]_.
If ``hw:maxphysaddr_bits`` is set to flavor extra_specs,
``ComputeCapabilitiesFilter`` can be used to compare the number
of bits in scheduling. For example, this can be accomplished by adding
``capabilities:cpu_info:maxphysaddr:bits>=42`` automatically.
If ``hw_maxphysaddr_bits`` is set to image properties, perform a numeric
comparison with ``ImagePropertiesFilter``.
Cold migration and live migration can also be realized with these
filter and ``COMPUTE_ADDRESS_SPACE_EMULATED`` trait.
Alternatives
------------
Before the ``maxphysaddr`` option was introduced into Libvirt, it was specified
as a workaround with the QEMU comanndline parameter. But this alternative is
not allowed in nova.
Also, some Linux distributions may have machine types with
``host-phys-bits=true`` [5]_. For example, ``pc-i440fx-bionic-hpb`` and
``pc-q35-bionic-hpb``. However, this alternative has following two issues and
cannot be adopted for general-purpose use cases.
- Ubuntu package maintainers are applying a patch to QEMU [6]_. It means this
is not included in vanilla QEMU and is not available in other distributions.
- This is only the case for ``hw:maxphysaddr_mode=passthrough`` and does not
include ``hw:maxphysaddr_mode=emulate``. Since
``hw:maxphysaddr_mode=passthrough`` requires ``cpu_mode=host-passthrough``
to be used [7]_, this alternative cannot be used with ``cpu_mode=custom``
or ``cpu_mode=host-model``. So, this alternative is not sufficient for
a cloud with many different CPU models.
As for scheduling, placement does not currently support numeric traits,
so the maximum number of bits supported by hypervisor cannot be checked
by this mechanism. Numeric comparisons can also be performed with
``JsonFilter``. However, ``JsonFilter`` appears to be vulnerable to changes in
``HostState`` and its child attributes, which is mentioned as a warning [10]_.
So this spec employs ``ComputeCapabilitiesFilter`` and
``ImagePropertiesFilter``.
Data model impact
-----------------
None
REST API impact
---------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
None
Other deployer impact
---------------------
Operators should specify appropriate flavor extra_specs or image properties
as needed.
Developer impact
----------------
None
Upgrade impact
--------------
As described earlier, the new traits ``COMPUTE_ADDRESS_SPACE_PASSTHROUGH`` and
``COMPUTE_ADDRESS_SPACE_EMULATED`` signal if the upgraded compute nodes support
this feature.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
nmiki
Other contributors:
None
Feature Liaison
---------------
Feature liaison:
Liaison Needed
Work Items
----------
This spec is addressed across multiple dev cycles.
The merged and missing items are shown below, respectively.
Merged Items
^^^^^^^^^^^^
* Add new traits to check Libvirt and QEMU versions [8]_ [9]_
Missing Items
^^^^^^^^^^^^^
* Add new guest configs
* Add new fileds in nova/api/validation/extra_specs/hw.py
* Add new fileds in nova/objects/image_meta.py
* Add new fields in LibvirtConfigCPU in nova/virt/livbirt/config.py
* Add new field ``maxphysaddr`` to ``cpu_info`` in nova/virt/libvirt/driver.py
* Add docs and release notes for new flavor extra_specs
* Support for ``hw:maxphysadar_bits`` numeric comparison
in ``ComputeCapabilitiesFilter``
* Support for ``hw_maxphysaddr_bits`` numeric comparison
in ``ImagePropertiesFilter``
Dependencies
============
Libivrt v8.7.0+.
QEMU v2.7.0+.
Testing
=======
Add the following unit tests:
- check that proposed flavor extra_specs are properly validated
- check that proposed image properties are properly validated
- check that intended XML elements are output
- check that traits are properly added and used
- check that new field in ``ComputeCapabilitiesFilter`` is property
added and used
- check that new field in ``ImagePropertiesFilter`` is property
added and used
Documentation Impact
====================
For operators, the documentation describes what proposed flavor extra_specs
and image properties mean and how they should be set.
References
==========
.. [1] https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1769053
.. [2] https://libvirt.org/news.html#v8-7-0-2022-09-01
.. [3] https://github.com/libvirt/libvirt/commit/1c1a7cdd4096c59fb0c374529e1e5aea8d43ee9c
.. [4] https://libvirt.org/formatcaps.html#examples
.. [5] https://cpaelzer.github.io/blogs/005-guests-bigger-than-1tb/
.. [6] https://git.launchpad.net/~paelzer/ubuntu/+source/qemu/commit/?id=6ba8b5c843d405e1b067dc8b98ecb8545af78a2b
.. [7] https://github.com/libvirt/libvirt/blob/v8.7.0/src/qemu/qemu_validate.c#L346-L351
.. [8] https://review.opendev.org/c/openstack/os-traits/+/871226
.. [9] https://review.opendev.org/c/openstack/nova/+/873221
.. [10] https://docs.openstack.org/nova/latest/admin/scheduling.html#jsonfilter
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - 2023.1 Antelope
- Introduced
* - 2023.2 Bobcat
- Reproposed
* - 2024.1 Caracal
- Reproposed