Commit Graph

510 Commits

Author SHA1 Message Date
Stephen Finucane eef4b5435e api: Reject non-spawn operations for vTPM
We're going to gradually introduce support for the various instance
operations when using vTPM due to the complications of having to worry
about the state of the vTPM device on the host. Add in API checks to
reject all manner of requests until we get to include support for each
one. With this change, the upcoming patch to turn everything on will
allow a user to create, delete and reboot an instance with vTPM, while
evacuate, rebuild, cold migration, live migration, resize, rescue and
shelve will not be supported immediately.

While we're here, we rename two unit test files so that their names
match the files they are testing and one doesn't have to spend time
finding them.

Change-Id: I3862a06ca28b383d525bcc9dcbc6fb1d4062f193
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2020-08-24 19:37:01 +01:00
Zuul a503e42a17 Merge "Add checks for volume status when rebuilding" 2020-08-20 08:21:16 +00:00
sunhao 10e9a9b9fc Add checks for volume status when rebuilding
When rebuilding, we should only allow detaching
the volume with 'in-use' status, volume in status
such as 'retyping' should not allowed.

Change-Id: I7f93cfd18f948134c9cb429dea55740d2cf97994
Closes-Bug: #1489304
2020-08-19 21:20:13 +08:00
Lee Yarwood 5913bd889f compute: Validate a BDMs disk_bus when provided
Previously disk_bus values were never validated and could easily end up
being ignored by the underlying virt driver and hypervisor.

For example, a common mistake made by users is to request a virtio-scsi
disk_bus when using the libvirt virt driver. This however isn't a valid
bus and is ignored, defaulting back to the virtio (virtio-blk) bus.

This change adds a simple validation in the compute API using the
potential disk_bus values provided by the DiskBus field class as used
when validating the hw_*_bus image properties.

Closes-Bug: #1876301
Change-Id: I77b28b9cc8f99b159f628f4655d85ff305a71db8
2020-07-29 16:05:48 +00:00
Zuul 0cdce3e8b5 Merge "compute: Do not allow rescue attempts using volume snapshot images" 2020-07-23 09:50:04 +00:00
Zuul b7a9ff513b Merge "hardware: Enable 'hw:cpu_dedicated_mask' for creating a mixed instance" 2020-07-22 09:37:32 +00:00
Zuul a1e392fa90 Merge "compute: bump nova-compute version and check in API" 2020-07-21 23:50:34 +00:00
Zuul 51c11fabfc Merge "Switch from unittest2 compat methods to Python 3.x methods" 2020-07-21 13:57:40 +00:00
Wang Huaqiang 5c71ac5e02 hardware: Enable 'hw:cpu_dedicated_mask' for creating a mixed instance
Enable the 'hw:cpu_dedicated_mask' flavor extra spec interface, user
can create CPU mixing instance through a flavor with following
extra spec settings:

 openstack flavor set <flavor_id> \
        --property hw:cpu_policy=mixed \
        --property hw:cpu_dedicated_mask=0-3,7

In a topic coming later, we'll introduce another way to create a
mixed instance through the real-time interface.

Part of blueprint use-pcpu-and-vcpu-in-one-instance

Change-Id: I2a3311c08a52eb11859c68ef940a0bd755a94c6b
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
2020-07-21 15:18:41 +08:00
Wang Huaqiang 9ddc60539f compute: bump nova-compute version and check in API
Bump nova-compute service version, announce the support of
'mixed' instance and then check the service version for all
nova-compute nodes in API layer.

The nova-compute nodes in cluster need to be ensured that
they support the 'mixed' instance CPU allocation policy
once they want to:

- Create a brand-new instance
- Resize to a mixed instance from a dedicated or shared instance.

And we don't support rebuilding an instance that changes
the NUMA topology, and changing the CPU policy will
definitely mean changing the NUMA topology, so nova-compute
nodes version will not be checked when rebuilding.

It is also not necessary to check the service version when
shelving and unshelving an instance, because the instance
CPU policy cannot be changed in this process, and all
compute nodes service have been checked before shelving a
mixed instance, no need to check this again.

Part of blueprint use-pcpu-and-vcpu-in-one-instance

Change-Id: I59298788f26ca8f32bf3e38f3a52f72ff63fcc8b
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
2020-07-21 15:18:36 +08:00
Zuul 1fa6799e41 Merge "objects: Introduce 'pcpuset' field for InstanceNUMACell" 2020-07-15 10:49:05 +00:00
Wang Huaqiang 867d447101 objects: Introduce 'pcpuset' field for InstanceNUMACell
Introduce the 'pcpuset' to 'InstanceNUMACell' object to track the
instance pinned CPUs. The 'InstanceNUMACell.cpuset' is switched to
keep the instance unpinned CPUs only. As a result, the vCPUs of a
dedicated instance is tracked in NUMA cell object's 'pcpuset', and
vCPUs of a shared instance is put into the 'cpuset' field.

This introduces some object data migration task for an existing instance
that is in the 'dedicated' CPU allocation policy with the fact that all
the CPUs are 1:1 pinned with host CPUs, and it requires to clear the
content of 'InstanceNUMACell.cpuset' and move it to
'InstanceNUMACell.pcpuset' field.

Part of blueprint use-pcpu-and-vcpu-in-one-instance

Change-Id: I901fbd7df00e45196395ff4c69e7b8aa3359edf6
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>
2020-07-14 00:38:34 +08:00
Lee Yarwood b9ff0ca94e compute: Do not allow rescue attempts using volume snapshot images
As seen in I7356b54bef0c614d9bfd1ed0d7b42574b58966f9 Nova is currently
unable to rescue instances using volume snapshot based images. This
currently results in zero length files being created on the compute as
the images are actually metadata stores and contain no image data.

This change adds a simple check within the compute API to reject
requests that provided an image reference that itself provides an
img_block_device_mapping before we cast out to the computes.

Depends-On: https://review.opendev.org/#/c/725812/
Closes-Bug: #1879500
Change-Id: I87253c518bd60a3e7cd08af68da9ade96f4a40db
2020-07-08 13:23:53 +00:00
Stephen Finucane 72cf37bca0 utils: Move 'get_bdm_image_metadata' to nova.block_device
The 'nova.block_device' module is essentially a catchall utils-like
module for all things BDM. The 'get_bdm_image_metadata' module, and
closely related 'get_image_metadata_from_volume' both fall into the
category of functions that belong here so move them. This allows us to
clean up tests and, crucially, avoid a circular reference seen when we
want to use proper type hints in the 'nova.virt.driver' module.

  nova.context imports...
  nova.utils, which imports...
  nova.block_device, which imports...
  nova.virt.driver, which tries to import...
  nova.context, causing a circular dependency

Change-Id: I48177d6e93f2ff132d26b53cd682fd24a43a4b31
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2020-07-08 11:56:01 +01:00
Dirk Mueller 4d58c0bb3d Switch from unittest2 compat methods to Python 3.x methods
With the removal of Python 2.x we can remove the unittest2 compat
wrappers and switch to assertCountEqual instead of assertItemsEqual

We have been able to use them since then, because
testtools required unittest2, which still included it. With testtools
removing Python 2.7 support [3][4], we will lose support for
assertItemsEqual, so we should switch to use assertCountEqual.

[1] - https://bugs.python.org/issue17866
[2] - https://hg.python.org/cpython/rev/d9921cb6e3cd
[3] - testing-cabal/testtools#286
[4] - testing-cabal/testtools#277^

Change-Id: Ied2227a482087f4a2dc4e2d9986f9b3b777aa821
2020-06-23 14:16:07 +02:00
Lee Yarwood 7776cc02a1 compute: Remove snapshot quiesce tests for STOPPED and SUSPENDED instances
At present only the libvirt virt driver implements any form of instance
quiescing and this is only done when the instance is ACTIVE. As a result
we can remove these two tests for STOPPED and SUSPENDED instances.

Change-Id: Ie9db52c31bfe5df70456342c3ec7f83c0e23487f
2020-05-15 13:59:49 +00:00
Lee Yarwood cfde53e4b4 compute: Allow snapshots to be created from PAUSED volume backed instances
Iabeb44f843c3c04f767c4103038fcf6c52966ff3 allowed snapshots to be
created from PAUSED non-volume backed instances but missed the volume
backed use case.

This change simply adds PAUSED to the list of acceptable vm_states when
creating a snapshot from a volume backed instance in addition to the
already supported ACTIVE, STOPPED and SUSPENDED vm_states.

Closes-Bug: #1878583
Change-Id: I9f95a054de9d43ecaa50ff7ffc9343490e212d53
2020-05-15 13:59:42 +00:00
Lee Yarwood 581b2694a5 compute: Extract _get_bdm_image_metadata into nova.utils
This function extracts image metadata from a given block_device_mapping
and is extremely useful outside of the compute API. This change simply
extracts the function into nova.utils alongside
get_image_metadata_from_volume and other similar functions.

Change-Id: I8ba52f0cd877cefc1f7d3c10d8a07a2a1c21cb34
2020-04-09 08:39:36 +01:00
Lee Yarwood 5b6f44efff compute: Report COMPUTE_RESCUE_BFV and check during rescue
This change introduces and checks the COMPUTE_RESCUE_BFV trait that was
introduced in os-traits 2.2.0 in the compute layer during an instance
rescue when the instance boots from a volume.

An additional kwarg ``allow_bfv_rescue`` flag is also added to the
signature of the rescue method within the compute API. This defaults to
False and will be used in a following change to indicate when the
request is using a high enough microversion to invoke this new
capability.

The ``supports_bfv_rescue`` capability tracked within the virt drivers
that this trait maps to is only added to the powervm driver for now due
to the way in which these capabilities are checked by the
``TestPowerVMDriver.test_driver_capabilities`` test.

Implements: blueprint virt-bfv-instance-rescue
Change-Id: Ic2ad1468d31b7707b7f8f2b845a9cf47d9d076d5
2020-04-09 08:39:35 +01:00
Sundar Nadathur 89dbd08976 Block unsupported instance operations with accelerators.
The block is applied to primary operations, such as pause
or shelve, but not to their reverse operations, like
unpause or unshelve, because that is not necessary.

Added functional tests for various instance operations,
including those that work and those that fail.
Rebuild functional test passes.

Change-Id: I016bc1812404ce1019c71b7a3363f34acc3f8aed
Blueprint: nova-cyborg-interaction
2020-03-31 00:24:00 -07:00
Zuul e78343dcff Merge "Delete ARQs for an instance when the instance is deleted." 2020-03-27 12:10:02 +00:00
Zuul 1dd760a25e Merge "nova-net: Remove unused parameters" 2020-03-26 21:46:16 +00:00
Sundar Nadathur a20aca7f5e Delete ARQs for an instance when the instance is deleted.
This patch series now works for many VM operations with libvirt:
* Creation, deletion of VM instances.
* Pause/unpause

The following works but is a no-op:
* Lock/unlock

Hard reboots are taken up in a later patch in this series.
Soft reboots work for accelerators unless some unrelated failure
forces a hard reboot in the libvirt driver.

Suspend is not supported yet. It would fail with this error:
   libvirtError: Requested operation is not valid:
   domain has assigned non-USB host devices

Shelve is not supported yet.
Live migration is not intended to be supported with accelerators now.

Change-Id: Icb95890d8f16cad1f7dc18487a48def2f7c9aec2
Blueprint: nova-cyborg-interaction
2020-03-24 22:44:18 -07:00
Sundar Nadathur 0c52730f6a Add Cyborg device profile groups to request spec.
Find the name of the device profile, if any, in flavor extra specs.
Get its profile groups (equiv to flavor request groups) from Cyborg.
Parse/validate them similar to extra_specs.
Generate RequestGroup objects and add them to the request spec
   (in requested_resources field, following precedent).

Change-Id: Icd2ee9024dd4af0a7eb105eca14df8e458e9de77
Blueprint: nova-cyborg-interaction
2020-03-21 12:03:37 -07:00
ericxiett 8dada6d0f6 Catch exception when use invalid architecture of image
Currently, when attempting to rebuild an instance with an image with invalid
architecture, the API raises a 500 error and leaves the instance stuck in
the REBUILDING task state.This patch adds checking image's architecture
before updating instance's task_state. And catches
exception.InvalidArchitectureName then returns HTTPBadRequest.

Change-Id: I25eff0271c856a8d3e83867b448e1dec6f6732ab
Closes-Bug: #1861749
2020-03-06 11:14:32 +08:00
Stephen Finucane 7e0d2547c1 nova-net: Remove unused parameters
We only care about neutron security groups now, so a lot of nova-network
only cruft can be removed. Do just that.

Change-Id: I2a360e766261a186f9edf6ceb47a786aea2957eb
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2020-02-18 14:07:58 +00:00
Stephen Finucane 5fc3b81fdf Remove 'nova.image.api' module
This doesn't exist for 'nova.volume' and no longer exists for
'nova.network'. There's only one image backend we support, so do like
we've done elsewhere and just use 'nova.image.glance'.

Change-Id: I7ca7d8a92dfbc7c8d0ee2f9e660eabaa7e220e2a
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2020-02-18 11:45:39 +00:00
Stephen Finucane c29f382f69 Recalculate 'RequestSpec.numa_topology' on resize
When resizing, it's possible to change the NUMA topology of an instance,
or remove it entirely, due to different extra specs in the new flavor.
Unfortunately we cache the instance's NUMA topology object in
'RequestSpec.numa_topology' and don't update it when resizing. This
means if a given host doesn't have enough free CPUs or mempages of the
size requested by the *old* flavor, that host can be rejected by the
filter.

Correct this by regenerating the 'RequestSpec.numa_topology' field as
part of the resize operation, ensuring that we revert to the old field
value in the case of a resize-revert.

Change-Id: I0ca50665b86b9fdb4618192d4d6a3bcaa6ea2291
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
Co-Authored-By: He Jie Xu <hejie.xu@intel.com>
Closes-bug: #1805767
2020-01-31 15:45:46 +00:00
Matt Riedemann 4921e822e7 Use COMPUTE_SAME_HOST_COLD_MIGRATE trait during migrate
This uses the COMPUTE_SAME_HOST_COLD_MIGRATE trait in the API during a
cold migration to filter out hosts that cannot support same-host cold
migration, which is all of them except for the hosts using the vCenter
driver.

For any nodes that do not report the trait, we won't know if they don't
because they don't support it or if they are not new enough to report
it, so the API has a service version check and will fallback to old
behavior using the config if the node is old. That compat code can be
removed in the next release.

As a result of this the FakeDriver capabilities are updated so the
FakeDriver no longer supports same-host cold migration and a new fake
driver is added to support that scenario for any tests that need it.

Change-Id: I7a4b951f3ab324c666ab924e6003d24cc8e539f5
Closes-Bug: #1748697
Related-Bug: #1811235
2020-01-29 09:44:47 +00:00
Stephen Finucane 110a683486 nova-net: Make the security group API a module
We're wrestling with multiple imports for this thing and have introduced
a cache to avoid having to load the thing repeatedly. However, Python
already has a way to ensure this doesn't happen: the use of a module.
Given that we don't have any state, we can straight up drop the class
and just call functions directly. Along the way, we drop the
'ensure_default' function, which is a no-op for neutron and switch all
the mocks over, where necessary.

Change-Id: Ia8dbe8ba61ec6d1b8498918a53a103a6eff4d488
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2019-11-29 11:17:06 +00:00
Stephen Finucane fadeedcdea nova-net: Remove layer of indirection in 'nova.network'
At some point in the past, there was only nova-network and its code
could be found in 'nova.network'. Neutron was added and eventually found
itself (mostly!) in the 'nova.network.neutronv2' submodule. With
nova-network now gone, we can remove one layer of indirection and move
the code from 'nova.network.neutronv2' back up to 'nova.network',
mirroring what we did with the old nova-volume code way back in 2012
[1]. To ensure people don't get nova-network and 'nova.network'
confused, 'neutron' is retained in filenames.

[1] https://review.opendev.org/#/c/14731/

Change-Id: I329f0fd589a4b2e0426485f09f6782f94275cc07
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2020-01-15 14:57:49 +00:00
Zuul 557b73d2a5 Merge "Ensure source service is up before resizing/migrating" 2020-01-07 05:39:02 +00:00
Zuul 64c9944978 Merge "Plumb graceful_exit through to EventReporter" 2020-01-05 23:20:05 +00:00
Zuul 1a46ab534f Merge "Add cross-cell resize policy rule and enable in API" 2019-12-24 00:19:55 +00:00
Zuul a0fc9a3e11 Merge "nova-net: Remove nova-network security group driver" 2019-12-23 16:08:29 +00:00
Zuul 47edb56eea Merge "nova-net: Convert remaining unit tests to neutron" 2019-12-23 15:41:43 +00:00
Matt Riedemann 24bf2aaa74 Plumb graceful_exit through to EventReporter
This adds a kwarg to wrap_instance_event to be used in the
EventReporter to allow the caller to tell EventReporter to
gracefully handle InstanceActionNotFound on __exit__.

This will be used by ComputeTaskManager.revert_snapshot_based_resize
which starts an action in the target cell DB but upon successful
exit of the RevertResizeTask the instance in the target cell DB
will be hard destroyed resulting in an InstanceActionNotFound
traceback which should be avoided.

Part of blueprint cross-cell-resize

Change-Id: Ie48a9c0a285f77e260f675fbe9282df9f02282b1
2019-12-23 10:10:57 -05:00
Matt Riedemann 6ebee92445 Add cross-cell resize policy rule and enable in API
This adds the "compute:servers:resize:cross_cell" policy
rule which is now used in the API to determine if a resize
or cold migrate operation can be performed across cells.

The check in the API is based on:

- The policy check passing for the request.
- The minimum nova-compute service version being high
  enough across all cells to perform a cross-cell resize.

If either of those conditions fail a traditional same-cell
resize will be performed.

A docs stub is added here and will be fleshed out in an
upcoming patch.

Implements blueprint cross-cell-resize

Change-Id: Ie8a0f79a3b16e02b7a34a1b81f547013a3d88996
2019-12-23 10:10:57 -05:00
Zuul a869f1c9d3 Merge "Support cross-cell moves in external_instance_event" 2019-12-23 09:59:18 +00:00
Zuul 02019d2660 Merge "FUP for in-place numa rebuild" 2019-12-20 11:41:55 +00:00
Sean Mooney f6060ab6b5 FUP for in-place numa rebuild
This patch addresses a number of typos and minor
issues raised during review of [1][2]. A summary
of the changes are corrections to typos in comments,
a correction to the exception message, an update to
the release note and the addition of debug logging.

[1] I0322d872bdff68936033a6f5a54e8296a6fb3434
[2] I48bccc4b9adcac3c7a3e42769c11fdeb8f6fd132

Related-Bug: #1804502
Related-Bug: #1763766

Change-Id: I8975e524cd5a9c7dfb065bb2dc8ceb03f1b89e7b
2019-12-19 16:11:44 -05:00
Matt Riedemann ea2ea492a3 Ensure source service is up before resizing/migrating
If the source compute service is down when a resize or
cold migrate is initiated the prep_resize cast from the
selected destination compute service to the source will
fail/hang. The API can validate the source compute service
is up or fail the operation with a 409 response if the
source service is down. Note that a host status of
"MAINTENANCE" means the service is up but disabled by
an administrator which is OK for resize/cold migrate.

The solution here works the validation into the
check_instance_host decorator which surprisingly isn't
used in more places where the source host is involved
like reboot, rebuild, snapshot, etc. This change just
handles the resize method but is done in such a way that
the check_instance_host decorator could be applied to
those other methods and perform the is-up check as well.
The decorator is made backward compatible by default.

Note that Instance._save_services is added because during
resize the Instance is updated and the services field
is set but not actually changed, but Instance.save()
handles object fields differently so we need to implement
the no-op _save_services method to avoid a failure.

Change-Id: I85423c7bcacff3bc465c22686d0675529d211b59
Closes-Bug: #1856925
2019-12-19 15:24:34 -05:00
Zuul a5b8217f5f Merge "Optimization for nova-api _checks_for_create_and_rebuild" 2019-12-18 18:28:05 +00:00
Stephen Finucane bf0d099f4b nova-net: Remove nova-network security group driver
This is another self-explanatory change. We remove the driver along with
related tests. Some additional API tests need to be fixed since these
were using the nova-network security group driver.

Change-Id: Ia05215b2e7168563c54b78263625125537b7234c
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2019-12-16 09:58:42 +00:00
Stephen Finucane 80e64186e6 nova-net: Convert remaining unit tests to neutron
Convert the remaining few unit test that aren't specific to nova-network
to "use neutron". In most cases, this simply means dropping unnecessary
'use_neutron=True' flag overrides, though there are some additional
things that need to be done.

Change-Id: I3d30fc9f823b02a1651646a01ad83b5c3e781325
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2019-12-16 09:58:42 +00:00
Matt Riedemann c2e315975c Support cross-cell moves in external_instance_event
The external_instance_event method in the API assumed
that all events for an instance would be routed to
hosts within the same cell the instance lives in, but
that is no longer the case when a cross-cell migration
is happening.

This changes the external_instance_event flow to check
if the instance is undergoing a cross-cell migration
and if so, gets the host mappings for the source and
dest host to properly target the context for the
cast to each compute in different cells.

Part of blueprint cross-cell-resize

Change-Id: Ie556b5837e7b45b314616fd4b19dc08c8193ed54
2019-12-12 12:45:06 -05:00
Matt Riedemann 386aa315a4 Confirm cross-cell resize from the API
This adds the logic to the API confirmResize
operation such that if the migration is a cross-cell
resize the API will RPC cast to conductor to
confirm the resize rather than directly to the source
compute service like a traditional resize. Conductor
will then orchestrate the confirm process between the
source and target cell.

Now that the API has confirmResize plumbed this change
builds on the cross-cell resize functional tests by
confirming the resize for the image-backed server test.
To make that extra fun, a volume is attached to the
server while it is in VERIFY_RESIZE status to assert it
remains attached to the instance in the target cell
after the resize is confirmed.

In addition, the FakeDriver.cleanup() method is updated
to guard against calling it before the guest is destroyed
from the "hypervisor". This is to make sure the cleanup()
method is called properly from
confirm_snapshot_based_resize_at_source() in the compute
service on the source host.

The _confirm_resize_on_deleting scenario will be
dealt with in a later change.

Part of blueprint cross-cell-resize

Change-Id: Ia5892e1d2cb7c7685e104466f83df7bb00b168c0
2019-12-12 12:00:29 -05:00
xulei 54f1056e98 Optimization for nova-api _checks_for_create_and_rebuild
When we boot a vm without metadata or inject_file, the param will be set to {} or [],
in this way do not need check.

Change-Id: Ib53fddbf2171aa018b69366817acc7aa2051d02a
Closes-Bug: #1855705
2019-12-10 14:39:10 +08:00
Sean Mooney 6f5358ac19 Block rebuild when NUMA topology changed
If the image change during a rebuild it's possible for the request
NUMA topology to change. As a rebuild uses a noop claim in the
resource tracker the NUMA topology will not be updated as part of
a rebuild.

If the NUMA constraints do not change, a rebuild will continue as normal.
If the new constraints conflict with the existing NUMA constraints of the
instance the rebuild will be rejected without altering the status of the
instance.

This change introduces an API check to block rebuild when the NUMA
requirements for the new image do not match the existing NUMA constraints.
This is in line with the previous check introduced to prevent the rebuild of
volume-backed instances which similarly are not supported.

This change adds functional tests to assert the expected behaviour of
rebuilding NUMA instances with new images. This change also asserts that
in place rebuilds of numa instances is currently not supported.

Closes-Bug: #1763766
Partial-implements: blueprint inplace-rebuild-of-numa-instances
Change-Id: I0322d872bdff68936033a6f5a54e8296a6fb3434
2019-12-05 23:20:52 +00:00
Zuul f28577a25a Merge "Filter duplicates from compute API get_migrations_sorted()" 2019-11-14 19:22:07 +00:00