Commit Graph

806 Commits

Author SHA1 Message Date
Sean Mooney f4852f4c81 [codespell] fix final typos and enable ci
This chnage adds the pre-commit config and
tox targets to run codespell both indepenetly
and via the pep8 target.

This change correct all the final typos in the
codebase as detected by codespell.

Change-Id: Ic4fb5b3a5559bc3c43aca0a39edc0885da58eaa2
2023-12-15 12:32:42 +00:00
Sean Mooney 7402822f0b [codespell] start fixing all the typos
this is the inital patch of applying codespell to nova.
codespell is a programing focused spellchecker that
looks for common typos and corrects them.

i am breaking this into multiple commits to make it simpler
to read and will automate the execution of codespell
at the end of the series.

Change-Id: If24a6c0a890f713545faa2d44b069c352655274e
2023-10-03 00:51:35 +01:00
melanie witt 6f79d6321e Enforce quota usage from placement when unshelving
When [quota]count_usage_from_placement = true or
[quota]driver = nova.quota.UnifiedLimitsDriver, cores and ram quota
usage are counted from placement. When an instance is SHELVED_OFFLOADED,
it will not have allocations in placement, so its cores and ram should
not count against quota during that time.

This means however that when an instance is unshelved, there is a
possibility of going over quota if the cores and ram it needs were
allocated by some other instance(s) while it was SHELVED_OFFLOADED.

This fixes a bug where quota was not being properly enforced during
unshelve of a SHELVED_OFFLOADED instance when quota usage is counted
from placement. Test coverage is also added for the "recheck" quota
cases.

Closes-Bug: #2003991

Change-Id: I4ab97626c10052c7af9934a80ff8db9ddab82738
2023-05-23 01:02:05 +00:00
Zuul d443f8e4c4 Merge "Transport context to all threads" 2023-02-27 15:11:25 +00:00
Sahid Orentino Ferdjaoui 8c2e765989 compute: enhance compute evacuate instance to support target state
Related to the bp/allowing-target-state-for-evacuate. This change
is extending compute API to accept a new argument targetState.

The targetState argument when set will force state of an evacuated
instance to the destination host.

Signed-off-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@industrialdiscipline.com>
Change-Id: I9660d42937ad62d647afc6be965f166cc5631392
2023-01-31 11:29:01 +01:00
Balazs Gibizer b387401187 Support unshelve with PCI in placement
blueprint: pci-device-tracking-in-placement
Change-Id: I35ca3ae82be5dc345d80ad1857abb915c987d34e
2022-12-21 16:17:34 +01:00
Balazs Gibizer 53642766f8 Support evacuate with PCI in placement
blueprint: pci-device-tracking-in-placement
Change-Id: I1462ee4f4dd143b56732332f7ed00df00a9f2067
2022-12-21 16:17:34 +01:00
Balazs Gibizer e667a7f8d8 Support cold migrate and resize with PCI tracking in placement
This patch adds support for cold migrate, and resize with PCI
devices when the placement tracking is enabled.

Same host resize, evacuate and unshelve will be supported by subsequent
patches. Live migration was not supported with flavor based PCI requests
before so it won't be supported now either.

blueprint: pci-device-tracking-in-placement
Change-Id: I8eec331ab3c30e5958ed19c173eff9998c1f41b0
2022-12-21 16:17:34 +01:00
Balazs Gibizer f86f1800f0 Store allocated RP in InstancePCIRequest
After the scheduler selected a target host and allocated an allocation
candidate that is passed the filters nova need to make sure that PCI
claim will allocate the real PCI devices from the RP which is allocated
in placement. Placement returns the request group - provider mapping for
each allocation candidate so nova can map which InstancePCIRequest was
fulfilled from which RP in the selected allocation candidate. This
mapping is then recorded in the InstancePCIRequest object and used
during the PCI claim to filter for PCI pools that can be used to claim
PCI devices from.

blueprint: pci-device-tracking-in-placement
Change-Id: I18bb31e23cc014411db68c31317ed983886d1a8e
2022-12-21 16:17:34 +01:00
whoami-rajat 6919db5612 Add conductor RPC interface for rebuild
This patch adds support for passing the ``reimage_boot_volume``
flag from the API layer through the conductor layer to the
computer layer and also includes RPC bump as necessary.

Related blueprint volume-backed-server-rebuild

Change-Id: I8daf177eb67d08112a16fe788910644abf338fa6
2022-08-31 16:38:50 +05:30
Rajat Dhasmana 30aab9c234 Add support for volume backed server rebuild
This patch adds the plumbing for rebuilding a volume backed
instance in compute code. This functionality will be enabled
in a subsequent patch which adds a new microversion and the
external support for requesting it.

The flow of the operation is as follows:

1) Create an empty attachment
2) Detach the volume
3) Request cinder to reimage the volume
4) Wait for cinder to notify success to nova (via external events)
5) Update and complete the attachment

Related blueprint volume-backed-server-rebuild

Change-Id: I0d889691de1af6875603a9f0f174590229e7be18
2022-08-31 16:38:37 +05:30
Dan Smith 232684b440 Avoid n-cond startup abort for keystone failures
Conductor creates a placement client for the potential case where
it needs to make a call for certain operations. A transient network
or keystone failure will currently cause it to abort startup, which
means it is not available for other unrelated activities, such as
DB proxying for compute.

This makes conductor test the placement client on startup, but only
abort startup on errors that are highly likely to be permanent
configuration errors, and only warn about things like being unable
to contact keystone/placement during initialization. If a non-fatal
error is encountered at startup, later operations needing the
placement client will retry initialization.

Closes-Bug: #1846820
Change-Id: Idb7fcbce0c9562e7b9bd3e80f2a6d4b9bc286830
2022-08-18 07:37:42 -07:00
Dan Smith c178d93606 Unify placement client singleton implementations
We have many places where we implement singleton behavior for the
placement client. This unifies them into a single place and
implementation. Not only does this DRY things up, but may cause us
to initialize it fewer times and also allows for emitting a common
set of error messages about expected failures for better
troubleshooting.

Change-Id: Iab8a791f64323f996e1d6e6d5a7e7a7c34eb4fb3
Related-Bug: #1846820
2022-08-18 07:22:37 -07:00
Fabian Wiesel 646fc51732 Transport context to all threads
The nova.utils.spawn and spawn_n methods transport
the context (and profiling information) to the
newly created threads. But the same isn't done
when submitting work to thread-pools in the
ComputeManager.

The code doing that for spawn and spawn_n
is extracted to a new function
and called to submit the work to the thread-pools.

Closes-Bug: #1962574
Change-Id: I9085deaa8cf0b167d87db68e4afc4a463c00569c
2022-08-04 17:36:23 +05:30
Zuul 0bea7f6b6b Merge "Add a workaround to skip hypervisor version check on LM" 2022-07-27 13:46:29 +00:00
Kashyap Chamarthy 00ed8a232b Add a workaround to skip hypervisor version check on LM
When turned on, this will disable the version-checking of hypervisors
during live-migration.  This can be useful for operators in certain
scenarios when upgrading.  E.g. if you want to relocate all instances
off a compute node due to an emergency hardware issue, and you only have
another old compute node ready at the time.

Note, though: libvirt will do its own internal compatibility checks, and
might still reject live migration if the destination is incompatible.

Closes-Bug: #1982853

Change-Id: Iec387dcbc49ddb91ebf5cfd188224eaf6021c0e1
Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
2022-07-27 12:20:03 +02:00
René Ribaud a263fa46f8 Allow unshelve to a specific host (Compute API part)
This patch introduce changes to the compute API that will allow
PROJECT_ADMIN to unshelve an shelved offloaded server to a specific host.
This patch also supports the ability to unpin the availability_zone of an
instance that is bound to it.

Implements: blueprint unshelve-to-host
Change-Id: Ieb4766fdd88c469574fad823e05fe401537cdc30
2022-07-22 10:22:24 +02:00
Rajesh Tailor 7824471b79 Remove return from rpc cast
This change removes return statement from rpc cast method calls.
As rpc cast are asynchronous, so doesn't return anything.

Change-Id: I766f64f2c086dd652bc28b338320cc94ccc48f1f
2022-06-18 16:23:26 +05:30
Rajesh Tailor 2521810e55 Fix typos
This change fixes some of the typos in unit tests as well
as in nova code-base.

Change-Id: I209bbb270baf889fcb2b9a4d1ce0ab4a962d0d0e
2022-05-30 17:40:00 +05:30
John Garbutt 140b3b81f9 Enforce resource limits using oslo.limit
We now enforce limits on resources requested in the flavor.
This includes: instances, ram, cores. It also works for any resource
class being requested via the flavor chosen, such as custom resource
classes relating to Ironic resources.

Note because disk resources can be limited, we need to know if the
instance is boot from volume or not. This has meant adding extra code to
make sure we know that when enforcing the limits.

Follow on patches will update the APIs to accurately report the limits
being applied to instances, ram and cores.

blueprint unified-limits-nova

Change-Id: If1df93400dcbcb1d3aac0ade80ae5ecf6ce38d11
2022-02-24 16:21:03 +00:00
Zuul f7fa3bf5fc Merge "neutron: Rework how we check for extensions" 2022-02-08 22:56:47 +00:00
Sean Mooney f3d48000b1 Add autopep8 to tox and pre-commit
autopep8 is a code formating tool that makes python code pep8
compliant without changing everything. Unlike black it will
not radically change all code and the primary change to the
existing codebase is adding a new line after class level doc strings.

This change adds a new tox autopep8 env to manually run it on your
code before you submit a patch, it also adds autopep8 to pre-commit
so if you use pre-commit it will do it for you automatically.

This change runs autopep8 in diff mode with --exit-code in the pep8
tox env so it will fail if autopep8 would modify your code if run
in in-place mode. This allows use to gate on autopep8 not modifying
patches that are submited. This will ensure authorship of patches is
maintianed.

The intent of this change is to save the large amount of time we spend
on ensuring style guidlines are followed automatically to make it
simpler for both new and old contibutors to work on nova and save
time and effort for all involved.

Change-Id: Idd618d634cc70ae8d58fab32f322e75bfabefb9d
2021-11-08 12:37:27 +00:00
Stephen Finucane 0f7f95b917 neutron: Rework how we check for extensions
There are couple of changes we can make here:

- Always attempt to refresh the cache before checking if an extension is
  enabled.
- Using extension slugs as our reference point rather than extension
  names. They seem like a better thing to use as a constant and are
  similarly fixed.
- Be consistent in how we name and call the extension check functions
- Add documentation for what each extension doing/used for

There's a TODO here to remove some code that relies on an out-of-tree
extension that I can't see. That's done separately since this is already
big enough.

Change-Id: I8058902df167239fa455396d3595a56bcf472b2b
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2021-09-02 12:10:04 +01:00
Zuul e81211318a Merge "Support move ops with extended resource request" 2021-08-31 21:38:24 +00:00
Matt Riedemann c09d98dadb Add force kwarg to delete_allocation_for_instance
This adds a force kwarg to delete_allocation_for_instance which
defaults to True because that was found to be the most common use case
by a significant margin during implementation of this patch.
In most cases, this method is called when we want to delete the
allocations because they should be gone, e.g. server delete, failed
build, or shelve offload. The alternative in these cases is the caller
could trap the conflict error and retry but we might as well just force
the delete in that case (it's cleaner).

When force=True, it will DELETE the consumer allocations rather than
GET and PUT with an empty allocations dict and the consumer generation
which can result in a 409 conflict from Placement. For example, bug
1836754 shows that in one tempest test that creates a server and then
immediately deletes it, we can hit a very tight window where the method
GETs the allocations and before it PUTs the empty allocations to remove
them, something changes which results in a conflict and the server
delete fails with a 409 error.

It's worth noting that delete_allocation_for_instance used to just
DELETE the allocations before Stein [1] when we started taking consumer
generations into account. There was also a related mailing list thread
[2].


Closes-Bug: #1836754

[1] I77f34788dd7ab8fdf60d668a4f76452e03cf9888
[2] http://lists.openstack.org/pipermail/openstack-dev/2018-August/133374.html

Change-Id: Ife3c7a5a95c5d707983ab33fd2fbfc1cfb72f676
2021-08-30 06:11:25 +00:00
Balazs Gibizer 191bdf2069 Support move ops with extended resource request
Nova re-generates the resource request of an instance for each server
move operation (migrate, resize, evacuate, live-migrate, unshelve) to
find (or validate) a target host for the instance move. This patch
extends the this logic to support the extended resource request from
neutron.

As the changes in the neutron interface code is called from nova-compute
service during the port binding the compute service version is bumped.
And a check is added to the compute-api to reject the move operations
with ports having extended resource request if there are old computes
in the cluster.

blueprint: qos-minimum-guaranteed-packet-rate
Change-Id: Ibcf703e254e720b9a6de17527325758676628d48
2021-08-27 17:59:18 +02:00
Zuul 00454f6279 Merge "scheduler: Merge 'FilterScheduler' into base class" 2021-08-20 17:17:37 +00:00
Yongli He e19fa1a199 smartnic support - cleanup arqs
delete arqs:
        -  delete arq while port unbind
	-  create ops failed and arqs did not bind to instance
	-  arq bind to instance but not bind to port

Implements: blueprint sriov-smartnic-support
Change-Id: Idab0ee38750d018de409699a0dbdff106d9e11fb
2021-08-05 15:58:34 +08:00
Yongli He b90c828d70 smartnic support - create arqs
create arqs for port with device profile:

    - On API stage, device profile is used to get schedule infomations.

    - After schedule instance come to a host, Conductor create ARQ and updates
      the ARQ binding info to Cyborg.

Implements: blueprint sriov-smartnic-support

Depends-On: https://review.opendev.org/c/openstack/neutron-lib/+/768324
Depends-On: https://review.opendev.org/q/topic:%22bug%252F1906602%22+
Depends-On: https://review.opendev.org/c/openstack/cyborg/+/758942

Change-Id: Idaf92c54df0f39d177d7acaabbfcf254ff5a4d0f
Co-Authored-By: Shaohe Feng <shaohe.feng@intel.com>
Co-Authored-By: Xinran Wang <xin-ran.wang@intel.com>
2021-08-05 15:58:29 +08:00
Stephen Finucane e0534cc289 scheduler: Merge 'FilterScheduler' into base class
There are no longer any custom filters. We don't need the abstract base
class. Merge the code in and give it a more useful 'SchedulerDriver'
name.

Change-Id: Id08dafa72d617ca85e66d50b3c91045e0e8723d0
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2021-06-29 12:24:41 +01:00
Stephen Finucane 7ab2947720 db: Remove 'nova.db.base' module
This made sense back in the day where the ORM was configurable and we
were making lots of direct calls to the database. Now, in a world where
most things happen via o.vo, it's just noise. Remove it.

Change-Id: I216cabcde5311abd46fdad9c95bb72c31b414010
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2021-06-16 10:10:29 +01:00
Stephen Finucane 1bf45c4720 Remove (almost) all references to 'instance_type'
This continues on from I81fec10535034f3a81d46713a6eda813f90561cf and
removes all other references to 'instance_type' where it's possible to
do so. The only things left are DB columns, o.vo fields, some
unversioned objects, and RPC API methods. If we want to remove these, we
can but it's a lot more work.

Change-Id: I264d6df1809d7283415e69a66a9153829b8df537
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2021-03-29 12:24:15 +01:00
Stephen Finucane e64744b92f rpc: Rework 'get_notifier', 'wrap_exception'
The 'nova.exception_wrapper.wrap_exception' decorator accepted either a
pre-configured notifier or a 'get_notifier' function, but the forget was
never provided and the latter was consistently a notifier created via a
call to 'nova.rpc.get_notifier'. Simplify things by passing the
arguments relied by 'get_notifier' into 'wrap_exception', allowing the
latter to create the former for us.

While doing this rework, it became obvious that 'get_notifier' accepted
a 'published_id' that is never provided nowadays, so that is dropped. In
addition, a number of calls to 'get_notifier' were passing in
'host=CONF.host', which duplicated the default value for this parameter
and is therefore unnecessary. Finally, the unit tests are split up by
file, as they should be.

Change-Id: I89e1c13e8a0df18594593b1e80c60d177e0d9c4c
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2021-03-01 11:06:48 +00:00
Sylvain Bauza 9e96f64126 Rename ensure_network_metadata to amend requested_networks
As we don't persist (fortunately) the requested networks when booting an
instance, we need a way to implement the value of the RequestSpec field
during any create or move operation so we would know in a later change
which port or network was asked.

Partially-Implements: blueprint routed-networks-scheduling

Change-Id: I0c7e32f6088a8fc1625a0655af824dee2df4a12c
2021-02-03 18:21:34 +01:00
Balazs Gibizer be9dd3d9db Refactor update_pci_request_spec_with_allocated_interface_name
Make update_pci_request_spec_with_allocated_interface_name only depend
on a list of IntancePCIRequest o.vos instead of a whole Instance object.
This will come in handy for the qos interface attach case where we only
need to make the changes on the Instance o.vo after we are sure that
the both the resource allocation and the pci claim is succeeded for the
request.

Change-Id: I5a6c6d3eed61895b00f9e9c3fb3b5d09d6786e9c
blueprint: support-interface-attach-with-qos-ports
2021-01-18 15:40:42 +01:00
zhangbailin 7fbd787b1b Cyborg shelve/unshelve support
This change extends the conductor manager to append the cyborg
resource request to the request spec when performing an unshelve.

On shelve offload an instance will be deleted the instance's ARQs
binding info to free up the bound ARQs in Cyborg service.
And this change passes the ARQs to spawn during unshelve an instance.

This change extends the ``shelve_instance``, ``shelve_offload_instance``
and ``unshelve_instance`` rpcapi function to carry the arq_uuids.

Co-Authored-By: Wenping Song <songwenping@inspur.com>

Implements: blueprint cyborg-shelve-and-unshelve
Change-Id: I258df4d77f6d86df1d867a8fe27360731c21d237
2021-01-15 03:21:17 +00:00
Takashi Natsume 383e2a8bdc Remove six.text_type (1/2)
Replace six.text_type with str.
A subsequent patch will replace other six.text_type.

Change-Id: I23bb9e539d08f5c6202909054c2dd49b6c7a7a0e
Implements: blueprint six-removal
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
2020-12-13 11:25:31 +00:00
Zuul 43b8df3ae8 Merge "Remove compute service level check for qos ops" 2020-11-15 08:19:55 +00:00
Balazs Gibizer c163205489 Remove compute service level check for qos ops
To support move operations with qos ports both the source and the
destination compute hosts need to be on Ussuri level. We have service
level checks implemented in Ussuri. In Victoria we could remove those
checks as nova only supports compatibility between N and N-1 computes.
But we kept them there just for extra safety. In the meanwhile we
codified [1] the rule that nova does not support N-2 computes any
more. So in Wallaby we can assume that the oldest compute is already
on Victoria (Ussuri would be enough too).

So this patch removes the unnecessary service level checks and related
test cases.

[1] Ie15ec8299ae52ae8f5334d591ed3944e9585cf71

Change-Id: I14177e35b9d6d27d49e092604bf0f288cd05f57e
2020-11-09 16:13:51 +01:00
Takashi Natsume 1d0a0e8c20 Remove six.moves
Replace the following items with Python 3 style code.

- six.moves.configparser
- six.moves.StringIO
- six.moves.cStringIO
- six.moves.urllib
- six.moves.builtins
- six.moves.range
- six.moves.xmlrpc_client
- six.moves.http_client
- six.moves.http_cookies
- six.moves.queue
- six.moves.zip
- six.moves.reload_module
- six.StringIO
- six.BytesIO

Subsequent patches will replace other six usages.

Change-Id: Ib2c406327fef2fb4868d8050fc476a7d17706e23
Implements: blueprint six-removal
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
2020-11-07 03:25:02 +00:00
Stephen Finucane cc8b300f67 conductor: Don't use setattr
setattr kills discoverability, making it hard to figure out who's
setting various fields. Don't do it.

While we're here, we drop legacy compat handlers for pre-Train
compute nodes.

Change-Id: Ie694a80e89f99c8d3e326eebb4590d93c0ebf671
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2020-09-14 16:47:15 +01:00
Zuul 65631f257b Merge "Move confirm resize under semaphore" 2020-09-10 18:49:19 +00:00
Stephen Finucane b2fbaa8767 Set 'old_flavor', 'new_flavor' on source before resize
Cross-cell resize is confusing. We need to set this information ahead of
time.

Change-Id: I5a403c072b9f03074882b552e1925f22cb5b15b6
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Partial-Bug: #1879878
2020-09-08 09:58:21 +01:00
Stephen Finucane a57800d382 Move confirm resize under semaphore
The 'ResourceTracker.update_available_resource' periodic task builds
usage information for the current host by inspecting instances and
in-progress migrations, combining the two. Specifically, it finds all
instances that are not in the 'DELETED' or 'SHELVED_OFFLOADED' state,
calculates the usage from these, then finds all in-progress migrations
for the host that don't have an associated instance (to prevent double
accounting) and includes the usage for these.

In addition to the periodic task, the 'ResourceTracker' class has a
number of helper functions to make or drop claims for the inventory
generated by the 'update_available_resource' periodic task as part of
the various instance operations. These helpers naturally assume that
when making a claim for a particular instance or migration, there
shouldn't already be resources allocated for same. Conversely, when
dropping claims, the resources should currently be allocated. However,
the check for *active* instances and *in-progress* migrations in the
periodic task means we have to be careful in how we make changes to a
given instance or migration record. Running the periodic task between
such an operation and an attempt to make or drop a claim can result in
TOCTOU-like races.

This generally isn't an issue: we use the 'COMPUTE_RESOURCE_SEMAPHORE'
semaphore to prevent the periodic task running while we're claiming
resources in helpers like 'ResourceTracker.instance_claim' and we make
our changes to the instances and migrations within this context. There
is one exception though: the 'drop_move_claim' helper. This function is
used when dropping a claim for either a cold migration, a resize or a
live migration, and will drop usage from either the source host (based
on the "old" flavor) for a resize confirm or the destination host (based
on the "new" flavor) for a resize revert or live migration rollback.
Unfortunately, while the function itself is wrapped in the semaphore, no
changes to the state or the instance or migration in question are
protected by it.

Consider the confirm resize case, which we're addressing here. If we
mark the migration as 'confirmed' before running 'drop_move_claim', then
the periodic task running between these steps will not account for the
usage on the source since the migration is allegedly 'confirmed'. The
call to 'drop_move_claim' will then result in the tracker dropping usage
that we're no longer accounting for. This "set migration status before
dropping usage" is the current behaviour for both same-cell and
cross-cell resize, via the 'ComputeManager.confirm_resize' and
'ComputeManager.confirm_snapshot_based_resize_at_source' functions,
respectively. We could reverse those calls and run 'drop_move_claim'
before marking the migration as 'confirmed', but while our usage will be
momentarily correct, the periodic task running between these steps will
re-add the usage we just dropped since the migration isn't yet
'confirmed'. The correct solution is to close this gap between setting
the migration status and dropping the move claim to zero. We do this by
putting both operations behind the 'COMPUTE_RESOURCE_SEMAPHORE', just
like the claim operations.

Change-Id: I26b050c402f5721fc490126e9becb643af9279b4
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Partial-Bug: #1879878
2020-09-03 08:55:47 +00:00
Sean Mooney 1356ef5b57 Cyborg evacuate support
This change extends the conductor manager
to append the cyborg resource request to the
request spec when performing an evacuate.

This change passes the ARQs to spawn during rebuild
and evacuate. On evacuate the existing ARQs will be deleted
and new ARQs will be created and bound, during rebuild the
existing ARQs are reused.

This change extends the rebuild_instance compute rpcapi
function to carry the arq_uuids. This eliminates the
need to lookup the uuids associated with the arqs assinged
to the instance by quering cyborg.

Co-Authored-By: Wenping Song <songwenping@inspur.com>
Co-Authored-By: Brin Zhang <zhangbailin@inspur.com>

Implements: blueprint cyborg-rebuild-and-evacuate
Change-Id: I147bf4d95e6d86ff1f967a8ce37260730f21d236
2020-09-01 08:41:45 +00:00
Takashi Natsume 5191b4f2f0 Remove six.add_metaclass
Replace six.add_metaclass with Python 3 style code.

Change-Id: Ifc3f2bcb8fcdd2b555864bd4e22a973a7858c272
Implements: blueprint six-removal
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
2020-08-15 07:45:39 +00:00
Sundar Nadathur d94ea23d3d Delete ARQs by UUID if Cyborg ARQ bind fails.
During the reivew of the cyborg series it was noted that
in some cases ARQs could be leaked during binding.
See https://review.opendev.org/#/c/673735/46/nova/conductor/manager.py@1632

This change adds a delete_arqs_by_uuid function that can delete
unbound ARQs by instance uuid.

This change modifies build_instances and schedule_and_build_instances
to handel the AcceleratorRequestBindingFailed exception raised when
binding fails and clean up instance arqs.

Co-Authored-By: Wenping Song <songwenping@inspur.com>

Closes-Bug: #1872730
Change-Id: I86c2f00e2368fe02211175e7328b2cd9c0ebf41b
Blueprint: nova-cyborg-interaction
2020-07-23 15:26:07 +08:00
Stephen Finucane f203da3838 objects: Add MigrationTypeField
We use these things many places in the code and it would be good to have
constants to reference. Do just that.

Note that this results in a change in the object hash. However, there
are no actual changes in the output object so that's okay.

Change-Id: If02567ce0a3431dda5b2bf6d398bbf7cc954eed0
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2020-05-08 14:45:54 +01:00
LuyaoZhong 4bd5af66b5 Support live migration with vpmem
1. Check if the cluster supports live migration with vpmem
2. On source host we generate new dest xml with vpmem info stored in
   migration_context.new_resources.
3. If there are vpmems, cleanup them on host/destination when live
   migration succeeds/fails

Change-Id: I5c346e690148678a2f0dc63f4f516a944c3db8cd
Implements: blueprint support-live-migration-with-virtual-persistent-memory
2020-04-07 13:13:13 +00:00
LuyaoZhong 990a26ef1f partial support for live migration with specific resources
1. Claim allocations from placement first, then claim specific
   resources in Resource Tracker on destination to populate
   migration_context.new_resources
3. cleanup specific resources when live migration succeeds/fails

Because we store specific resources in migration_context during
live migration, to ensure cleanup correctly we can't drop
migration_context before cleanup is complete:
 a) when post live migration, we move source host cleanup before
    destination cleanup(post_live_migration_at_destination will
    apply migration_context and drop it)
 b) when rollback live migration, we drop migration_context after
    rollback operations are complete

For different specific resource, we might need driver specific support,
such as vpmem. This change just ensures that new claimed specific
resources are populated to migration_context and migration_context is not
droped before cleanup is complete.

Change-Id: I44ad826f0edb39d770bb3201c675dff78154cbf3
Implements: blueprint support-live-migration-with-virtual-persistent-memory
2020-04-07 13:12:53 +00:00