Commit Graph

547 Commits

Author SHA1 Message Date
melanie witt 6f79d6321e Enforce quota usage from placement when unshelving
When [quota]count_usage_from_placement = true or
[quota]driver = nova.quota.UnifiedLimitsDriver, cores and ram quota
usage are counted from placement. When an instance is SHELVED_OFFLOADED,
it will not have allocations in placement, so its cores and ram should
not count against quota during that time.

This means however that when an instance is unshelved, there is a
possibility of going over quota if the cores and ram it needs were
allocated by some other instance(s) while it was SHELVED_OFFLOADED.

This fixes a bug where quota was not being properly enforced during
unshelve of a SHELVED_OFFLOADED instance when quota usage is counted
from placement. Test coverage is also added for the "recheck" quota
cases.

Closes-Bug: #2003991

Change-Id: I4ab97626c10052c7af9934a80ff8db9ddab82738
2023-05-23 01:02:05 +00:00
Zuul d443f8e4c4 Merge "Transport context to all threads" 2023-02-27 15:11:25 +00:00
Sahid Orentino Ferdjaoui 8c2e765989 compute: enhance compute evacuate instance to support target state
Related to the bp/allowing-target-state-for-evacuate. This change
is extending compute API to accept a new argument targetState.

The targetState argument when set will force state of an evacuated
instance to the destination host.

Signed-off-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@industrialdiscipline.com>
Change-Id: I9660d42937ad62d647afc6be965f166cc5631392
2023-01-31 11:29:01 +01:00
Balazs Gibizer b387401187 Support unshelve with PCI in placement
blueprint: pci-device-tracking-in-placement
Change-Id: I35ca3ae82be5dc345d80ad1857abb915c987d34e
2022-12-21 16:17:34 +01:00
Balazs Gibizer 53642766f8 Support evacuate with PCI in placement
blueprint: pci-device-tracking-in-placement
Change-Id: I1462ee4f4dd143b56732332f7ed00df00a9f2067
2022-12-21 16:17:34 +01:00
whoami-rajat 6919db5612 Add conductor RPC interface for rebuild
This patch adds support for passing the ``reimage_boot_volume``
flag from the API layer through the conductor layer to the
computer layer and also includes RPC bump as necessary.

Related blueprint volume-backed-server-rebuild

Change-Id: I8daf177eb67d08112a16fe788910644abf338fa6
2022-08-31 16:38:50 +05:30
Rajat Dhasmana 30aab9c234 Add support for volume backed server rebuild
This patch adds the plumbing for rebuilding a volume backed
instance in compute code. This functionality will be enabled
in a subsequent patch which adds a new microversion and the
external support for requesting it.

The flow of the operation is as follows:

1) Create an empty attachment
2) Detach the volume
3) Request cinder to reimage the volume
4) Wait for cinder to notify success to nova (via external events)
5) Update and complete the attachment

Related blueprint volume-backed-server-rebuild

Change-Id: I0d889691de1af6875603a9f0f174590229e7be18
2022-08-31 16:38:37 +05:30
Dan Smith 232684b440 Avoid n-cond startup abort for keystone failures
Conductor creates a placement client for the potential case where
it needs to make a call for certain operations. A transient network
or keystone failure will currently cause it to abort startup, which
means it is not available for other unrelated activities, such as
DB proxying for compute.

This makes conductor test the placement client on startup, but only
abort startup on errors that are highly likely to be permanent
configuration errors, and only warn about things like being unable
to contact keystone/placement during initialization. If a non-fatal
error is encountered at startup, later operations needing the
placement client will retry initialization.

Closes-Bug: #1846820
Change-Id: Idb7fcbce0c9562e7b9bd3e80f2a6d4b9bc286830
2022-08-18 07:37:42 -07:00
Dan Smith c178d93606 Unify placement client singleton implementations
We have many places where we implement singleton behavior for the
placement client. This unifies them into a single place and
implementation. Not only does this DRY things up, but may cause us
to initialize it fewer times and also allows for emitting a common
set of error messages about expected failures for better
troubleshooting.

Change-Id: Iab8a791f64323f996e1d6e6d5a7e7a7c34eb4fb3
Related-Bug: #1846820
2022-08-18 07:22:37 -07:00
Fabian Wiesel 646fc51732 Transport context to all threads
The nova.utils.spawn and spawn_n methods transport
the context (and profiling information) to the
newly created threads. But the same isn't done
when submitting work to thread-pools in the
ComputeManager.

The code doing that for spawn and spawn_n
is extracted to a new function
and called to submit the work to the thread-pools.

Closes-Bug: #1962574
Change-Id: I9085deaa8cf0b167d87db68e4afc4a463c00569c
2022-08-04 17:36:23 +05:30
René Ribaud a263fa46f8 Allow unshelve to a specific host (Compute API part)
This patch introduce changes to the compute API that will allow
PROJECT_ADMIN to unshelve an shelved offloaded server to a specific host.
This patch also supports the ability to unpin the availability_zone of an
instance that is bound to it.

Implements: blueprint unshelve-to-host
Change-Id: Ieb4766fdd88c469574fad823e05fe401537cdc30
2022-07-22 10:22:24 +02:00
John Garbutt 140b3b81f9 Enforce resource limits using oslo.limit
We now enforce limits on resources requested in the flavor.
This includes: instances, ram, cores. It also works for any resource
class being requested via the flavor chosen, such as custom resource
classes relating to Ironic resources.

Note because disk resources can be limited, we need to know if the
instance is boot from volume or not. This has meant adding extra code to
make sure we know that when enforcing the limits.

Follow on patches will update the APIs to accurately report the limits
being applied to instances, ram and cores.

blueprint unified-limits-nova

Change-Id: If1df93400dcbcb1d3aac0ade80ae5ecf6ce38d11
2022-02-24 16:21:03 +00:00
Zuul e81211318a Merge "Support move ops with extended resource request" 2021-08-31 21:38:24 +00:00
Matt Riedemann c09d98dadb Add force kwarg to delete_allocation_for_instance
This adds a force kwarg to delete_allocation_for_instance which
defaults to True because that was found to be the most common use case
by a significant margin during implementation of this patch.
In most cases, this method is called when we want to delete the
allocations because they should be gone, e.g. server delete, failed
build, or shelve offload. The alternative in these cases is the caller
could trap the conflict error and retry but we might as well just force
the delete in that case (it's cleaner).

When force=True, it will DELETE the consumer allocations rather than
GET and PUT with an empty allocations dict and the consumer generation
which can result in a 409 conflict from Placement. For example, bug
1836754 shows that in one tempest test that creates a server and then
immediately deletes it, we can hit a very tight window where the method
GETs the allocations and before it PUTs the empty allocations to remove
them, something changes which results in a conflict and the server
delete fails with a 409 error.

It's worth noting that delete_allocation_for_instance used to just
DELETE the allocations before Stein [1] when we started taking consumer
generations into account. There was also a related mailing list thread
[2].


Closes-Bug: #1836754

[1] I77f34788dd7ab8fdf60d668a4f76452e03cf9888
[2] http://lists.openstack.org/pipermail/openstack-dev/2018-August/133374.html

Change-Id: Ife3c7a5a95c5d707983ab33fd2fbfc1cfb72f676
2021-08-30 06:11:25 +00:00
Balazs Gibizer 191bdf2069 Support move ops with extended resource request
Nova re-generates the resource request of an instance for each server
move operation (migrate, resize, evacuate, live-migrate, unshelve) to
find (or validate) a target host for the instance move. This patch
extends the this logic to support the extended resource request from
neutron.

As the changes in the neutron interface code is called from nova-compute
service during the port binding the compute service version is bumped.
And a check is added to the compute-api to reject the move operations
with ports having extended resource request if there are old computes
in the cluster.

blueprint: qos-minimum-guaranteed-packet-rate
Change-Id: Ibcf703e254e720b9a6de17527325758676628d48
2021-08-27 17:59:18 +02:00
Yongli He e19fa1a199 smartnic support - cleanup arqs
delete arqs:
        -  delete arq while port unbind
	-  create ops failed and arqs did not bind to instance
	-  arq bind to instance but not bind to port

Implements: blueprint sriov-smartnic-support
Change-Id: Idab0ee38750d018de409699a0dbdff106d9e11fb
2021-08-05 15:58:34 +08:00
Yongli He b90c828d70 smartnic support - create arqs
create arqs for port with device profile:

    - On API stage, device profile is used to get schedule infomations.

    - After schedule instance come to a host, Conductor create ARQ and updates
      the ARQ binding info to Cyborg.

Implements: blueprint sriov-smartnic-support

Depends-On: https://review.opendev.org/c/openstack/neutron-lib/+/768324
Depends-On: https://review.opendev.org/q/topic:%22bug%252F1906602%22+
Depends-On: https://review.opendev.org/c/openstack/cyborg/+/758942

Change-Id: Idaf92c54df0f39d177d7acaabbfcf254ff5a4d0f
Co-Authored-By: Shaohe Feng <shaohe.feng@intel.com>
Co-Authored-By: Xinran Wang <xin-ran.wang@intel.com>
2021-08-05 15:58:29 +08:00
Stephen Finucane 7ab2947720 db: Remove 'nova.db.base' module
This made sense back in the day where the ORM was configurable and we
were making lots of direct calls to the database. Now, in a world where
most things happen via o.vo, it's just noise. Remove it.

Change-Id: I216cabcde5311abd46fdad9c95bb72c31b414010
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2021-06-16 10:10:29 +01:00
Stephen Finucane 1bf45c4720 Remove (almost) all references to 'instance_type'
This continues on from I81fec10535034f3a81d46713a6eda813f90561cf and
removes all other references to 'instance_type' where it's possible to
do so. The only things left are DB columns, o.vo fields, some
unversioned objects, and RPC API methods. If we want to remove these, we
can but it's a lot more work.

Change-Id: I264d6df1809d7283415e69a66a9153829b8df537
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2021-03-29 12:24:15 +01:00
Stephen Finucane e64744b92f rpc: Rework 'get_notifier', 'wrap_exception'
The 'nova.exception_wrapper.wrap_exception' decorator accepted either a
pre-configured notifier or a 'get_notifier' function, but the forget was
never provided and the latter was consistently a notifier created via a
call to 'nova.rpc.get_notifier'. Simplify things by passing the
arguments relied by 'get_notifier' into 'wrap_exception', allowing the
latter to create the former for us.

While doing this rework, it became obvious that 'get_notifier' accepted
a 'published_id' that is never provided nowadays, so that is dropped. In
addition, a number of calls to 'get_notifier' were passing in
'host=CONF.host', which duplicated the default value for this parameter
and is therefore unnecessary. Finally, the unit tests are split up by
file, as they should be.

Change-Id: I89e1c13e8a0df18594593b1e80c60d177e0d9c4c
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2021-03-01 11:06:48 +00:00
Sylvain Bauza 9e96f64126 Rename ensure_network_metadata to amend requested_networks
As we don't persist (fortunately) the requested networks when booting an
instance, we need a way to implement the value of the RequestSpec field
during any create or move operation so we would know in a later change
which port or network was asked.

Partially-Implements: blueprint routed-networks-scheduling

Change-Id: I0c7e32f6088a8fc1625a0655af824dee2df4a12c
2021-02-03 18:21:34 +01:00
zhangbailin 7fbd787b1b Cyborg shelve/unshelve support
This change extends the conductor manager to append the cyborg
resource request to the request spec when performing an unshelve.

On shelve offload an instance will be deleted the instance's ARQs
binding info to free up the bound ARQs in Cyborg service.
And this change passes the ARQs to spawn during unshelve an instance.

This change extends the ``shelve_instance``, ``shelve_offload_instance``
and ``unshelve_instance`` rpcapi function to carry the arq_uuids.

Co-Authored-By: Wenping Song <songwenping@inspur.com>

Implements: blueprint cyborg-shelve-and-unshelve
Change-Id: I258df4d77f6d86df1d867a8fe27360731c21d237
2021-01-15 03:21:17 +00:00
Takashi Natsume 383e2a8bdc Remove six.text_type (1/2)
Replace six.text_type with str.
A subsequent patch will replace other six.text_type.

Change-Id: I23bb9e539d08f5c6202909054c2dd49b6c7a7a0e
Implements: blueprint six-removal
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
2020-12-13 11:25:31 +00:00
Takashi Natsume 1d0a0e8c20 Remove six.moves
Replace the following items with Python 3 style code.

- six.moves.configparser
- six.moves.StringIO
- six.moves.cStringIO
- six.moves.urllib
- six.moves.builtins
- six.moves.range
- six.moves.xmlrpc_client
- six.moves.http_client
- six.moves.http_cookies
- six.moves.queue
- six.moves.zip
- six.moves.reload_module
- six.StringIO
- six.BytesIO

Subsequent patches will replace other six usages.

Change-Id: Ib2c406327fef2fb4868d8050fc476a7d17706e23
Implements: blueprint six-removal
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
2020-11-07 03:25:02 +00:00
Sean Mooney 1356ef5b57 Cyborg evacuate support
This change extends the conductor manager
to append the cyborg resource request to the
request spec when performing an evacuate.

This change passes the ARQs to spawn during rebuild
and evacuate. On evacuate the existing ARQs will be deleted
and new ARQs will be created and bound, during rebuild the
existing ARQs are reused.

This change extends the rebuild_instance compute rpcapi
function to carry the arq_uuids. This eliminates the
need to lookup the uuids associated with the arqs assinged
to the instance by quering cyborg.

Co-Authored-By: Wenping Song <songwenping@inspur.com>
Co-Authored-By: Brin Zhang <zhangbailin@inspur.com>

Implements: blueprint cyborg-rebuild-and-evacuate
Change-Id: I147bf4d95e6d86ff1f967a8ce37260730f21d236
2020-09-01 08:41:45 +00:00
Sundar Nadathur d94ea23d3d Delete ARQs by UUID if Cyborg ARQ bind fails.
During the reivew of the cyborg series it was noted that
in some cases ARQs could be leaked during binding.
See https://review.opendev.org/#/c/673735/46/nova/conductor/manager.py@1632

This change adds a delete_arqs_by_uuid function that can delete
unbound ARQs by instance uuid.

This change modifies build_instances and schedule_and_build_instances
to handel the AcceleratorRequestBindingFailed exception raised when
binding fails and clean up instance arqs.

Co-Authored-By: Wenping Song <songwenping@inspur.com>

Closes-Bug: #1872730
Change-Id: I86c2f00e2368fe02211175e7328b2cd9c0ebf41b
Blueprint: nova-cyborg-interaction
2020-07-23 15:26:07 +08:00
Stephen Finucane f203da3838 objects: Add MigrationTypeField
We use these things many places in the code and it would be good to have
constants to reference. Do just that.

Note that this results in a change in the object hash. However, there
are no actual changes in the output object so that's okay.

Change-Id: If02567ce0a3431dda5b2bf6d398bbf7cc954eed0
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2020-05-08 14:45:54 +01:00
Sundar Nadathur c433b1df42 Bump compute rpcapi version and reduce Cyborg calls.
The _get_bound_arq_resources() in the compute manager [1] calls Cyborg
up to 3 times: once to get the accelerator request (ARQ) UUIDs for the
instance, and then once or twice to get all ARQs with completed bindings.

The first call can be eliminated by passing the ARQs from the conductor
to the compute manager as an additional parameter in
build_and_run_instance(). This requires a bump in compute rpcapi version.

[1] https://review.opendev.org/#/c/631244/54/nova/compute/manager.py@2652

Blueprint: nova-cyborg-interaction

Change-Id: I26395d57bd4ba55276b7514baa808f9888639e11
2020-03-31 00:24:00 -07:00
Sundar Nadathur a20aca7f5e Delete ARQs for an instance when the instance is deleted.
This patch series now works for many VM operations with libvirt:
* Creation, deletion of VM instances.
* Pause/unpause

The following works but is a no-op:
* Lock/unlock

Hard reboots are taken up in a later patch in this series.
Soft reboots work for accelerators unless some unrelated failure
forces a hard reboot in the libvirt driver.

Suspend is not supported yet. It would fail with this error:
   libvirtError: Requested operation is not valid:
   domain has assigned non-USB host devices

Shelve is not supported yet.
Live migration is not intended to be supported with accelerators now.

Change-Id: Icb95890d8f16cad1f7dc18487a48def2f7c9aec2
Blueprint: nova-cyborg-interaction
2020-03-24 22:44:18 -07:00
Sundar Nadathur cc630b4eb6 Create and bind Cyborg ARQs.
* Call Cyborg with device profile name to get ARQs (Accelerator Requests).
  Each ARQ corresponds to a single device profile group, which
  corrresponds to a single request group in request spec.
* Match each ARQ to associated request group, and thereby obtain the
  corresponding RP for that ARQ.
* Call Cyborg to bind the ARQ to that host/device-RP.
* When Cyborg sends the ARQ bind notification events, wait for those
  events with a timeout.

Change-Id: I0f8b6bf2b4f4510da6c84fede532533602b6af7f
Blueprint: nova-cyborg-interaction
2020-03-21 12:03:38 -07:00
Balazs Gibizer 94c7e7ad43 Support unshelve with qos ports
This patch adds support for unshelving an offloaded server with qos ports.
To do that this patch:
* collects the port resource requests from neutron before the scheduler
  is called to select the target of the unshelve.
* calculate the request group - provider mapping after the scheduler
  selected the target host
* update the InstancePCIRequest to drive the pci_claim to allocate VFs
  from the same PF as the bandwidth is allocated from by the scheduler
* update the binding profile of the qos ports to so that the allocation
  key of the binding profile points to the RPs the port is allocated
  from.

As this was the last move operation to be supported the compute service
version is bumped to indicate such support. This will be used in a later
patches to implement a global service level check in the API.

Note that unshelve does not have a re-schedule loop and all the RPC
changes was committed in Queens.

Two error cases needs special care by rolling back allocations before
putting the instance back to SHELVED_OFFLOADED state:

* if the IntancePCIRequest cannot be updated according to the new target
host of unshelve
* if updating port binding fails in neutron during unshelve

Change-Id: I678722b3cf295c89110967d5ad8c0c964df4cb42
blueprint: support-move-ops-with-qos-ports-ussuri
2020-03-18 17:24:56 +01:00
Stephen Finucane 5fc3b81fdf Remove 'nova.image.api' module
This doesn't exist for 'nova.volume' and no longer exists for
'nova.network'. There's only one image backend we support, so do like
we've done elsewhere and just use 'nova.image.glance'.

Change-Id: I7ca7d8a92dfbc7c8d0ee2f9e660eabaa7e220e2a
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2020-02-18 11:45:39 +00:00
Stephen Finucane fadeedcdea nova-net: Remove layer of indirection in 'nova.network'
At some point in the past, there was only nova-network and its code
could be found in 'nova.network'. Neutron was added and eventually found
itself (mostly!) in the 'nova.network.neutronv2' submodule. With
nova-network now gone, we can remove one layer of indirection and move
the code from 'nova.network.neutronv2' back up to 'nova.network',
mirroring what we did with the old nova-volume code way back in 2012
[1]. To ensure people don't get nova-network and 'nova.network'
confused, 'neutron' is retained in filenames.

[1] https://review.opendev.org/#/c/14731/

Change-Id: I329f0fd589a4b2e0426485f09f6782f94275cc07
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2020-01-15 14:57:49 +00:00
Zuul 662138b6c1 Merge "Create instance action when burying in cell0" 2020-01-06 20:13:52 +00:00
Matt Riedemann 26d695876a Use graceful_exit=True in ComputeTaskManager.revert_snapshot_based_resize
This passes graceful_exit=True to the wrap_instance_event decorator
in ComputeTaskManager.revert_snapshot_based_resize so that upon successful
completion of the RevertResizeTask, when the instance is hard destroyed
from the target cell DB (used to create the action/event), a traceback
is not logged for the InstanceActionNotFound exception.

The same event is also finished in the source cell DB upon successful
completion of the RevertResizeTask. Note that there are other ways we
could have done this, e.g. moving the contents of the _execute() method
to another method and then putting that in an EventReporter context with
the source cell context/instance, but this was simpler.

Part of blueprint cross-cell-resize

Change-Id: Ibb32f7c19f5f2ec4811b165b8df748d1b7b0f9e4
2019-12-23 10:10:57 -05:00
Zuul d5a786f540 Merge "Remove now invalid cells v1 comments from conductor code" 2019-12-19 04:38:06 +00:00
Matt Riedemann f2608c9117 Create instance action when burying in cell0
Change I8742071b55f018f864f5a382de20075a5b444a79 in Ocata
moved the creation of the instance record from the API to
conductor. As a result, the "create" instance action was
only being created in conductor when the instance is created
in a non-cell0 database. This is a regression because before
that change when a server create would fail during scheduling
you could still list instance actions for the server and see
the "create" action but that was lost once we started burying
those instances in cell0.

This fixes the bug by creating the "create" action in the cell0
database when burying an instance there. It goes a step further
and also creates and finishes an event so the overall action
message shows up as "Error" with the details about where the
failure occurred in the event traceback.

A short release note is added since a new action event is
added here (conductor_schedule_and_build_instances) rather than
re-use some kind of event that we could generate from the
compute service, e.g. compute__do_build_and_run_instance.

Change-Id: I1e9431e739adfbcfc1ca34b87e826a516a4b18e2
Closes-Bug: #1852458
2019-12-12 14:30:35 -05:00
Matt Riedemann 74d18c412f Add revert_snapshot_based_resize conductor RPC method
This adds the conductor ComputeTaskManager method
revert_snapshot_based_resize along with the related conductor
RPC API client method which will be an RPC cast from the API
for a revertResize server action.

Part of blueprint cross-cell-resize

Change-Id: Ia6b6b25238963a5f60349267da6d07cb740982f4
2019-12-12 12:00:33 -05:00
Matt Riedemann 6f74bc1e98 Add confirm_snapshot_based_resize conductor RPC method
This adds the conductor ComputeTaskManager method
confirm_snapshot_based_resize along with the related conductor
RPC API client method which by default will be an RPC cast
from the API for a confirmResize server action but can also
be RPC called in the case of deleting a server in VERIFY_RESIZE
status.

Part of blueprint cross-cell-resize

Change-Id: If4c4b23891bfc340deb18a2f500510a472a869c9
2019-12-12 11:13:52 -05:00
Eric Fried 7daa3f59e2 Use provider mappings from Placement (mostly)
fill_provider_mapping is used from *most* code paths where it's
necessary to associate RequestSpec.request_groups with the resource
providers that are satisfying them. (Specifically, all the code paths
where we have a Selection object available. More about that below.)

Prior to Placement microversion 1.34, the only way to do this mapping
was by reproducing much of the logic from GET /allocation_candidates
locally to reverse engineer the associations. This was incomplete,
imperfect, inefficient, and ugly. That workaround was nested in the call
from fill_provider_mapping to fill_provider_mapping_based_on_allocation.

Placement microversion 1.34 enhanced GET /allocation_candidates to
return these mappings [1], and Nova started using 1.34 as of [2], so
this commit makes fill_provider_mapping bypass
fill_provider_mapping_based_on_allocations completely.

We would love to get rid of the entire hack, but
fill_provider_mapping_based_on_allocation is still used from
finish_revert_resize to restore port bindings on a reverted migration.
And when reverting a migration, we don't have allocation candidates with
mappings, only the original source allocations. It is left to a future
patch to figure out how to get around this, conceivably by saving the
original mappings in the migration context.

[1] https://docs.openstack.org/placement/train/specs/train/implemented/placement-resource-provider-request-group-mapping-in-allocation-candidates.html
[2] I52499ff6639c1a5815a8557b22dd33106dcc386b

Related to blueprint: placement-resource-provider-request-group-mapping-in-allocation-candidates
Change-Id: I45e0b2b73f88b86a20bc70ddf4f9bb97c8ea8312
2019-12-06 11:04:55 -06:00
Matt Riedemann 103b8c984f Remove now invalid cells v1 comments from conductor code
The build_instances method had a couple of comments about
cases where we used to be able to reach code when using cells v1
but since cells v1 is gone those comments are no longer valid
and removed here.

Change-Id: I702dbdbbb77811f8b0bd33af4ca2091aa54ff51c
2019-12-06 11:26:02 -05:00
Zuul f1382651dc Merge "Sanity check instance mapping during scheduling" 2019-11-28 23:25:35 +00:00
Zuul c06374862e Merge "Remove TODO from ComputeTaskManager._live_migrate" 2019-11-15 03:10:16 +00:00
Zuul 691db5b99b Merge "Restrict RequestSpec to cell when evacuating" 2019-11-14 01:10:33 +00:00
Zuul 5ca532ad00 Merge "cond: rename 'recreate' var to 'evacuate'" 2019-11-13 23:22:29 +00:00
Matt Riedemann 996a4bbbd9 Remove TODO from ComputeTaskManager._live_migrate
Back when this TODO was written [1] the wrap_instance_event
decorator was not on the migrate_server method that calls
_live_migrate and the other caller, live_migrate_instance,
did not exist (but also has the wrap_instance_event decorator).
Because _live_migrate re-raises and both callers are using the
wrap_instance_event decorator which will update the instance
action created in the API live_migrate method, the TODO is
essentially already resolved.

[1] I4120156db1499dfd3ed22095e528787eb73d33a6

Change-Id: I4d5276a2f8dc185fdead4592e1801411566558cb
2019-11-11 08:38:02 -05:00
Matt Riedemann b6133f8183 Remove TODOs around claim_resources_on_destination
The TODOs were added back in the Queens/Pike timeframe [1][2]
but at this point there probably isn't much value in resolving
those TODOs by adding a skip_filters kwarg to the scheduler
especially since [3] changed the method to not support nested
resource provider allocations so the minimal duplication with
what we do in the scheduler in the non-force evacuate/live migrate
cases is sufficient.

[1] Ie63a4798d420c39815e294843e02ab6473cfded2
[2] I6590f0eda4ec4996543ad40d8c2640b83fc3dd9d
[3] I7cbd5d9fb875ebf72995362e0b6693492ce32051

Change-Id: I3e599147f95337477c9573b517feee67e0ae37e4
2019-11-10 19:29:23 +00:00
Zuul 485d2894d2 Merge "Refresh instance in MigrationTask.execute Exception handler" 2019-11-07 20:54:21 +00:00
Matt Riedemann 462d0d813e Refresh instance in MigrationTask.execute Exception handler
Cross-cell resize is a synchronous operation from the MigrationTask
which means if something fails on the compute and sets the instance
to ERROR, the Exception block in _cold_migrate in conductor can
blindly reset the vm_state to ACTIVE because it is using a stale
copy of the instance.

This change refreshes the instance before calling _set_vm_state_and_notify
so we don't overwrite the current vm_state. This is not really a
problem for same-cell resize because prep_resize is an RPC cast to
the destination compute so any failures there will be persisted
separately from the Exception block in the _cold_migrate method.

Change-Id: I2254e36407c9f9ff674eaec44c115d7516421de3
2019-11-05 14:16:10 -05:00
Zuul 6bb7c51346 Merge "Allow evacuating server with port resource request" 2019-11-04 23:00:30 +00:00