this is the inital patch of applying codespell to nova.
codespell is a programing focused spellchecker that
looks for common typos and corrects them.
i am breaking this into multiple commits to make it simpler
to read and will automate the execution of codespell
at the end of the series.
Change-Id: If24a6c0a890f713545faa2d44b069c352655274e
This patch adds support for cold migrate, and resize with PCI
devices when the placement tracking is enabled.
Same host resize, evacuate and unshelve will be supported by subsequent
patches. Live migration was not supported with flavor based PCI requests
before so it won't be supported now either.
blueprint: pci-device-tracking-in-placement
Change-Id: I8eec331ab3c30e5958ed19c173eff9998c1f41b0
We have many places where we implement singleton behavior for the
placement client. This unifies them into a single place and
implementation. Not only does this DRY things up, but may cause us
to initialize it fewer times and also allows for emitting a common
set of error messages about expected failures for better
troubleshooting.
Change-Id: Iab8a791f64323f996e1d6e6d5a7e7a7c34eb4fb3
Related-Bug: #1846820
Nova re-generates the resource request of an instance for each server
move operation (migrate, resize, evacuate, live-migrate, unshelve) to
find (or validate) a target host for the instance move. This patch
extends the this logic to support the extended resource request from
neutron.
As the changes in the neutron interface code is called from nova-compute
service during the port binding the compute service version is bumped.
And a check is added to the compute-api to reject the move operations
with ports having extended resource request if there are old computes
in the cluster.
blueprint: qos-minimum-guaranteed-packet-rate
Change-Id: Ibcf703e254e720b9a6de17527325758676628d48
There are no longer any custom filters. We don't need the abstract base
class. Merge the code in and give it a more useful 'SchedulerDriver'
name.
Change-Id: Id08dafa72d617ca85e66d50b3c91045e0e8723d0
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
As we don't persist (fortunately) the requested networks when booting an
instance, we need a way to implement the value of the RequestSpec field
during any create or move operation so we would know in a later change
which port or network was asked.
Partially-Implements: blueprint routed-networks-scheduling
Change-Id: I0c7e32f6088a8fc1625a0655af824dee2df4a12c
To support move operations with qos ports both the source and the
destination compute hosts need to be on Ussuri level. We have service
level checks implemented in Ussuri. In Victoria we could remove those
checks as nova only supports compatibility between N and N-1 computes.
But we kept them there just for extra safety. In the meanwhile we
codified [1] the rule that nova does not support N-2 computes any
more. So in Wallaby we can assume that the oldest compute is already
on Victoria (Ussuri would be enough too).
So this patch removes the unnecessary service level checks and related
test cases.
[1] Ie15ec8299ae52ae8f5334d591ed3944e9585cf71
Change-Id: I14177e35b9d6d27d49e092604bf0f288cd05f57e
The legacy_props variable comes from to_legacy_filter_properties_dict
and that method does not put a context in the dict so the line being
removed was doing nothing.
Change-Id: Ie8a1a6b29b4cc03bb972295b681b4c7251fa680a
fill_provider_mapping is used from *most* code paths where it's
necessary to associate RequestSpec.request_groups with the resource
providers that are satisfying them. (Specifically, all the code paths
where we have a Selection object available. More about that below.)
Prior to Placement microversion 1.34, the only way to do this mapping
was by reproducing much of the logic from GET /allocation_candidates
locally to reverse engineer the associations. This was incomplete,
imperfect, inefficient, and ugly. That workaround was nested in the call
from fill_provider_mapping to fill_provider_mapping_based_on_allocation.
Placement microversion 1.34 enhanced GET /allocation_candidates to
return these mappings [1], and Nova started using 1.34 as of [2], so
this commit makes fill_provider_mapping bypass
fill_provider_mapping_based_on_allocations completely.
We would love to get rid of the entire hack, but
fill_provider_mapping_based_on_allocation is still used from
finish_revert_resize to restore port bindings on a reverted migration.
And when reverting a migration, we don't have allocation candidates with
mappings, only the original source allocations. It is left to a future
patch to figure out how to get around this, conceivably by saving the
original mappings in the migration context.
[1] https://docs.openstack.org/placement/train/specs/train/implemented/placement-resource-provider-request-group-mapping-in-allocation-candidates.html
[2] I52499ff6639c1a5815a8557b22dd33106dcc386b
Related to blueprint: placement-resource-provider-request-group-mapping-in-allocation-candidates
Change-Id: I45e0b2b73f88b86a20bc70ddf4f9bb97c8ea8312
Because of change If7ea02df42d220c5042947efdef4777509492a0b,
when cold migrating with a target host, the cell on the
requested destination will be used to lookup that host
(compute node) to set the in_tree attribute on the
RequestGroup. When doing a targeted cold migration across
cells, the target host could be in another cell from the
Destination.cell (which is the source cell per the
MigrationTask._restrict_request_spec_to_cell method).
In this scenario, we aren't "preferring" the source cell
necessarily if the target host is in another cell.
This change simply removes the Destination.cell attribute
during a cold migration with a target host so that the
scheduler will look in all enabled cells for the host.
Alternatively, we could lookup the HostMapping for the
requested target host and set the Destination.cell attribute
appropriately but HostManager.get_compute_nodes_by_host_or_node
is going to do that for us anyway.
This also implements one of the TODO functional tests for
cross-cell migrate by targeting a specific host in another
cell for the cold migration. The functional test will fail
without the fix in conductor.
Part of blueprint cross-cell-resize
Change-Id: I2f61a4513135c9b514d938ca18c2c9c87f24403a
This adds the code to check if a cross-cell move is
allowed by the RequestSpec, which is set in the API,
and if so, checks to see if the scheduler selected
a host in another cell. If so, then CrossCellMigrationTask
is executed.
Note the _restrict_request_spec_to_cell method gets
renamed and logging is adjusted since we may not be
restricting the scheduler to hosts in the same cell
if cross-cell moves are allowed.
Part of blueprint cross-cell-resize
Change-Id: Ibaa31fc8079c16e169472372c05dc48ff0602515
This modifies the TaskBase.rollback method to pass the
handled exception through so implementations of the
rollback method can use the exception for things like
adding an instance fault, notifications, etc.
Change-Id: Ib31691dca7b1374512fe5e577e46c9e3e223d608
Since Ie991d4b53e9bb5e7ec26da99219178ab7695abf6 move_allocation
handles more than one resource provider, but still does not handle
sharing providers. This patch refines the code comments accordingly.
Change-Id: I7884361e32a8c9765256c0a9b16e54e3f9a82084
This builds on change Ia50c5f4dd2204f1cafa669097d1e744479c4d8c8
to use the Selection.availability_zone value when rescheduling
during a resize or cold migrate so that the cell conductor does not
have to make an up-call to the aggregates table in the API DB
which will fail if the cell conductor is not configured to use
the API DB.
The functional test added in Ic6926eecda1f9dd7183d66c67f04f308f6a1799d
is updated to show the failure is gone and we get the AZ from the
Selection object during the reschedule.
For the case that the availability_zone field is not in the Selection
object, there are existing unit tests in
nova.tests.unit.conductor.tasks.test_migrate which will make sure we
are not unconditionally trying to access the Selection.availability_zone
field.
Change-Id: I103d5023d3a3a7c367c7eea7fb103cb8ec52accf
Closes-Bug: #1781286
Resetting was in place but it was done after the retry filter is
populated in the MigrationTask by the populate_retry call. This
patch moves the reset code before the call to populate_retry as
to allow retries.
Change-Id: I8290e890a36cf5a8f409ab8a50e7c72f7ae15025
Closed-Bug: #1845291
The MigrationTask in the conductor already checks the service version as
old computes cannot support migration with QoS port. However it is still
possible that every compute is new but the compute RPC API is pinned to
< 5.2. In this case the migration still cannot be supported.
This patch adds an extra RPC version check to the conductor.
Change-Id: Ib4e0b9ab050a59ab5a290e6eecea01b87c3bd4c6
Closes-Bug: #1844993
During resize and cold migrate the dest compute service needs to update
the port binding based on the re-calculated port - resource provider mapping.
This update happens in finish_resize.
To do that the dest compute service needs to be at least on service level
39.
The calculation is based on the RequestSpec. The RequestSpec is sent
to the dest compute in pre_resize but the dest compute only sends it to the
source compute in resize_instance if the compute rpc api version is at least
5.2. Also the source compute only sends the RequestSpec to the dest
compute in the finish_resize if the rpc api version is at least 5.2. So
migration with bandwidth only works if both compute talks at least 5.2
which means that the min service level is at least 39.
Change-Id: Ia500b105b9ec70c0d8bd38faa084270b825476eb
blueprint: support-move-ops-with-qos-ports
This patch pulls out two functions from MigrationTask._execute. One for
the initial schedule case and the other for the re-schedule case. This
way unit testing some of the intricate parts of re-schedule becomes
easier.
Change-Id: I03b1eb1bd1a996081d1546134baa872c6166c5bb
Nova intentionally does not persist the resoruce request of the neturon
ports. Therefore during migration nova needs to query neturon about the
resource requests to include them to the allocation_candidates query
sent to placement during scheduling. Also when the allocation is made by
the scheduler nova needs to re-calculate request group - resource
provider mapping. A subsequent patch will use this mapping to update the
binding profile when the port is bound to the destination host of the
migration.
blueprint: support-move-ops-with-qos-ports
Change-Id: I8e5a0480c81ba548bc1f50a8098eabac52b11453
During migration conductor needs to heal the content of the
RequestSpec.requested_resources field based on the resource requests of
the ports attached to the instance being migrated.
This patch makes sure that the MigrationTask has access the networking
API to do such healing.
blueprint: support-move-ops-with-qos-ports
Change-Id: Idf38568c3c237687c54fbbfcc6c5792c49c95161
This patch collects the resource requests from each neutron port
involved in a server create request. Converts each request to
a RequestGroup object and includes them in the RequestSpec.
This way the requests are reaching the scheduler and there
they are included in the generation of the allocation_candidates
query.
This patch only handles the happy path of a server create request. But
it adds couple of TODOs to places where the server move operations
related code paths need to be implemented. That implementation will be
part of subsequent patches.
Note that this patch technically makes it possible to boot server with
one neutron port that has resource request. But it does not handle
multiple such ports or SRIOV ports where two PFs are supporting the
same physnet as well as many server lifecycle operations like resize,
migrate, live-migrate, unshelve. To avoid possible resource allocation
inconsistencies due to the partial support nova rejects any requests
that involves such ports. See the previous patches in this patch
series for details.
Also note that the simple boot cases are verified with functional tests
and in those tests we need to mock out the above described logic that
reject such requests. See a more background about this approach on the
ML [1].
[1] http://lists.openstack.org/pipermail/openstack-discuss/2018-December/001129.html
blueprint bandwidth-resource-provider
Change-Id: Ica6152ccb97dce805969d964d6ed032bfe22a33f
This is a simple refactor to isolate the code that
restricts the RequestSpec during a cold migration
to only select hosts from within the cell in which
the instance is already running.
Change-Id: Iea1a4c76384d0fda45610f3f1dab99df039236ec
Since all remaining SchedulerClient methods were direct passthroughs to
the SchedulerQueryClient, and nothing special is being done anymore to
instantiate that client, the SchedulerClient is no longer necessary. All
references to it are replaced with direct references to
SchedulerQueryClient.
Change-Id: I57dd199a7c5e762d97a600307aa13a7aeb62d2b2
A step toward getting rid of the SchedulerClient intermediary, this
patch removes the reportclient member from SchedulerClient, instead
instantiating SchedulerReportClient directly wherever it's needed.
Change-Id: I14d1a648843c6311a962aaf99a47bb1bebf7f5ea
During a resize, we were still passing a legacy dict-ified
version of a RequestSpec to compute even though we don't need
to anymore (that's removed here).
And we weren't passing the RequestSpec back from compute to
the cell conductor during a reschedule of a resize, which makes
conductor have to re-create a stub RequestSpec - which has caused
problems in the past (see bug 1774205).
This change passes the RequestSpec to the cell conductor on a
reschedule of a resize so that conductor can use it and lets
us start the timer on removing the compatibility code in
conductor (marked with a TODO here).
While in here, some really old and non-sensical stuff in compute
is modernized and tests are updated as a result.
Related to blueprint request-spec-use-by-compute
Change-Id: I4244f7dd8fe74565180f73684678027067b4506e
Both os-migrateLive and evacuate server API actions support a force
flag. If force is set to True in the request then nova does not call the
scheduler but instead tries to blindly copy the source host allocation
to the desitnation host. If the source host allocation contains
resources from more than the root RP then such blind copy cannot be done
properly. Therefore this patch detects such situation and rejects
the forced move operation if the server has complex allocations on the
source host.
There is a separate bluperint
remove-force-flag-from-live-migrate-and-evacuate that will remove the
force flag in a new API microversion.
Note that before the force flag was added to these APIs Nova bypassed the
scheduler when the target host was specified.
Blueprint: use-nested-allocation-candidates
Change-Id: I7cbd5d9fb875ebf72995362e0b6693492ce32051
Migration-based allocations, where conductor swaps the
source node allocations from the instance record to the
migration record before calling the scheduler - which
creates new allocations on the selected dest node for
the instance, was new in Queens but contained compatibility
code for (1) old computes where conductor couldn't pass
a migration record to the compute and (2) scheduler drivers
which didn't create allocations, like the CachingScheduler.
Change Ibcb6bf912b3fb69c8631665fef2832906ba338aa makes
"migration" a required parameter for prep_resize() in
compute so (1) is no longer an issue. And change
I1832da2190be5ef2b04953938860a56a43e8cddf removed the
CachingScheduler.
This change guts the compatibility code added throughout
the following changes in Queens:
I0883c2ba1989c5d5a46e23bcbcda53598707bcbc
I7b2903c56cb53b48afe2c4dec42ae7623a7f24eb
I7c3c95d4e0836e1af054d076aad29172574eab2c
This change does overlap a bit with live-migration
flows that deal with migration-based allocations. Those
flows will be cleaned up in a subsequent change.
Cleaning up this compatibility code is going to help
with supporting nested resource allocations for things
like blueprint use-nested-allocation-candidates where
we need to be sure that the source node allocations are
on the migration record rather than "doubled up" on the
instance record.
Change-Id: I0851e2d54a1fdc82fe3291fb7e286e790f121e92
This patch renames the set_and_clear_allocations function in the
scheduler report client to move_allocations and adds handling of
consumer generation conflict for it. This call now moves everything from
one consumer to another and raises AllocationMoveFailed to the caller if
the move fails due to consumer generation conflict.
When migration or resize fails to move the source host allocation to the
migration_uuid then the API returns HTTP 409 and the migration is aborted.
If reverting a migration, a resize, or a resize to same host fails to move
the source host allocation back to the instance_uuid due consumer generation
conflict the instance will be put into ERROR state. The instance still has two
allocations in this state and deleting the instance only deletes the one that
is held by the instance_uuid. This patch logs an ERROR describing that in this
case the allocation held by the migration_uuid is leaked.
Blueprint: use-nested-allocation-candidates
Change-Id: Ie991d4b53e9bb5e7ec26da99219178ab7695abf6
There are two things required here. Firstly, we need to start
(consistently) storing the physnet and tunneled status of various
networks in 'Instance.info_cache.network_info'. Once we have this
information, we can use it to populate the
'RequestSpec.network_metadata', which can be consumed by later changes
in the series.
Note that live migrations and evacuations with a forced destination
host bypass the scheduler which also means there will be no NUMA
vswitch affinity "claims" on the destination host for those force
move operations. This is not a regression since (1) live migration
is not NUMA affinity aware nor does claims anyway (see blueprint
numa-aware-live-migration) and (2) forced host evacuate already does
not perform claims on the compute since the scheduler is bypassed
so the limits passed to compute, used for the claim, are empty.
Part of blueprint numa-aware-vswitches
Change-Id: I393bd58b8fede38af98ded0c7be099ef22b6f75b
Change I9c2111f7377df65c1fc3c72323f85483b3295989 sets the
RequestSpec.is_bfv flag for newly created instances so
that scheduling (using placement) does not allocate DISK_GB
resources for the Flavor's root_gb when booting from volume.
RequestSpecs for old instances created before that change will
not have the is_bfv field set, so this change adds a check for
that in the various move operations (evacuate, unshelve, cold
migrate and live migrate) and sets the RequestSpec.is_bfv flag
accordingly.
The related functional test is updated for the legacy cold
migrate and heal scenario.
Change-Id: I8e529ad4d707b2ad012328993892db83ce464c4b
Closes-Bug: #1469179
The user_id field was not implemented in RequestSpec like project_id was.
Some people have out of tree filters which use the user_id field.
This change makes the user_id field available.
Closes-bug: #1768107
Change-Id: I3e174ae76931f8279540e92328c7c36a7bcaabc0
Add the 'X-Openstack-Request-Id' header
in the request of GET in SchedulerReportClient.
Change-Id: I306ac6f5c6b67d77d91a7ba24d4d863ab3e1bf5c
Closes-Bug: #1734625
This adds the usual compatibility logic for allowing 5.0 and 4.x for a
version-spanning release. This also adds a queens version alias.
Change-Id: I98dc2a588ee9ddfe0c4d8ce1d7fe59f1fc9e9fa8
Add the 'X-Openstack-Request-Id' header
in the request of POST in SchedulerReportClient.
When creating a resource provider and creating a resource class,
the header is added.
Subsequent patches will add the header in the other cases.
Change-Id: I39d8c71432b3adf7e5bdde1c6cb6f089a9c79614
Partial-Bug: #1734625
The 'prep_resize()' method of compute now accepts a 'host_list'
parameter that supplies alternate hosts for retrying. This required
bumping the RPC version to 4.21 for compute. The
MigrationTask._execute() method is also changed to request alternates
from the scheduler.
Blueprint: return-alternate-hosts
Change-Id: If6a0bb766e70ab6f1c313da38bb2b0756a2e8772
The CachingScheduler does not create allocations in the
scheduler, so the assertion in the conductor migrate task
that the instance will have allocations on the source node
is incorrect. This removes the assertion and changes the
error to a debug message.
The related functional regression test is updated to show
resize working with the CachingScheduler again.
There are other places in the code, especially the compute
and resource tracker, where we log errors if we can't find
allocations and those likely need to be updated at some
point, but it can happen in a follow up.
Change-Id: I0bb2933c4ed7ed479206c9b06be7e30a2ec92f2a
Closes-Bug: #1741307
The online data migration routine to create request specs for old
instances used an admin context which has an empty project_id,
so when scheduling (moving) one of these, if we try to PUT /allocations
in placement using the FilterScheduler we'll fail because the project_id
is None.
This works around that by putting the instance.project_id on the request
spec before calling the scheduler to pick a node and claim resources
against it.
A later change will need to add some sort of online data migration
routine so that we properly update and persist the fix for these
older records.
Change-Id: I34b1d99a9d0d2aca80f094a79ec1656abaf762dc
Partial-Bug: #1739318
set_and_clear_allocations is added to the scheduler's report client to
provide an atomic method which can set and clear allocations for two
different consumer uuids by making a POST to /allocations. Because we
have committed to using microversion 1.14 we do not try falling back
to an earlier version as the necessary functionality is provided in
microversion 1.13.
Three FIXMEs left behind when dansmith was doing the migration.uuid
with allocations work earlier in blueprint migration-allocations have
been changed from using those two fallback methods to using the
new one.
Also there are some concerns about the return values from the
methods. Are we doing enough with them?
Are there other places where this change should happen that I missed?
Is the new method too specific? Should it be more generic about
assembling aribtrary allocations that it is given?
Change-Id: I88aa32a95ea64ab7a9214772fd6e2a80c185d0cd
This changes the RPC call for select_destinations() as made by the
conductor. The previous patch added the logic on the scheduler side;
this patch changes the conductor side to use the two new parameters that
flag the new behaviors for Selection objects and alternate hosts.
Blueprint: return-alternate-hosts
Change-Id: I70b11dd489d222be3d70733355bfe7966df556aa
If we fail to find resource allocations for an instance on the
source compute node when swapping allocations for a cold migrate /
resize operation, we need to raise a specific exception which will
eventually be a 500 internal error from the REST API. Using
InstanceUnacceptable is incorrect since it extends Invalid which is
handled in the resize REST API controller code and returned as a 400
bad request with the message "Instance image invalid" which is not
at all the actual failure.
As for the actual reason why the instance allocation on the source
node is not found, that is a different bug.
Change-Id: Id8e2dcf21776f8237a8b63acb23787bb42c9bd13
Closes-Bug: #1729356
This function enables users to specify a target host
when cold migrating a VM instance.
By default, only admin users may perform
a cold migration operation.
This patch does not change that policy default.
This patch modifies a compute API and conductor.
A subsequent patch modifies the migration API.
Change-Id: I57568e9a01664ee373ea00a8db3164109c982909
Implements: blueprint cold-migration-with-target-queens
This makes us swap the instance's allocation for one held by the migration
during a cold move operation. If we need to revert back, we swap again,
and if not, we just delete the migration's allocation against the source
node when confirming.
Related to blueprint migration-allocations
Change-Id: I89e2682c9210901cf1992dac2f9068b51f0373cd