When [quota]count_usage_from_placement = true or
[quota]driver = nova.quota.UnifiedLimitsDriver, cores and ram quota
usage are counted from placement. When an instance is SHELVED_OFFLOADED,
it will not have allocations in placement, so its cores and ram should
not count against quota during that time.
This means however that when an instance is unshelved, there is a
possibility of going over quota if the cores and ram it needs were
allocated by some other instance(s) while it was SHELVED_OFFLOADED.
This fixes a bug where quota was not being properly enforced during
unshelve of a SHELVED_OFFLOADED instance when quota usage is counted
from placement. Test coverage is also added for the "recheck" quota
cases.
Closes-Bug: #2003991
Change-Id: I4ab97626c10052c7af9934a80ff8db9ddab82738
Related to the bp/allowing-target-state-for-evacuate. This change
is extending compute API to accept a new argument targetState.
The targetState argument when set will force state of an evacuated
instance to the destination host.
Signed-off-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@industrialdiscipline.com>
Change-Id: I9660d42937ad62d647afc6be965f166cc5631392
This patch adds support for passing the ``reimage_boot_volume``
flag from the API layer through the conductor layer to the
computer layer and also includes RPC bump as necessary.
Related blueprint volume-backed-server-rebuild
Change-Id: I8daf177eb67d08112a16fe788910644abf338fa6
This patch adds the plumbing for rebuilding a volume backed
instance in compute code. This functionality will be enabled
in a subsequent patch which adds a new microversion and the
external support for requesting it.
The flow of the operation is as follows:
1) Create an empty attachment
2) Detach the volume
3) Request cinder to reimage the volume
4) Wait for cinder to notify success to nova (via external events)
5) Update and complete the attachment
Related blueprint volume-backed-server-rebuild
Change-Id: I0d889691de1af6875603a9f0f174590229e7be18
Conductor creates a placement client for the potential case where
it needs to make a call for certain operations. A transient network
or keystone failure will currently cause it to abort startup, which
means it is not available for other unrelated activities, such as
DB proxying for compute.
This makes conductor test the placement client on startup, but only
abort startup on errors that are highly likely to be permanent
configuration errors, and only warn about things like being unable
to contact keystone/placement during initialization. If a non-fatal
error is encountered at startup, later operations needing the
placement client will retry initialization.
Closes-Bug: #1846820
Change-Id: Idb7fcbce0c9562e7b9bd3e80f2a6d4b9bc286830
Now that we no longer support py27, we can use the standard library
unittest.mock module instead of the third party mock lib. Most of this
is autogenerated, as described below, but there is one manual change
necessary:
nova/tests/functional/regressions/test_bug_1781286.py
We need to avoid using 'fixtures.MockPatch' since fixtures is using
'mock' (the library) under the hood and a call to 'mock.patch.stop'
found in that test will now "stop" mocks from the wrong library. We
have discussed making this configurable but the option proposed isn't
that pretty [1] so this is better.
The remainder was auto-generated with the following (hacky) script, with
one or two manual tweaks after the fact:
import glob
for path in glob.glob('nova/tests/**/*.py', recursive=True):
with open(path) as fh:
lines = fh.readlines()
if 'import mock\n' not in lines:
continue
import_group_found = False
create_first_party_group = False
for num, line in enumerate(lines):
line = line.strip()
if line.startswith('import ') or line.startswith('from '):
tokens = line.split()
for lib in (
'ddt', 'six', 'webob', 'fixtures', 'testtools'
'neutron', 'cinder', 'ironic', 'keystone', 'oslo',
):
if lib in tokens[1]:
create_first_party_group = True
break
if create_first_party_group:
break
import_group_found = True
if not import_group_found:
continue
if line.startswith('import ') or line.startswith('from '):
tokens = line.split()
if tokens[1] > 'unittest':
break
elif tokens[1] == 'unittest' and (
len(tokens) == 2 or tokens[4] > 'mock'
):
break
elif not line:
break
if create_first_party_group:
lines.insert(num, 'from unittest import mock\n\n')
else:
lines.insert(num, 'from unittest import mock\n')
del lines[lines.index('import mock\n')]
with open(path, 'w+') as fh:
fh.writelines(lines)
Note that we cannot remove mock from our requirements files yet due to
importing pypowervm unit test code in nova unit tests. This library
still uses the mock lib, and since we are importing test code and that
lib (correctly) only declares mock in its test-requirements.txt, mock
would not otherwise be installed and would cause errors while loading
nova unit test code.
[1] https://github.com/testing-cabal/fixtures/pull/49
Change-Id: Id5b04cf2f6ca24af8e366d23f15cf0e5cac8e1cc
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
We now enforce limits on resources requested in the flavor.
This includes: instances, ram, cores. It also works for any resource
class being requested via the flavor chosen, such as custom resource
classes relating to Ironic resources.
Note because disk resources can be limited, we need to know if the
instance is boot from volume or not. This has meant adding extra code to
make sure we know that when enforcing the limits.
Follow on patches will update the APIs to accurately report the limits
being applied to instances, ram and cores.
blueprint unified-limits-nova
Change-Id: If1df93400dcbcb1d3aac0ade80ae5ecf6ce38d11
autopep8 is a code formating tool that makes python code pep8
compliant without changing everything. Unlike black it will
not radically change all code and the primary change to the
existing codebase is adding a new line after class level doc strings.
This change adds a new tox autopep8 env to manually run it on your
code before you submit a patch, it also adds autopep8 to pre-commit
so if you use pre-commit it will do it for you automatically.
This change runs autopep8 in diff mode with --exit-code in the pep8
tox env so it will fail if autopep8 would modify your code if run
in in-place mode. This allows use to gate on autopep8 not modifying
patches that are submited. This will ensure authorship of patches is
maintianed.
The intent of this change is to save the large amount of time we spend
on ensuring style guidlines are followed automatically to make it
simpler for both new and old contibutors to work on nova and save
time and effort for all involved.
Change-Id: Idd618d634cc70ae8d58fab32f322e75bfabefb9d
Nova re-generates the resource request of an instance for each server
move operation (migrate, resize, evacuate, live-migrate, unshelve) to
find (or validate) a target host for the instance move. This patch
extends the this logic to support the extended resource request from
neutron.
As the changes in the neutron interface code is called from nova-compute
service during the port binding the compute service version is bumped.
And a check is added to the compute-api to reject the move operations
with ports having extended resource request if there are old computes
in the cluster.
blueprint: qos-minimum-guaranteed-packet-rate
Change-Id: Ibcf703e254e720b9a6de17527325758676628d48
Introduce a new 'nova.db.api.api' module to hold API database-specific
helpers, plus a generic 'nova.db.utils' module to hold code suitable for
both main and API databases. This highlights a level of complexity
around connection management that is present for the main database but
not for the API database. This is because we need to handle the
complexity of cells for the former but not the latter.
Change-Id: Ia5304c552ce552ae3c5223a2bfb3a9cd543ec57c
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
The two remaining modules, 'api_models' and 'api_migrations', are
moved to the new 'nova.db.api' module.
Change-Id: I138670fe36b07546db5518f78c657197780c5040
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Merge these, removing an unnecessary layer of abstraction, and place
them in the new 'nova.db.main' directory. The resulting change is huge,
but it's mainly the result of 's/sqlalchemy import api/main import api/'
and 's/nova.db.api/nova.db.main.api/' with some necessary cleanup. We
also need to rework how we do the blocking of API calls since we no
longer have a 'DBAPI' object that we can monkey patch as we were doing
before. This is now done via a global variable that is set by the 'main'
function of 'nova.cmd.compute'.
The main impact of this change is that it's no longer possible to set
'[database] use_db_reconnect' and have all APIs automatically wrapped in
a DB retry. Seeing as this behavior is experimental, isn't applied to
any of the API DB methods (which don't use oslo.db's 'DBAPI' helper),
and is used explicitly in what would appear to be the critical cases
(via the explicit 'oslo_db.api.wrap_db_retry' decorator), this doesn't
seem like a huge loss.
Change-Id: Iad2e4da4546b80a016e477577d23accb2606a6e4
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Server with ARQ in the port does not support move and suspend,
reject these operations in API stage:
- resize
- shelve
- live_migrate
- evacuate
- suspend
- attach/detach a smartnic port
Reject create server with smartnic in port if minimal compute
service version less than 57
Reject create server with port which have a malformed device
profile that request multi devices, like:
{
"resources:CUSTOM_ACCELERATOR_FPGA": "2",
"trait:CUSTOM_INTEL_PAC_ARRIA10": "required",
}
Implements: blueprint sriov-smartnic-support
Change-Id: Ia705a0341fb067e746a3b91ec4fc6d149bcaffb8
delete arqs:
- delete arq while port unbind
- create ops failed and arqs did not bind to instance
- arq bind to instance but not bind to port
Implements: blueprint sriov-smartnic-support
Change-Id: Idab0ee38750d018de409699a0dbdff106d9e11fb
The fake_notifier uses module globals and also needs careful stub and
reset calls to work properly. This patch wraps the fake_notifier into a
proper Fixture that automates the complexity.
This is fairly rage patch but it does not change any logic just redirect
calls from the fake_notifier to the new NotificationFixture
Change-Id: I456f685f480b8de71014cf232a8f08c731605ad8
There's no need to throw these into one giant file.
Change-Id: I8478449d15edb40f98d25d3940343cae9ab2fde8
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Move these to the central place. There's a large amount of test damage
but it's pretty trivial.
Change-Id: If581eb7aa463c9dde13714f34f0f1b41549a7130
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
This continues on from I81fec10535034f3a81d46713a6eda813f90561cf and
removes all other references to 'instance_type' where it's possible to
do so. The only things left are DB columns, o.vo fields, some
unversioned objects, and RPC API methods. If we want to remove these, we
can but it's a lot more work.
Change-Id: I264d6df1809d7283415e69a66a9153829b8df537
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
As we don't persist (fortunately) the requested networks when booting an
instance, we need a way to implement the value of the RequestSpec field
during any create or move operation so we would know in a later change
which port or network was asked.
Partially-Implements: blueprint routed-networks-scheduling
Change-Id: I0c7e32f6088a8fc1625a0655af824dee2df4a12c
This change extends the conductor manager to append the cyborg
resource request to the request spec when performing an unshelve.
On shelve offload an instance will be deleted the instance's ARQs
binding info to free up the bound ARQs in Cyborg service.
And this change passes the ARQs to spawn during unshelve an instance.
This change extends the ``shelve_instance``, ``shelve_offload_instance``
and ``unshelve_instance`` rpcapi function to carry the arq_uuids.
Co-Authored-By: Wenping Song <songwenping@inspur.com>
Implements: blueprint cyborg-shelve-and-unshelve
Change-Id: I258df4d77f6d86df1d867a8fe27360731c21d237
Replace six.text_type with str.
This patch completes six removal.
Change-Id: I779bd1446dc1f070fa5100ccccda7881fa508d79
Implements: blueprint six-removal
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
This change extends the conductor manager
to append the cyborg resource request to the
request spec when performing an evacuate.
This change passes the ARQs to spawn during rebuild
and evacuate. On evacuate the existing ARQs will be deleted
and new ARQs will be created and bound, during rebuild the
existing ARQs are reused.
This change extends the rebuild_instance compute rpcapi
function to carry the arq_uuids. This eliminates the
need to lookup the uuids associated with the arqs assinged
to the instance by quering cyborg.
Co-Authored-By: Wenping Song <songwenping@inspur.com>
Co-Authored-By: Brin Zhang <zhangbailin@inspur.com>
Implements: blueprint cyborg-rebuild-and-evacuate
Change-Id: I147bf4d95e6d86ff1f967a8ce37260730f21d236
During the reivew of the cyborg series it was noted that
in some cases ARQs could be leaked during binding.
See https://review.opendev.org/#/c/673735/46/nova/conductor/manager.py@1632
This change adds a delete_arqs_by_uuid function that can delete
unbound ARQs by instance uuid.
This change modifies build_instances and schedule_and_build_instances
to handel the AcceleratorRequestBindingFailed exception raised when
binding fails and clean up instance arqs.
Co-Authored-By: Wenping Song <songwenping@inspur.com>
Closes-Bug: #1872730
Change-Id: I86c2f00e2368fe02211175e7328b2cd9c0ebf41b
Blueprint: nova-cyborg-interaction
The 'inspect.trace()' function is expected to be called within the
context of an exception handler. The 'from_exc_and_traceback' class
method of the 'nova.notification.objects.exception.ExceptionPayload'
class uses this to get information about a provided exception, however,
there are cases where this is called from outside of an exception
handler. In these cases, we see an 'IndexError' since we can't get the
last frame of a non-existent stacktrace. The solution to this is to
fallback to using the traceback embedded in the exception. This is a bit
lossy when decorators are involved but for all other cases this will
give us the same information. This also allows us to avoid passing a
traceback argument to the function since we have it to hand already.
Change-Id: I404ca316b1bf2a963106cd34e927934befbd9b12
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Closes-Bug: #1881455
The _get_bound_arq_resources() in the compute manager [1] calls Cyborg
up to 3 times: once to get the accelerator request (ARQ) UUIDs for the
instance, and then once or twice to get all ARQs with completed bindings.
The first call can be eliminated by passing the ARQs from the conductor
to the compute manager as an additional parameter in
build_and_run_instance(). This requires a bump in compute rpcapi version.
[1] https://review.opendev.org/#/c/631244/54/nova/compute/manager.py@2652
Blueprint: nova-cyborg-interaction
Change-Id: I26395d57bd4ba55276b7514baa808f9888639e11
This patch series now works for many VM operations with libvirt:
* Creation, deletion of VM instances.
* Pause/unpause
The following works but is a no-op:
* Lock/unlock
Hard reboots are taken up in a later patch in this series.
Soft reboots work for accelerators unless some unrelated failure
forces a hard reboot in the libvirt driver.
Suspend is not supported yet. It would fail with this error:
libvirtError: Requested operation is not valid:
domain has assigned non-USB host devices
Shelve is not supported yet.
Live migration is not intended to be supported with accelerators now.
Change-Id: Icb95890d8f16cad1f7dc18487a48def2f7c9aec2
Blueprint: nova-cyborg-interaction
There are two almost identical implementations of the _run_periodics()
helper - and a third one would have joined them in a subsequent patch,
if not for this patch. This patch moves the _run_periodics() to the
base test class. In addition, _run_periodics() depends on the
self.computes dict used for compute service tracking. The method that
populates that dict, _start_compute(), is therefore also moved to the
base class.
This enables some light refactoring of existing tests that need
either the _run_periodics() helper, or the compute service tracking.
In addition, a needless override of _start_compute() in
test_aggregates that provided no added value is removed. This is done
to avoid any potential confusion around _start_compute()'s role.
Change-Id: I36dd64dc272ea1743995b3b696323a9431666489
safdasdf
Change-Id: I33d8ac0a1cae0b2d275a21287d5e44c008a68122
* Call Cyborg with device profile name to get ARQs (Accelerator Requests).
Each ARQ corresponds to a single device profile group, which
corrresponds to a single request group in request spec.
* Match each ARQ to associated request group, and thereby obtain the
corresponding RP for that ARQ.
* Call Cyborg to bind the ARQ to that host/device-RP.
* When Cyborg sends the ARQ bind notification events, wait for those
events with a timeout.
Change-Id: I0f8b6bf2b4f4510da6c84fede532533602b6af7f
Blueprint: nova-cyborg-interaction
This patch adds support for unshelving an offloaded server with qos ports.
To do that this patch:
* collects the port resource requests from neutron before the scheduler
is called to select the target of the unshelve.
* calculate the request group - provider mapping after the scheduler
selected the target host
* update the InstancePCIRequest to drive the pci_claim to allocate VFs
from the same PF as the bandwidth is allocated from by the scheduler
* update the binding profile of the qos ports to so that the allocation
key of the binding profile points to the RPs the port is allocated
from.
As this was the last move operation to be supported the compute service
version is bumped to indicate such support. This will be used in a later
patches to implement a global service level check in the API.
Note that unshelve does not have a re-schedule loop and all the RPC
changes was committed in Queens.
Two error cases needs special care by rolling back allocations before
putting the instance back to SHELVED_OFFLOADED state:
* if the IntancePCIRequest cannot be updated according to the new target
host of unshelve
* if updating port binding fails in neutron during unshelve
Change-Id: I678722b3cf295c89110967d5ad8c0c964df4cb42
blueprint: support-move-ops-with-qos-ports-ussuri
This doesn't exist for 'nova.volume' and no longer exists for
'nova.network'. There's only one image backend we support, so do like
we've done elsewhere and just use 'nova.image.glance'.
Change-Id: I7ca7d8a92dfbc7c8d0ee2f9e660eabaa7e220e2a
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
At some point in the past, there was only nova-network and its code
could be found in 'nova.network'. Neutron was added and eventually found
itself (mostly!) in the 'nova.network.neutronv2' submodule. With
nova-network now gone, we can remove one layer of indirection and move
the code from 'nova.network.neutronv2' back up to 'nova.network',
mirroring what we did with the old nova-volume code way back in 2012
[1]. To ensure people don't get nova-network and 'nova.network'
confused, 'neutron' is retained in filenames.
[1] https://review.opendev.org/#/c/14731/
Change-Id: I329f0fd589a4b2e0426485f09f6782f94275cc07
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
This passes graceful_exit=True to the wrap_instance_event decorator
in ComputeTaskManager.revert_snapshot_based_resize so that upon successful
completion of the RevertResizeTask, when the instance is hard destroyed
from the target cell DB (used to create the action/event), a traceback
is not logged for the InstanceActionNotFound exception.
The same event is also finished in the source cell DB upon successful
completion of the RevertResizeTask. Note that there are other ways we
could have done this, e.g. moving the contents of the _execute() method
to another method and then putting that in an EventReporter context with
the source cell context/instance, but this was simpler.
Part of blueprint cross-cell-resize
Change-Id: Ibb32f7c19f5f2ec4811b165b8df748d1b7b0f9e4
This adds the conductor ComputeTaskManager method
revert_snapshot_based_resize along with the related conductor
RPC API client method which will be an RPC cast from the API
for a revertResize server action.
Part of blueprint cross-cell-resize
Change-Id: Ia6b6b25238963a5f60349267da6d07cb740982f4
This adds the conductor ComputeTaskManager method
confirm_snapshot_based_resize along with the related conductor
RPC API client method which by default will be an RPC cast
from the API for a confirmResize server action but can also
be RPC called in the case of deleting a server in VERIFY_RESIZE
status.
Part of blueprint cross-cell-resize
Change-Id: If4c4b23891bfc340deb18a2f500510a472a869c9
fill_provider_mapping is used from *most* code paths where it's
necessary to associate RequestSpec.request_groups with the resource
providers that are satisfying them. (Specifically, all the code paths
where we have a Selection object available. More about that below.)
Prior to Placement microversion 1.34, the only way to do this mapping
was by reproducing much of the logic from GET /allocation_candidates
locally to reverse engineer the associations. This was incomplete,
imperfect, inefficient, and ugly. That workaround was nested in the call
from fill_provider_mapping to fill_provider_mapping_based_on_allocation.
Placement microversion 1.34 enhanced GET /allocation_candidates to
return these mappings [1], and Nova started using 1.34 as of [2], so
this commit makes fill_provider_mapping bypass
fill_provider_mapping_based_on_allocations completely.
We would love to get rid of the entire hack, but
fill_provider_mapping_based_on_allocation is still used from
finish_revert_resize to restore port bindings on a reverted migration.
And when reverting a migration, we don't have allocation candidates with
mappings, only the original source allocations. It is left to a future
patch to figure out how to get around this, conceivably by saving the
original mappings in the migration context.
[1] https://docs.openstack.org/placement/train/specs/train/implemented/placement-resource-provider-request-group-mapping-in-allocation-candidates.html
[2] I52499ff6639c1a5815a8557b22dd33106dcc386b
Related to blueprint: placement-resource-provider-request-group-mapping-in-allocation-candidates
Change-Id: I45e0b2b73f88b86a20bc70ddf4f9bb97c8ea8312