Commit Graph

44 Commits

Author SHA1 Message Date
Balazs Gibizer 48229b46b4 Retry /reshape at provider generation conflict
During a normal update_available_resources run if the local provider
tree caches is invalid (i.e. due to the scheduler made an allocation
bumping the generation of the RPs) and the virt driver try to update the
inventory of an RP based on the cache Placement will report conflict,
the report client will invalidate the caches and the retry decorator
on ResourceTracker._update_to_placement will re-drive the top of the
fresh RP data.

However the same thing can happen during reshape as well but the retry
mechanism is missing in that code path so the stale caches can cause
reshape failures.

This patch adds specific error handling in the reshape code path to
implement the same retry mechanism as exists for inventory update.

blueprint: pci-device-tracking-in-placement
Change-Id: Ieb954a04e6aba827611765f7f401124a1fe298f3
2022-08-25 10:00:10 +02:00
Stephen Finucane 89ef050b8c Use unittest.mock instead of third party mock
Now that we no longer support py27, we can use the standard library
unittest.mock module instead of the third party mock lib. Most of this
is autogenerated, as described below, but there is one manual change
necessary:

nova/tests/functional/regressions/test_bug_1781286.py
  We need to avoid using 'fixtures.MockPatch' since fixtures is using
  'mock' (the library) under the hood and a call to 'mock.patch.stop'
  found in that test will now "stop" mocks from the wrong library. We
  have discussed making this configurable but the option proposed isn't
  that pretty [1] so this is better.

The remainder was auto-generated with the following (hacky) script, with
one or two manual tweaks after the fact:

  import glob

  for path in glob.glob('nova/tests/**/*.py', recursive=True):
      with open(path) as fh:
          lines = fh.readlines()
      if 'import mock\n' not in lines:
          continue
      import_group_found = False
      create_first_party_group = False
      for num, line in enumerate(lines):
          line = line.strip()
          if line.startswith('import ') or line.startswith('from '):
              tokens = line.split()
              for lib in (
                  'ddt', 'six', 'webob', 'fixtures', 'testtools'
                  'neutron', 'cinder', 'ironic', 'keystone', 'oslo',
              ):
                  if lib in tokens[1]:
                      create_first_party_group = True
                      break
              if create_first_party_group:
                  break
              import_group_found = True
          if not import_group_found:
              continue
          if line.startswith('import ') or line.startswith('from '):
              tokens = line.split()
              if tokens[1] > 'unittest':
                  break
              elif tokens[1] == 'unittest' and (
                  len(tokens) == 2 or tokens[4] > 'mock'
              ):
                  break
          elif not line:
              break
      if create_first_party_group:
          lines.insert(num, 'from unittest import mock\n\n')
      else:
          lines.insert(num, 'from unittest import mock\n')
      del lines[lines.index('import mock\n')]
      with open(path, 'w+') as fh:
          fh.writelines(lines)

Note that we cannot remove mock from our requirements files yet due to
importing pypowervm unit test code in nova unit tests. This library
still uses the mock lib, and since we are importing test code and that
lib (correctly) only declares mock in its test-requirements.txt, mock
would not otherwise be installed and would cause errors while loading
nova unit test code.

[1] https://github.com/testing-cabal/fixtures/pull/49

Change-Id: Id5b04cf2f6ca24af8e366d23f15cf0e5cac8e1cc
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2022-08-01 17:46:26 +02:00
Stephen Finucane 8133092907 Remove use of pkg_resources
Use of this library has significant performance implications. While
we're probably not too badly affected, we don't actually need to use it
here. The 'parse_version' utility it exposes is intended to parse
PEP440-compliant version identifiers, not the simple microversions
placement uses, which the 'microversion_parse' library can competently
parse for us.

Change-Id: I9b7281caec6fa53600dea316492d052787cf799b
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
2022-07-14 15:20:55 +01:00
Zuul 77c8f91a5b Merge "Bump min placement microversion to 1.36" 2021-08-31 01:35:22 +00:00
Matt Riedemann c09d98dadb Add force kwarg to delete_allocation_for_instance
This adds a force kwarg to delete_allocation_for_instance which
defaults to True because that was found to be the most common use case
by a significant margin during implementation of this patch.
In most cases, this method is called when we want to delete the
allocations because they should be gone, e.g. server delete, failed
build, or shelve offload. The alternative in these cases is the caller
could trap the conflict error and retry but we might as well just force
the delete in that case (it's cleaner).

When force=True, it will DELETE the consumer allocations rather than
GET and PUT with an empty allocations dict and the consumer generation
which can result in a 409 conflict from Placement. For example, bug
1836754 shows that in one tempest test that creates a server and then
immediately deletes it, we can hit a very tight window where the method
GETs the allocations and before it PUTs the empty allocations to remove
them, something changes which results in a conflict and the server
delete fails with a 409 error.

It's worth noting that delete_allocation_for_instance used to just
DELETE the allocations before Stein [1] when we started taking consumer
generations into account. There was also a related mailing list thread
[2].


Closes-Bug: #1836754

[1] I77f34788dd7ab8fdf60d668a4f76452e03cf9888
[2] http://lists.openstack.org/pipermail/openstack-dev/2018-August/133374.html

Change-Id: Ife3c7a5a95c5d707983ab33fd2fbfc1cfb72f676
2021-08-30 06:11:25 +00:00
Balazs Gibizer f6e8c512fb Bump min placement microversion to 1.36
To implement the usage of same_subtree query parameter in the
allocation candidate request first the minimum requires placement
microversion needs to be bumped from 1.35 to 1.36. This patch makes such
bump and update the related nova upgrade check. Later patches will
modify the query generation to include the same_subtree param to the
request.

Change-Id: I5bfec9b9ec49e60c454d71f6fc645038504ef9ef
blueprint: qos-minimum-guaranteed-packet-rate
2021-08-21 10:00:51 +02:00
Balazs Gibizer c3804efd42 Refactor ResourceRequest constructor
This refactor changes ResourceRequest __init__ to make only an empty
request and moves the ResourceReqeust creation from a RequestSpec to a
static factory method. This is a preparation to introduce another
factory method later that will generate a ResourceRequest from a single
ResourceGroup instead of a full RequestSpec.

Blueprint: support-interface-attach-with-qos-ports

Change-Id: Idd58298a6b01775f962b9bf0a0835f762c8e0ed2
2021-01-18 15:40:42 +01:00
Eric Fried f2d088b04e Stop using PlacementDirect
PlacementDirect was integrated into a functional test suite when it was
first created as a way to prove that it worked [1] and demonstrate how
to use it.

However, it was a pain then, because the interceptor needs to be created
every time you want to use it; and since extracted placement started
diverging from in-tree placement, other problems started cropping up
(see the associated bug).

So this commit removes the use of PlacementDirect from nova. Details:

- test_report_client now uses PlacementFixture. So all the `with
  interceptor` context management is gone. This accounts for the vast
  majority of the apparent change, which is just outdenting those
  contexts.
- SchedulerReportClientTestBase, which was doing some hocus pocus to
  wrap the SchedulerReportClient such that we could do some microversion
  checks, is removed. The test suite simply instantiates the
  microversion-checking wrapper class directly as the client used by the
  test cases.
- We were taking advantage of a PlacementDirect feature allowing us to
  default to the latest microversion if not explicitly specified in the
  request. Without this, we had to add the `version` kwarg to some of
  the calls we were making to SchedulerReportClient primitives
  (get/put/post/delete).
- A piece of test_update_from_provider_tree was using a
  deliberately-broken interceptor to prove that the code in question
  wasn't hitting the API. We replace this with a non-callable mock on
  the Adapter's request method.
- test_global_request_id was taking advantage of the interceptor to
  validate that the global request ID was making it to the "other side"
  of the API boundary. This was fun, but overkill. We now simply assert
  that the correct HTTP header is making it into the ksa Adapter's
  request method.
- Functional test suite test_resource_tracker.IronicResourceTrackerTest
  was inheriting from the SchedulerReportClientTestBase class, but not
  using the interceptor anywhere. Can't tell you why that was done. So
  now it just uses the plain old test.TestCase like everyone else.

[1] This commit does remove all of nova's testing of PlacementDirect.
However, it is still tested in the placement repository itself:
69b9659a45/placement/tests/functional/test_direct.py

Change-Id: Icb889c09a69e7c5cbf9330e5d9917d6ab3ac3dc5
Related-Bug: #1818560
2020-03-05 07:36:37 -06:00
Eric Fried bcc893a2b0 Use Placement 1.35 (root_required)
Placement microversion 1.35 gives us the root_required queryparam to GET
/allocation_candidates, allowing us to filter out candidates where the
*root* provider has/lacks certain traits, independent of traits
specified in any of the individual request groups.

Use it.

And add affordance for specifying such traits to the RequestSpec.

Which allows us to fix up the couple of request filters that were
hacking traits into the RequestSpec.flavor.

Change-Id: I44f02044ce178e84c23d178e5a23a3aa1208e502
2020-01-07 16:46:56 -06:00
Eric Fried 54195a1bd9 Use Placement 1.34 (string suffixes & mappings)
This commit cuts us over to using placement microversion 1.34 for GET
/allocation_candidates, thereby supporting string request group suffixes
(added in 1.33) when specified in flavor extra_specs.

The mappings (added in 1.34) are not used in code yet, but a future
patch will tie the group suffixes to the RequestGroup.requester_id so
that it can be correlated after GET /a_c. This will allow us to get rid
of map_requested_resources_to_providers, which was a hack to bridge the
gap until we had mappings from placement.

Change-Id: I52499ff6639c1a5815a8557b22dd33106dcc386b
2019-12-05 17:02:46 -06:00
Matt Riedemann cea4f391f3 Move compute_node_to_inventory_dict to test-only code
Since [1] the only thing still using this utility method
is some functional report client test code so this change
moves it to the test class that needs it.

[1] Ib62ac0b692eb92a2ed364ec9f486ded05def39ad

Change-Id: I016765112b4d7a811a855da5e503a8cb870afbbe
2019-11-07 17:34:33 -05:00
Zuul 5345f9acb7 Merge "Remove @safe_connect from _delete_provider" 2019-10-09 22:49:12 +00:00
Chris Dent 5f553f4b1a Use microversion in put allocations in test_report_client
In some places test_report_client uses a raw client.put when
writing allocations. This defaults to the latest microversion.
This means that if the allocations format changes in the latest
microversion in a patch in placement, the nova functional tests
fail.

In this change we pin the microversion to the minimum one providing
the features used in the PUT.

Change-Id: Ia0fee6bb931770792b552ae32ef31f0a4cc466ee
2019-09-02 11:40:35 +01:00
Stephen Finucane 7abe83f646 scheduler: Flatten 'ResourceRequest.from_extra_specs', 'from_image_props'
The 'ResourceRequest' object sources information from three different
attributes of an instance: the instance's image metadata properties,
the instance's flavor, this flavor's extra specs. It's possible for a
user to override resources requested via the flavor using flavor extra
specs (e.g. using the 'resources:VCPU=N' extra spec), and it's possible
to override traits requested via the flavor extra specs using image
metadata (e.g. using the 'traits_required=foo' metadata property). This
means there's an implicit hierarchy present:

- Traits: image metadata > flavor extra specs
- Resources: flavor extra specs > flavor

Previously, we pulled information from the flavor extra specs and image
metadata using two classmethods, 'from_extra_specs' and
'from_image_props', but this required a lot of glue code in between to
ensure this hierarchy was maintained. Stop doing this, preferring to
centralize everything in one location. This results in fewer LoC and a
more grokable implementation, and will make things much easier when we
start handling 'PCPU's here.

Part of blueprint cpu-resources

Change-Id: Ic0e6bc47b79711b38b2d4dabaeb5ae1dbaf2b18a
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2019-08-27 17:00:03 +01:00
Eric Fried 9981d06b4c Remove @safe_connect from _delete_provider
This commit removes the @safe_connect decorator from
SchedulerReportClient._delete_provider and makes its callers deal with
resulting keystoneauth1.exceptions.ClientExceptionZ sanely:

- ServiceController.delete (via delete_resource_provider) logs an error
  and continues (backward compatible behavior, best effort to delete
  each provider).
- ComputeManager.update_available_resource (ditto) ditto (ditto).
- SchedulerReportClient.update_from_provider_tree raises the exception
  through, per the contract described in the catch_all internal helper.

Change-Id: I8403a841f21a624a546ae5f26bb9ba19318ece6a
2019-07-19 16:41:08 -05:00
Eric Fried 0652a4c7a5 Un-safe_connect and publicize get_providers_in_tree
In the continuing saga to wipe @safe_connect from the annals of history,
and in preparation, for its use outside of SchedulerReportClient, this
commit does two things to _get_providers_in_tree:

- Removes @safe_connect from it. Callers now need to be aware that they
  can get ClientExceptionZ from ksa. (The two existing callers were
  vetted and needed no additional handling - it's way more appropriate
  for them to raise ClientException than a mysterious NoneType error
  somewhere down the line as they would have been doing previously.)
- Renames it to get_providers_in_tree.

Change-Id: I2b284d69d345d15287f04a7ca4cd422155768525
2019-06-27 17:00:24 -05:00
Zuul 42df3eaf1f Merge "Prepare _heal_allocations_for_instance for nested allocations" 2019-06-27 21:33:43 +00:00
Balazs Gibizer 307999c581 Prepare _heal_allocations_for_instance for nested allocations
When no allocations exist for an instance the current heal code uses a
report client call that can only handle allocations from a single RP.
This call is now replaced with a more generic one so in a later patch
port allocations can be added to this code path too.

Related-Bug: #1819923
Change-Id: Ide343c1c922dac576b1944827dc24caefab59b74
2019-06-27 10:33:14 +02:00
Stephen Finucane dc6fc82c14 hacking: Resolve W605 (invalid escape sequence)
This one's actually important since it will be an error in future
versions of Python.

Change-Id: Ib9f735216773224f91ac7f49fbe2eee119670872
Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
2019-06-24 14:24:06 -05:00
Eric Fried c43f7e664d Use aggregate_add_host in nova-manage
When nova-manage placement sync_aggregates was added [1], it duplicated
some report client logic (aggregate_add_host) to do provider aggregate
retrieval and update so as not to duplicate a call to retrieve the
host's resource provider record. It also left a TODO to handle
generation conflicts.

Here we change the signature of aggregate_add_host to accept *either*
the host name or RP UUID, and refactor the nova-manage placement
sync_aggregates code to use it.

The behavior in terms of exit codes and messaging should be largely
unchanged, though there may be some subtle differences in corner cases.

[1] Iac67b6bf7e46fbac02b9d3cb59efc3c59b9e56c8

Change-Id: Iaa4ddf786ce7d31d2cee660d5196e5e530ec4bd3
2019-03-26 17:38:48 -05:00
Chris Dent 09090c8277 Use a placement conf when testing report client
It turns out that the independent wsgi interceptors in
test_report_client were using nova's global configuration
when creating the intercepts using code from placement.
This was working because until [1] placement's set of conf
options had not diverged from nova's and nova still has
placement_database config settings.

This change takes advantage of new functionality in the
PlacementFixture to allow the fixtur to manage config and
database, but _not_ run the interceptor. This means it
can set up a config that is later used by the independent
interceptors that are used in the report client tests.

[1] Ie43a69be8b75250d9deca6a911eda7b722ef8648

Change-Id: I05326e0f917ca1b9a6ef8d3bd463f68bd00e217e
Closes-Bug: #1818560
Depends-On: I8c36f35dbe85b0c0db1a5b6b5389b160b68ca488
2019-03-04 20:43:48 +00:00
Zuul c7f0d160e4 Merge "Use placement.inventory.inuse in report client" 2019-02-25 22:47:34 +00:00
Matt Riedemann 39ec15f58c Follow up for I0c764e441993e32aafef0b18049a425c3c832a50
This is a follow up for change
I0c764e441993e32aafef0b18049a425c3c832a50 to address
review comments.

The most important part is the early exit from
_fill_provider_mapping if request_spec.maps_requested_resources
returns False. That is needed to avoid the performance
impact of getting allocations and resource provider traits
per instance and provider. Since this code is currently only
going to be exercised with ports that have resource requests,
we want to avoid the extra work for all other server create
requests.

Part of blueprint bandwidth-resource-provider

Change-Id: I90845461b2b98c176c7b3b97dd3f47ed604a9bef
2019-02-22 10:57:11 +01:00
Eric Fried efa22cd985 Use placement.inventory.inuse in report client
Since I9a833aa35d474caa35e640bbad6c436a3b16ac5e we've had the framework
for placement to return specific error codes allowing us to
differentiate among error conditions for oft-repeated status codes.
That change also included as its proof-of-concept a specific code for
the placement side of InventoryInUse - i.e. an attempt to delete an
inventory record for which there are existing allocations.

SchedulerReportClient was previously identifying this error condition by
parsing the text of the 409 response. With this change, it instead uses
the provided error code.

Change-Id: Ic621adcadf10cc607455eba48c4cb1882bde23fa
2019-02-11 17:10:59 -06:00
Chris Dent 27617ee193 Switch to using os-resource-classes
With the extraction of placement we ended up with resource class names
being duplicated between nova and placement. To address that, the
os-resource-classes library [1] was created to provide a single
authority for standard resource classes and the format of custom
classes.

This patch changes nova to use it, removing the use of the rc_fields
module which used to have the information. A method left in it
(normalize_name) has been moved to utils.py, renamed as
normalize_rc_name, and callers and tests updated accordingly.

Because the placement code is being kept in nova for the time being,
that code's use of rc_fields is maintained, and the module too.
A note is added in the module explain that. Backporting the changes
from extracted-placement to placement-in-nova was considered but
because we no longer have placement tests in nova, that didn't seem
like the right thing to do.

requirements and lower-constraints have been updated.
os-resource-classes is already in global requirements.

For reference the related placement change is at [2].

[1] https://docs.openstack.org/os-resource-classes
[2] https://review.openstack.org/#/c/623556/

Change-Id: I8e579920c0eaca81b563a87429c930b21b3d4dc5
2019-02-07 11:11:09 +00:00
Eric Fried 570ad36992 Commonize _update code path
There were a bunch of report client methods around updating inventory to
placement which were only being used in the non-update_provider_tree
code paths of the resource tracker's update routine. Those code paths
had already been retrofitted to produce a placement-shaped inventory
object.

update_from_provider_tree gives us another way to flush these inventory
changes.

This patch simply takes the inventory object produced by the
get_inventory() and update_compute_node() code paths and updates the
provider tree object in the same fashion as update_provider_tree does.
So now all three code paths can commonly invoke
update_from_provider_tree.

And we can get rid of a ton of redundant code in the report client.

This includes the former incarnation of set_inventory_for_provider; so
we rename the artist formerly known as _set_inventory_for_provider to
match its brethren, set_traits_for_provider and
set_aggregates_for_provider.

Change-Id: I1a305847f0310c8d4babd5a625e4cc7bffe5b086
2019-01-16 18:34:39 +00:00
Eric Fried 2f77e7ad90 Consolidate inventory refresh
get_provider_tree_and_ensure_root now refreshes inventories via the
_ensure_resource_provider code path, so the call to
_refresh_and_get_inventories in get_provider_tree_and_ensure_root is no
longer necessary.

Change-Id: Iece924e85409bd4d9cd38ce6ced7883ffc905310
2019-01-16 18:34:39 +00:00
Eric Fried deef31729b Reduce calls to placement from _ensure
Prior to this patch, the report client's update_from_provider_tree
method would, upon failure of any placement API call, invalidate the
cache *just* for the failing provider (and any descendants) and attempt
to continue operating on any other providers in the tree.

With this patch, we instead invalidate the tree around the failing
provider and fail right away.

In real life, since we don't yet have any implementations of nested,
this would have been effectively a null change.

Except: this allows us to resolve a TODO whereby we would *always*
_ensure_resource_provider (including a call to GET
/resource_providers?in_tree=$compute_rp) on every periodic. Now we can
optimize that out.

This should reduce the number of calls to placement per RT periodic to
zero in steady state when [compute]resource_provider_association_refresh
is zero.

Closes-Bug: #1742467

Change-Id: Ieeaad9783e0ff93377fbc6c7932618d2fac8946a
2019-01-16 18:34:34 +00:00
Chris Dent 787bb33606 Use external placement in functional tests
Adjust the fixtures used by the functional tests so they
use placement database and web fixtures defined by placement
code. To avoid making redundant changes, the solely placement-
related unit and functional tests are removed, but the placement
code itself is not (yet).

openstack-placement is required by the functional tests. It is not
added to test-requirements as we do not want unit tests to depend
on placement in any way, and we enforce this by not having placement
in the test env.

The concept of tox-siblings is used to ensure that the
placement requirement will be satisfied correctly if there is a
depends-on. To make this happen, the functional jobs defined in
.zuul.yaml are updated to require openstack/placement.

tox.ini has to be updated to use a envdir that is the same
name as job. Otherwise the tox siblings role in ansible cannot work.

The handling of the placement fixtures is moved out of nova/test.py
into the functional tests that actually use it because we do not
want unit tests (which get the base test class out of test.py) to
have anything to do with placement. This requires adjusting some
test files to use absolute import.

Similarly, a test of the comparison function for the api samples tests
is moved into functional, because it depends on placement functionality,

TestUpgradeCheckResourceProviders in unit.cmd.test_status is moved into
a new test file: nova/tests/functional/test_nova_status.py. This is done
because it requires the PlacementFixture, which is only available to
functional tests. A MonkeyPatch is required in the test to make sure that
the right context managers are used at the right time in the command
itself (otherwise some tables do no exist). In the test itself, to avoid
speaking directly to the placement database, which would require
manipulating the RequestContext objects, resource providers are now
created over the API.

Co-Authored-By: Balazs Gibizer <balazs.gibizer@ericsson.com>
Change-Id: Idaed39629095f86d24a54334c699a26c218c6593
2018-12-12 18:46:49 +00:00
Balazs Gibizer dfa2e6f221 Consumer gen support for put allocations
The placement API version 1.28 introduced consumer generation as a way
to make updating allocation safe even if it is done from multiple
places.

This patch changes the scheduler report client put_allocations
function to raise AllocationUpdateFailed in case of generation conflict.
The only direct user of this call is the nova-manage heal_allocations
CLI which will simply fail to heal the allocation for this instance.

Blueprint: use-nested-allocation-candidates
Change-Id: Iba230201803ef3d33bccaaf83eb10453eea43f20
2018-09-25 13:02:02 +02:00
Eric Fried 73d7ef4288 Nix update_instance_allocation, _allocate_for_instance
A previous change [1] removed the only usage of SchedulerReportClient
method update_instance_allocation, which itself was the only user of the
_allocate_for_instance method. Remove both of these methods and their
test artifacts.

[1] If272365e58a583e2831a15a5c2abad2d77921729

Change-Id: Iec02942d384620608e7c705f15f895105d90c882
2018-09-20 18:53:03 +00:00
Eric Fried 8e1ca5bf34 Use uuidsentinel from oslo.utils
oslo.utils release 3.37.0 [1] introduced uuidsentinel [2]. This change
rips out nova's uuidsentinel and replaces it with the one from
oslo.utils.

[1] https://review.openstack.org/#/c/599754/
[2] https://review.openstack.org/#/c/594179/

Change-Id: I7f5f08691ca3f73073c66c29dddb996fb2c2b266
Depends-On: https://review.openstack.org/600041
2018-09-05 09:08:54 -05:00
Eric Fried 4f3c063aab Fix reshaper report client functonal test nits
Followon to address minor issues (mostly doc/comment) from reshaper
functional tests over the report client changes [1].

[1] https://review.openstack.org/#/c/585049/19/nova/tests/functional/test_report_client.py

Change-Id: I413cfd54cb8e5df444810874ebe2954844bbf863
2018-08-30 14:18:49 -05:00
Eric Fried b23bf6d6ab Report client: update_from_provider_tree w/reshape
The update_from_provider_tree method now takes an `allocations` kwarg
which, if not None, signals that we need to do a reshape
(inventory/allocation data migration). If the reshape section fails for
any reason, we raise ReshapeFailed.

Change-Id: I3fc2d5538cfe3ac1fd330f10d0376627f34a8b94
blueprint: reshape-provider-tree
2018-08-24 15:57:10 -05:00
Eric Fried 2833785f59 Report client: _reshape helper, placement min bump
Add a thin wrapper to invoke the POST /reshaper placement API with
appropriate error checking. This bumps the placement minimum to the
reshaper microversion, 1.30.

Change-Id: Idf8997d5efdfdfca6967899a0882ffb9ecf96915
blueprint: reshape-provider-tree
2018-08-24 15:39:18 -05:00
Eric Fried 25b852efd7 Report client: get_allocations_for_provider_tree
The reshaper path needs to pass all the allocations related to the
compute node's provider tree to update_provider_tree so it can shuffle
those allocations appropriately. This patch adds a new
get_allocations_for_provider_tree method to the report client for this
purpose.

Blueprint: reshape-provider-tree

Change-Id: I73811f3e3bf19dec3a240e1f1f8c69f4c98d677c
2018-08-24 15:36:49 -05:00
Eric Fried 176d1d90fd Report client: Real get_allocs_for_consumer
In preparation for reshaper work, implement a superior method to
retrieve allocations for a consumer. The new get_allocs_for_consumer:
- Uses the microversion that returns consumer generations (1.28).
- Doesn't hide error conditions:
  - If the request returns non-200, instead of returning {}, it raises a
    new ConsumerAllocationRetrievalFailed exception.
  - If we fail to communicate with the placement API, instead of
    returning None, it raises (a subclass of) ksa ClientException.
- Returns the entire payload rather than just the 'allocations' dict.

The existing get_allocations_for_consumer is refactored to behave
compatibly (except it logs warnings for the previously-silently-hidden
error conditions). In a subsequent patch, we should rework all callers
of this method to use the new one, and get rid of the old one.

Change-Id: I0e9a804ae7717252175f7fe409223f5eb8f50013
blueprint: reshape-provider-tree
2018-08-24 15:31:04 -05:00
Vladyslav Drok 55fb7efe31 Use placement microversion 1.26 in update_from_provider_tree
Recent change I1fd85860c96e8690fbcf93c8a2f02178168bfd5a changed the
microversion for updating the inventory only in the
_update_inventory_attempt, missing _set_inventory_for_provider
which is called from update_from_provider_tree.
It causes failures with ironic virt driver.

Closes-Bug: 1787910
Change-Id: Ibdebd02ce6f52ca87559e9d2d5c068f37bf4b6db
2018-08-20 11:29:10 -04:00
Matt Riedemann 660e328a25 Use consumer generation in _heal_allocations_for_instance
If we're updating existing allocations for an instance due
to the project_id/user_id not matching the instance, we should
use the consumer_generation parameter, new in placement 1.28,
to ensure we don't overwrite the allocations while another
process is updating them.

As a result, the include_project_user kwarg to method
get_allocations_for_consumer is removed since nothing else
is using it now, and the minimum required version of placement
checked by nova-status is updated to 1.28.

Change-Id: I4d5f26061594fa9863c1110e6152069e44168cc3
2018-07-23 14:09:55 -04:00
Eric Fried 3518ccb665 Check provider generation and retry on conflict
Update aggregate-related scheduler report client methods to use
placement microversion 1.19, which returns provider generation in GET
/rps/{u}/aggregates and handles generation conflicts in PUT
/rps/{u}/aggregates. Helper methods previously returning aggregates and
traits now also return the generation, which is fed through
appropriately to subsequent calls. As a result, the generation kwarg is
no longer needed in _refresh_associations, so it is removed.

Doing this exposes the race described in the cited bug, so we add a
retry decorator to the resource tracker's _update and the report
client's aggregate_{add|remove}_host methods.

Related to blueprint placement-aggregate-generation
Closes-Bug: #1779931

Change-Id: I3c5fbb18297db71e682fcddb5bf4536595d92383
2018-07-20 10:09:44 -05:00
Eric Fried 814bc9d2d9 Enforce placement minimum in nova.cmd.status
We keep forgetting to bump the minimum required placement version in
nova.cmd.status (and all the related bits and pieces) whenever we
change the report client to require a new version.

This patch interposes a check in the test_report_client functional suite
any time get/put/post/delete is called from the report client.  If we
see a microversion higher than the minumum specified in nova.cmd.status,
we raise an exception, which will blow up the test.  This should force
the author of a new patch on SchedulerReportClient to do the necessary
paperwork in that patch.

...assuming said author happens to write a test in test_report_client.
This pattern can and should be copied into other test suites where
report client tests are likely to be written, to broaden the scope of
this enforcement.

Change-Id: I5482b92f941261ab6ee6b7cd532ce268c31fe793
2018-06-15 21:04:50 +00:00
Chris Dent 43cc59abe2 Provide a direct interface to placement
This is a method of using wsgi-intercept to provide a context
manager that allows talking to placement over requests, but without
a network. It is a quick and dirty way to talk to and make changes
in the placement database where the only network traffic is with the
placement database.

This is expected to be useful in the creation of tools for
performing fast forward upgrades where each compute node may need to
"migrate" its resource providers, inventory and allocations in the
face of changing representations of hardware (for example
pre-existing VGPUs being represented as nested providers) but would
like to do so when all non-database services are stopped. A system
like this would allow code on the compute node to update the
placement database, using well known HTTP interactions, without the
placement service being up.

The basic idea is that we spin up the WSGI stack with no auth,
configured using whatever already loaded CONF we happen to have
available. That CONF points to the placement database and all the
usual stuff. The context manager provides a keystoneauth1 Adapter
class that operates as a client for accessing placement. The full
WSGI stack is brought up because we need various bits of middleware
to help ensure that policy calls don't explode and so JSON
validation is in place.

In this model everything else is left up to the caller: constructing
the JSON, choosing which URIs to call with what methods (see
test_direct for minimal examples that ought to give an idea of what
real callers could expect).

To make things friendly in the nova context and ease creation of fast
forward upgrade tools, SchedulerReportClient is tweaked to take an
optional adapter kwarg on construction. If specified, this is used
instead of creating one with get_ksa_adapter(), using settings from
[placement] conf.

Doing things in this way draws a clear line between the placement parts
and the nova parts while keeping the nova parts straightforward.

NoAuthReportClient is replaced with a base test class,
test_report_client.SchedulerReportClientTestBase. This provides an
_interceptor() context manager which is a wrapper around
PlacementDirect, but instead of producing an Adapter, it produces a
SchedulerReportClient (which has been passed the Adapter provided by
PlacementDirect). test_resource_tracker and test_report_client are
updated accordingly.

Caveats to be aware of:

* This is (intentionally) set up to circumvent authentication and
  authorization. If you have access to the necessary database
  connection string, then you are good to go. That's what we want,
  right?

* CONF construction being left up to the caller is on purpose
  because right now placement itself is not super flexible in this
  area and flexibility is desired here.

This is not (by a long shot) the only way to do this. Other options
include:

* Constructing a WSGI environ that has all the necessary bits to
  allow calling the methods in the handlers directly (as python
  commands).  This would duplicate a fair bit of the middleware and
  seems error prone, because it's hard to discern what parts of the
  environ need to be filled. It's also weird for data input: we need
  to use a BytesIO to pass in data on PUTs and POSTs.

* Using either the WSGI environ or wsgi-intercept models but wrap it
  with a pythonic library that exposes a "pretty" interface to
  callers. Something like:

      placement.direct.allocations.update(consumer_uuid, {data})

* Creating a python library that assembles the necessary data for
  calling the methods in the resource provider objects and exposing
  that to:
  a) the callers who want this direct stuff
  b) the existing handlers in placement (which remain responsible
     for json manipulation and validation and microversion handling,
     and marshal data appropriately for the python lib)

I've chosen the simplest thing as a starting point because it gives
us something to talk over and could solve the immediate problem. If
we were to eventually pursue the 4th option, I would hope that we
had some significant discussion before doing so as I think it is a)
harder than it might seem at first glance, b) likely to lead to many
asking "why bother with the http interface at all?". Both require
thought.

Partially implements blueprint reshape-provider-tree
Co-Authored-By: Eric Fried <efried@us.ibm.com>
Change-Id: I075785abcd4f4a8e180959daeadf215b9cd175c8
2018-06-12 11:04:50 -05:00
Jay Pipes 5eda1fab85 mirror nova host aggregate members to placement
This patch is the first step in syncing the nova host aggregate
information with the placement service. The scheduler report client gets
a couple new public methods -- aggregate_add_host() and
aggregate_remove_host(). Both of these methods do **NOT** impact the
provider tree cache that the scheduler reportclient keeps when
instantiated inside the compute resource tracker.

Instead, these two new reportclient methods look up a resource provider
by *name* (not UUID) since that is what is supplied by the
os-aggregates Compute API when adding or removing a "host" to/from a
nova host aggregate.

Change-Id: Ibd7aa4f8c4ea787774becece324d9051521c44b6
blueprint: placement-mirror-host-aggregates
2018-05-30 12:45:20 -04:00
Chris Dent ce2840539e Move test_report_client out of placement namespace
test_report_client provides functional tests of the report client using
a fully operating placement service (via wsgi-intercept) but it is not,
in itself, testing placement. Therefore this change moves the test
into nova/tests/functional where it can sit besides other genral purpose
nova-related functional tests.

As noted in the moved file, in a future where placement is extracted,
nova could choose to import a fixture that placement (installed as a
test dependency) provides so that this test and ones like it can
continue to run as desired.

compute/test_resource_tracker.py is updated to reflect the new location
of the module as it makes use of it.

partially implements blueprint placement-extract

Change-Id: I433700e833f97c0fec946dafc2cdda9d49e1100b
2018-04-06 22:56:03 +01:00