Update Queens NRP spec to reflect reality

Edit the Queens specification for bp nested-resource-providers to
reflect what actually got accomplished in Queens.  See background
discussion in IRC [1].  Highlights of the delta:

- Remove/soften assertions about how NUMA and SR-IOV will be
  implemented.  None of this was done in Queens, and the vision of how
  it will eventually be handled has evolved (particularly due to the
  introduction of update_provider_tree [2]).
- Update the ?root={uuid} filter to its glorious new name,
  ?in_tree={uuid}, and more clearly document its behavior.
- Mention the future work needed in GET /allocation_candidates (series
  starting at [3], Rocky spec to be proposed) and update_provider_tree
  [2] to make this usable.

[1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-02-12.log.html#t2018-02-12T20:14:58
[2] https://review.openstack.org/#/c/540111
[3] https://review.openstack.org/#/c/531899

blueprint: nested-resource-providers

Change-Id: I3d8194e5cf6d3b39878edad99fe0b17092129271
This commit is contained in:
Eric Fried 2018-02-13 14:50:53 -06:00 committed by Matt Riedemann
parent 25e6c08bbd
commit 3abbaae124
1 changed files with 46 additions and 38 deletions

View File

@ -54,7 +54,7 @@ SRIOV-enabled network interface cards. In the case of multiple SRIOV-enabled
NICs on a compute host, different qualitative traits may be tagged to each NIC.
For example, the NIC called enp2s0 might have a trait "CUSTOM_PHYSNET_PUBLIC"
indicating that the NIC is attached to a physical network called "public". The
NIC enp2s1 might have a trait "CUSTOM_PHYSNET_PRIVATE" that indicates the NIC
NIC enp2s1 might have a trait "CUSTOM_PHYSNET_INTRANET" that indicates the NIC
is attached to the physical network called "Intranet". We need a way of
representing that these NICs each provide SRIOV_NET_VF resources but those
virtual functions are associated with different physical networks. In the
@ -96,32 +96,32 @@ Proposed change
We will add two new attributes to the resource provider data model:
* `parent_provider_uuid`: Indicates the UUID of the immediate parent provider.
This will be None for the vast majority of providers, and for nested resource
providers, this will most likely be the compute host's UUID. To be clear,
a resource provider can have 0 or 1 parents. We will not support multiple
parents for a resource provider.
This will be None for root providers. To be clear, a resource provider can
have 0 or 1 parents. We will not support multiple parents for a resource
provider.
* `root_provider_uuid`: Indicates the UUID of the resource provider that is at
the "root" of the tree of providers. This field allows us to implement
efficient tree-access queries and avoid use of recursive queries to follow
child->parent relations.
the "root" of the tree of providers. For Nova usage, this will be the UUID of
the resource provider corresponding to the compute host. This field allows us
to implement efficient tree-access queries and avoid use of recursive queries
to follow child->parent relations.
A new microversion will be added to the placement REST API that adds the above
attributes to the appropriate request and response payloads.
The scheduler reporting client shall be modified to track NUMA nodes and
SRIOV-enabled NICs as child resource providers to a parent compute host
resource provider.
In the future, the scheduler reporting client may be modified to track NUMA
nodes and SRIOV-enabled NICs as child resource providers to a parent compute
host resource provider.
The `VCPU` and `MEMORY_MB` resource classes will continue to be inventoried on
the parent resource provider (i.e the compute node resource provider) and not
the NUMA node child providers. The NUMA node child providers will have
inventory records populated for the `NUMA_CORE`, `NUMA_THREAD` and
`NUMA_MEMORY_MB` resource classes. When a boot request is received, the Nova
API service will need to determine whether the request (flavor and image)
specifies a particular NUMA topology and, if so, construct the request to the
placement service for the appropriate `NUMA_XXX` resources. This is currently
out of scope for this spec. This spec is only about the inventorying of the
various child providers with appropriate resource classes.
Future NUMA support may entail the NUMA node child providers having inventory
records populated for the `NUMA_CORE`, `NUMA_THREAD` and `NUMA_MEMORY_MB`
resource classes. `VCPU` and `MEMORY_MB` resource classes would continue to be
inventoried on the parent resource provider (i.e the compute node resource
provider) and not the NUMA node child providers. When a boot request is
received, the Nova API service would need to determine whether the request
(flavor and image) specifies a particular NUMA topology and, if so, construct
the request to the placement service for the appropriate `NUMA_XXX` resources.
This is currently out of scope for this spec. This spec is only about the
inventorying of the various child providers with appropriate resource classes.
On the CPU-pinning side of the equation, we do not plan to allow a compute node
to serve as *either* a general-purpose compute node *or* as a target for
@ -132,12 +132,6 @@ floating workloads or not. Initial thoughts were to use the
pci_passthrough_whitelist CONF option to determine this however this still
needs to be debated.
This spec will simply ensure that if a virt driver returns a NUMATopology
object in the result of its get_available_resource() call, then we will create
child resource providers representing those NUMA nodes. Similarly, if the PCI
device manager returns a set of SR-IOV physical functions on the compute host,
we will create child resource provider records for those SR-IOV PFs.
Alternatives
------------
@ -175,13 +169,31 @@ REST API impact
`root_provider_uuid` and `parent_provider_uuid` fields will be added to the
corresponding request and response payloads of appropriate placement REST APIs.
The `GET /resource_providers` call will get a new filter on `root={uuid}` that,
when present, will return all resource provider records, inclusive of the root,
having a `root_provider_uuid` equal to `{uuid}`.
The `GET /resource_providers` call will get a new filter on `in_tree={uuid}`
that, when present, will return all resource provider records, inclusive of the
root, having a `root_provider_uuid` equal to the `root_provider_uuid` of the
provider indicated by `{uuid}`. To be clear, consider a tree like:
The filter parameter `root={uuid}` will *not* be added to
`GET /allocation_candidates`, as this call is for a specific use case for the
Nova scheduler, and there is no use case for it.
.. code::
A
/ \
B D
/
C
Specifying *any* of `A`, `B`, `C`, or `D`'s UUIDs to `in_tree={uuid}` will
return *all* the providers in the entire tree (`{A, B, C, D}`).
The filter parameter `in_tree={uuid}` will *not* be added to
`GET /allocation_candidates`, as there is no use case for it.
.. note:: More work is required to make tree models usable for real
deployments. The `GET /allocation_candidates` API will need to be
updated to process requests for resources that are distributed
throughout a tree. And work will need to be done in the resource
tracker and report client (ultimately at the behest of the virt
driver) to construct nested models using these capabilities.
Security impact
---------------
@ -232,12 +244,8 @@ Work Items
* Add DB schema and object model changes
* Add REST API microversion adding new attributes for resource providers and
allocation candidates
* Add REST API microversion adding new `root={uuid}` filter on `GET
* Add REST API microversion adding new `in_tree={uuid}` filter on `GET
/resource_providers`
* Add code in scheduler reporting client to track NUMA nodes as child resource
providers on the parent compute host resource provider
* Add code in scheduler reporting client to track SRIOV PFs as child resource
providers on the parent compute host resource provider
Please note that not all of this spec is expected to be implemented in a single
release cycle. At the Queens PTG we agreed that fully suppporting NUMA will