20 KiB
Granular Resource Request Syntax
https://blueprints.launchpad.net/nova/+spec/granular-resource-requests
As Generic and Nested Resource Providers begin to crystallize and be exercised, it becomes necessary to be able to express:
- _`Requirement 1`: Requesting an allocation of a particular resource class with a particular set of traits, and requesting a different allocation of the same resource class with a different set of traits.
- _`Requirement 2`: Ensuring that requests of certain resources are allocated from the same resource provider (affinity).
- _`Requirement 3`: Ensuring that requests of certain resources are allocated from different resource providers (anti-affinity).
- _`Requirement 4`: The ability to spread allocations of effectively-identical resources across multiple resource providers in situations of high saturation ("any fit").
This specification attempts to address these requirements by way of a
numbered syntax on resource and trait keys in flavor extra_specs and the
GET /allocation_candidates
Placement
API.
Note
This document uses "RP" as an abbreviation for "Resource Provider" throughout.
Problem description
Up to this point with generic and nested resource providers and traits, it is only possible to request a single blob of resources with a single blob of traits. More specifically:
- The resources can only be expressed as an integer count of a single resource class. There is no way to express a second resource_class:count with the same resource class.
- All specified traits apply to all requested resources. There is no way to apply certain traits to certain resources.
- All resources of a given resource class are allocated from the same RP.
The Use Cases below exemplify scenarios that cannot be expressed within these restrictions.
Use Cases
Consider the following hardware representation ("wiring diagram"):
+-----------------------------------+
| CN1 |
+-+--------------+-+--------------+-+
| NIC1 | | NIC2 |
+-+---+--+---+-+ +-+---+--+---+-+
|PF1| |PF2| |PF3| |PF4|
+-+-+ +-+-+ +-+-+ +-+-+
\ \__ __/ /
\ \ / /
| X |
| ____/ \____ |
| / \ |
+-+--+-+ +-+--+-+
| NET1 | | NET2 |
+------+ +------+
Assume this is modeled in Placement as:
RP1 (represents PF1):
{
SRIOV_NET_VF=16,
NET_EGRESS_BYTES_SEC=1250000000, # 10Gbps
traits: [CUSTOM_NET1, HW_NIC_ACCEL_SSL]
}
RP2 (represents PF2):
{
SRIOV_NET_VF=16,
NET_EGRESS_BYTES_SEC=1250000000, # 10Gbps
traits: [CUSTOM_NET2, HW_NIC_ACCEL_SSL]
}
RP3 (represents PF3):
{
SRIOV_NET_VF=16,
NET_EGRESS_BYTES_SEC=125000000, # 1Gbps
traits: [CUSTOM_NET1]
}
RP4 (represents PF4):
{
SRIOV_NET_VF=16,
NET_EGRESS_BYTES_SEC=125000000, # 1Gbps
traits: [CUSTOM_NET2]
}
Use Case 1
As an Operator, I need to be able to express a boot request for an instance with one SR-IOV VF on physical network NET1 and a second SR-IOV VF on physical network NET2.
I expect the scheduler to receive the following allocation candidates:
[RP1(SRIOV_NET_VF:1), RP2(SRIOV_NET_VF:1)]
[RP1(SRIOV_NET_VF:1), RP4(SRIOV_NET_VF:1)]
[RP3(SRIOV_NET_VF:1), RP2(SRIOV_NET_VF:1)]
[RP3(SRIOV_NET_VF:1), RP4(SRIOV_NET_VF:1)]
This demonstrates the ability to get different allocations of the same resource class from different providers in a single request (Requirement 1).
Use Case 2
Request: one VF with egress bandwidth of 10000 bytes/sec. (No, it doesn't make sense that I don't care which physnet I'm on -- mentally replace NET with SWITCH if that bothers you.)
Expect:
[RP1(SRIOV_NET_VF:1), RP1(NET_EGRESS_BYTES_SEC:10000)]
[RP2(SRIOV_NET_VF:1), RP2(NET_EGRESS_BYTES_SEC:10000)]
[RP3(SRIOV_NET_VF:1), RP3(NET_EGRESS_BYTES_SEC:10000)]
[RP4(SRIOV_NET_VF:1), RP4(NET_EGRESS_BYTES_SEC:10000)]
This demonstrates the ability to ensure that allocations of different resource classes can be made to come from the same resource provider (Requirement 2).
Use Case 3
Request:
- One VF on NET1 with bandwidth 10000 bytes/sec
- One VF on NET2 with bandwidth 20000 bytes/sec on a NIC with SSL acceleration (This one should always land on RP2.)
Expect:
[RP1(SRIOV_NET_VF:1, NET_EGRESS_BYTES_SEC:10000),
RP2(SRIOV_NET_VF:1, NET_EGRESS_BYTES_SEC:20000)]
*
[RP3(SRIOV_NET_VF:1, NET_EGRESS_BYTES_SEC:10000),
RP2(SRIOV_NET_VF:1, NET_EGRESS_BYTES_SEC:20000)]
This demonstrates both Requirement 1 and Requirement 2.
Use Case 4
In a high-availability scenario, request two VFs on NET1 from different PFs.
Expect:
[RP1(SRIOV_NET_VF:1), RP3(SRIOV_NET_VF:1)]
But not either of:
[RP1(SRIOV_NET_VF:2)]
[RP3(SRIOV_NET_VF:2)]
This demonstrates Requirement 3.
Use Case 5
As an Operator, I need to be able to express a request for more than one VF and have the request succeed even if my PFs are nearly saturated. For this use case, assume that each PF resource provider has only two VFs unallocated. I need to be able to express a request for four VFs on NET1.
Expect: [RP1(SRIOV_NET_VF:2), RP3(SRIOV_NET_VF:2)]
This demonstrates Requirement 4.
Proposed change
Numbered Request Groups
With the existing syntax (once Dependencies land), a resource request can be logically expressed as:
= { resource_classA: rcA_count,
resources
resource_classB: rcB_count,
... },= [ TRAIT_C, TRAIT_D, ... ] required
Semantically, each resulting allocation candidate will consist of
resource_class
N:
rc
N_count
resources spread
arbitrarily across resource providers within the same tree (i.e. all
resource providers in a single allocation candidate will have the same
root_provider_uuid
). Each resource provider in
each resulting allocation candidate will possess all
of the listed required
traits.
Note
When shared resource providers are fully implemented, the above will read, "...spread arbitrarily across resource providers within the same tree or aggregate".
Also, it is unsupported for resource classes or traits to be repeated.
The proposed change is to augment the above to include numbered resource groupings as follows:
Logical Representation
= { resource_classA: rcA_count,
resources
resource_classB: rcB_count,
... },= [ TRAIT_C, TRAIT_D, ... ],
required
= { resource_class1A: rc1A_count,
resources1
resource_class1B: rc1B_count,
... },= [ TRAIT_1C, TRAIT_1D, ... ],
required1
= { resource_class2A: rc2A_count,
resources2
resource_class2B: rc2B_count,
... },= [ TRAIT_2C, TRAIT_2D, ... ],
required2
...,
= { resource_classXA: rcXA_count,
resourcesX
resource_classXB: rcXB_count,
... },= [ TRAIT_XC, TRAIT_XD, ... ],
requiredX
= "none"|"isolate" group_policy
Semantics
The term "results" is used below to refer to the contents of one item
in the allocation_requests
list within the
GET /allocation_candidates
response.
- The semantic for the (single) un-numbered grouping is unchanged. That is, it may still return results from different RPs in the same tree (or, when "shared" is fully implemented, the same aggregate).
- However, a numbered group will always return results from the same RP. This is to satisfy Requirement 2.
- With
group_policy=none
, separate groups (numbered or un-numbered) may return results from different RPs or the same RP (assuming isolation is not otherwise forced e.g. via traits or inventory/usage constraints). - With
group_policy=isolate
, numbered request groups are guaranteed to be satisfied by separate RPs. This applies only to numbered request groups. That is, resources within the un-numbered group are still able to be provided by any RPs in the tree (or aggregate); and there is no restriction between the RPs satisfied by the un-numbered group and those satisfied by the numbered groups. - The
group_policy
option is required when more than one numbered group is specified; omitting it will result in a 400 error. - It is still not supported to repeat a resource class within a given
(numbered or un-numbered)
resources
grouping, but there is no restriction on repeating a resource class from one grouping to the next. The same applies to traits. This is to satisfy Requirement 1. - A given
required
N list applies only to its matchingresources
N list. This goes for the un-numberedrequired
/resources
as well. - The numeric suffixes are arbitrary. Other than binding
resources
N torequired
N, they have no implied meaning. In particular, they are not required to be sequential; and there is no semantic significance to their order. - For both numbered and un-numbered
resources
, a single resource_class:count will never be split across multiple RPs. While such a split could be seen to be sane for e.g. VFs, it is clearly not valid for e.g. DISK_GB. If you want to be able to split, use separate numbered groups. This satisfies Requirement 4. - Specifying a
resources
(numbered or un-numbered) without a correspondingrequired
returns results unfiltered by traits. - It is an error to specify a
required
(numbered or un-numbered) without a correspondingresources
.
Syntax In Flavors
In reference to the Logical
Representation, the existing (once Dependencies have landed) implementation is to
specify resources
and required
traits in the
flavor extra_specs as follows:
- Each member of
resources
is specified as a separate extra_specs entry of the form:
resources:resource_classA=rcA_count
- Each member of
required
is specified as a separate extra_specs entry of the form:
trait:TRAIT_B=required
For example:
resources:VCPU=2
resources:MEMORY_MB=2048
trait:HW_CPU_X86_AVX=required
trait:CUSTOM_MAGIC=required
Proposed: Allow the same syntax for numbered
resource and trait groupings via the number being appended to the
resources
and trait
keyword:
resourcesN:resource_classC=rcC_count traitN:TRAIT_D=required
A given numbered resources
or trait
key may
be repeated to specify multiple resources/traits in the same grouping,
just as with the un-numbered syntax.
Specify inter-group affinity policy via the group_policy
key, which may have the following values:
isolate
: Different numbered request groups will be satisfied by different providers.none
: Different numbered request groups may be satisfied by different providers or common providers.
For example:
resources:VCPU=2
resources:MEMORY_MB=2048
trait:HW_CPU_X86_AVX=required
trait:CUSTOM_MAGIC=required
resources1:SRIOV_NET_VF=1
resources1:NET_EGRESS_BYTES_SEC=10000
trait1:CUSTOM_PHYSNET_NET1=required
resources2:SRIOV_NET_VF=1
resources2:NET_EGRESS_BYTES_SEC:20000
trait2:CUSTOM_PHYSNET_NET2=required
trait2:HW_NIC_ACCEL_SSL=required
group_policy=isolate
Syntax In the Placement API
In reference to the Logical
Representation, the existing (once Dependencies have landed) Placement
API implementation is via the
GET /allocation_candidates
querystring as follows:
- The
resources
are grouped together under a single key calledresources
whose value is a comma-separated list ofresource_class
N:rc
N_count
. - The traits are grouped together under a single key called
required
whose value is a comma-separated list of TRAIT_Y.
For example:
GET /allocation_candidates?resources=VCPU:2,MEMORY_MB:2048
&required=HW_CPU_X86_AVX,CUSTOM_MAGIC
Proposed: Allow the same syntax for numbered
resource and trait groupings via the number being appended to the
resources
and required
keywords, and require a
group_policy
to be specified when more than one numbered
grouping is given. In the following example, groups 1 and 2 represent Use Case 3:
GET /allocation_candidates?resources=VCPU:2,MEMORY_MB:2048
&required=HW_CPU_X86_AVX,CUSTOM_MAGIC
&resources1=SRIOV_NET_VF:1,NET_EGRESS_BYTES_SEC:10000
&required1=CUSTOM_PHYSNET_NET1
&resources2=SRIOV_NET_VF:1,NET_EGRESS_BYTES_SEC:20000
&required2=CUSTOM_PHYSNET_NET2,HW_NIC_ACCEL_SSL
&group_policy=none
The following example demonstrates the use of
group_policy=isolate
and represents Use Case 4 by ensuring that the two VFs come from
different providers, even though they are otherwise identical:
GET /allocation_candidates
?resources1=SRIOV_NET_VF:1
&required1=CUSTOM_PHYSNET_NET1
&resources2=SRIOV_NET_VF:1
&required2=CUSTOM_PHYSNET_NET1
&group_policy=isolate
There is no change to the response payload syntax.
Alternatives
- Requirement 2 could also be expressed via aggregates by associating each RP with a unique aggregate, once shared resource providers are fully implemented.
- We could allow the "number" suffixes to be any arbitrary string. However, using integers is easy to understand and validate, and obviates worries about escaping/encoding special characters, etc.
- There has been discussion over time about the need for a JSON
payload-based API to enable richer expression to request allocation
candidates. While this is still a possibility for the future, it was
considered unnecessary in this case, as the current requirements can be
met via the proposed (relatively simple) enhancements to the querystring
syntax of the existing
GET /allocation_candidates
API. - Much discussion has occurred around whether and how to satisfy both anti-affinity (Requirement 3) and "any fit" (Requirement 4). See the separate_providers proposal, the can_split proposal, and the mailing list thread for details.
Data model impact
None.
REST API impact
See Syntax In the Placement
API. To summarize, the GET /allocation_candidates
Placement
API is modified to accept arbitrary query parameter keys of the
format resources
N and
required
N, where N can be any integer.
The format of the values to these query parameters is identical to that
of resources
and required
, respectively.
Otherwise, there is no REST API impact.
Security impact
None
Notifications impact
None
Other end user impact
Operators will need to understand the Syntax In Flavors and the Semantics of the changes in order to create flavors exploiting the new functionality. See Documentation Impact.
There is no impact on the nova or openstack CLIs. The existing CLI syntax is adequate for expressing the newly-supported extra_specs keys.
Performance Impact
Use of the new syntax results in the
GET /allocation_candidates
Placement
API effectively doing multiple lookups per request. This has the
potential to impact performance in the database by a factor of N+1,
where N is the number of numbered resource groupings specified in a
given request. Clever SQL expression may reduce or eliminate this
impact.
There should be no impact outside of the database, as this feature
should not result in a significant increase in the number of records
returned by the GET /allocation_candidates
API (if
anything, the increased specificity will decrease the number of
results).
Other deployer impact
None
Developer impact
Developers of modules supplying Resource Provider representations (e.g. virt drivers) will need to be aware of this feature in order to model their RPs appropriately.
Upgrade impact
None
Implementation
Assignee(s)
- efried
Work Items
Implementation work was begun in Queens. Several patches were merged; the remaining patches have been started but are waiting on dependencies.
Scheduler
- Negotiate microversion capabilities with the Placement API.
- Recognize and parse the new Syntax In Flavors.
- If the new flavor extra_specs syntax is recognized and the Placement API is not capable of the appropriate microversion, error.
- Construct the
GET /allocation_candidates
querystring according to the flavor extra_specs. - Send the
GET /allocation_candidates
request to Placement, specifying the appropriate microversion if the new syntax is in play.
Placement
- Publish a new microversion.
- Recognize and parse the new
GET /allocation_candidates
querystring key formats if invoked at the new microversion. - Construct the appropriate database query/ies.
- Everything else is unchanged.
Dependencies
This work builds on reapproval and completion of the Nested Resource Providers effort.
Testing
Functional tests, including gabbits, will be added to exercise the
new syntax. New fixtures may be required to express some of the more
complicated configurations, particularly involving nested resource
providers. Test cases will be designed to prove various combinations and
permutations of the items listed in Semantics.
For example, a GET /allocation_candidates
request using
both numbered and un-numbered groupings against a placement service
containing multiple nested resource provider trees with three or more
levels and involving trait propagation. Migration scenarios will also be
tested.
Documentation Impact
- The Placement
API reference will be updated to describe the new syntax to the
GET /allocation_candidates
API. - The Placement Devref will be updated to describe the new microversion.
- Admin documentation (presumably the same as introduced/enhanced via the Traits in Flavors effort) will be updated to describe the new Syntax In Flavors.
References
- Traits in Flavors spec
- Traits in the GET /allocation_candidates API spec
- Generic Resource Providers original spec
- Nested Resource Providers spec
- Placement API reference
- Placement Devref
- https://etherpad.openstack.org/p/nova-multi-alloc-request-syntax-brainstorm
- https://review.openstack.org/#/q/project:openstack/nova+branch:master+topic:bp/granular-resource-requests
History
Release Name | Description |
---|---|
Queens | Introduced, approved, implementation started |
Rocky | Reproposed |