From 2edd702dc960211ee490cdd2c4610523cc60e292 Mon Sep 17 00:00:00 2001 From: Balazs Gibizer Date: Wed, 22 Aug 2018 18:10:49 +0200 Subject: [PATCH] Network bandwidth resource provider This spec proposes adding new resource classes representing network bandwidth and modeling network backends as resource providers in placement. As well as adding scheduling support for the new resources in Nova. blueprint bandwidth-resource-provider Previously-approved: Rocky Co-Authored-By: Rodolfo Alonso Hernandez Co-Authored-By: Bence Romsics Change-Id: I404ccf376d9fcc8e595487e8c2743f923c4d1511 --- .../approved/bandwidth-resource-provider.rst | 602 ++++++++++++++++++ 1 file changed, 602 insertions(+) create mode 100644 specs/stein/approved/bandwidth-resource-provider.rst diff --git a/specs/stein/approved/bandwidth-resource-provider.rst b/specs/stein/approved/bandwidth-resource-provider.rst new file mode 100644 index 000000000..578d4c689 --- /dev/null +++ b/specs/stein/approved/bandwidth-resource-provider.rst @@ -0,0 +1,602 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +=================================== +Network Bandwidth resource provider +=================================== + +https://blueprints.launchpad.net/nova/+spec/bandwidth-resource-provider + +This spec proposes adding new resource classes representing network +bandwidth and modeling network backends as resource providers in +Placement. As well as adding scheduling support for the new resources in Nova. + + +Problem description +=================== + +Currently there is no method in the Nova scheduler to place a server +based on the network bandwidth available in a host. The Placement service +doesn't track the different network back-ends present in a host and their +available bandwidth. + +Use Cases +--------- + +A user wants to spawn a server with a port associated with a specific physical +network. The user also wants a defined guaranteed minimum bandwidth for this +port. The Nova scheduler must select a host which satisfies this request. + + +Proposed change +=============== + +This spec proposes Neutron to model the bandwidth resource of the physical NICs +on a compute host and their resources providers in the Placement service, +express the bandwidth request in the Neutron port, and modify Nova to consider +the requested bandwidth resource during the scheduling of the server based on +the available bandwidth resources on each compute host. + +This also means that this spec proposes to use Placement and the nova-scheduler +to select which bandwidth providing RP and therefore which physical device will +provide the bandwidth for a given Neutron port. Today selecting the physical +device happens during Neutron port binding but after this spec is implemented +this selection will happen when an allocation candidate is selected for the +server in the nova-scheduler. Therefore Neutron needs to provide enough +information in the Networking RP model in Placement and in the resource_request +field of the port so that Nova can query Placement and receive allocation +candidates that are not conflicting with Neutron port binding logic. +The Networking RP model and the schema of the new resource_request port +attribute is described in `QoS minimum bandwidth allocation in Placement API`_ +Neutron spec. + +Please note that today Neutron port binding could fail if the nova-scheduler +selects a compute host where Neutron cannot bind the port. We are not aiming to +remove this limitation by this spec but also we don't want to increase the +frequency of such port binding failures as it would ruin the usability of the +system. + + +Separation of responsibilities +------------------------------ + +* Nova creates the root RP of the compute node RP tree as today +* Neutron creates the networking RP tree of a compute node under the compute + node root RP and reports bandwidth inventories +* Neutron provides the resource_request of a port in the Neutron API +* Nova takes the ports' resource_request and includes it in the GET + /allocation_candidate request. Nova does not need to understand or manipulate + the actual resource request. But Nova needs to assign unique granular + resource request group suffix for each port's resource request. +* Nova selects one allocation candidate and claims the resources in Placement. +* Nova passes the allocation information it received from placement during + resource claiming to Neutron during port binding. Nova will send this + information in the same format as the PUT ``/allocations/{consumer_uuid}`` + request uses in Placement. Neutron will not use this to send PUT + ``/allocations/{consumer_uuid}`` requests as Nova has already claimed these + resources. + +Scoping +------- + +Due to the size and complexity of this feature the scope of the current spec +is limited. To keep backward compatibility while the feature is not fully +implemented both new Neutron API extensions will be optional and turned off by +default. Nova will check for the extension that introduces the port's +resource_request field and fall back to the current resource handling behavior +if the extension is not loaded. + +Out of scope from Nova perspective: + +* Mapping parts of a server's allocation back to the resource_request of each + individual port of the server. Instead Nova will send the whole allocation + request in the port binding to Neutron. +* Supporting separate proximity policy for the granular resource request groups + created from the Neutron port's resource_request. Nova will use the policy + defined in the flavor extra_spec for the whole request as today such policy + is global for an allocation_candidate request. +* Handling Neutron mechanism driver preference order in a weigher in the + nova-scheduler +* Interface attach with a port or network having a QoS minimum bandwidth policy + rule as interface_attach does not call scheduler today. Nova will reject + interface_attach request if the port (passed in or created in network that is + passed in) resource request in non empty. +* Server create with network having QoS minimum bandwidth policy rule as a port + in this network is created by the nova-compute *after* the scheduling + decision. This spec proposes to fail such boot in the compute-manager. +* QoS policy rule create or update on bound port +* QoS aware trunk subport create under a bound parent port +* Baremetal port having a QoS bandwidth policy rule is out of scope as Neutron + does not own the networking devices on a baremetal compute node. + +Scenarios +--------- + +This spec needs to consider multiple flows and scenarios detailed in the +following sections. + +Neutron agent first start +~~~~~~~~~~~~~~~~~~~~~~~~~ + +The Neutron agent running on a given compute host uses the existing ``host`` +neutron.conf variable to find the compute RP related to its host in Placement. +See `Finding the compute RP`_ for details and reasoning. + +The Neutron agent creates the networking RPs under the compute RP with proper +traits then reports resource inventories based on the discovered and / or +configured resource inventory of the compute host. See +`QoS minimum bandwidth allocation in Placement API`_ for details. + +Neutron agent restart +~~~~~~~~~~~~~~~~~~~~~ + +During restart the Neutron agent ensures that the proper RP tree exists in +Placement with correct inventories and traits by creating / updating the RP +tree if necessary. The Neutron agent only modifies the inventory and traits of +the RPs that were created by the agent. Also Neutron only modifies the pieces +that actually got added or deleted. Unmodified pieces should be left in place +(no delete and re-create). + +Server create with pre-created Neutron ports having QoS policy rule +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The end user creates a Neutron port with the Neutron API and attaches a QoS +policy minimum bandwidth rule to it, either directly or indirectly by attaching +the rule to the network the port is created in. Then the end user creates a +server in Nova and passes in the port UUID in the server create request. + +Nova fetches the port data from Neutron. This already happens in +create_pci_requests_for_sriov_ports in the current code base. The port contains +the requested resources and required traits. See +`Resource request in the port`_. + +The create_pci_requests_for_sriov_ports() call needs to be refactored to a more +generic call that not just generates PCI requests but also collects the +requested resources from the Neutron ports. + +The nova-api stores the requested resources and required traits in a new field +of the RequestSpec object called requested_resources. The new +`requested_resources` field should not be persisted in the api database as +it is computed data based on the resource requests from different sources in +this case from the Neutron ports and the data in the port might change outside +of Nova. + +The nova-scheduler uses this information from the RequestSpec to send an +allocation candidate request to Placement that contains the port related +resource requests besides the compute related resource requests. The requested +resources and required traits from each port will be considered to be +restricted to a single RP with a separate, numbered request group as defined in +the `granular-resource-request`_ spec. This is necessary as mixing requested +resource and required traits from different ports (i.e. one OVS and one +SRIOV) towards placement will cause empty allocation candidate response as no +RP will have both OVS and SRIOV traits at the same time. + +Alternatively we could extend and use the requested_networks +(NetworkRequestList ovo) parameter of the build_instance code path to store and +communicate the resource needs coming from the Neutron ports. Then the +select_destinations() scheduler rpc call needs to be extended with a new +parameter holding the NetworkRequestList. + +The `nova.scheduler.utils.resources_from_request_spec()` call needs to be +modified to use the newly introduced `requested_resources` field from the +RequestSpec object to generate the proper allocation candidate request. + +Later on the resource request in the Neutron port API can be evolved to support +the same level of granularity that the Nova flavor resource override +functionality supports. + +Then Placement returns allocation candidates. After additional filtering and +weighing in the nova-scheduler, the scheduler claims the resources in the +selected candidate in a single transaction in Placement. The consumer_id of the +created allocations is the instance_uuid. See `The consumer of the port related +resources`_. + +When multiple ports, having QoS policy rules towards the same physical network, +are attached to the server (e.g. two VFs on the same PF) then the resulting +allocation is the sum of the resource amounts of each individual port request. + +Delete a server with ports having QoS policy rule +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +During normal delete, `local delete`_ and shelve_offload Nova today deletes the +resource allocation in placement where the consumer_id is the instance_uuid. As +this allocation will include the port related resources those are also cleaned +up. + +Detach_interface with a port having QoS policy rule +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +After the detach succeeds in Neutron and in the hypervisor, the nova-compute +needs to delete the allocation related to the detached port in Placement. The +rest of the server's allocation will not be changed. + +Server move operations (cold migrate, evacuate, resize, live migrate) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +During the move operation Nova makes allocation on the destination host +with consumer_id == instance_uuid while the allocation on the source host is +changed to have consumer_id == migration_uuid. These allocation sets will +contain the port related allocations as well. When the move operation succeeds +Nova deletes the allocation towards the source host. If the move operation +rolled back Nova cleans up the allocations towards the destination host. + +As the port related resource request is not persisted in the RequestSpec object +Nova needs to re-calculate that from the ports' data before calling the +scheduler. + +Move operations with force host flag (evacuate, live-migrate) do not call the +scheduler. So to make sure that every case is handled we have to go through +every direct or indirect call of `reportclient.claim_resources()` function and +ensure that the port related resources are handled properly. Today we `blindly +copy the allocation from source host to destination host`_ by using the +destination host as the RP. This will be lot more complex as there will be +more than one RP to be replaced and Nova will have a hard time to figure out +what Network RP from the source host maps to what Network RP on the +destination host. A possible solution is to `send the move requests through +the scheduler`_ regardless of the force flag but skipping the scheduler +filters. + +Shelve_offload and unshelve +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +During shelve_offload Nova deletes the resource allocations including the port +related resources as those also have the same consumer_id, the instance uuid. +During unshelve a new scheduling is done in the same way as described in the +server create case. + +Details +------- + +Finding the compute RP +~~~~~~~~~~~~~~~~~~~~~~ + +Neutron already depends on the ``host`` conf variable to be set to the same id +that Nova uses in the Neutron port binding. Nova uses the hostname in the port +binding. If the ``host`` is not defined in the Neutron config then it defaults +to the hostname as well. This way Neutron and Nova are in sync today. The same +mechanism (i.e. the hostname) can be used in Neutron agent to find the compute +RP created by Nova for the same compute host. + +Having non fully qualified hostnames in a deployment can cause ambiguity. For +example different cells might contain hosts with the same hostname. This +hostname ambiguity in a multicell deployment is already a problem without the +currently proposed feature as Nova uses the hostname as the compute RP name in +Placement and the name field has a unique constraint in the Placement db model. +So in an ambiguous situation the Nova compute services having non unique +hostnames have already failed to create RPs in Placement. + +The ambiguity can be fixed by enforcing that hostnames are FQDNs. However as +this problem is not special for the currently proposed feature this fix is out +of scope of this spec. The `override-compute-node-uuid`_ blueprint describes a +possible solution. + +The consumer of the port related resources +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This spec proposes to use the instance_uuid as the consumer_id for the port +related resource as well. + +During the server move operations Nova needs to handle two sets of allocations +for a single server (one for the source and one for the destination host). If +the consumer_id of the port related resources are the port_id then during move +operations the two sets of allocations couldn't be distinguished, especially in +case of resize to same host. Therefore the port_id is not a good consumer_id. + +Another possibility would be to use a UUID from the port binding as consumer_id +but the port binding does not have a UUID today. Also today the port binding +is created after the allocations are made which is too late. + +In both cases having multiple allocations for a single server on a single host +would make it complex to find every allocation for that server both for Nova +and for the deployer using the Placement API. + +Separating non QoS aware and QoS aware ports +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If QoS aware and non QoS aware ports are mixed on the same physical port then +the minimum bandwidth rule cannot be fulfilled. The separation can be achieved +at least on two levels: + +* Separating compute hosts via host aggregates. The deployer can create two + host aggregates in Nova, one for QoS aware server and another for non QoS + aware servers. This separation can be done without changing either Nova or + Neutron. This is the proposed solution for the first version of this feature. +* Separating physical ports via traits. The Neutron agent can put traits, like + `CUSTOM_GUARANTEED_BW_ONLY` and `CUSTOM_BEST_EFFORT_BW_ONLY` to the network + RPs to indicate which physical port belongs to which group. Neutron can offer + this configurability via neutron.conf. Then Neutron can add + `CUSTOM_GUARANTEED_BW_ONLY` trait in resource request of the port that is QoS + aware and add `CUSTOM_BEST_EFFORT_BW_ONLY` trait otherwise. This solution + would allow better granularity as a server can request guaranteed bandwidth + on its data port and can accept best effort connectivity on its control port. + This solution needs additional work in Neutron but no additional work in + Nova. Also this would mean that ports without QoS policy rules would also + have at least a trait request (`CUSTOM_BEST_EFFORT_BW_ONLY`) and it would + cause scheduling problems with a port created by the nova-compute. + Therefore this option can only be supported + `after nova port create is moved to the conductor`_. +* If we use \*_ONLY traits then we can never combine them, though that would be + desirable. For example it makes perfect sense to guarantee 5 gigabits of a + 10 gigabit card to somebody and let the rest to be used on a best effort + basis. To allow this we only need to turn the logic around and use traits + CUSTOM_GUARANTEED_BW and CUSTOM_BEST_EFFORT_BW. If the admin still wants to + keep guaranteed and best effort traffic fully separated then s/he never puts + both traits on the same RP. But one can mix them if one wants to. Even the + possible starvation of best effort traffic (next to guaranteed traffic) could + be easily addressed by reserving some of the bandwidth inventory. + +Alternatives +------------ + +Alternatives are discussed in their respective sub chapters in this spec. + + +Data model impact +----------------- + +Two new standard Resource Classes will be defined to represent the bandwidth in +each direction, named as `NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND` and +`NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND`. The kbps unit is selected as the +Neutron API already use this unit in the `QoS minimum bandwidth rule`_ API and +we would like to keep the units in sync. + +A new `requested_resources` field is added to the RequestSpec versioned +object with ListOfObjectField('RequestGroup') type to store the resource and +trait requests coming from the Neutron ports. This field will not be persisted +in the database. + +The `QoS minimum bandwidth allocation in Placement API`_ Neutron spec will +propose the modeling of the Networking RP subtree in Placement. Nova will +not depend on the exact structure of such model as Neutron will provide the +port's resource request in an opaque way and Nova will only need to blindly +include that resource request to the ``GET allocation_candidates`` request. + +Resource request in the port +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Neutron needs to express the port's resource needs in the port API in a similar +way the resource request can be done via flavor extra_spec. For now we assume +that a single port requests resources from a single RP. Therefore Nova will map +each port's resource request to a single numbered resource request group as +defined in `granular-resource-request`_ spec. That spec requires that the name +of the numbered resource groups has a form of `resources`. Nova will +map a port's resource_request to the first unused numbered group in the +allocation_candidate request. Neutron does not know which ports are used +together in a server create request, and which numbered groups have already +been used by the flavor extra_spec therefore Neutron cannot assign unique +integer ids to the resource groups in these ports. + +From implementation perspective it means Nova will create one RequestGroup +instance for each Neutron port based on the port's resource_request and insert +it to the end of the list in `RequestSpec.requested_resources`. + +In case when the Neutron multi-provider extension is used and a logical network +maps to more than one physnet then the port's resource request will require +that the selected network RP has one of the physnet traits the network maps to. +This any-traits type of request is not supported by Placement today but can be +implemented similarly to member_of query param used for aggregate selection in +Placement. This will be proposed in a separate spec +`any-traits-in-allocation_candidates-query`_. + +Mapping between physical resource consumption and claimed resources +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Neutron must ensure that the resources allocated in Placement for a port are +the same as the resources consumed by that port from the physical +infrastructure. To be able to do that Neutron needs to know the mapping between +a port's resource request and a specific RP (or RPs) in the allocation record +of the server that are fulfilling the request. + +In the current scope we do not try to solve the whole problem of mapping +resource request groups to allocation subsets. Instead Nova will send the whole +allocation of the server to Neutron in the port binding of each port of the +given server and let Neutron try to do the mapping. + +REST API impact +--------------- + +Neutron REST API impact is discussed in the separate +`QoS minimum bandwidth allocation in Placement API`_ Neutron spec. + +The Placement REST API needs to be extended to support querying allocation +candidates with an RP that has at least one of the traits from a list +of requested traits. This feature will be described in the separate +`any-traits-in-allocation_candidates-query`_ spec. + +This feature also depends on the `granular-resource-request`_ and +`nested-resource-providers`_ features which impact the Placement REST API. + +Security impact +--------------- + +None + +Notifications impact +-------------------- + +None + +Other end user impact +--------------------- + +None + +Performance Impact +------------------ + +* Placement API will be used from Neutron to create RPs and the compute RP tree + will grow in size. + +* Nova will send more complex allocation candidate request to Placement as it + will include the port related resource request as well. + +As Placement do not seem to be a bottleneck today we do not foresee +performance degradation due to the above changes. + +Other deployer impact +--------------------- + +This feature impacts multiple modules and creates new dependencies between +Nova, Neutron and Placement. + +Also the deployer should be aware that after this feature the server create and +move operations could fail due to bandwidth limits managed by Neutron. + + +Developer impact +---------------- + +None + +Upgrade impact +-------------- + +Servers could exist today with SRIOV ports having QoS minimum bandwidth policy +rule and for them the resource allocation is not enforced in Placement during +scheduling. Upgrading to an OpenStack version that implements this feature +will make it possible to change the rule in Neutron to be placement aware (i.e. +request resources) then (live) migrate the servers and during the selection of +the target of the migration the minimum bandwidth rule will be enforced by the +scheduler. Tools can also be provided to search for existing instances and try +to do the minimum bandwidth allocation in place. This way the number of +necessary migrations can be limited. + +The end user will see behavior change of the Nova API after such upgrade: + +* Booting a server with a network that has QoS minimum bandwidth policy rule + requesting bandwidth resources will fail. The current Neutron feature + proposal introduces the possibility of a QoS policy rule to request + resources but in the first iteration Nova will only support such rule on + a pre-created port. +* Attaching a port or a network having QoS minimum bandwidth policy rule + requesting bandwidth resources to a running server will fail. The current + Neutron feature proposal introduces the possibility of a QoS policy rule to + request resources but in the first iteration Nova will not support + such rule for interface_attach. + +The new QoS rule API extension and the new port API extension in Neutron will +be marked experimental until the above two limitations are resolved. + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + + * balazs-gibizer (Balazs Gibizer) + +Other contributors: + + * xuhj (Alex Xu) + * minsel (Miguel Lavalle) + * bence-romsics (Bence Romsics) + * lajos-katona (Lajos Katona) + +Work Items +---------- + +This spec does not list work items for the Neutron impact. + +* Make RequestGroup an ovo and add the new `requested_resources` field to the + RequestSpec. Then refactor the `resources_from_request_spec()` to use the + new field. + +* Implement `any-traits-in-allocation_candidates-query`_ and + `mixing-required-traits-with-any-traits`_ support in Placement. + This work can be done in parallel with the below work items as any-traits + type of query only needed for a small subset of the use cases. + +* Read the resource_request from the Neutron port in the nova-api and store + the requests in the RequestSpec object. + +* Include the port related resources in the allocation candidate request in + nova-scheduler and nova-conductor and claim port related resources based + on a selected candidate. + +* Send the server's whole allocation to the Neutron during port binding + +* Ensure that server move operations with force flag handles port resource + correctly by sending such operations through the scheduler. + +* Delete the port related allocations from Placement after successful interface + detach operation + +* Reject an interface_attach request that contains a port or a network having + a QoS policy rule attached that requests resources. + +* Check in nova-compute that a port created by the nova-compute during server + boot has a non empty resource_request in the Neutron API and fail the boot if + it has + + +Dependencies +============ + +* `any-traits-in-allocation_candidates-query`_ and + `mixing-required-traits-with-any-traits`_ to support multi-provider + networks. While these placement enhancements are not in place this feature + will only support networks with a single network segment having a physnet + defined. + +* `nested-resource-providers`_ to allow modelling the networking RPs + +* `granular-resource-request`_ to allow requesting each port related resource + from a single RP + +* `QoS minimum bandwidth allocation in Placement API`_ for the Neutron impacts + +Testing +======= + +Tempest tests as well as functional tests will be added to ensure that server +create operation, server move operations, shelve_offload and unshelve and +interface detach work with QoS aware ports and the resource allocation is +correct. + + +Documentation Impact +==================== + +* User documentation about how to use the QoS aware ports. + + +References +========== + +* `nested-resource-providers`_ feature in Nova +* `granular-resource-request`_ feature in Nova +* `QoS minimum bandwidth allocation in Placement API`_ feature in Neutron +* `override-compute-node-uuid`_ proposal to avoid hostname ambiguity + + +.. _`nested-resource-providers`: https://review.openstack.org/556873 +.. _`granular-resource-request`: https://specs.openstack.org/openstack/nova-specs/specs/queens/approved/granular-resource-requests.html +.. _`QoS minimum bandwidth allocation in Placement API`: https://review.openstack.org/#/c/508149 +.. _`override-compute-node-uuid`: https://blueprints.launchpad.net/nova/+spec/override-compute-node-uuid +.. _`vnic_types are defined in the Neutron API`: > https://developer.openstack.org/api-ref/network/v2/#show-port-details +.. _`blindly copy the allocation from source host to destination host`: https://github.com/openstack/nova/blob/9273b082026080122d104762ec04591c69f75a44/nova/scheduler/utils.py#L372 +.. _`QoS minimum bandwidth rule`: https://docs.openstack.org/neutron/latest/admin/config-qos.html +.. _`any-traits-in-allocation_candidates-query`: https://blueprints.launchpad.net/nova/+spec/any-traits-in-allocation-candidates-query +.. _`mixing-required-traits-with-any-traits`: https://blueprints.launchpad.net/nova/+spec/mixing-required-traits-with-any-traits +.. _`local delete`: https://github.com/openstack/nova/blob/4b0d0ea9f18139d58103a520a6a4e9119e19a4de/nova/compute/api.py#L2023 +.. _`send the move requests through the scheduler`: https://github.com/openstack/nova/blob/9273b082026080122d104762ec04591c69f75a44/nova/scheduler/utils.py#L339 +.. _`after nova port create is moved to the conductor`: https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/prep-for-network-aware-scheduling-pike.html + +History +======= + +.. list-table:: Revisions + :header-rows: 1 + + * - Release Name + - Description + * - Queens + - Introduced + * - Rocky + - Reworked after several discussions + * - Stein + - Re-proposed as implementation hasn't been finished in Rocky