Merge "Specification for Cyborg/Nova interaction for scheduling."

This commit is contained in:
Zuul 2018-06-11 03:45:04 +00:00 committed by Gerrit Code Review
commit 4ec459afb2
1 changed files with 486 additions and 0 deletions

View File

@ -0,0 +1,486 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
Cyborg-Nova Interaction for Scheduling
==========================================
https://blueprints.launchpad.net/cyborg/+spec/cyborg-nova-interaction
Cyborg provides a general management framework for accelerators, such
as FPGAs, GPUs, etc. For scheduling an instance that needs accelerators,
Cyborg needs to work with Nova on three levels:
* Representation and Discovery: Cyborg shall represent accelerators as
resources in Placement. When a device is discovered, Cyborg updates
resource providers, inventories, traits, etc. in Placement.
* Instance placement/scheduling: Cyborg may provide a filter and/or weigher
that limit or prioritize hosts based on available accelerator resources,
but it is expected that Placement itself can handle most requirements.
* Attaching accelerators to instances. In the compute node, Cyborg shall
define a workflow based on interacting with Nova through a new os-acc
library (similar to os-vif and os-brick).
This spec addresses the first two aspects. There is another spec to
address the attachment of accelerators to instances [#os-acc]_.
Cyborg also needs to handle some aspects for FPGAs without involving
Nova, specifically FPGA programming and bitstream management. They
will be covered in other specs. This spec is independent of those specs.
This spec is common to all accelerators, including GPUs, High Precision
Time Synchronization (HPTS) cards, etc. Since FPGAs have more aspects to
be considered than other devices, some sections may focus on FPGA-specific
factors. The spec calls out the FPGA-specific aspects.
Smart NICs based on FPGAs fall into two categories: those which expose
the FPGA explicitly to the host, and those that do not. Cyborg's scope
includes the former. This spec includes such devices, though the
Cyborg-Neutron interaction is out of scope.
The scope of this spec is Rocky release.
Terminology
===========
* Accelerator: The unit that can be assigned to an instance for
offloading specific functionality. For non-FPGA devices, it is either the
device itself or a virtualized version of it (e.g. vGPUs). For FPGAs, an
accelerator is either the entire device, a region within the device or a
function.
* Bitstream: An FPGA image, usually a binary file, possibly with
vendor-specific metadata. A bitstream may implement one or more functions.
* Function: A specific functionality, such as matrix multiplication or video
transcoding, usually represented as a string or UUID. This term may be used
with multi-function devices, including FPGAs and other fixed function
hardware like Intel QuickAssist.
* Region: A part of the FPGA which can be programmed without disrupting
other parts of that FPGA. If an FPGA does not support Partial
Reconfiguration, the entire device constitutes one region. A region
may implement one or more functions.
Here is an example diagram for an FPGA with multiple regions, and multiple
functions in a region::
PCI A PCI B
| |
+-------|--------|-------------------+
| | | |
| +----|--------|---+ +--------+ |
| | +--|--+ +---|-+ | | | |
| | | Fn A| | Fn B| | | | |
| | +-----+ +-----+ | | | |
| +-----------------+ +--------+ |
| Region 1 Region 2 |
| |
+------------------------------------+
Problem description
===================
Cyborg's representation and handling of accelerators needs to be consistent
with Nova's Placement API. Specifically, they must be modeled in terms of
Resource Providers (RPs), Resource Classes (RCs) and Traits.
Though PCI Express is entrenched in the data center, some accelerators
may be exposed to the host via some other protocol. Even with PCI, the
connections between accelerator components and PCI functions
may vary across devices. Accordingly, Cyborg should not represent
accelerators as PCI functions.
For instances that need accelerators, we need to define a way for Cyborg
to be included seamlessly in the Nova scheduling workflow.
Use Cases
---------
We need to satisfy the following use cases for the tenant role:
* Device as a Service (DaaS): The flavor asks for a device.
* FPGA variation: The flavor asks for a device to which specific
bitstream(s) can be applied. There are three variations, the first
two of which delegate bitstream programming to Cyborg for secure
programming:
* Request-time Programming: The flavor specifies a bitstream. (Cyborg
applies the bitstream before instance bringup. This is similar to
AWS flow.)
* Run-time Programming: The instance may request one or more
bitstreams dynamically. (Cyborg receives the request and does
the programming.)
* Direct Programming: The instance directly programs the FPGA
region assigned to it, without delegating it to Cyborg. The
security questions that this raises need to be addressed in
the future. (This is listed only for completeness; this is not
going to be addressed in Rocky, or even future releases till
the security concerns are fully addressed.)
* Accelerated Function as a Service (AFaaS): The flavor asks for a
function (e.g. ipsec) attached to the instance. The operator may
satisfy this use case in two ways:
* Pre-programmed: Do not allow orchestration to modify any function,
for any of these reasons:
* Only fixed function hardware is available. (E.g. ASICs.)
* Operational simplicity.
* Assure tenants of programming security, by doing all programming offline
through some audited process.
* For FPGAs, allow orchestration to program as needed, to maximize
flexibility and availability of resources.
An operator must be able to provide both Device as a Service and Accelerated
Function as a Service in the same cluster, to serve all
kinds of users: those who are device-agnostic, those using 3rd party
bitstreams, and those using their own bitstreams (incl. developers).
The goal for Cyborg is to provide the mechanisms to enable all these use
cases.
In this spec, we do not consider bitstream developer or device developer
roles. Also, we assume that each accelerator device is dedicated to a
compute node, rather than shared among several nodes.
Proposed change
===============
Representation
--------------
* Cyborg will represent a generic accelerator for a device type as a
custom Resource Class (RC) for that type, of the form
CUSTOM_ACCELERATOR_<device-type>. E.g. CUSTOM_ACCELERATOR_GPU,
CUSTOM_ACCELERATOR_FPGA, etc. This helps in defining separate quotas
for different device types.
* Device-local memory is the memory available to the device alone,
usually in the form of DDR, QDR or High Bandwidth Memory in the
PCIe board along with the device. It can also be represented as an
RC of the form CUSTOM_ACCELERATOR_MEMORY_<memory-type>. E.g.
CUSTOM_ACCELERATOR_MEMORY_DDR. A single PCIe board may have more
than one type of memory.
* In addition, each device/region is represented as a Resource Provider
(RP). This enables traits to be applied to it and other RPs/RCs to
be contained within it. So, a device RP provides one or more instances
of that device type's RC. This depends on nested RP support in
Nova [#nRP]_.
* For FPGAs, both the device and the regions within it will be
represented as RPs. This allows the hierarchy within an FPGA
to be naturally modelled as an RP hierarchy.
* Using Nested RPs is the preferred way. But, until Nova
supports nested RPs, Cyborg shall associate the
RCs and traits (described below) with the compute node RPs. This
requires that all devices on a single host must share the same
traits. If nested RP support becomes usable after Rocky release,
the operator needs to handle the upgrade as below:
* Terminate all instances using accelerators.
* Remove all Cyborg traits and inventory on all compute node RPs,
perhaps by running a script.
* Perform the Cyborg upgrade. Post-upgrade, the new agent/driver(s)
will create RPs for the devices and publish the traits
and inventory.
* Cyborg will associate a Device Type trait with each device, of the
form CUSTOM_<device-type>-<vendor>. E.g. CUSTOM_GPU_AMD or
CUSTOM_FPGA_XILINX. This trait is intended to help match the
software drivers/libraries in the instance image. This is meant to
be used in a flavor when a single driver/library in the instance
image can handle most or all of device types from a vendor.
* For FPGAs, this trait and others will be applied to the region
RPs which are children of the device RPs as well.
* Cyborg will associate a Device Family trait with each device as
needed, of the form CUSTOM_<device-type>_<vendor>_<family>.
E.g. CUSTOM_FPGA_INTEL_ARRIA10.
This is not a product name, but the name of a device family, used to
match software in the instance image with the device family. This is
a refinement of the Device Type Trait. It is meant to be used in
a flavor when there are different drivers/libraries for different
device families. Since it may be tough to forecast whether a new
device family will need a new driver/library, it may make sense to
associate both these traits with the same device RP.
* For FPGAs, Cyborg will associate a region type trait with each region
(or with the FPGA itself if there is no Partial Reconfiguration
support), of the form CUSTOM_FPGA_REGION_<vendor>__<uuid>.
E.g. CUSTOM_FPGA_REGION_INTEL_<uuid>. This is needed for Device as a
Service with FPGAs.
* For FPGAs, Cyborg may associate a function type trait with a region
when the region gets programmed, of the form
CUSTOM_FPGA_FUNCTION_<vendor>_<uuid>. E.g.
CUSTOM_FPGA_FUNCTION_INTEL_<gzip-uuid>.
This is needed for AFaaS use case. This is updated when Cyborg
reprograms a region as part of AFaaS request.
* For FPGAs, Cyborg should associate a CUSTOM_PROGRAMMABLE trait with
every region. This is needed to lay the groundwork for
multi-function accelerators in the future. Flavors should ask for
this trait, except in the pre-programmed case.
* For FPGAs, since they may implement a wide variety of functionality,
we may also attach a Functionality Trait.
E.g. CUSTOM_FPGA_COMPUTE, CUSTOM_FPGA_NETWORK, CUSTOM_FPGA_STORAGE.
* The Cyborg agent needs to get enough information from the Cyborg driver
to create the RPs, RCs and traits. In particular, it needs to get the
device type string, region IDs and function IDs from the driver. This
requires the driver/agent interface to be enhanced [#drv-api]_.
* The modeling in Placement represents generic virtual accelerators as
resource classes, and devices/regions as RPs. This is PCI-agnostic.
However, many FPGA implementations use PCI Express in general, and
SR-IOV in particular. In those cases, it is expected that Cyborg will
pass PCI VFs to instances via PCI Passthrough, and retain the PCI PF
in the host for management.
Flavors
-------
For the sake of illustrating how the device representation in Nova
can be used, and for completeness, we now show how to define flavors
for various use cases. Please see [#flavor]_ for more details.
* A flavor that needs device access always asks for one or more instances
of 'resource:CUSTOM_ACCELERATOR_<device-type>'. In addition, it
needs to specify the right traits.
* Example flavor for DaaS:
| ``resources:CUSTOM_ACCELERATOR_HPTS=1``
| ``trait:CUSTOM_HPTS_ZTE=required``
NOTE: For FPGAs, the flavor should also include CUSTOM_PROGRAMMABLE trait.
* Example flavor for AFaaS Pre-programed:
| ``resources:CUSTOM_ACCELERATOR_FPGA=1``
| ``trait:CUSTOM_FPGA_INTEL_ARRIA10=required``
| ``trait:CUSTOM_FPGA_FUNCTION_INTEL_<gzip-uuid>=required``
* Example flavor for AFaaS Orchestration-Programmed:
| ``resources:CUSTOM_ACCELERATOR_FPGA=1``
| ``trait:CUSTOM_FPGA_INTEL_ARRIA10=required``
| ``trait:CUSTOM_PROGRAMMABLE=required``
| ``function:CUSTOM_FPGA_FUNCTION_INTEL_<gzip-uuid>=required``
(Not interpreted by Nova.)
* NOTE: When Nova supports preferred traits, we can use that instead
of 'function' keyword in extra specs.
* NOTE: For Cyborg to fetch the bitstream for this function, it
is assumed that the operator has configured the function UUID
as a property of the bitstream image in Glance.
* Another example flavor for AFaaS Orchestration-Programmed which
refers to a function by name instead of UUID for ease of use:
| ``resources:CUSTOM_ACCELERATOR_FPGA=1``
| ``trait:CUSTOM_FPGA_INTEL_ARRIA10=required``
| ``trait:CUSTOM_PROGRAMMABLE=required``
| ``function_name:<string>=required``
(Not interpreted by Nova.)
* NOTE: This assumes the operator has configured the function name
as a property of the bitstream image in Glance. The FPGA
hardware is not expected to expose function names, and so
Cyborg will not represent function names as traits.
* A flavor may ask for other RCs, such as local memory.
* A flavor may ask for multiple accelerators, using the granular resource
request syntax. Cyborg can tie function and bitstream fields in
the extra_specs to resources/traits using an extension of the granular
resource request syntax (see References) which is not interpreted by Nova.
| ``resourcesN: CUSTOM_ACCELERATOR_FPGA=1``
| ``traitsN: CUSTOM_FPGA_INTEL_ARRIA10=required``
| ``othersN: function:CUSTOM_FPGA_FUNCTION_INTEL_<gzip-uuid>=required``
Scheduling workflow
--------------------
We now look at the scheduling flow when each device implements only
one function. Devices with multiple functions are outside the scope for now.
* A request spec with a flavor comes to Nova conductor/scheduler.
* Placement API returns the list of RPs which contain the requested
resources with matching traits. (With nested RP support, the returned
RPs are device/region RPs. Without it, they are compute node RPs.)
* FPGA-specific: For AFaaS orchestration-programmed use case, Placement
will return matching devices but they may not have the requested
function. So, Cyborg may provide a weigher which checks the
allocation candidates to see which ones have the required function trait,
and ranks them higher. This requires no change to Cyborg DB.
* The request_spec goes to compute node (ignoring Cells for now).
NOTE: When one device/region implements multiple functions and
orchestration-driven programming is desired, the inventory of that
device needs to be adjusted.
This can be addressed later and is not a priority for Rocky release.
See References.
* Nova compute calls os-acc/Cyborg [#os-acc]_.
* FPGA-specific: If the request spec asks for a function X in extra specs,
but X is not present in the selected region RP, Cyborg should program
that region.
* Cyborg should associate RPs/RCs and PFs/VFs with Deployables in its
internal DB. It can use such mappings associating the requested resource
(device/function) with some attach handle that can be used to
attach the resource to an instance (such as a PCI function).
NOTE : This flow is PCI-agnostic: no PCI whitelists involved.
Handling Multiple Functions Per Device
--------------------------------------
Alternatives
------------
N/A
Data model impact
-----------------
Following changes are needed in Cyborg.
* Do not publish PCI functions as resources in Nova. Instead, publish
RC/RP info to Nova, and keep RP-PCI mapping internally.
* Cyborg should associate RPs/RCs and PFs/VFs with Deployables in its
internal DB.
* Driver/agent interface needs to report device/region types so that
RCs can be created.
* Deployables table should track which RP corresponds to each Deployable.
REST API impact
---------------
None
Security impact
---------------
This change allows tenants to initiate FPGA bitstream programming. To mitigate
the security impact, it is proposed that only 2 methods are offered for
programming (flavor asks for a bitstream, or the running instance asks for
specific bitstreams) and both are handled through Cyborg. There is no direct
access from an instance to an FPGA.
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
Other deployer impact
---------------------
None
Developer impact
----------------
None
Implementation
==============
Assignee(s)
-----------
None
Work Items
----------
* Decide specific changes needed in Cyborg conductor, db, agent and drivers.
Dependencies
============
* `Nested Resource Providers
<http://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/nested-resource-providers-allocation-candidates.html>`_
* `Nova Granular Requests
<https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/granular-resource-requests.html>`_
NOTE: the granular requests feature is needed to define a flavor that requests
non-identical accelerators, but is not needed for Cyborg development in Rocky.
Testing
=======
For each vendor driver supported in this release, we need to integrate the
corresponding FPGA type(s) in the CI infrastructure.
Documentation Impact
====================
None
References
==========
.. [#os-acc] `Specification for Compute Node <https://review.openstack.org/#/c/566798/>`_
.. [#nRP] `Nested RPs in Rocky <http://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/nested-resource-providers-allocation-candidates.html>`_
.. [#drv-api] `Specification for Cyborg Agent-Driver API <https://review.openstack.org/#/c/561849/>`_
.. [#flavor] `Custom Resource Classes in Flavors <https://specs.openstack.org/openstack/nova-specs/specs/pike/implemented/custom-resource-classes-in-flavors.html>`_
.. [#qspec] `Cyborg Nova Queens Spec <https://github.com/openstack/cyborg/blob/master/doc/specs/queens/approved/cyborg-nova-interaction.rst>`_
.. [#ptg] `Rocky PTG Etherpad for Cyborg Nova Interaction <https://etherpad.openstack.org/p/cyborg-ptg-rocky-nova-cyborg-interaction>`_
.. [#multifn] `Detailed Cyborg/Nova scheduling <https://etherpad.openstack.org/p/Cyborg-Nova-Multifunction>`_
.. [#mails] `Openstack-dev email discussion <http://lists.openstack.org/pipermail/openstack-dev/2018-April/128951.html>`_
History
=======
Optional section intended to be used each time the spec is updated to describe
new design, API or any database schema updated. Useful to let reader know
what happened over time.
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Rocky
- Introduced