Cyborg Nova interaction take 2

Dramatically revised from this

https://review.openstack.org/#/c/448228/3/specs/proposal/cyborg-nova-interaction.rst

My goal with this spec is to have it be only where we interact with nova without
details of other components, otherwise we just get into a monolithic spec for
everything.

I plan to expand this into exact api calls and a detailed workflow, especially
for the new API call we will have to make in nova to register whitelisted
devices live.  That being said we may need to reboot the machine to change the
grub config anyways so maybe we should be looking at how to make that work
first.

Change-Id: I22037109b613d7b33d7c620b78493ec7e96e735e
This commit is contained in:
jkilpatr 2017-09-29 13:21:07 -04:00 committed by Justin Kilpatrick
parent 3d0655a179
commit 5a3aa4a9c7
1 changed files with 183 additions and 0 deletions

View File

@ -0,0 +1,183 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=======================
Cyborg-Nova interaction
=======================
https://blueprints.launchpad.net/cyborg/+spec/cyborg-nova-interaction
Cyborg, as a service for managing accelerators of any kind needs to cooperate
with Nova on two planes: Cyborg should be able to inform Nova about the
resources through placement API[1], so that scheduler can leverage user
requests for particular functionality into assignment of specific resource using
resource provider which possess an accelerator, and second, Cyborg should be
able to provide information on how Nova compute can attach particular resource
to VM.
In a nutshell, this blueprint will define how information between Nova and
Cyborg will be exchanged.
Problem description
===================
Currently in OpenStack the use of non-standard accelerator hardware is supported
in that features exist across many of the core servers that allow these resources
to be allocated, passed through, and eventually used.
What remains a challenge though is the lack of an integrated workflow; there is no
way to configure many of the accelerator features without significant by hand effort
and service disruptions that go against the goals of having a easy, stable, and
flexible cloud.
Cyborg exists to bring these disjoint efforts together into a more standard
workflow. While many components of this workflow already exist, some don't
and will need to be written expressly for this goal.
Use Cases
---------
All possible use cases were briefly described in backlog Nova spec [2]. It
might be distinguished two main use case groups for which accelerators might be
used:
* Accelerator might be attached to the VM, where workload demands acceleration.
That can be achieved by passing whole PCI device, certain host device from
``/dev/`` filesystem, passing Virtual Function, etc.
* Accelerator might be utilized by infrastructure, like accelerating virtual
switches (i.e. Open vSwitch), and than utilized via appropriate service (like
Neutron for example).
Proposed Workflow
===============
Using a method not relevant to this proposal Cyborg Agent inspects hardware
and finds accelerators that it is interested in setting up for use.
These accelerators are registered into the Cyborg Database and the Cyborg
Conductor is now responsible for using the Nova placement API to create
corresponding traits and resources.
One of the primary responsibilities of the Cyborg conductor is to keep the
placement API in sync with reality. For example if here is a device with
a virtual function or a FPGA with a given program Cyborg may be tasked with
changing the virtual function on the NIC or the program on the FPGA. At which
point the previously specified traits and resources need to be updated. Likewise
Cyborg will be watching monitoring Nova's instances to ensure that doing this
doesn't pull resources out from under an allocated instance.
At a high level what we need to be able to do is the following
1. Add a PCI device to Nova's whitelist live (config only / needs implementation)
2. Add information about this device to the placement API (existing / being worked)
3. Hotplug and unplug PCI devices from instances (existing / not sure how well maintained)
Alternatives
------------
Don't use Cyborg, struggle with bouncing services and grub config changes yourself.
Data model impact
-----------------
N/A
REST API impact
---------------
N/A
Security impact
---------------
N/A
Notifications impact
--------------------
N/A
Other end user impact
---------------------
N/A
Performance Impact
------------------
N/A
Other deployer impact
---------------------
N/A
Developer impact
----------------
N/A
Implementation
==============
Assignee(s)
-----------
Primary assignee:
None
Work Items
----------
* Implementation of Cyborg service
* Implementation of Cyborg agent
* Blueprint for changes in Nova
* Implementation of the POC which exposes functionality and interoperability
between Cyborg and Nova
Dependencies
============
This design depends on the changes which may or may not be accepted in Nova
project. Other than that is ongoing work on Nested resource providers:
http://specs.openstack.org/openstack/nova-specs/specs/ocata/approved/nested-resource-providers.html
Which would be an essential feature in Placement API, which will be leveraged by
Cyborg.
Testing
=======
There would be a need to provide another gate, which would provide an
accelerator for tests.
Documentation Impact
====================
* Document new nova api for whitelisting
* Document developer and user interaction with the workflow
* Document placement api standard identifiers
References
==========
* [1] https://docs.openstack.org/developer/nova/placement.html
* [2] https://review.openstack.org/#/c/318047/
* [3] https://github.com/openstack/nova/blob/390c7e420f3880a352c3934b9331774f7afdadcc/nova/compute/resource_tracker.py#L751
History
=======
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Pike
- Introduced