Internal API spec

This is my attempt at whipping up some internal API calls
for Crushil, there's some assumptions in here we should talk
about. Hopefully this will also be a good place to keep track
of the internal API in general.

Change-Id: I59f3706190d0abed2997a8302717308ea5974ac5
This commit is contained in:
jkilpatr 2017-12-05 07:53:00 -05:00 committed by Justin Kilpatrick
parent 111630aa07
commit 9d40c88d0e
1 changed files with 312 additions and 0 deletions

View File

@ -0,0 +1,312 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
Cyborg Internal API spec
==========================================
This document loosely specifies the API calls between
the components of Cyborg. Driver, Agent, Conductor, and API endpoint.
These API's are internal and therefore may change from version to version
without warning or backwards compatibility. This document is kept as a
developer reference to be edited before any internally braking changes
are made.
Problem description
===================
Developers writing one component of Cyborg need to know how to talk to another
component of Cyborg, hopefully without having to go spelunking in the code
of that component.
Use Cases
---------
Happier Cyborg developers
Proposed change
===============
Versioning internal API's
Alternatives
------------
A mess
Data model impact
-----------------
A fixed internal API should help keep data models consistent.
REST API impact
---------------
The API changes add resource endpoints to:
Driver:
* `POST` start accelerator discovery FROM: Agent
* `GET` get a list of discovered accelerators and their properties FROM: Agent
Agent:
* `POST` register driver FROM: Driver
* `POST` start accelerator discovery across all drivers FROM: Conductor
* `GET` get a list of all accelerators across all drivers FROM: Conductor
Conductor:
* `POST` register agent FROM: Agent
The following new REST API call will be created:
Driver 'POST /discovery'
***************************
Trigger the discovery and setup process for a specific driver
Content-Type: application/json
{
"status":"IN-PROGRESS"
}
Driver 'GET /hardware'
**************************
Gets a list of hardware, not accelerators, accelerators are
ready to use entires available by the public API. Hardware are
physical devices on nodes that may or may not be ready to use or
even fully supported.
200 OK
Content-Type: application/json
{
"hardware":[
{
"uuid":"8e45a2ea-5364-4b0d-a252-bf8becaa606e",
"acc_specs":
{
"remote":0,
"num":1,
"device_type":"CRYPTO"
"acc_capability":
{
"num":2
"ipsec":
{
"aes":
{
"3des":50,
"num":1,
}
}
}
}
"acc_status":
{
"setup_required":true,
"reboot_equired":false
}
}]
}
Driver 'POST /hello'
***************************
Registers that a driver has been installed on the machine and is ready to use.
As well as it's endpoint and hardware support.
Content-Type: application/json
{
"status":"READY",
"endpoint":"localhost:1337",
"type":"CRYPTO"
}
Agent 'POST /discovery'
***************************
Trigger the discovery and setup process for all registered drivers
See driver example
Agent 'GET /hardware'
***************************
Get list of hardware across all drivers on the node
see driver example
Conductor 'POST /hello'
***************************
Registers that an Agent has been installed on the machine and is ready to use.
Content-Type: application/json
{
"status":"READY",
"endpoint":"compute-whatever:1337",
}
Security impact
---------------
Care must be taken to secure the internal endpoints from malicious calls
Notifications impact
--------------------
N/A
Other end user impact
---------------------
Aside from the API, are there other ways a user will interact with this
feature?
* Does this change have an impact on python-cyborgclient? What does the user
interface there look like?
Performance Impact
------------------
In this model the Agent takes care of wrangling however many drivers are on
a compute and the Conductor takes care of wrangling all the agents to present
a coherent answer to the API quickly and easily. I don't include
API <-> Conductor calls yet because I assume the API will be for the most part
working from the database while the Conductor tries to keep that database up to
date and takes the occasional setup call.
Other deployer impact
---------------------
In this model we won't really know when we're missing an agent. If one has
reported in previously and then goes away we can have an alarm for that. But
if an agent never reports in we just have to assume no instance exists by that
name. This means making sure the Cyborg Drivers/Agent's are installed and
running is the responsibility of the deployment tool.
Developer impact
----------------
More internal communication in Cyborg
Implementation
==============
Assignee(s)
-----------
Who is leading the writing of the code? Or is this a blueprint where you're
throwing it out there to see who picks it up?
If more than one person is working on the implementation, please designate the
primary author and contact.
Primary assignee:
<launchpad-id or None>
Other contributors:
<launchpad-id or None>
Work Items
----------
Work items or tasks -- break the feature up into the things that need to be
done to implement it. Those parts might end up being done by different people,
but we're mostly trying to understand the timeline for implementation.
Dependencies
============
* Include specific references to specs and/or blueprints in cyborg, or in other
projects, that this one either depends on or is related to.
* If this requires functionality of another project that is not currently used
by Cyborg, document that fact.
* Does this feature require any new library dependencies or code otherwise not
included in OpenStack? Or does it depend on a specific version of library?
Testing
=======
Please discuss the important scenarios needed to test here, as well as
specific edge cases we should be ensuring work correctly. For each
scenario please specify if this requires specialized hardware, a full
OpenStack environment, or can be simulated inside the Cyborg tree.
Please discuss how the change will be tested. We especially want to know what
tempest tests will be added. It is assumed that unit test coverage will be
added so that doesn't need to be mentioned explicitly, but discussion of why
you think unit tests are sufficient and we don't need to add more tempest
tests would need to be included.
Is this untestable in gate given current limitations (specific hardware /
software configurations available)? If so, are there mitigation plans (3rd
party testing, gate enhancements, etc).
Documentation Impact
====================
Which audiences are affected most by this change, and which documentation
titles on docs.openstack.org should be updated because of this change? Don't
repeat details discussed above, but reference them here in the context of
documentation for multiple audiences. For example, the Operations Guide targets
cloud operators, and the End User Guide would need to be updated if the change
offers a new feature available through the CLI or dashboard. If a config option
changes or is deprecated, note here that the documentation needs to be updated
to reflect this specification's change.
References
==========
Please add any useful references here. You are not required to have any
reference. Moreover, this specification should still make sense when your
references are unavailable. Examples of what you could include are:
* Links to mailing list or IRC discussions
* Links to notes from a summit session
* Links to relevant research, if appropriate
* Related specifications as appropriate (e.g. if it's an EC2 thing, link the
EC2 docs)
* Anything else you feel it is worthwhile to refer to
History
=======
Optional section intended to be used each time the spec is updated to describe
new design, API or any database schema updated. Useful to let reader understand
what's happened along the time.
.. list-table:: Revisions
:header-rows: 1
* - Release Name
- Description
* - Pike
- Introduced