8.1 KiB

Raw Blame History

Cyborg Internal API spec

This document loosely specifies the API calls between the components of Cyborg. Driver, Agent, Conductor, and API endpoint.

These API's are internal and therefore may change from version to version without warning or backwards compatibility. This document is kept as a developer reference to be edited before any internally braking changes are made.

Problem description

Developers writing one component of Cyborg need to know how to talk to another component of Cyborg, hopefully without having to go spelunking in the code of that component.

Use Cases

Happier Cyborg developers

Proposed change

Versioning internal API's

Alternatives

A mess

Data model impact

A fixed internal API should help keep data models consistent.

REST API impact

The API changes add resource endpoints to:

Driver:

POST start accelerator discovery FROM: Agent
GET get a list of discovered accelerators and their properties FROM: Agent

Agent:

POST register driver FROM: Driver
POST start accelerator discovery across all drivers FROM: Conductor
GET get a list of all accelerators across all drivers FROM: Conductor

Conductor: * POST register agent FROM: Agent

The following new REST API call will be created:

Driver 'POST /discovery'

Trigger the discovery and setup process for a specific driver

Content-Type: application/json

{

"status":"IN-PROGRESS"

}

Driver 'GET /hardware'

Gets a list of hardware, not accelerators, accelerators are ready to use entires available by the public API. Hardware are physical devices on nodes that may or may not be ready to use or even fully supported.

200 OK Content-Type: application/json

{

"hardware":[

{

"uuid":"8e45a2ea-5364-4b0d-a252-bf8becaa606e", "acc_specs": { "remote":0, "num":1, "device_type":"CRYPTO" "acc_capability": { "num":2 "ipsec": { "aes": { "3des":50, "num":1, } } } } "acc_status": { "setup_required":true, "reboot_equired":false } }]

}

Driver 'POST /hello'

Registers that a driver has been installed on the machine and is ready to use. As well as it's endpoint and hardware support.

Content-Type: application/json

{

"status":"READY", "endpoint":"localhost:1337", "type":"CRYPTO"

}

Agent 'POST /discovery'

Trigger the discovery and setup process for all registered drivers

See driver example

Agent 'GET /hardware'

Get list of hardware across all drivers on the node

see driver example

Conductor 'POST /hello'

Registers that an Agent has been installed on the machine and is ready to use.

Content-Type: application/json

{

"status":"READY", "endpoint":"compute-whatever:1337",

}

Security impact

Care must be taken to secure the internal endpoints from malicious calls

Notifications impact

N/A

Other end user impact

Aside from the API, are there other ways a user will interact with this feature?

Does this change have an impact on python-cyborgclient? What does the user interface there look like?

Performance Impact

In this model the Agent takes care of wrangling however many drivers are on a compute and the Conductor takes care of wrangling all the agents to present a coherent answer to the API quickly and easily. I don't include API <-> Conductor calls yet because I assume the API will be for the most part working from the database while the Conductor tries to keep that database up to date and takes the occasional setup call.

Other deployer impact

In this model we won't really know when we're missing an agent. If one has reported in previously and then goes away we can have an alarm for that. But if an agent never reports in we just have to assume no instance exists by that name. This means making sure the Cyborg Drivers/Agent's are installed and running is the responsibility of the deployment tool.

Developer impact

More internal communication in Cyborg

Implementation

Assignee(s)

Who is leading the writing of the code? Or is this a blueprint where you're throwing it out there to see who picks it up?

If more than one person is working on the implementation, please designate the primary author and contact.

Primary assignee:: <launchpad-id or None>
Other contributors:: <launchpad-id or None>

Work Items

Work items or tasks -- break the feature up into the things that need to be done to implement it. Those parts might end up being done by different people, but we're mostly trying to understand the timeline for implementation.

Dependencies

Include specific references to specs and/or blueprints in cyborg, or in other projects, that this one either depends on or is related to.
If this requires functionality of another project that is not currently used by Cyborg, document that fact.
Does this feature require any new library dependencies or code otherwise not included in OpenStack? Or does it depend on a specific version of library?

Testing

Please discuss the important scenarios needed to test here, as well as specific edge cases we should be ensuring work correctly. For each scenario please specify if this requires specialized hardware, a full OpenStack environment, or can be simulated inside the Cyborg tree.

Please discuss how the change will be tested. We especially want to know what tempest tests will be added. It is assumed that unit test coverage will be added so that doesn't need to be mentioned explicitly, but discussion of why you think unit tests are sufficient and we don't need to add more tempest tests would need to be included.

Is this untestable in gate given current limitations (specific hardware / software configurations available)? If so, are there mitigation plans (3rd party testing, gate enhancements, etc).

Documentation Impact

Which audiences are affected most by this change, and which documentation titles on docs.openstack.org should be updated because of this change? Don't repeat details discussed above, but reference them here in the context of documentation for multiple audiences. For example, the Operations Guide targets cloud operators, and the End User Guide would need to be updated if the change offers a new feature available through the CLI or dashboard. If a config option changes or is deprecated, note here that the documentation needs to be updated to reflect this specification's change.

References

Please add any useful references here. You are not required to have any reference. Moreover, this specification should still make sense when your references are unavailable. Examples of what you could include are:

Links to mailing list or IRC discussions
Links to notes from a summit session
Links to relevant research, if appropriate
Related specifications as appropriate (e.g. if it's an EC2 thing, link the EC2 docs)
Anything else you feel it is worthwhile to refer to

History

Optional section intended to be used each time the spec is updated to describe new design, API or any database schema updated. Useful to let reader understand what's happened along the time.

Revisions
Release Name	Description
Pike	Introduced

8.1 KiB Raw Blame History