diff --git a/doc/source/admin/index.rst b/doc/source/admin/index.rst new file mode 100644 index 00000000..609060c3 --- /dev/null +++ b/doc/source/admin/index.rst @@ -0,0 +1,237 @@ +==================== +Acceleration Service +==================== + +The OpenStack Cyborg is running as an acceleration service that allows you to +manage the lifecycle of accelerating for an instance in cloud computing +platform. It gives you control over accelerators attached to instances easily. + +Overview +-------- + +A good understand on how Cyborg interacts with Nova and Placement help +operators manage the acceleration service more effectively. + +.. image:: ../figures/cyborg-nova-interaction.png + :width: 700 px + :scale: 99 % + :align: center + +Coexistence with PCI whitelists +------------------------------- + +The operator tells Nova which PCI devices to claim and to be used by +configuring the PCI Whitelists mechanism. In addition, the operator installs +Cyborg drivers in compute nodes and configures/enables them. Those drivers may +then discover and report some PCI devices. The operator must ensure that both +configurations are compatible. + +Ideally, there is a single way for the operator to identify which PCI +devices should be claimed by Nova and which by Cyborg. Until that is figured +out, the operator shall use Cyborg’s configuration file to specify which Cyborg +drivers are enabled. Since each driver claims specific PCI IDs, the operator +can and must ensure that none of these PCI IDs are included in Nova’s PCI +whitelist. + +Placement update +---------------- + +Cyborg conductor calls Placement API directly to represent devices and +accelerators. Some of the intended use cases for the API invocation are: + +* Create or delete child RPs under the compute node RP. + +* Create or delete custom RCs and custom traits. + +* Associate traits with RPs or remove such association. + +* Update RP inventory. + +Cyborg shall not modify the RPs created by any other component, such as Nova +virt drivers. + +User Requests +------------- + +The user request for accelerators is encapsulated in a device profile, +which is created and managed by the admin via the Cyborg API. + +The structure overview of a `device_profile` is like this: + +.. code:: json + + { + "device_profiles":[ + { + "name":"fpga-dp1", + "uuid":"5518a925-1c2c-49a2-a8bf-0927d9456f3e", + "description": "", + "groups":[ + { + "trait:CUSTOM_FPGA_TRAITS":"required", + "resources:FPGA":"1", + "accel:bitstream_id":"d5ca2f11-3108-4426-a11c-a959987565df" + } + ], + "created_at": "2020-03-10 03:52:15+00:00", + "updated_at": null, + "links":[ + { + "href":"http://192.168.32.217/accelerator/v2/device_profiles/5518a925-1c2c-49a2-a8bf-0927d9456f3e", + "rel":"self" + } + ] + } + ] + } + +The device profile is folded into the flavor as an extra spec by the operator, +as below: + +.. code:: bash + + openstack flavor set --property 'accel:device_profile=' flavor + +Thus the standard Nova API can be used to create an instance with only the +flavor (without device profiles), like this: + +.. code:: bash + + openstack server create --flavor f .... # instance creation + +In the future, device profile may be used by itself to specify accelerator +resources for the instance creation API. + +Updating the Request Spec +------------------------- + +When the user submits a request to create an instance, as described in Section +User Requests, Nova needs to call a Cyborg API, to get back the resource +request groups in the device profile and merge them into the request spec. + +This call, like all the others that Nova would make to Cyborg APIs, is done +through a Keystone-based adapter that would locate the Cyborg service, similar +to the way Nova calls Placement. A Cyborg client module added to Nova, will +encapsulate such calls. + +VM images in Glance may be associated with image properties (other than image +traits), such as bitstream/function IDs needed for that image. So, Nova should +pass the VM image UUID from the request spec to Cyborg. + +The groups in the device profile are numbered by Cyborg. The request groups +that are merged into the request spec are numbered by Nova. These numberings +would not be the same in general, i.e., the N-th device profile group may not +correspond to the N-th request group in the request spec. + +When the device profile request groups are added to other request groups in the +flavor, the group_policy of the flavor shall govern the overall semantics of +all request groups. + +Accelerator Requests +-------------------- + +An accelerator request (ARQ) is an object that represents the state of the +request for an accelerator to be assigned to an instance. The creation and +management of ARQs are handled by Cyborg, and ARQs are persisted in Cyborg +database. + +An ARQ represents a request for a single accelerator by definition. The +device profile in the user request may have N request groups, each asking for M +accelerators, then N * M ARQs will be created for that device profile. + +When an ARQ is initially created by Cyborg, it is not yet associated with a +specific host name or a device resource provider. So it is said to be in an +unbound state. Subsequently, Nova calls Cyborg to bind the ARQ to a host name, +a device RP UUID and an instance UUID. If the instance fails to spawn, Nova +would unbind the ARQ with deleting it. On instance termination, Nova would +delete the ARQs after unbinding them. + +Each ARQ needs to be matched to the specific RP in the allocation candidate +that Nova has chosen, before the ARQ is bound. The current Nova code maps +request groups to RPs, while the Cyborg client module in Nova +(cyborg-client-module) matches ARQs to request groups. The matching is done +using the request_id field in the RequestGroup object as below: + +* The order of request groups in a device profile is not significant, but it is + preserved by Cyborg. Thus, each device profile request group has a unique + index. + +* When the device profile request groups returned by Cyborg are added to the + request spec, the request_id field is set to ‘device_profile_’ for the + N-th device profile request group (starting from zero). The device profile + name need not be included here because there is only one device profile per + request spec. + +* When Cyborg creates an ARQ for a device profile, it embeds the device profile + request group index in the ARQ before returning it to Nova. + +* The matching is done in two steps: + + * Each ARQ is mapped to a specific request group in the request spec using + the request_id field. + + * Each request group is mapped to a specific RP using the same logic as the + Neutron bandwidth provider. + +Cyborg and Nova interaction workflow +------------------------------------ + +This flow is captured by the following sequence diagram, in which the Nova +conductor and scheduler are together represented as the Nova controller. + +.. image:: ../figures/cyborg-nova-interaction-workflow.svg + +A Cyborg client module is added to nova (cyborg-client-module). All Cyborg API +calls are routed through that. + +1. The Nova API server receives a `POST /servers` API request with a flavor + that includes a device profile name. + +2. The Nova API server calls the Cyborg API + `GET /v2/device_profiles?name=$device_profile_name` and gets back the device + profile. The request groups in that device profile are added to the request + spec. + +3. The Nova scheduler invokes Placement and gets a list of allocation + candidates. It selects one of those candidates and makes claim(s) in + Placement. The Nova conductor then sends a RPC message + build_and_run_instances to the Nova compute manager. + +4. Nova conductor manager calls the Cyborg API `POST /v2/accelerator_requests` + with the device profile name. Cyborg creates a set of unbound ARQs for that + device profile and returns them to Nova. + +5. The Cyborg client in Nova matches each ARQ to the resource provider picked + for that accelerator. + +6. The Nova compute manager calls the Cyborg API + `PATCH /v2/accelerator_requests` to bind the ARQ with the host name, + device’s RP UUID and instance UUID. This is an asynchronous call which + prepares or reconfigures the device in the background. + +7. Cyborg, on completion of the bindings (successfully or otherwise), + calls Nova’s `POST /os-server-external-events` API with: + +.. code:: + + { + "events": [ + { "name": "accelerator-request-bound", + "tag": $device_profile_name, + "server_uuid": $instance_uuid, + "status": "completed" # or "failed" + }, + ... + ] + } + +8. The Nova compute manager waits for the notification, subject to the timeout + mentioned in Section Other deployer impact. It then calls the Cyborg REST + API `GET /v2/accelerator_requests?instance=&bind_state=resolved`. + +9. The Nova virt driver uses the attach handles returned from the Cyborg call + to compose PCI passthrough devices into the VM’s definition. + +10. If there is any error after binding has been initiated, Nova must unbind + the relevant ARQs by calling Cyborg API. It may then retry on another host + or delete the (unbound) ARQs for the instance. diff --git a/doc/source/figures/cyborg-nova-interaction-workflow.svg b/doc/source/figures/cyborg-nova-interaction-workflow.svg new file mode 100644 index 00000000..6e668eeb --- /dev/null +++ b/doc/source/figures/cyborg-nova-interaction-workflow.svg @@ -0,0 +1,83 @@ + + + + + + + blockdiag + + + + + + + + + + + Nova Controller + + Placement + + Cyborg + + Nova Compute + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + GET /v2/device_profiles?name=mydp + {"device_profiles": $device_profile} + Merge request grou + ps into request_sp + ec + Get /allocation_candidates + allocation candidates with nes + ted RPs + Select a candidate + build_and_run_instances() + POST /v2/accelerator_requests + {"arqs": [$arq, ...] + PATCH /v2/accelerator_requests + {"arqs": [$arq, ...] + POST /os-server-external-events + Wait for notific + ation from Cybor + g + GET /v2/accelerator_requests? + instance=$uuid&bind_state=reso + lved + {"arqs": [$arq, ....]} + \ No newline at end of file diff --git a/doc/source/figures/cyborg-nova-interaction.png b/doc/source/figures/cyborg-nova-interaction.png new file mode 100644 index 00000000..5c0674b6 Binary files /dev/null and b/doc/source/figures/cyborg-nova-interaction.png differ diff --git a/doc/source/index.rst b/doc/source/index.rst index b5a94e3f..6056d27c 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -29,6 +29,7 @@ Installation install/install-from-pip install/install-from-source admin/config-wsgi + admin/index Configuration Reference ~~~~~~~~~~~~~~~~~~~~~~~ @@ -39,6 +40,14 @@ Configuration Reference configuration/index reference/support-matrix +Maintenance +~~~~~~~~~~~ + +Once you are running cyborg, the following information is extremely useful. + +* :doc:`Admin Guide `: A collection of guides for administrating + cyborg. + For End Users -------------