diff --git a/HACKING.rst b/HACKING.rst index f0c0547a..81d65385 100644 --- a/HACKING.rst +++ b/HACKING.rst @@ -7,10 +7,11 @@ Before you commit your code run tox against your patch using the command. tox . -If any of the tests fail correct the error and try again. If your code is valid Python -but not valid pep8 you may find autopep8 from pip useful. +If any of the tests fail correct the error and try again. If your code is valid +Python but not valid pep8 you may find autopep8 from pip useful. -Once you submit a patch integration tests will run and those may fail, -1'ing your patch -you can make a gerrit comment 'recheck ci' if you have reviewed the logs from the jobs -by clicking on the job name in gerrit and concluded that the failure was spurious or otherwise -not related to your patch. If problems persist contact people on #openstack-cyborg or #openstack-infra. +Once you submit a patch integration tests will run and those may fail, +-1'ing your patch you can make a gerrit comment 'recheck ci' if you have +reviewed the logs from the jobs by clicking on the job name in gerrit and +concluded that the failure was spurious or otherwise not related to your patch. +If problems persist contact people on #openstack-cyborg or #openstack-infra. diff --git a/doc/source/admin/api.rst b/doc/source/admin/api.rst index 982a4d12..01c11a75 100644 --- a/doc/source/admin/api.rst +++ b/doc/source/admin/api.rst @@ -5,19 +5,19 @@ General Information =================== This document describes the basic REST API operation that Cyborg supports -for Pike release. +for Pike release:: -+--------+-----------------------+-------------------------------------------------------------------------------+ -| Verb | URI | Description | -+========+=======================+===============================================================================+ -| GET | /accelerators | Return a list of accelerators | -+--------+-----------------------+-------------------------------------------------------------------------------+ -| GET | /accelerators/{uuid} | Retrieve a certain accelerator info identified by `{uuid}` | -+--------+-----------------------+-------------------------------------------------------------------------------+ -| POST | /accelerators | Create a new accelerator. | -+--------+-----------------------+-------------------------------------------------------------------------------+ -| PUT | /accelerators/{uuid} | Update the spec for the accelerator identified by `{uuid}` | -+--------+-----------------------+-------------------------------------------------------------------------------+ -| DELETE | /accelerators/{uuid} | Delete the accelerator identified by `{uuid}` | -+--------+-----------------------+-------------------------------------------------------------------------------+ + +--------+-----------------------+-----------------------------------------------------------+ + | Verb | URI | Description | + +========+=======================+===========================================================+ + | GET | /accelerators | Return a list of accelerators | + +--------+-----------------------+-----------------------------------------------------------+ + | GET | /accelerators/{uuid} | Retrieve a certain accelerator info identified by `{uuid}`| + +--------+-----------------------+-----------------------------------------------------------+ + | POST | /accelerators | Create a new accelerator. | + +--------+-----------------------+-----------------------------------------------------------+ + | PUT | /accelerators/{uuid} | Update the spec for the accelerator identified by `{uuid}`| + +--------+-----------------------+-----------------------------------------------------------+ + | DELETE | /accelerators/{uuid} | Delete the accelerator identified by `{uuid}` | + +--------+-----------------------+-----------------------------------------------------------+ diff --git a/doc/source/contributor/contributing.rst b/doc/source/contributor/contributing.rst index 9c67ca79..6e7408c5 100644 --- a/doc/source/contributor/contributing.rst +++ b/doc/source/contributor/contributing.rst @@ -135,10 +135,11 @@ Finally, push the patch for review using, Adding functionality -------------------- -If you are adding new functionality to Cyborg please add testing for that functionality -and provide a detailed commit message outlining the goals of your commit and how you -achived them. +If you are adding new functionality to Cyborg please add testing for that +functionality and provide a detailed commit message outlining the goals of +your commit and how you achived them. -If the functionality you wish to add doesn't fix in an existing part of the Cyborg -achitecture diagram drop by our team meetings to disscuss how it could be implemented +If the functionality you wish to add doesn't fix in an existing part of the +Cyborg achitecture diagram drop by our team meetings to disscuss how it +could be implemented diff --git a/doc/source/contributor/devstack_setup.rst b/doc/source/contributor/devstack_setup.rst index d948b546..71ec574a 100644 --- a/doc/source/contributor/devstack_setup.rst +++ b/doc/source/contributor/devstack_setup.rst @@ -119,7 +119,8 @@ It will speed up your installation if you have a local GIT_BASE. ##### Command line You can `source openrc YOUR_USER YOUR_USER (e.g. source openrc admin admin)` in -your shell, and then use the `openstack` command line tool to manage your devstack. +your shell, and then use the `openstack` command line tool to manage your +devstack. ##### Horizon diff --git a/doc/source/user/introduction.rst b/doc/source/user/introduction.rst index 60b60939..87d6570b 100644 --- a/doc/source/user/introduction.rst +++ b/doc/source/user/introduction.rst @@ -6,8 +6,10 @@ Background Story OpenStack Acceleration Discussion Started from Telco Requirements: -* High level requirements first drafted in the standard organization ETSI NFV ISG -* High level requirements transformed into detailed requirements in OPNFV DPACC project. +* High level requirements first drafted in the standard organization + ETSI NFV ISG +* High level requirements transformed into detailed requirements in + OPNFV DPACC project. * New project called Nomad established to address the requirements. * BoF discussions back in OpenStack Austin Summit. diff --git a/doc/specs/pike/approved/cyborg-agent.rst b/doc/specs/pike/approved/cyborg-agent.rst index f47ae110..d08273a0 100644 --- a/doc/specs/pike/approved/cyborg-agent.rst +++ b/doc/specs/pike/approved/cyborg-agent.rst @@ -28,42 +28,44 @@ Use of accelerators attached to virtual machine instances in OpenStack Proposed change =============== -Cyborg Agent resides on various compute hosts and monitors them for accelerators. -On it's first run Cyborg Agent will run the detect accelerator functions of all -it's installed drivers. The resulting list of accelerators available on the host -will be reported to the conductor where it will be stored into the database and -listed during API requests. By default accelerators will be inserted into the -database in a inactive state. It will be up to the operators to manually set -an accelerator to 'ready' at which point cyborg agent will be responsible for -calling the drivers install function and ensuring that the accelerator is ready -for use. +Cyborg Agent resides on various compute hosts and monitors them for +accelerators. On it's first run Cyborg Agent will run the detect +accelerator functions of all it's installed drivers. The resulting list +of accelerators available on the host will be reported to the conductor +where it will be stored into the database and listed during API requests. +By default accelerators will be inserted into the database in a inactive +state. It will be up to the operators to manually set an accelerator to +'ready' at which point cyborg agent will be responsible for calling the +drivers install function and ensuring that the accelerator is ready for use. In order to mirror the current Nova model of using the placement API each Agent -will send updates on it's resources directly to the placement API endpoint as well -as to the conductor for usage aggregation. This should keep placement API up to date -on accelerators and their usage. +will send updates on it's resources directly to the placement API endpoint +as well as to the conductor for usage aggregation. This should keep placement +API up to date on accelerators and their usage. Alternatives ------------ There are lots of alternate ways to lay out the communication between the Agent -and the API endpoint or the driver. Almost all of them involving exactly where we -draw the line between the driver, Conductor , and Agent. I've written my proposal -with the goal of having the Agent act mostly as a monitoring tool, reporting to -the cloud operator or other Cyborg components to take action. A more active role -for Cyborg Agent is possible but either requires significant synchronization with -the Conductor or potentially steps on the toes of operators. +and the API endpoint or the driver. Almost all of them involving exactly where +we draw the line between the driver, Conductor , and Agent. I've written my +proposal with the goal of having the Agent act mostly as a monitoring tool, +reporting to the cloud operator or other Cyborg components to take action. +A more active role for Cyborg Agent is possible but either requires significant +synchronization with the Conductor or potentially steps on the toes of +operators. Data model impact ----------------- -Cyborg Agent will create new entries in the database for accelerators it detects -it will also update those entries with the current status of the accelerator -at a high level. More temporary data like the current usage of a given accelerator -will be broadcast via a message passing system and won't be stored. +Cyborg Agent will create new entries in the database for accelerators it +detects it will also update those entries with the current status of the +accelerator at a high level. More temporary data like the current usage of +a given accelerator will be broadcast via a message passing system and won't +be stored. -Cyborg Agent will retain a local cache of this data with the goal of not losing accelerator -state on system interruption or loss of connection. +Cyborg Agent will retain a local cache of this data with the goal of not losing +accelerator state on system interruption or loss of connection. REST API impact diff --git a/doc/specs/pike/approved/cyborg-api-proposal.rst b/doc/specs/pike/approved/cyborg-api-proposal.rst index 42ddad7e..3d0397c8 100644 --- a/doc/specs/pike/approved/cyborg-api-proposal.rst +++ b/doc/specs/pike/approved/cyborg-api-proposal.rst @@ -23,11 +23,11 @@ Use Cases --------- * As a user I want to be able to spawn VM with dedicated hardware, so -that I can utilize provided hardware. + that I can utilize provided hardware. * As a compute service I need to know how requested resource should be -attached to the VM. + attached to the VM. * As a scheduler service I'd like to know on which resource provider -requested resource can be found. + requested resource can be found. Proposed change =============== @@ -38,26 +38,28 @@ for Cyborg. Life Cycle Management Phases ---------------------------- -For cyborg, LCM phases include typical create, retrieve, update, delete operations. -One thing should be noted that deprovisioning mainly refers to detach(delete) operation -which deactivate an acceleration capability but preserve the resource itself -for future usage. For Cyborg, from functional point of view, the LCM includes provision, -attach,update,list, and detach. There is no notion of deprovisioning for Cyborg API -in a sense that we decomission or disconnect an entire accelerator device from -the bus. +For cyborg, LCM phases include typical create, retrieve, update, delete +operations. One thing should be noted that deprovisioning mainly refers to +detach(delete) operation which deactivate an acceleration capability but +preserve the resource itself for future usage. For Cyborg, from functional +point of view, the LCM includes provision, attach,update,list, and detach. +There is no notion of deprovisioning for Cyborg API in a sense that we +decomission or disconnect an entire accelerator device from the bus. Difference between Provision and Attach/Detach ---------------------------------------------- -Noted that while the APIs support provisioning via CRUD operations, attach/detach -are considered different: +Noted that while the APIs support provisioning via CRUD operations, +attach/detach are considered different: * Provision operations (create) will involve api-> -conductor->agent->driver workflow, where as attach/detach (update/delete) could be taken -care of at the driver layer without the involvement of the pre-mentioned workflow. This -is similar to the difference between create a volume and attach/detach a volume in Cinder. + conductor->agent->driver workflow, where as attach/detach (update/delete) + could be taken care of at the driver layer without the involvement of the + pre-mentioned workflow. This is similar to the difference between create a + volume and attach/detach a volume in Cinder. -* The attach/detach in Cyborg API will mainly involved in DB status modification. +* The attach/detach in Cyborg API will mainly involved in DB status + modification. Difference between Attach/Detach To VM and Host ----------------------------------------------- @@ -66,23 +68,23 @@ Moreover there are also differences when we attach an accelerator to a VM or a host, similar to Cinder. * When the attachment happens to a VM, we are expecting that Nova could call -the virt driver to perform the action for the instance. In this case Nova -needs to support the acc-attach and acc-detach action. + the virt driver to perform the action for the instance. In this case Nova + needs to support the acc-attach and acc-detach action. * When the attachment happens to a host, we are expecting that Cyborg could -take care of the action itself via Cyborg driver. Althrough currently there -is the generic driver to accomplish the job, we should consider a os-brick -like standalone lib for accelerator attach/detach operations. + take care of the action itself via Cyborg driver. Althrough currently there + is the generic driver to accomplish the job, we should consider a os-brick + like standalone lib for accelerator attach/detach operations. Alternatives ------------ * For attaching an accelerator to a VM, we could let Cyborg perform the action -itself, however it runs into the risk of tight-coupling with Nova of which Cyborg -needs to get instance related information. -* For attaching an accelerator to a host, we could consider to use Ironic drivers -however it might not bode well with the standalone accelerator rack scenarios where -accelerators are not attached to server at all. + itself, however it runs into the risk of tight-coupling with Nova of which + Cyborg needs to get instance related information. +* For attaching an accelerator to a host, we could consider to use Ironic + drivers however it might not bode well with the standalone accelerator rack + scenarios where accelerators are not attached to server at all. Data model impact ----------------- @@ -177,7 +179,7 @@ Example message body of the response to the GET operation:: } 'GET /accelerators/{uuid}' -************************* +************************** Retrieve a certain accelerator info indetified by '{uuid}' @@ -210,7 +212,7 @@ If the accelerator does not exist a `404 Not Found` must be returned. 'POST /accelerators/{uuid}' -******************* +*************************** Create a new accelerator @@ -252,7 +254,7 @@ A `409 Conflict` response code will be returned if another accelerator exists with the provided name. 'PUT /accelerators/{uuid}/{acc_spec}' -************************* +************************************* Update the spec for the accelerator identified by `{uuid}`. @@ -289,7 +291,7 @@ The returned HTTP response code will be one of the following: 'PUT /accelerators/{uuid}' -************************* +************************** Attach the accelerator identified by `{uuid}`. @@ -322,11 +324,13 @@ The body of the request and the response is empty. The returned HTTP response code will be one of the following: -* `204 No Content` if the request was successful and the accelerator was detached. +* `204 No Content` if the request was successful and the accelerator was + detached. * `404 Not Found` if the accelerator identified by `{uuid}` was not found. * `409 Conflict` if there exist allocations records for any of the - accelerator resource that would be detached as a result of detaching the accelerator. + accelerator resource that would be detached as a result of detaching + the accelerator. Security impact @@ -373,7 +377,7 @@ Work Items * Implement the APIs specified in this spec * Proposal to Nova about the new accelerator -attach/detach api + attach/detach api * Implement the DB specified in this spec diff --git a/doc/specs/pike/approved/cyborg-conductor.rst b/doc/specs/pike/approved/cyborg-conductor.rst index a1e8ffcb..d2509642 100644 --- a/doc/specs/pike/approved/cyborg-conductor.rst +++ b/doc/specs/pike/approved/cyborg-conductor.rst @@ -30,23 +30,24 @@ Proposed change Cyborg Conductor will reside on the control node and will be responsible for stateful actions taken by Cyborg. Acting as both a cache to the database and as a method of combining reads and writes to the database. -All other Cyborg components will go through the conductor for database operations. +All other Cyborg components will go through the conductor for database +operations. Alternatives ------------ Having each Cyborg Agent instance hit the database on it's own is a possible -alternative, and it may even be feasible if the accelerator load monitoring rate is -very low and the vast majority of operations are reads. But since we intend to store -metadata about accelerator usage updated regularly this model probably will not scale -well. +alternative, and it may even be feasible if the accelerator load monitoring +rate is very low and the vast majority of operations are reads. But since we +intend to store metadata about accelerator usage updated regularly this model +probably will not scale well. Data model impact ----------------- -Using the conductor 'properly' will result in little or no per instance state and stateful -operations moving through the conductor with the exception of some local caching where it -can be garunteed to work well. +Using the conductor 'properly' will result in little or no per instance state +and stateful operations moving through the conductor with the exception of +some local caching where it can be garunteed to work well. REST API impact --------------- @@ -120,8 +121,8 @@ CI using the dummy driver. Documentation Impact ==================== -Some configuration values tuning save out rate and other parameters on the controller -will need to be documented for end users +Some configuration values tuning save out rate and other parameters on the +controller will need to be documented for end users References ========== diff --git a/doc/specs/pike/approved/cyborg-driver-proposal.rst b/doc/specs/pike/approved/cyborg-driver-proposal.rst index 4fabf56d..63c17595 100644 --- a/doc/specs/pike/approved/cyborg-driver-proposal.rst +++ b/doc/specs/pike/approved/cyborg-driver-proposal.rst @@ -66,14 +66,15 @@ REST API impact --------------- This blueprint proposes to add the following APIs: -*cyborg install-driver -*cyborg uninstall-driver -*cyborg attach-instance -*cyborg detach-instance -*cyborg service-list -*cyborg driver-list -*cyborg update-driver -*cyborg discover-services + +* cyborg install-driver +* cyborg uninstall-driver +* cyborg attach-instance +* cyborg detach-instance +* cyborg service-list +* cyborg driver-list +* cyborg update-driver +* cyborg discover-services Security impact --------------- @@ -119,6 +120,7 @@ Work Items ---------- This change would entail the following: + * Add a feature to identify and discover attached accelerator backends. * Add a feature to list services running on the backend * Add a feature to attach accelerators to the generic backend. diff --git a/doc/specs/queens/approved/cyborg-fpga-driver-proposal.rst b/doc/specs/queens/approved/cyborg-fpga-driver-proposal.rst index 4f01da33..26d8c6ba 100755 --- a/doc/specs/queens/approved/cyborg-fpga-driver-proposal.rst +++ b/doc/specs/queens/approved/cyborg-fpga-driver-proposal.rst @@ -17,10 +17,10 @@ Problem description A Field Programmable Gate Array(FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturing. The advantage lies -in that they are sometimes significantly faster for some applications because of -their parallel nature and optimality in terms of the number of gates used for a -certain process. Hence, using FPGA for application acceleration in cloud has been -becoming desirable. +in that they are sometimes significantly faster for some applications because +of their parallel nature and optimality in terms of the number of gates used +for a certain process. Hence, using FPGA for application acceleration in cloud +has been becoming desirable. There is a management framwork in Cyborg [1]_ for heterogeneous accelerators, tracking and deploying FPGAs. This spec will add a FPGA driver for Cyborg to @@ -30,20 +30,20 @@ Use Cases --------- * When Cyborg agent starts or does resource checking periodically, the Cyborg -FPGA driver should enumerate the list of the FPGA devices, and report the -details of all available FPGA accelerators on the host, such as BDF(Bus, -Device, Function), PID(Product id) VID(Vendor id), IMAGE_ID and PF(Physical -Function)/VF(Virtual Function) type. + FPGA driver should enumerate the list of the FPGA devices, and report the + details of all available FPGA accelerators on the host, such as BDF(Bus, + Device, Function), PID(Product id) VID(Vendor id), IMAGE_ID and PF(Physical + Function)/VF(Virtual Function) type. * When user uses empty FPGA regions as their accelerators, Cyborg agent will -call driver's program() interface. Cyborg agent should provide BDF -of PF/VF, and local image path to the driver. More details can be found in ref -[2]_. + call driver's program() interface. Cyborg agent should provide BDF + of PF/VF, and local image path to the driver. More details can be found in + ref [2]_. * When there maybe more thant one vendor fpga card on a host, or on different -hosts in the cluster, Cyborg agent can discover the wendors easiy and -intelligently by Cyborg FPGA driver, and call the correct driver to execute -it's operations, such as discover() and program(). + hosts in the cluster, Cyborg agent can discover the wendors easiy and + intelligently by Cyborg FPGA driver, and call the correct driver to execute + it's operations, such as discover() and program(). Proposed changes @@ -54,27 +54,29 @@ discover/program interfaces for FPGA accelerator framework. The driver should include the follow functions: 1. discover() - driver reports devices as following: - [{ - "vendor": "0x8086", - "product": "bcc0", - "pr_num": 1, - "devices": "0000:be:00:0", - "path": "/sys/class/fpga/intel-fpga-dev.0", - "regions": [ - {"vendor": "0x8086", - "product": "bcc1", - "regions": 1, - "devices": "0000:be:00:1", - "path": "/sys/class/fpga/intel-fpga-dev.1" - }] - }] - pr_num: partial reconfiguration region numbers. +driver reports devices as following:: + + [{ + "vendor": "0x8086", + "product": "bcc0", + "pr_num": 1, + "devices": "0000:be:00:0", + "path": "/sys/class/fpga/intel-fpga-dev.0", + "regions": [ + {"vendor": "0x8086", + "product": "bcc1", + "regions": 1, + "devices": "0000:be:00:1", + "path": "/sys/class/fpga/intel-fpga-dev.1" + }] + }] + + pr_num: partial reconfiguration region numbers. 2. program(device_path, image) - program the image to a PR region specified by device_path. - device_path: the sys path of accelerator device. - image: The local path of programming image. + program the image to a PR region specified by device_path. + device_path: the sys path of accelerator device. + image: The local path of programming image. Image Format ---------------------------- @@ -161,7 +163,7 @@ Testing * Functional tests will be added to test Cyborg FPGA driver. Documentation Impact -=================== +==================== Document FPGA driver in the Cyborg project diff --git a/doc/specs/queens/approved/cyborg-fpga-model-proposal.rst b/doc/specs/queens/approved/cyborg-fpga-model-proposal.rst index 3a911c16..8184add0 100644 --- a/doc/specs/queens/approved/cyborg-fpga-model-proposal.rst +++ b/doc/specs/queens/approved/cyborg-fpga-model-proposal.rst @@ -11,32 +11,35 @@ Blueprint url is not available yet https://blueprints.launchpad.net/openstack-cyborg/+spec/cyborg-fpga-modelling -This spec proposes the DB modelling schema for tracking reprogrammable resources +This spec proposes the DB modelling schema for tracking reprogrammable +resources Problem description =================== A field-programmable gate array (FPGA) is an integrated circuit designed to be -configured by a customer or a designer after manufacturing. Their advantage lies -in that they are sometimes significantly faster for some applications because of -their parallel nature and optimality in terms of the number of gates used for a -certain process. Hence, using FPGA for application acceleration in cloud has been -becoming desirable. Cyborg as a management framwork for heterogeneous accelerators -,tracking and deploying FPGAs are much needed features. +configured by a customer or a designer after manufacturing. Their advantage +lies in that they are sometimes significantly faster for some applications +because of their parallel nature and optimality in terms of the number of gates +used for a certain process. Hence, using FPGA for application acceleration in +cloud has been becoming desirable. Cyborg as a management framwork for +heterogeneous accelerators, tracking and deploying FPGAs are much needed +features. Use Cases --------- -When user requests FPGA resources, scheduler will use placement agent [1]_ to select -appropriate hosts that have the requested FPGA resources. +When user requests FPGA resources, scheduler will use placement agent [1]_ to +select appropriate hosts that have the requested FPGA resources. -When a FPGA type resource is allocated to a VM, Cyborg needs to track down which -exact device has been assigned in the database. On the other hand, when the -resource is released, Cyborg will need to be detached and free the exact resource. +When a FPGA type resource is allocated to a VM, Cyborg needs to track down +which exact device has been assigned in the database. On the other hand, when +the resource is released, Cyborg will need to be detached and free the exact +resource. -When a new device is plugged in to the system(host), Cyborg needs to discover it -and store it into the database +When a new device is plugged in to the system(host), Cyborg needs to discover +it and store it into the database Proposed change =============== @@ -45,13 +48,14 @@ We need to add 2 more tables to Cyborg database, one for tracking all the deployables and one for arbitrary key-value pairs of deplyable associated attirbutes. These tables are named as Deployables and Attributes. -Deployables table consists of all the common attributes columns as well as a parent_id -and a root_id. The parent_id will point to the associated parent deployable and the -root_id will point to the associated root deployable. By doing this, we can form a -nested tree structure to represent different hierarchies. In addition, there will a -foreign key named accelerator_id reference to the accelerators table. For the case -where FPGA has not been loaded any bitstreams on it, they will still be tracked as -a Deployable but no other Deployables referencing to it. For instance, a network of +Deployables table consists of all the common attributes columns as well as +a parent_id and a root_id. The parent_id will point to the associated parent +deployable and the root_id will point to the associated root deployable. +By doing this, we can form a nested tree structure to represent different +hierarchies. In addition, there will a foreign key named accelerator_id +reference to the accelerators table. For the case where FPGA has not been +loaded any bitstreams on it, they will still be tracked as a Deployable but +no other Deployables referencing to it. For instance, a network of FPGA hierarchies can be formed using deployables in following scheme:: ------------------- @@ -71,17 +75,18 @@ FPGA hierarchies can be formed using deployables in following scheme:: ----------------- ----------------- -Attributes table consists of a key and a value columns to represent arbitrary k-v pairs. +Attributes table consists of a key and a value columns to represent arbitrary +k-v pairs. -For instance, bitstream_id and function kpi can be tracked in this table.In addition, -a foreign key deployable_id refers to the Deployables table and a parent_attribute_id -to form nested structured attribute relationships. +For instance, bitstream_id and function kpi can be tracked in this table. +In addition, a foreign key deployable_id refers to the Deployables table and +a parent_attribute_id to form nested structured attribute relationships. -Cyborg needs to have object classes to represent different types of deployables(e.g. -FPGA, Physical Functions, Virtual Functions etc). +Cyborg needs to have object classes to represent different types of +deployables(e.g. FPGA, Physical Functions, Virtual Functions etc). -Cyborg Agent needs to add feature to discover the FPGA resources from FPGA driver -and report them to the Cyborg DB through the conductor. +Cyborg Agent needs to add feature to discover the FPGA resources from FPGA +driver and report them to the Cyborg DB through the conductor. Conductor needs to add couple of sets of APIs for different types of deployable resources. @@ -89,21 +94,23 @@ resources. Alternatives ------------ -Alternativly, instead of having a flat table to track arbitrary hierarchies, we can use -two different tables in Cyborg database, one for physical functions and one for virtual -functions. physical_functions should have a foreign key constraint to reference the id in -Accelerators table. In addition, virtual_functions should have a foreign key constraint -to reference the id in physical_functions. +Alternativly, instead of having a flat table to track arbitrary hierarchies, we +can use two different tables in Cyborg database, one for physical functions and +one for virtual functions. physical_functions should have a foreign key +constraint to reference the id in Accelerators table. In addition, +virtual_functions should have a foreign key constraint to reference the id +in physical_functions. -The problems with this design are as follows. First, it can only track up to 3 hierarchies -of resources. In case we need to add another layer, a lot of migaration work will -be required. Second, even if we only need to add some new attribute to the existing -resource type, we need to create new migration scripts for them. Overall the maintenance -work is tedious. +The problems with this design are as follows. First, it can only track up to +3 hierarchies of resources. In case we need to add another layer, a lot of +migaration work will be required. Second, even if we only need to add some new +attribute to the existing resource type, we need to create new migration +scripts for them. Overall the maintenance work is tedious. Data model impact ----------------- -As discussed in previous sections, two tables will be added: Deployables and Attributes:: +As discussed in previous sections, two tables will be added: Deployables and +Attributes:: CREATE TABLE Deployables @@ -143,7 +150,8 @@ As discussed in previous sections, two tables will be added: Deployables and Att RPC API impact --------------- -Two sets of conductor APIs need to be added. 1 set for physical functions, 1 set for virtual functions +Two sets of conductor APIs need to be added. 1 set for physical functions, +1 set for virtual functions Physical function apis:: @@ -161,9 +169,9 @@ Virtual function apis:: REST API impact --------------- -Since these tables are not exposed to users for modifying/adding/deleting, Cyborg -will only add two extra REST APIs to allow user query information related to -deployables and their attributes. +Since these tables are not exposed to users for modifying/adding/deleting, +Cyborg will only add two extra REST APIs to allow user query information +related to deployables and their attributes. API for retrieving Deployable's information:: diff --git a/doc/specs/queens/approved/cyborg-internal-api.rst b/doc/specs/queens/approved/cyborg-internal-api.rst index 09662c2d..266ecd40 100644 --- a/doc/specs/queens/approved/cyborg-internal-api.rst +++ b/doc/specs/queens/approved/cyborg-internal-api.rst @@ -71,6 +71,8 @@ Driver 'POST /discovery' Trigger the discovery and setup process for a specific driver +.. code_block:: init + Content-Type: application/json { @@ -85,6 +87,8 @@ ready to use entires available by the public API. Hardware are physical devices on nodes that may or may not be ready to use or even fully supported. +.. code_block:: init + 200 OK Content-Type: application/json @@ -125,6 +129,8 @@ Driver 'POST /hello' Registers that a driver has been installed on the machine and is ready to use. As well as it's endpoint and hardware support. +.. code_block:: init + Content-Type: application/json { @@ -154,6 +160,8 @@ Conductor 'POST /hello' Registers that an Agent has been installed on the machine and is ready to use. +.. code_block:: init + Content-Type: application/json { diff --git a/doc/specs/queens/approved/cyborg-nova-interaction.rst b/doc/specs/queens/approved/cyborg-nova-interaction.rst index 57cc781e..a67af0f4 100644 --- a/doc/specs/queens/approved/cyborg-nova-interaction.rst +++ b/doc/specs/queens/approved/cyborg-nova-interaction.rst @@ -13,10 +13,10 @@ https://blueprints.launchpad.net/cyborg/+spec/cyborg-nova-interaction Cyborg, as a service for managing accelerators of any kind needs to cooperate with Nova on two planes: Cyborg should be able to inform Nova about the resources through placement API[1], so that scheduler can leverage user -requests for particular functionality into assignment of specific resource using -resource provider which possess an accelerator, and second, Cyborg should be -able to provide information on how Nova compute can attach particular resource -to VM. +requests for particular functionality into assignment of specific resource +using resource provider which possess an accelerator, and second, Cyborg should +be able to provide information on how Nova compute can attach particular +resource to VM. In a nutshell, this blueprint will define how information between Nova and Cyborg will be exchanged. @@ -24,14 +24,14 @@ Cyborg will be exchanged. Problem description =================== -Currently in OpenStack the use of non-standard accelerator hardware is supported -in that features exist across many of the core servers that allow these resources -to be allocated, passed through, and eventually used. +Currently in OpenStack the use of non-standard accelerator hardware is +supported in that features exist across many of the core servers that allow +these resources to be allocated, passed through, and eventually used. -What remains a challenge though is the lack of an integrated workflow; there is no -way to configure many of the accelerator features without significant by hand effort -and service disruptions that go against the goals of having a easy, stable, and -flexible cloud. +What remains a challenge though is the lack of an integrated workflow; there +is no way to configure many of the accelerator features without significant +by hand effort and service disruptions that go against the goals of having +a easy, stable, and flexible cloud. Cyborg exists to bring these disjoint efforts together into a more standard workflow. While many components of this workflow already exist, some don't @@ -53,7 +53,7 @@ used: Proposed Workflow -=============== +================= Using a method not relevant to this proposal Cyborg Agent inspects hardware and finds accelerators that it is interested in setting up for use. @@ -66,21 +66,25 @@ One of the primary responsibilities of the Cyborg conductor is to keep the placement API in sync with reality. For example if here is a device with a virtual function or a FPGA with a given program Cyborg may be tasked with changing the virtual function on the NIC or the program on the FPGA. At which -point the previously specified traits and resources need to be updated. Likewise -Cyborg will be watching monitoring Nova's instances to ensure that doing this -doesn't pull resources out from under an allocated instance. +point the previously specified traits and resources need to be updated. +Likewise Cyborg will be watching monitoring Nova's instances to ensure that +doing this doesn't pull resources out from under an allocated instance. At a high level what we need to be able to do is the following -1. Add a PCI device to Nova's whitelist live (config only / needs implementation) -2. Add information about this device to the placement API (existing / being worked) -3. Hotplug and unplug PCI devices from instances (existing / not sure how well maintained) +1. Add a PCI device to Nova's whitelist live + (config only / needs implementation) +2. Add information about this device to the placement API + (existing / being worked) +3. Hotplug and unplug PCI devices from instances + (existing / not sure how well maintained) Alternatives ------------ -Don't use Cyborg, struggle with bouncing services and grub config changes yourself. +Don't use Cyborg, struggle with bouncing services and grub config changes +yourself. Data model impact ----------------- @@ -146,8 +150,8 @@ Dependencies This design depends on the changes which may or may not be accepted in Nova project. Other than that is ongoing work on Nested resource providers: http://specs.openstack.org/openstack/nova-specs/specs/ocata/approved/nested-resource-providers.html -Which would be an essential feature in Placement API, which will be leveraged by -Cyborg. +Which would be an essential feature in Placement API, which will be leveraged +by Cyborg. Testing diff --git a/doc/specs/queens/approved/cyborg-spdk-driver-proposal.rst b/doc/specs/queens/approved/cyborg-spdk-driver-proposal.rst index 3d51f661..b7a18795 100755 --- a/doc/specs/queens/approved/cyborg-spdk-driver-proposal.rst +++ b/doc/specs/queens/approved/cyborg-spdk-driver-proposal.rst @@ -24,14 +24,14 @@ Use Cases --------- * When Cinder uses Ceph as its backend, the user should be able to -use the Cyborg SPDK driver to discover the SPDK accelerator backend, -enumerate the list of the Ceph nodes that have installed the SPDK. + use the Cyborg SPDK driver to discover the SPDK accelerator backend, + enumerate the list of the Ceph nodes that have installed the SPDK. * When Cinder directly uses SPDK's BlobStore as its backend, the user -should be able to accomplish the same life cycle management operations -for SPDK as mentioned above. After enumerating the SPDK, the user can -attach (install) SPDK on that node. When the task completes, the user -can also detach the SPDK from the node. Last but not least the user -should be able to update the latest and available SPDK. + should be able to accomplish the same life cycle management operations + for SPDK as mentioned above. After enumerating the SPDK, the user can + attach (install) SPDK on that node. When the task completes, the user + can also detach the SPDK from the node. Last but not least the user + should be able to update the latest and available SPDK. Proposed change =============== @@ -42,18 +42,18 @@ discover/list/update/attach/detach operations for SPDK framework. SPDK framework -------------- -The SPDK framework comprises of the following components: +The SPDK framework comprises of the following components:: - +-----------userspace--------+ +--------------+ - | +------+ +------+ +------+ | | +-----------+ | -+---+ | |DPDK | |NVMe | |NVMe | | | | Ceph | | -| N +-+-+NIC | |Target| |Driver+-+-+ |NVMe Device| | -| I | | |Driver| | | | | | | +-----------+ | -| C | | +------+ +------+ +------+ | | +-----------+ | -+---+ | +------------------------+ | | | Blobstore | | - | | DPDK Libraries | | | |NVMe Device| | - | +------------------------+ | | +-----------+ | - +----------------------------+ +---------------+ + +-----------userspace--------+ +--------------+ + | +------+ +------+ +------+ | | +-----------+ | + +---+ | |DPDK | |NVMe | |NVMe | | | | Ceph | | + | N +-+-+NIC | |Target| |Driver+-+-+ |NVMe Device| | + | I | | |Driver| | | | | | | +-----------+ | + | C | | +------+ +------+ +------+ | | +-----------+ | + +---+ | +------------------------+ | | | Blobstore | | + | | DPDK Libraries | | | |NVMe Device| | + | +------------------------+ | | +-----------+ | + +----------------------------+ +---------------+ BlobStore NVMe Device Format ---------------------------- @@ -87,25 +87,25 @@ avoids the filesystem, which improves efficiency. Life Cycle Management Phases ---------------------------- * We should be able to add a judgement whether the backend node has SPDK kit -in generic driver module. If true, initialize the DPDK environment (such as -hugepage). + in generic driver module. If true, initialize the DPDK environment (such as + hugepage). * Import the generic driver module, and then we should be able to -discover (probe) the system for SPDK. + discover (probe) the system for SPDK. * Determined by the backend storage scenario, enumerate (list) the optimal -SPDK node, returning a boolean value to judge whether the SPDK should be -attached. + SPDK node, returning a boolean value to judge whether the SPDK should be + attached. * After the node where SPDK will be running is attached, we can now send a -request about the information of namespaces, and then create an I/O queue -pair to submit read/write requests to a namespace. + request about the information of namespaces, and then create an I/O queue + pair to submit read/write requests to a namespace. * When Ceph is used as the backend, as the latest Ceph (such as Luminous) -uses the BlueStore to be the storage engine, BlueStore and BlobStore are -very similar things. We will not be able to use BlobStore to accelerate -Ceph, but we can use Ioat and poller to boost speed for storage. + uses the BlueStore to be the storage engine, BlueStore and BlobStore are + very similar things. We will not be able to use BlobStore to accelerate + Ceph, but we can use Ioat and poller to boost speed for storage. * When SPDK is used as the backend, we should be able to use BlobStore to -improve performance. + improve performance. * Whenever user requests, we should be able to detach the SPDK device. * Whenever user requests, we should be able to update SPDK to the latest and -stable release. + stable release. Alternatives ------------ @@ -116,19 +116,20 @@ Data model impact ----------------- * The Cyborg SPDK driver will notify Cyborg Agent to update the database -when discover/list/update/attach/detach operations take place. + when discover/list/update/attach/detach operations take place. REST API impact --------------- This blueprint proposes to add the following APIs: -*cyborg discover-driver(driver_type) -*cyborg driver-list(driver_type) -*cyborg install-driver(driver_id, driver_type) -*cyborg attach-instance -*cyborg detach-instance -*cyborg uninstall-driver(driver_id, driver_type) -*cyborg update-driver + +* cyborg discover-driver(driver_type) +* cyborg driver-list(driver_type) +* cyborg install-driver(driver_id, driver_type) +* cyborg attach-instance +* cyborg detach-instance +* cyborg uninstall-driver(driver_id, driver_type) +* cyborg update-driver Security impact --------------- @@ -176,7 +177,7 @@ Work Items * Implement the cyborg-spdk-driver in this spec. * Propose SPDK to py-spdk. The py-spdk is designed as a SPDK client -which provides the python binding. + which provides the python binding. Dependencies @@ -192,10 +193,10 @@ Testing * Unit tests will be added to test Cyborg SPDK driver. * Functional tests will be added to test Cyborg SPDK driver. For example: -discover-->list-->attach,whether the workflow can be passed successfully. + discover-->list-->attach,whether the workflow can be passed successfully. Documentation Impact -=================== +==================== Document SPDK driver in the Cyborg project diff --git a/doc/specs/template.rst b/doc/specs/template.rst index 6a1fe372..1f1c1bb7 100644 --- a/doc/specs/template.rst +++ b/doc/specs/template.rst @@ -99,8 +99,8 @@ If this is one part of a larger effort make it clear where this piece ends. In other words, what's the scope of this effort? At this point, if you would like to just get feedback on if the problem and -proposed change fit in Cyborg, you can stop here and post this for review to get -preliminary feedback. If so please say: +proposed change fit in Cyborg, you can stop here and post this for review to +get preliminary feedback. If so please say: Posting to get preliminary feedback on the scope of this spec. Alternatives diff --git a/test-requirements.txt b/test-requirements.txt index 6b0cc3e7..7b7fbc2f 100644 --- a/test-requirements.txt +++ b/test-requirements.txt @@ -20,3 +20,4 @@ sphinxcontrib-seqdiag # BSD reno # Apache-2.0 os-api-ref # Apache-2.0 tempest # Apache-2.0 +doc8>=0.6.0 # Apache-2.0 diff --git a/tox.ini b/tox.ini index 31d6a04d..8e1f4401 100644 --- a/tox.ini +++ b/tox.ini @@ -31,6 +31,7 @@ commands = [testenv:pep8] commands = pep8 {posargs} + doc8 {posargs} [testenv:pep8-constraints] install_command = {[testenv:common-constraints]install_command} @@ -42,6 +43,10 @@ commands = {posargs} [testenv:cover] commands = python setup.py testr --coverage --testr-args='{posargs}' +[doc8] +ignore-path = .venv,.git,.tox,*cyborg/locale*,*lib/python*,*cyborg.egg*,api-ref/build,doc/build,doc/source/contributor/api + + [testenv:docs] commands = python setup.py build_sphinx