diff --git a/specs/proposal/cyborg-fpga-model-proposal.rst b/specs/proposal/cyborg-fpga-model-proposal.rst new file mode 100644 index 00000000..0d146ad0 --- /dev/null +++ b/specs/proposal/cyborg-fpga-model-proposal.rst @@ -0,0 +1,335 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +========================================== + Cyborg FPGA Model Proposal +========================================== + +Blueprint url is not available yet +https://blueprints.launchpad.net/openstack-cyborg/+spec/cyborg-fpga-modelling + +This spec proposes the DB modelling schema for tracking reprogrammable resources + +Problem description +=================== + +A field-programmable gate array (FPGA) is an integrated circuit designed to be +configured by a customer or a designer after manufacturing. Their advantage lies +in that they are sometimes significantly faster for some applications because of +their parallel nature and optimality in terms of the number of gates used for a +certain process. Hence, using FPGA for application acceleration in cloud has been +becoming desirable. Cyborg as a management framwork for heterogeneous accelerators +,tracking and deploying FPGAs are much needed features. + + +Use Cases +--------- + +When user requests FPGA resources, scheduler will use placement agent [1]_ to select +appropriate hosts that have the requested FPGA resources. + +When a FPGA type resource is allocated to a VM, Cyborg needs to track down which +exact device has been assigned in the database. On the other hand, when the +resource is released, Cyborg will need to be detached and free the exact resource. + +When a new device is plugged in to the system(host), Cyborg needs to discover it +and store it into the database + +Proposed change +=============== + +We need to add 2 more tables to Cyborg database, one for tracking all the +deployables and one for arbitrary key-value pairs of deplyable associated +attirbutes. These tables are named as Deployables and Attributes. + +Deployables table consists of all the common attributes columns as well as a parent_id +and a root_id. The parent_id will point to the associated parent deployable and the +root_id will point to the associated root deployable. By doing this, we can form a +nested tree structure to represent different hierarchies. In addition, there will a +foreign key named accelerator_id reference to the accelerators table. For the case +where FPGA has not been loaded any bitstreams on it, they will still be tracked as +a Deployable but no other Deployables referencing to it. For instance, a network of +FPGA hierarchies can be formed using deployables in following scheme:: + + ------------------- + ------------------->|Deployable - FPGA|<-------------------- + | ------------------- | + | /\ | + | root_id / \ parent_id/root_id | + | / \ | + | ----------------- ----------------- | + | |Deployable - PF| |Deployable - PF| | + | ----------------- ----------------- | + | /\ | + | / \ parent_id root_id | + | / \ | + ----------------- ----------------- | + |Deployable - VF| |Deployable - VF| ----------------------- + ----------------- ----------------- + + +Attributes table consists of a key and a value columns to represent arbitrary k-v pairs. + +For instance, bitstream_id and function kpi can be tracked in this table.In addition, +a foreign key deployable_id refers to the Deployables table and a parent_attribute_id +to form nested structured attribute relationships. + +Cyborg needs to have object classes to represent different types of deployables(e.g. +FPGA, Physical Functions, Virtual Functions etc). + +Cyborg Agent needs to add feature to discover the FPGA resources from FPGA driver +and report them to the Cyborg DB through the conductor. + +Conductor needs to add couple of sets of APIs for different types of deployable +resources. + +Alternatives +------------ + +Alternativly, instead of having a flat table to track arbitrary hierarchies, we can use +two different tables in Cyborg database, one for physical functions and one for virtual +functions. physical_functions should have a foreign key constraint to reference the id in +Accelerators table. In addition, virtual_functions should have a foreign key constraint +to reference the id in physical_functions. + +The problems with this design are as follows. First, it can only track up to 3 hierarchies +of resources. In case we need to add another layer, a lot of migaration work will +be required. Second, even if we only need to add some new attribute to the existing +resource type, we need to create new migration scripts for them. Overall the maintenance +work is tedious. + +Data model impact +----------------- +As discussed in previous sections, two tables will be added: Deployables and Attributes:: + + + CREATE TABLE Deployables + ( + id INTEGER NOT NULL , /*Primary Key*/ + parent_id INTEGER , /*Pointer to the parent deployable's primary key*/ + root_id INTEGER , /*Pointer to the root deployable's primary key*/ + name VARCHAR2 (32 BYTE) , /*Name of the deployable*/ + pcie_address VARCHAR2 (32 BYTE) , /*pcie address which can be used for passthrough*/ + uuid VARCHAR2 (32 BYTE) , /*uuid v4 format for the deployable itself*/ + node_id VARCHAR2 (32 BYTE) , /*uuid v4 format to identify which host this deployable is located*/ + board VARCHAR2 (16 BYTE) , /*Identify the model of the deployable(e.g. KU115)*/ + vendor VARCHAR2 (16 BYTE) , /*Identify the vendor of the deployable(e.g. Xilinx)*/ + version VARCHAR2 (32 BYTE) , /*Identify the version of the deployable(e.g. 1.2a)*/ + type VARCHAR2 (32) , /*Identify the type of the deployable(e.g. FPGA/PF/VF)*/ + assignable CHAR (1) , /*Represent if the deployable can be assigned to users*/ + instance_id VARCHAR2 (32 BYTE) , /*Represent which instance this deployable has been assigned to*/ + availability INTEGER NOT NULL, /*enum type to represent the status of the deployable(e.g. acclocated/claimed)*/ + accelerator_id INTEGER NOT NULL /*foreign key references to the accelerator table*/ + ) ; + ALTER TABLE Deployables ADD CONSTRAINT Deployables_PK PRIMARY KEY ( id ) ; + ALTER TABLE Deployables ADD CONSTRAINT Deployables_accelerators_FK FOREIGN KEY ( accelerator_id ) REFERENCES accelerators ( id ) ; + + + CREATE TABLE Attributes + ( + id INTEGER NOT NULL , /*Primary Key*/ + deployable_id INTEGER NOT NULL , /*foreign key references to the Deployables table*/ + KEY CLOB , /*Attribute Key*/ + value CLOB , /*Attribute Value*/ + parent_attribute_id INTEGER /*Pointer to the parent attribute's primary key*/ + ) ; + ALTER TABLE Attributes ADD CONSTRAINT Attributes_PK PRIMARY KEY ( id ) ; + ALTER TABLE Attributes ADD CONSTRAINT Attributes_Deployables_FK FOREIGN KEY ( deployable_id ) REFERENCES Deployables ( id ) ON + DELETE CASCADE ; + + +RPC API impact +--------------- +Two sets of conductor APIs need to be added. 1 set for physical functions, 1 set for virtual functions + +Physical function apis:: + + def physical_function_create(context, values) + def physical_function_get_all_by_filters(context, filters, sort_key='created_at', sort_dir='desc', limit=None, marker=None, columns_to_join=None) + def physical_function_update(context, uuid, values, expected=None) + def physical_function_destroy(context, uuid) + +Virtual function apis:: + + def virtual_function_create(context, values) + def virtual_function_get_all_by_filters(context, filters, sort_key='created_at', sort_dir='desc', limit=None, marker=None, columns_to_join=None) + def virtual_function_update(context, uuid, values, expected=None) + def virtual_function_destroy(context, uuid) + +REST API impact +--------------- +Since these tables are not exposed to users for modifying/adding/deleting, Cyborg +will only add two extra REST APIs to allow user query information related to +deployables and their attributes. + +API for retrieving Deployable's information:: + + Url: {base_url}/accelerators/deployable/{uuid} + Method: GET + URL Params: + GET: uuid --> get deplyable by uuid + + Data Params: + None + + Success Response: + GET: + Code: 200 + Content: { deployable: {id : 12, parent_id: 11, root_id: 10, ....}} + + Error Response + Code: 401 UNAUTHORIZED + Content: { error : "Log in" } + OR + Code: 422 Unprocessable Entry + Content: { error : "deployable uuid invalid" } + + Sample Call: + To get the deployable with uuid=2864a139-c2cd-4f9f-abf3-44eb3f09b83c + $.ajax({ + url: "/accelerators/deployable/2864a139-c2cd-4f9f-abf3-44eb3f09b83c", + dataType: "json", + type : "get", + success : function(r) { + console.log(r); + } + }); + +API for retrieving list of Deployables with filters/attirbutes:: + + Url: {base_url}/accelerators/deployable + Method: GET + URL Params: + None + + Data Params: + k-v pairs for filtering + + Success Response: + GET: + Code: 200 + Content: { deployables: [{id : 12, parent_id: 11, root_id: 10, ....}]} + + Error Response + Code: 401 UNAUTHORIZED + Content: { error : "Log in" } + OR + Code: 422 Unprocessable Entry + Content: { error : "deployable uuid invalid" } + + Sample Call: + To get a list of FPGAs with no bitstream loaded. + $.ajax({ + url: "/accelerators/deployable", + data: { + "bitstream_id": None, + "type": "FPGA" + }, + dataType: "json", + type : "get", + success : function(r) { + console.log(r); + } + }); + +API for retrieving Deployable attributes' information:: + + Url: {base_url}/accelerators/deployable/{uuid}/attribute/{key} + Method: GET + URL Params: + GET: uuid --> uuid for the associated deployable + key --> key for the associated deployable + + Data Params: + None + + Success Response: + GET: + Code: 200 + Content: { attribute: {key : value}} + + Error Response + Code: 401 UNAUTHORIZED + Content: { error : "Log in" } + OR + Code: 422 Unprocessable Entry + Content: { error : "attirbute key invalid" } + + Sample Call: + To get the value of key=kpi for deployable with id=2864a139-c2cd-4f9f-abf3-44eb3f09b83c + $.ajax({ + url: "/accelerators/deployable/2864a139-c2cd-4f9f-abf3-44eb3f09b83c/attribute/kpi", + dataType: "json", + type : "get", + success : function(r) { + console.log(r); + } + }); + +Security impact +--------------- +None + +Notifications impact +-------------------- +None + +Other end user impact +--------------------- +None + +Performance Impact +------------------ +None + +Other deployer impact +--------------------- +None + +Developer impact +---------------- + +There will be new functionalities available to the dev because of this work. + + +Implementation +============== + +Assignee(s) +----------- +Primary assignee: + Li Liu + +Work Items +---------- +* Create migration scripts to add two more tables to the database +* Create models in sqlalchemy as well as related conductor APIs +* Create corespoinding objects +* Create Conductor APIs to allow resourece reporting + + +Dependencies +============ + +Testing +======= +* Unit tests will be added test Cyborg generic driver. + +Documentation Impact +==================== +Document FPGA Modelling in the Cyborg project + +References +========== +.. [1] https://docs.openstack.org/nova/latest/user/placement.html + +History +======= + +.. list-table:: Revisions + :header-rows: 1 + +