diff --git a/specs/stein/magnum-nodegroups.rst b/specs/stein/magnum-nodegroups.rst new file mode 100644 index 0000000..22d0391 --- /dev/null +++ b/specs/stein/magnum-nodegroups.rst @@ -0,0 +1,275 @@ +Magnum Nodegroups +================= + +Launchpad blueprint: + +https://blueprints.launchpad.net/magnum/+spec/magnum-nodegroups + +This is a proposal to extend the Magnum API adding support for nodegroups. + +Problem Description +------------------- + +Currently Magnum supports the creation of clusters with all the nodes in the +same availability zone. At the same time, the user has the ability to choose +one flavor for master nodes and one for worker nodes. + +The concept of nodegroups provides users with the ability to specify groups of +nodes with different properties. Within the scope of a group users are able to +define labels, used image, flavor, etc depending on the purpose these nodes are +going to be used for. + +This proposal tries to address the changes needed to support nodegroups with +Magnum. + +Use Cases +--------- + +1. As a user, I want to deploy heterogeneous workloads in the same cluster. + These can include sql databases with high iops requirements, caches + requiring large amounts of memory and batch jobs requiring a larger number + of cpus or even gpus. + +2. As a user I want to create higly available clusters with Magnum. + +Proposed Changes +---------------- + +The proposed change includes: + +* Add a new '/clusters/{cluster_id}/nodegroups' REST API endpoint to Magnum + providing management of the given cluster's nodegroups. This includes + nodegroup creation, update and deletion. + +* Add a new object to the data model to represent a nodegroup. + +* Change the cluster create procedure to create two default nodegroups, one + containing the master node(s) of the cluster and one containing the worker + node(s). + +* Adapt the cluster delete procedure to delete also the nodegroups associated + with the cluster being deleted. + +Check sections `Data Model Impact`_ and `REST API Impact`_ for more details. + + NOTE:: + As a first step, users will be able to create nodegroups containing only + worker nodes. This is because the scripts used for scaling up do not + support adding new master nodes to the cluster. This change is left as + future work and will be handled by another spec. + +Alternatives +------------ + +As an alternative to the proposed solution, a user could create multiple +independent clusters and connect them in one single federated control plane, +acting as one heterogeneous cluster. + +The problem is that there is no feature parity between the cluster and the +federation APIs and for the time being, cluster federation is supported only by +the Kubernetes COE. + +It seems that the concept of nodegroups takes care of the matter at hand, in a +more complete way. + +Data Model Impact +----------------- + +A new entity would be added (corresponding tables will be added): + +* **nodegroup** + + * uuid + * name + * cluster_uuid (the uuid of the cluster where the nodegroup belongs) + * project_id + * docker_volume_size + * labels + * flavor_id + * image_id + * node_addresses + * node_count + * role (shows if the nodegroup contains master or worker nodes for now) + +The project id could be fetched by the cluster, but we add it here also for +future use. This is the scenario where the master nodes belong to an operator +tenant and the cluster nodegroups belong to different projects. + +Adding the nodegroup entity means that some information currently stored in the +the cluster, should be moved to nodegroup table. The cluster columns that need +to be dropped are the following: + +* node_count +* master_count +* node_addresses +* master_addresses + + NOTE:: + It is really important to point out that moving information from the + cluster to the nodegroup table will NOT result in changing the output of + the existing CLIs. The only thing that will change is the way this + information is stored and subsequently fetched from the database. + e.g. The cluster show output will contain the node_count information but it + will be calculated at the API level by summing the node_count of all + the associated worker nodegroups. + +REST API Impact +--------------- + +This change leads to a minor version increase in the Magnum API, the +addition of a new REST endpoint and a new set of CLI commands. + +Below is a description of the commands to manage nodegroups: + +* add a new nodegroup, in an existing cluster:: + + openstack coe node-group create + +* delete an existing nodegroup:: + + openstack coe node-group delete + +* update an existing nodegroup:: + + openstack coe node-group update + +* list existing nodegroups given an existing cluster:: + + openstack coe node-group list + + +------+-------------+-------------+------------+-----------+ + | uuid | name | flavor id | node count | role | + +------+-------------+-------------+------------+-----------+ + | ... | nodegroup1 | flavor-1 | 3 | master | + +------+-------------+-------------+------------+-----------+ + | ... | nodegroup2 | flavor-2 | 5 | worker | + +------+-------------+-------------+------------+-----------+ + +* show details of an existing nodegroup:: + + openstack coe node-group show + + +---------------------+-------------------------------------------+ + | Property | Value | + +---------------------+-------------------------------------------+ + | uuid | 5b2ee3b5-2f85-4917-be7c-11a2c82031ad | + | name | nodegroup1 | + | cluster uuid | | + | project id | | + | docker volume size | 5 | + | labels | , , | + | flavor id | flavor1 | + | node count | 3 | + | node addresses | , , | + | role | master | + +---------------------+-------------------------------------------+ + +Backward Compatibility +---------------------- + +In this section we refer to the clusters created before the introduction of +Magnum Nodegroups as "old clusters". + +During the upgrade, the existing stacks will not be modified. This is the +reason that adding as well as deleting nodegroups to/from old clusters will be +not permitted. + +Showing details for a nodegroup in an old cluster should work correctly. + +Security Impact +--------------- + +There is no keypair added in the nodegroup object as all nodegroups will +inherit the one set to the cluster. This approach was chosen, in order to not +propagate the use of keypairs to the level of nodegroups and complicate further +their removal in the future. + +Notifications Impact +-------------------- + +New notifications will be added for: +* nodegroup creation +* nodegroup deletion +* nodegroup update + +Other End User Impact +--------------------- + +New subcommands will be added to the openstack client as described above. + +At the same time, some of the existing commands for managing clusters have to +be adapted: + +### Cluster Create ### +The existing create cluster cli will result in a cluster with two default +nodegroups, one for the master node(s) and one for the worker(s). + +### Cluster Delete ### +When the user deletes a cluster, all the associated nodegroups will be deleted +as well. There is no point of making the user delete all the nodegroups +separately before deleting the cluster. + +### Cluster Update ### +Cluster update should continue working for the already existing clusters and it +should be deprecated for the new ones. All scaling operations for new clusters +should be done using the "node-group update" command. + +### Cluster Show ### +Firstly, the node count of the cluster should reflect the sum of the node count +fields of all its nodegroups. +Another thing that has to be handled is showing the status of the cluster. The +show cluster cli should summarize the status of its nodegroups since each stack +has its own status. + +Developer Impact +---------------- + +None. + +Implementation +-------------- + +The implementation will be done in 4 phases. + +1. Add the new API endpoint and data model entity, and the corresponding + controller implementation linked to each driver. At this point we will + have all drivers declaring every operation regarding nodegroups as + 'Not Implemented'. At the same step, we need to adapt all the operations + for cluster management. + +2. Implement the nodegroup functionality for all drivers. + +3. Add the new command line tools to the openstack client. + +4. Implement the Magnum nodegroup notifications, for creation, deletion and + update. + +Assignee(s) +----------- + +Primary assignee: + + +Work Items +---------- + +See `Implementation`_. + +Testing +------- + +A new set of unit and functional tests covering creation, deletion and update +of nodegroups is needed. At the same time, the existing tests for cluster +creation, deletion and update should be adapted. + +Documentation Impact +-------------------- + +New documentation will be added to describe the new API endpoint and its +functionality as well as the changes in the existing cluster API. + +References +---------- + +Magnum Nodegroups Blueprint: +https://blueprints.launchpad.net/magnum/+spec/magnum-nodegroups