Add spec for implementing Nova Cells v2
Add a spec for adding support for deploying multiple Nova v2 cells with the Openstack Charms. Change-Id: I570410c33e16bda9e7b06bf7f44c29924b066320
This commit is contained in:
parent
a71e923773
commit
cc81bf49ac
|
@ -0,0 +1,343 @@
|
|||
..
|
||||
Copyright 2018 Canonical UK Ltd
|
||||
|
||||
This work is licensed under a Creative Commons Attribution 3.0
|
||||
Unported License.
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
..
|
||||
This template should be in ReSTructured text. Please do not delete
|
||||
any of the sections in this template. If you have nothing to say
|
||||
for a whole section, just write: "None". For help with syntax, see
|
||||
http://sphinx-doc.org/rest.html To test out your formatting, see
|
||||
http://www.tele3.cz/jbar/rest/rest.html
|
||||
|
||||
========
|
||||
Cells V2
|
||||
========
|
||||
|
||||
Nova cells v2 has been introduced over the Ocata and Pike cycles. In fact, all
|
||||
Pike deployments are now deployments using nova cells v2 usually using a single
|
||||
compute cell.
|
||||
|
||||
Problem Description
|
||||
===================
|
||||
|
||||
Nova cells v2 allows for a group of compute nodes to have their own dedicated
|
||||
database, message queue and conductor while still being administered through
|
||||
a central API service. This has the following benefits:
|
||||
|
||||
Reduced pressure on Rabbit and MySQL in large deployments
|
||||
---------------------------------------------------------
|
||||
|
||||
In even moderately sized clouds the database and message broker can quickly
|
||||
become a bottle neck. Cells can be used to alleviate that pressure by having a
|
||||
database and message queue per cell of compute nodes. It is worth noting that
|
||||
the charms already support having traffic for neutron etc in a separate rabbit
|
||||
instance.
|
||||
|
||||
Create multiple failure domains
|
||||
-------------------------------
|
||||
|
||||
Grouping compute cells with their local services allows the creation of
|
||||
discrete failure domains (from a nova POV at least).
|
||||
|
||||
Remote Compute cells (Edge computing)
|
||||
-------------------------------------
|
||||
|
||||
In some deployments a group of compute nodes maybe far removed (from a
|
||||
networking pov) from the central services. In this case it maybe useful to have
|
||||
the compute nodes act as a largely independent group.
|
||||
|
||||
Different SLAs per cell
|
||||
-----------------------
|
||||
|
||||
Different groups of compute nodes can have different levels of performance,
|
||||
HA. etc. A cell could have no local HA for the database, message queue and
|
||||
conductor for a development cell but the production cell could have
|
||||
significantly higher specification servers running clustered services.
|
||||
|
||||
(These use cases were paraphrased from `*4 <https://www.openstack.org/videos/sydney-2017/adding-cellsv2-to-your-existing-nova-deployment>`_.)
|
||||
|
||||
Proposed Change
|
||||
===============
|
||||
|
||||
To facilitate a cells v2 deployment a few relatively simple interfaces and
|
||||
relations need to be added. From a nova perspective the topology looks like `this <https://docs.openstack.org/nova/latest/_images/graphviz-d1099235724e647ca447c7bd6bf703c607ddf68f.png>`_.
|
||||
This spec proposes mapping that to this `charm topology <https://docs.google.com/drawings/d/1v5f8ow0aCGrKRIpg3uXsv2zolWsz3mGVGzLnbgUQpKQ/>`_.
|
||||
|
||||
Superconductor access to Child cells
|
||||
------------------------------------
|
||||
|
||||
The superconductor needs to be able to query the databases of the compute cells
|
||||
and to send and receive messages on the compute_cells message bus. The
|
||||
cleanest way to model this would be to have a direct Juju relation between the
|
||||
superconductor and the compute cells database and message bus. To facilitate
|
||||
this the following relations will be added to the nova-cloud-controller charm:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
requires:
|
||||
shared-db-cell:
|
||||
interface: mysql-shared
|
||||
amqp-cell:
|
||||
interface: rabbitmq
|
||||
|
||||
|
||||
Superconductor configuring child cells
|
||||
--------------------------------------
|
||||
|
||||
With the above change the superconductor has access to the child db and mq but
|
||||
does not know which compute cell name to associate with them. To solve this the
|
||||
nova-cloud-controller charm will have the following new relations:
|
||||
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
provides:
|
||||
nova-cell-api:
|
||||
interface: cell
|
||||
requires:
|
||||
nova-cell:
|
||||
interface: cell
|
||||
|
||||
The new cell relation will be used to pass the cell name, db service name and
|
||||
message queue service name.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
{
|
||||
'amqp-service': 'rabbitmq-server-cell1',
|
||||
'db-service': 'mysql-cell1',
|
||||
'cell-name': 'cell1',
|
||||
}
|
||||
|
||||
Given this information the superconductor can examine the service names that
|
||||
are attached to its shared-db-cell and amqp-cell relations and construct
|
||||
urls for them. The superconductor is then able to create the cell mapping in
|
||||
the api database by running:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
nova-manage cell_v2 create_cell \
|
||||
--name <cell_name> \
|
||||
--transport-url <transport_url> \
|
||||
--database_connection <database_connection>
|
||||
|
||||
The superconductor needs five relations to be in place and their corresponding
|
||||
contexts to be complete before the cell can be mapped. Given the
|
||||
nova-cloud-controller is a non-reactive charm special care will be needed to
|
||||
ensure that the cell mapping happens irrespective of the order in which those
|
||||
relations are completed.
|
||||
|
||||
Compute conductor no longer registering with keystone
|
||||
-----------------------------------------------------
|
||||
|
||||
The compute conductor does not need to register an endpoint with keystone nor
|
||||
does it need service credentials. As such the identity-service relation should
|
||||
not be used for compute cells. A guard should be put in place in the
|
||||
nova-cloud-controller charm to prevent a compute cells nova-cloud-controller
|
||||
from registering an incorrect endpoint in keystone.
|
||||
|
||||
Compute conductor cell name config option
|
||||
-----------------------------------------
|
||||
|
||||
The compute conductor needs to know its own cell name so that it can pass this
|
||||
information up to the superconductor. To allow this a new configuration option
|
||||
will be added to the nova-compute charm:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
options:
|
||||
cell-name:
|
||||
type: string
|
||||
default:
|
||||
description: |
|
||||
Name of the compute cell this controller is associated with. If this is
|
||||
left unset or set to api then it is assumed that this controller will
|
||||
be the top level api and cell0 controller.
|
||||
|
||||
Leaving the cell name unset assumes the current behaviour of associating the
|
||||
nova-cloud-controller with the api service, cell0 and cell1.
|
||||
|
||||
nova-compute service credentials
|
||||
--------------------------------
|
||||
|
||||
The nova-compute charm needs service credentials for RPC calls to the Nova
|
||||
Placement API and the Neutron API service. It currently gets these credentials
|
||||
via its cloud-compute relation which is ugly at best. However, given that the
|
||||
compute cells nova-cloud-controller will no longer have a relation with
|
||||
keystone it will not have any credentials to pass on to nova-compute. This is
|
||||
overcome by adding a cloud-credentials relation to the nova-compute charm.
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
requires:
|
||||
cloud-credentials:
|
||||
interface: keystone-credentials
|
||||
|
||||
nova-compute will request a username based on its service name so that users
|
||||
for different cells can be distinguished from one another.
|
||||
|
||||
Bespoke vhosts and db names
|
||||
---------------------------
|
||||
|
||||
The ability to specify a nova db name and a rabbitmq vhost name should either
|
||||
be removed from the nova-cloud-controller charm or the new cell interface needs
|
||||
to support passing those up to the superconductor so that the superconductor
|
||||
can request access to the correct resources from the compute nodes database and
|
||||
message queue.
|
||||
|
||||
Disabling unused services
|
||||
-------------------------
|
||||
|
||||
The compute cells nova-cloud-controller only needs to run the conductor service
|
||||
and possible the console services. Unused services should be disabled by the
|
||||
charm.
|
||||
|
||||
New cell conductor charm?
|
||||
-------------------------
|
||||
|
||||
The nova cloud controller in a compute node only runs a small subset of the
|
||||
nova services and does not require a lot of the complexity that is baked
|
||||
into the current nova-cloud-controller charm. This begs the question of whether
|
||||
a new cut-down reactive charm that just runs the conductor would make sense.
|
||||
Most of the changes outlined above actually impact the superconductor rather
|
||||
than the compute conductor. However, looking at this the other way around the
|
||||
changes needed to allow the nova-cloud-controller charm to act as a child
|
||||
conductor are actually quite small and so probably do not warrant the creation
|
||||
of a new charm. It is probably worth noting some historical context here too,
|
||||
every time the decision has been made to create a charm which can operate in
|
||||
multiple modes that decision has been reversed at some cost at a later data (
|
||||
ceph being a prime example).
|
||||
|
||||
Taking all that into consideration a new charm will not be written and the
|
||||
existing nova-cloud-controller charm will be extended to add support for
|
||||
running as a compute conductor.
|
||||
|
||||
Message Queues
|
||||
--------------
|
||||
|
||||
There is flexibility around which message queue the non-nova services use. A
|
||||
dedicated rabbit instance could be created for them or they could reuse the
|
||||
rabbit instance the nova api service is using.
|
||||
|
||||
Telemetry etc
|
||||
--------------
|
||||
|
||||
This spec does not touch on integration with telemetry. However, this does
|
||||
require further investigation to ensure that message data can be collected.
|
||||
|
||||
Juju service names
|
||||
------------------
|
||||
|
||||
It will be useful, but not required, to embed the cell name in the service name
|
||||
of each component that is cell specific. Eg deploying services for cellN
|
||||
may look like this:
|
||||
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
juju deploy nova-compute nova-compute-cellN
|
||||
juju deploy nova-cloud-controller nova-cloud-controller-cellN
|
||||
juju deploy mysql mysql-cellN
|
||||
juju deploy rabbitmq-server rabbitmq-server-cellN
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
* Do nothing and do not support additional nova v2 cells.
|
||||
* Resurrect support for the deprecated and bug ridden cells v1
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Unknown
|
||||
|
||||
Gerrit Topic
|
||||
------------
|
||||
|
||||
Use Gerrit topic "<topic_name>" for all patches related to this spec.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
git-review -t cellsv2
|
||||
|
||||
Existing Work
|
||||
-------------
|
||||
|
||||
As part of writing the spec prototype charms and a bundle were created
|
||||
for reference: `Bundle <https://gist.github.com/gnuoy/9ede4e9d426ea56951c664569e7ad957>`_
|
||||
and `charm diffs <https://gist.github.com/gnuoy/aff86d0ad616a890ba731a3cb7deef51>`_
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
* Remove support for cells v1 from nova-compute and nova-cloud-controller
|
||||
charms
|
||||
* Add identity-context relation to nova-compute and ensure the supplied
|
||||
credentials are used when rendering placement and keystone sections in
|
||||
nova.conf
|
||||
* Add shared-db-cell relation to nova-cloud-controller assuming 'nova'
|
||||
database name when requesting access.
|
||||
* Add amqp-cell relations to nova-cloud-controller assuming 'openstack' vhost
|
||||
name when requesting access.
|
||||
* Add code for registering a cell to nova-cloud-controller. This will use the
|
||||
AMQ and SharedDB contexts from the shared-db-cell and amqp-cell relation
|
||||
to create the cell mapping.
|
||||
* Update nova.conf templates in nova-cloud-controller to only render api db
|
||||
url if the nova-cloud-controller is a superconductor.
|
||||
* Update db initialisation code to only run the relevant cell migration if not
|
||||
a superconductor.
|
||||
* Add nova-cell and nova-cell-api relations and ensure that the shared-db, amqp
|
||||
shared-db-cell, amqp-cell and nova-api-cell relations all attempt to register
|
||||
compute cells.
|
||||
* Write bundles to use cells topology
|
||||
* Check integration with other services (designate and telemetry in particular)
|
||||
|
||||
Repositories
|
||||
------------
|
||||
|
||||
No new repositories needed.
|
||||
|
||||
Documentation
|
||||
-------------
|
||||
|
||||
* READMEs of nova-cloud-controller and nova-compute will need updating to
|
||||
explain new relations and config options.
|
||||
* Blog with deployment walkthrough and explanation.
|
||||
* Update Openstack Charm documentation to explain how to do a multi-cell
|
||||
deployment
|
||||
* Add bundle to charm store.
|
||||
|
||||
Security
|
||||
--------
|
||||
|
||||
No new security risks that I am aware of
|
||||
|
||||
Testing
|
||||
-------
|
||||
|
||||
* A multi-cell topology is probably beyond the scope of amulet tests
|
||||
* Bundles added to openstack-charm-testing
|
||||
* Mojo specs
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None that I can think of
|
||||
|
||||
Credits
|
||||
-------
|
||||
|
||||
Much of the benefit of cells etc was lifted from \*4
|
||||
|
||||
\*1 https://docs.openstack.org/nova/pike/cli/nova-manage.html
|
||||
\*2 https://docs.openstack.org/nova/latest/user/cellsv2-layout.html
|
||||
\*3 https://bugs.launchpad.net/nova/+bug/1742421
|
||||
\*4 https://www.openstack.org/videos/sydney-2017/adding-cellsv2-to-your-existing-nova-deployment
|
Loading…
Reference in New Issue