Introduce etcd for service coordination

The spec introduces etcd and tooz for the inspector service
coordination, which is a prerequisite for service split.

Group management will be used to calculate which ironic-inspector
conductor service the rpc request will be sent to, distributed
locking support will help to avoid racing under concurrent environment.

Change-Id: If2c228c4d2ebaf93d79c4cbf2cc39146f8f74086
Story: 2001842
Task: 30376
This commit is contained in:
Kaifeng Wang 2019-04-08 16:16:32 +08:00
parent 110ec01268
commit 2a157e2630
1 changed files with 175 additions and 0 deletions

175
specs/etcd-coordination.rst Normal file
View File

@ -0,0 +1,175 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
========================================
Incorporate ETCD as service coordination
========================================
https://storyboard.openstack.org/#!/story/2001842
This spec is part of the ironic-inspector HA work. To further split the
inspector service, this spec proposes to introduce etcd as the base service
for the coordination between ironic-inspector api and conductor services.
Problem description
===================
From the previous work, the single process ironic-inspector is logically
splitted into two services both running under ``oslo.service``, namely
``ironic_inspector`` and ``ironic-inspector-conductor``.
To split two services into two processes, we need to address existing
functional test issue before we can split two services into respective
executables. Currently the functional test uses fake messaging driver
which only works for single process, we can either add rabbitmq support
for functional test env or introduce another messaging mechanism like
``json-rpc``, but the first solution is not desirable.
Even when services are splitted, we are facing the challenge of service
coordination, for multiple inspector conductor services, we need a way to
prevent the racing of concurrent operation on the same node, or to choose
which inspector conductor should the request be delivered to.
Proposed change
===============
As etcd is already a base service for the OpenStack platform, the spec
proposes to add ``python-etcd3`` and ``tooz`` as project requirements for the
service coordination. ``tooz`` provides several feature encapsulations like
group management, locking, etc. Group management is only implemented for ETCD
API v3, thus ``python-etcd3`` is required.
All proposed work is implemented with tooz interfaces. Each service will
create a coordinator and keep heartbeating, the example workflow for
ironic-inspector API service:
#. Create a coordinator with hostname
#. Create a group "ironic-inspector-service-group", bypass if the group
already exists.
#. Query query group members upon API request, randomly pick one conductor,
generate topic according to hostname and send rpc request.
The example workflow for ironic-inspector conductor service:
#. Create a coordinator with hostname
#. Join group "ironic-inspector-service-group", create and join if the
group does not exist.
#. Leaving group explicitly when service is shutdown.
There is no distributed locking support for ironic-inspector, this spec will
introduce an abstract lock layer, and implement locking support based on tooz.
Alternatives
------------
Though it's totally workable to utilize database as the the coordination
source just like ironic, it would be much lighter if implemented with tooz.
tooz also supports multiple backends, which brings more possibilities in
deployement.
Data model impact
-----------------
None.
HTTP API impact
---------------
None.
Client (CLI) impact
-------------------
None.
Ironic python agent impact
--------------------------
None.
Ironic impact
-------------
None.
Performance and scalability impact
----------------------------------
There should be no obvious performance and scalability impact before services
are actually splitted.
Security impact
---------------
None.
Deployer impact
---------------
A new configuration section ``etcd`` with options below will be added to
support etcd operation:
* ``host`` and ``port``: specify the etcd service endpoint.
* ``ca_cert``, ``cert_key`` and ``cert_cert``: specify SSL related
authentication.
* ``timeout``: connection timeout per request.
* ``user`` and ``password``: the username and password if etcd authentication
is required.
* ``group_path``: the name of service group used to coordinate inspector
services, it can be a key path, a key prefix or both. By default, the value
will be ``/openstack/ironic-inspector/service-group``.
* ``lock_prefix``: a string prefix for a lock name, for example, locking a node
``fake-node-uuid`` with prefix ``ironic-inspector`` will have a lock name of
``ironic-inspector.fake-node-uuid`` passed to tooz.
Developer impact
----------------
None.
Upgrades and Backwards Compatibility
------------------------------------
After this spec is implemented, etcd v3 will be a mandatory requirement for
inspector service working properly.
Implementation
==============
Assignee(s)
-----------
Primary assignee:
kaifeng - kaifeng.w@gmail.com
Other contributors:
None
Work Items
----------
Implement proposed work.
Dependencies
============
``python-etcd3`` and ``tooz`` are required library support.
There should be a etcd v3 service running in the same cloud.
Testing
=======
Will be covered by unittest and bifrost.
References
==========
https://docs.openstack.org/tooz/latest/user/index.html