Server Pools - Manager

Specifications for the new Pool Manager service needed for server pools.

Change-Id: I92c7cfb861980b1345527998b41c9187a8395c38
blueprint: server-pools-service
This commit is contained in:
rjrjr 2014-08-12 01:27:45 -07:00 committed by Graham Hayes
parent b69bc4ad21
commit 4adb499301
1 changed files with 504 additions and 0 deletions

View File

@ -0,0 +1,504 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
http://creativecommons.org/licenses/by/3.0/legalcode
..
====================
Server Pools Manager
====================
https://blueprints.launchpad.net/designate/+spec/server-pools-service
This specification outlines the Pool Manager, Central, backend driver,
and storage changes needed to support the new Pool Manager service.
Problem description
===================
Coordinating DNS operations across many different backends is difficult,
especially when there is a great number of DNS servers. A Pool Manager
service is needed to manage the changes from the Designate database to
the many DNS servers. A Pool Manager will also track the status of those
changes. When this specification is implemented, a Pool Manager will
be used to manage a pool with multiple DNS servers, even if those DNS
servers are of different types.
Proposed change
===============
API Changes
-----------
None
Pool Manager Changes
--------------------
A new Designate service, called designate-pool-manager, will be created.
This is the Pool Manager. The Pool Manager will get its configuration
from the configuration file when it is instantiated.
The configuration section is called **[service:pool_manager]**. The options
for this section are:
+--------------------------+-------------+--------------+--------------------------------------------------------------------------------------------------------+
| **Parameter** | **Default** | **Required** | **Notes** |
+==========================+=============+==============+========================================================================================================+
| *pool_name* | 'default' | Yes | The pool name of the pool managed by this instance of the Pool Manager |
+--------------------------+-------------+--------------+--------------------------------------------------------------------------------------------------------+
| *threshold_percentage* | 100 | Yes | The percentage of servers requiring a successful update for a domain change to be considered active |
+--------------------------+-------------+--------------+--------------------------------------------------------------------------------------------------------+
| *poll_timeout* | 30 | Yes | The time to wait for a NOTIFY response from a name server |
+--------------------------+-------------+--------------+--------------------------------------------------------------------------------------------------------+
| *poll_retry_interval* | 2 | Yes | The time between retrying to send a NOTIFY request and waiting for a NOTIFY response |
+--------------------------+-------------+--------------+--------------------------------------------------------------------------------------------------------+
| *poll_max_retries* | 3 | Yes | The maximum number of times minidns will retry sending a NOTIFY request and wait for a NOTIFY response |
+--------------------------+-------------+--------------+--------------------------------------------------------------------------------------------------------+
| *periodic_sync_interval* | 120 | Yes | The time between sychronizing the servers with Storage |
+--------------------------+-------------+--------------+--------------------------------------------------------------------------------------------------------+
The Pool Manager will contain a map of the servers to instantiated
backend drivers. The backend driver will not be responsible for reading
the configuration information as the Pool Manager will read the
global backend driver and server specific backend driver sections
from the configuration file and pass the backend driver configuration
to the backend driver for instantiation. This map will be created when
the Pool Manager is instantiated. Please refer to the Backend Driver
Changes section in the Storage Pools - Storage specification for more
information concerning the global backend driver and server specific
backend driver sections.
The methods in the base class for the Pool Manager service include:
create_domain(context, domain)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+---------------+-------------------------------+--------------+
| **Parameter** | **Description** | **Required** |
+===============+===============================+==============+
| *context* | Security context information. | Yes |
+---------------+-------------------------------+--------------+
| *domain* | The designate domain object. | Yes |
+---------------+-------------------------------+--------------+
Return Value
""""""""""""
No return value.
Design Considerations
"""""""""""""""""""""
Loop through each server in the pool and call the backend driver to create
the domain. For each call to the backend driver, the status is stored in the
pool_manager_status table with an action of 'CREATE' and a second row is
created with an action of 'UPDATE'. Successful creations have a status of
'SUCCESS' and failed creations have a status of 'ERROR'. The 'UPDATE' action
row has no initial status. Check to see if a consensus exists using the
pool_manager_status table. Consensus exists if the number of servers for the
domain with a successful creation exceed the *threshold_percentage*. If
consensus exists, the Central **update_status** method is called using the
serial number used when creating the domain and a status of 'SUCCESS'. If
consensus does not exist, the Central **update_status** method is called
using the serial number used when creating the domain and a status of 'ERROR'.
Cast vs. Call
"""""""""""""
This is an RPC cast. Communication about the status of the domain
creation will be handled using the Central **update_status** method.
delete_domain(context, domain)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+---------------+-------------------------------+--------------+
| **Parameter** | **Description** | **Required** |
+===============+===============================+==============+
| *context* | Security context information. | Yes |
+---------------+-------------------------------+--------------+
| *domain* | The designate domain object. | Yes |
+---------------+-------------------------------+--------------+
Return Value
""""""""""""
No return value.
Design Considerations
"""""""""""""""""""""
Loop through each server in the pool and call the backend driver to delete
the domain. For each call to the backend driver, the status is stored in the
pool_manager_status table with an action of 'DELETE'. Successful deletions
have a status of 'SUCCESS' and failed deletions have a status of 'ERROR'.
Check to see if a consensus exists using the pool_manager_status table.
Consensus exists if the number of servers for the domain with a successful
deletion exceed the *threshold_percentage*. If consensus exists, the
Central **update_status** method is called using the serial number used when
deleting the domain and a status of 'SUCCESS'. If consensus does not exist,
the Central **update_status** method is called using the serial number
used when creating the domain and a status of 'ERROR'.
Cast vs. Call
"""""""""""""
This is an RPC cast. Communication about the status of the domain
deletion will be handled using the Central **update_status** method.
update_domain(context, domain)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+---------------+-------------------------------+--------------+
| **Parameter** | **Description** | **Required** |
+===============+===============================+==============+
| *context* | Security context information. | Yes |
+---------------+-------------------------------+--------------+
| *domain* | The designate domain object. | Yes |
+---------------+-------------------------------+--------------+
Return Value
""""""""""""
No return value.
Design Considerations
"""""""""""""""""""""
Loop through each server in the pool and call the minidns
**notify_zone_changed** method. Loop through each server again and call
the minidns **poll_for_serial_number** method.
Cast vs. Call
"""""""""""""
This is an RPC cast. Communication about the status of the domain update
will be handled using the Central **update_status** method which is
called by the the Pool Manager **update_status** method. The minidns
**poll_for_serial_number** method invokes the Pool Manager
**update_status** method when it completes.
update_status(context, domain, name_server, status, serial_number)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+-----------------+-----------------------------------------------------------------+--------------+
| **Parameter** | **Description** | **Required** |
+=================+=================================================================+==============+
| *context* | Security context information. | Yes |
+-----------------+-----------------------------------------------------------------+--------------+
| *domain* | The designate domain object. | Yes |
+-----------------+-----------------------------------------------------------------+--------------+
| *name_server* | The name server for which this serial number is applicable. | Yes |
+-----------------+-----------------------------------------------------------------+--------------+
| *status* | The status, 'SUCCESS' or 'ERROR'. | Yes |
+-----------------+-----------------------------------------------------------------+--------------+
| *serial_number* | The serial number received from the name server for the domain. | Yes |
+-----------------+-----------------------------------------------------------------+--------------+
Return Value
""""""""""""
No return value.
Design Considerations
"""""""""""""""""""""
Reads the existing serial number from the pool_manager_status table for the
server and domain. If the new serial number > the existing serial number,
update the row and check to see if a consensus exists using the
pool_manager_status table. Consensus exists if the number of servers for
the domain with a serial number > the existing serial number exceed the
*threshold_percentage*. Servers are discounted from participating in the
consensus starting with the servers with the lowest serial numbers until the
minimum number of servers needed to achieve consensus based on the
*threshold_percentage* is realized. If the existing serial number < all the
serial numbers for the remaining servers, the Central **update_status** method
is called using the lowest (consensus) serial number for those remaining
servers and a status of 'SUCCESS'.
If > 100 - *threshold_percentage* servers for the domain have a status of
'ERROR', the Central **update_status** method is called using the lowest
serial number greater than the consensus serial number (calculated above) and
a status of 'ERROR'.
Cast vs. Call
"""""""""""""
This is an RPC cast.
periodic_sync()
^^^^^^^^^^^^^^^
Return Value
""""""""""""
No return value.
Design Considerations
"""""""""""""""""""""
This method is a thread that is created when Pool Manager is instantiated.
The intent of this thread is to read the pool_manager_status table and
perform failed create, delete, and updates operations. Additionally, the
thread will call the minidns **poll_for_serial_number** method for each
domain and server to ensure the server is synchronized with Storage.
Every *period_sync_interval*, this thread will perform the following
operations:
Read the pool_manager_status table looking for 'CREATE' actions that
have a status of 'ERROR' grouping by domains and ordering by the row
create time. Check to see if a consensus already exists for the domain
creation. Loop through each servers with a failed creation, using the
backend driver to attempt creation. If consensus does not already exist,
check for consensus and call the Central **update_status** if consensus
is achieved.
Read the pool_manager_status table looking for 'DELETE' actions that
have a status of 'ERROR' grouping by domains and ordering by the row
create time. Check to see if a consensus already exists for the domain
deletion. Loop through each servers with a failed deletion, using the
backend driver to attempt deletion. If consensus does not already exist,
check for consensus and call the Central **update_status** if consensus
is achieved.
For each domain in the pool, read the domain's serial number from Storage.
Loop through each server in the pool and read the pool_manager_status
table looking for 'UPDATE' actions for the domain that have a serial number
< the domain's serial number and call the minidns **notify_zone_changed**
method.
Finally, for each domain in the pool, read the domain's serial number
from Storage. Loop through each server in the pool and call the minidns
**poll_for_serial_number** method.
Central Changes
---------------
The Central service will be updated to use the Pool Manager instead of the
backend driver. Additionally, the default_pool_name option will be removed
from the **[service:central]** section of the Designate configuration.
All domains will be 'PENDING' status initially and calls to the Central
**update_status** method by the Pool Manager will change the status.
When creating, updating, or deleting records, records will have the serial
number field set to the new serial number of the domain. The task will be
'ADD', 'DELETE', or 'UPDATE' corresponding to the operation. The status
will be 'PENDING'.
Valid record states are:
+----------+------------+
| **task** | **status** |
+==========+============+
| 'ADD' | 'PENDING' |
+----------+------------+
| 'ADD' | 'ERROR' |
+----------+------------+
| 'DELETE' | 'PENDING' |
+----------+------------+
| 'DELETE' | 'ERROR' |
+----------+------------+
| 'UPDATE' | 'PENDING' |
+----------+------------+
| 'UPDATE' | 'ERROR' |
+----------+------------+
| 'NONE' | 'ACTIVE' |
+----------+------------+
| 'NONE' | 'DELETED' |
+----------+------------+
Affected code in the Central service will be updated appropriately to align
with these states.
The new method needed to update the status of domains and records is:
update_status(context, domain, status, serial_number)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+-----------------+---------------------------------------------+--------------+
| **Parameter** | **Description** | **Required** |
+=================+=============================================+==============+
| *context* | Security context information. | Yes |
+-----------------+---------------------------------------------+--------------+
| *domain* | The designate domain object. | Yes |
+-----------------+---------------------------------------------+--------------+
| *status* | The status, 'SUCCESS' or 'ERROR'. | Yes |
+-----------------+---------------------------------------------+--------------+
| *serial_number* | The consensus serial number for the domain. | Yes |
+-----------------+---------------------------------------------+--------------+
Return Value
""""""""""""
No return value.
Design Considerations
"""""""""""""""""""""
If the status is 'SUCCESS':
Check the status of the domain and if it has a status of 'PENDING' or 'ERROR',
set the status to 'ACTIVE'.
Check the status of records for the domain. If they have a task of
'ADD' or 'UPDATE' and a status of 'PENDING' or 'ERROR', set the task
to 'NONE' and the status to 'ACTIVE' if the consensus serial number >= serial
number field.
Check the status of records for the domain. If they have a task of
'DELETE' and a status of 'PENDING' or 'ERROR', set the task to 'NONE' and
the status to 'DELETED' if the consensus serial number >= serial number field.
If the status is 'ERROR':
Check the status of the domain and if it has a status of 'PENDING', set the
status to 'ERROR'.
Check the status of records for the domain. If they have a status of
'PENDING', set the status to 'ERROR' if the consensus serial number >=
serial number field.
Cast vs. Call
"""""""""""""
This is an RPC call.
Backend Driver Changes
----------------------
The backend driver will now be instantiated with information provided by
the Pool Manager as explained in the Pool Manager Changes section. This is
necessary because of server specific backend driver configurations.
The backend driver will continue to support the same configuration options
they currently do, only the section names will change by adding a wildcard
qualifier for the server. For example, the backend driver section for
PowerDNS will now be **[backend:powerdns:*]**. This syntax will denote the
global configuration for the backend driver. This is done to allow for
server specific backend driver configurations.
The new server specific backend driver section in the configuration will be
**[backend:powerdns:<uuid>]** where uuid is a universally unique identifier.
The options for this section are:
+---------------+-------------+--------------+-----------------------------------------------+
| **Parameter** | **Default** | **Required** | **Notes** |
+===============+=============+==============+===============================================+
| *host* | None | Yes | The host name or IP address of the DNS server |
+---------------+-------------+--------------+-----------------------------------------------+
| *port* | 53 | Yes | The port of the DNS server |
+---------------+-------------+--------------+-----------------------------------------------+
| *tsig_key* | None | Yes | The TSIG key for the DNS server |
+---------------+-------------+--------------+-----------------------------------------------+
In addition to the above options, the server specific backend driver section
will support the same options as the backend driver global section. If
those options are not included in the server specific backend driver section,
the server configuration will default to using the global configuration
option. These server specific backend driver sections will support
different backends in the same pool.
The server object will be implemented. The server object encapsulates the
server specific backend driver section in the configuration.
The following methods will not be used in the backend driver:
* create_tsigkey(tsigkey)
* update_tsigkey(tsigkey)
* delete_tsigkey(tsigkey)
This is due to the only provisioner supported initially being the 'unmanaged'
provisioner. Those methods will be used for future provisioners.
Storage Changes
---------------
A new table for the Pool Manager status will be needed. Additionally, the
domains and records tables will be modified to support pools. Domains
and records will be 'PENDING' status initially. A new status 'ERROR' will
be possible for domains and records. Finally, a record can also be
'DELETE_PENDING' and 'DELETE_ERROR'.
New Table - pool_manager_status
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+---------------+----------------------------+-----------+---------+---------------------------------+
| Column | Type | Nullable? | Unique? | Notes |
+===============+============================+===========+=========+=================================+
| id | CHAR(32) | No | Yes | PK |
+---------------+----------------------------+-----------+---------+---------------------------------+
| updated_at | DATETIME | No | No | UTC time of last update |
+---------------+----------------------------+-----------+---------+---------------------------------+
| server_id | VARCHAR(32) | No | No | Server ID |
+---------------+----------------------------+-----------+---------+---------------------------------+
| domain_id | CHAR(32) | No | No | FK to ID on domains table |
+---------------+----------------------------+-----------+---------+---------------------------------+
| status | 'SUCCESS','ERROR' | Yes | No | Status |
+---------------+----------------------------+-----------+---------+---------------------------------+
| serial_number | INT(11) | No | No | Serial number at time of status |
+---------------+----------------------------+-----------+---------+---------------------------------+
| action | 'CREATE','DELETE','UPDATE' | No | No | Action resulting in status |
+---------------+----------------------------+-----------+---------+---------------------------------+
Modify Table - domains
^^^^^^^^^^^^^^^^^^^^^^
+--------+--------------------------------------+-----------+---------+-----------+---------------+--------+
| Column | Type | Nullable? | Unique? | Default | Notes | Action |
+========+======================================+===========+=========+===========+===============+========+
| status | 'ACTIVE','PENDING','DELETED','ERROR' | No | No | 'PENDING' | Record status | update |
+--------+--------------------------------------+-----------+---------+-----------+---------------+--------+
Modify Table - records
^^^^^^^^^^^^^^^^^^^^^^
+---------------+--------------------------------------+-----------+---------+-----------+----------------------------+--------+
| Column | Type | Nullable? | Unique? | Default | Notes | Action |
+===============+======================================+===========+=========+===========+============================+========+
| serial_number | INT(11) | No | No | | Used for the record status | add |
+---------------+--------------------------------------+-----------+---------+-----------+----------------------------+--------+
| task | 'ADD','DELETE','UPDATE','NONE' | No | No | 'ADD' | Record operation task | add |
+---------------+--------------------------------------+-----------+---------+-----------+----------------------------+--------+
| status | 'ACTIVE','PENDING','DELETED','ERROR' | No | No | 'PENDING' | Record status | update |
+---------------+--------------------------------------+-----------+---------+-----------+----------------------------+--------+
Other Changes
-------------
None
Alternatives
------------
None
Implementation
==============
Assignee(s)
-----------
Primary assignee:
https://launchpad.net/~rjrjr
Additional assignee:
https://launchpad.net/~darshan104
Milestones
----------
Target Milestone for completion:
Kilo-1
Work Items
----------
* Pool Manager changes
* Central changes
* Backend driver changes
* Storage changes
Dependencies
============
This specification relies on the Server Pools - Storage specification.
This specification relies on the Server Pools - MiniDNS Support specification.