A Worker Model for Designate

- Proposal to change Designate's architecture to more of a
  producer-consumer model of work.
- Includes upgrade path

Change-Id: I731bab1b2c24a6f2c9392698914a6b09c12765af
This commit is contained in:
Tim Simmons 2015-12-16 10:33:59 -06:00 committed by Eric Larson
parent 1104957ab7
commit c03d065ef2
2 changed files with 379 additions and 0 deletions

View File

@ -0,0 +1,373 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
http://creativecommons.org/licenses/by/3.0/legalcode
============================
A Worker Model for Designate
============================
Thesis: Gratuitous unnecessary complexity exists within the Designate
code, and in the operation of Designate as a service. Making Designate
a producer-worker type of project will vastly simplify the development
and operation, and align it more to the true nature of the service it
provides (DNS).
Problem description
===================
``designate-pool-manager`` does a reasonably good job at pushing Create/Delete
changes out to nameservers, but the process gets a bit less shiny after that.
- Polling to see if state is live is done via asynchronous and synchronous RPC
with another component.
- Cache usage is a mess (storing mutliple keys, keeping one key around forever,
storing a different key for each type of operation).
- Periodic Sync/Recovery are very unreliable as the number of changes grows.
- The update_status and logic for calculating consensus is heavy-handed, too
eager, and too complex.
- The state machine is very foggy, and the logic for updating status that gets
pushed into central is obfuscated.
- Pool Managers are tied to one pool
``designate-zone-manager`` does a good job at executing periodic timers for the
zones it manages. However:
- One zone-manager process is responsible for a certain set of zones, if the
operations for that set of zones gets heavy, a single zone manager process
could become overwhelmed.
- We rely on ``tooz`` to manage the extremely delicate task of ensuring
balance and coverage of all zones by zone manager processes.
- Certain work (export) that's in the critical path of operations has already
crept into the component that wasn't really meant for that. As a substitute
for proper workers, the zone-manager is looking like the current answer.
``designate-mdns`` is a DNS server written in Python. It works well for small
amounts of traffic, but as traffic grows, we may realize that we need it to be
more specialized, as a DNS server written in Python should be. The logic for
sending NOTIFYs and polling of changes seems less likely to belong in mdns in
the future. If those bits were removed, ``designate-mdns`` could be rewritten
to make use of a better tool for the problem.
Proposed change
===============
A change to the underlying architecture of executing actual work on DNS servers
and the running of other tasks. Essentially, removing
``designate-pool-manager`` and ``designate-zone-manager``, replacing them with
``designate-worker`` and ``designate-producer`` (names up for debate) and
removing certain logic from ``designate-mdns``. All of the actual "work" would
be put in the scalable ``designate-worker`` process, which has work produced
by the API/Central, and ``designate-producer``. ``designate-mdns`` gets back
to it's roots, and only answers AXFRs. Callbacks over queues that don't involve
the API are eliminated, simplifying the code in all components that deal with
DNS servers.
**No changes** to the API or Database are required, with minimal changes to
``designate-central``.
To the end user, the results of this change would be relatively simple:
- Scalability limited only by DNS servers, datastores, and queues. If at any
point Designate starts to slow down in some aspect, unless there's an issue
with those services listed above, the problem can be solved by throwing more
Designate processes at the problem.
- Fault tolerance. One or more Designate processes dying would be inivisible
in almost every case. An operator wouldn't have to fear a certain process
dying, because no one process handles responsiblities others will not, as long
as there is a minimal amount of redundancy.
- Simplicity. The Designate architecture becomes much cleaner and easier to
understand. The questions of "what is the difference between pool-manager and
zone-manager" and "what's synchronous and asynchronous" become less muddy,
or disappear altogether.
- Operationally, this opens the door for simpler scenarios for small
deployments where a customer need only scale a couple components (even on
the same machine) to get more performance. You could even go so far as to
deploy app nodes that only shared datastore (db, cache) and had their own
designate components and queues.
Architectural Changes
---------------------
These are the services that would remain present:
- ``designate-api`` - To receive JSON and parse it for Designate
- ``designate-central`` - To do validation/storage of zone/record data and
send CRUD tasks to ``designate-worker``.
- ``designate-mdns`` - To **only** perfrom AXFRs from Designate's database
- ``designate-worker`` - Any and all tasks that Designate needs to
produce state on nameservers
- ``designate-producer`` - To run periodic/timed jobs, and produce work
for ``designate-worker`` that is out the normal path of API operations.
For example: Periodic recovery.
Other necessary components:
- Queue - Usually RabbitMQ
- Database - Usually MySQL
- Cache - (encouraged, although not necessary) Memcached, MySQL, Redis
- A ``tooz`` backend (Zookeeper, Memcahed, Redis)
Services/components that are no longer required:
- ``designate-pool-manager``
- ``designate-zone-manager``
Designate Worker
----------------
The scope of ``designate-worker``'s duties are essentially any and all tasks
that Designate needs to take action to perform. For example:
- Create, Update, and Delete zones on pool targets via backend plugins
- Poll that the change is live
- Update a cache with the serial number for a zone/target
- Emit zone-exists events for billing
- Flatten Alias Records
- Clean up deleted zones
- Importing/Exporting zones
- Many more
The service essentially exposes a vast RPCAPI that contains ``tasks``.
An important difference to Designate's current model is that all
of these tasks do not call back. They are all fire-and-forget tasks
that will be shoved on a queue and await worker action.
``tasks`` are essentially functions, that given relatively simple input, make
the desired income happen on either nameservers, or the Designate database.
Cache
~~~~~
The cache performs a similar function to the current pool manager cache
now.
It will store state for each different type of task that a worker can use
to decide if it needs to continue with a ``task`` received from the queue, or
simply drop it and move on to the next task.
This varies by task, some are relatively simple, knowing whether to perform
a zone update to a certain serial number is knowable by seeing the serial
number of a zone on each target in a pool. For DNSSEC zone signing, a key
would probably be placed to indicate that a certain worker was working on
resigning a zone, as it's a more long-running process.
In the absence of such a cache, each worker will act naive and try to complete
each task it receives.
Tasks
~~~~~
Each task will be idempotent, to the degree that it is possible.
As mentioned in the ``Cache`` section, to a certain degree, tasks could be
able to know if they need to complete work based on information in the cache.
But they should also make an effort to not duplicate work, for instance,
if it's trying to delete a zone that's already gone, it should interpret
the zone being gone as a sign that the delete is successful and move on.
On the whole these tasks would simply be lifted from where they currently
exist in the code, and wouldn't change all that much.
A slight change might be that during the course of the task, we may recheck
that the work that is being undertaken still needs to be done.
As an example:
An API customer creates many recordsets very quickly. The work being dispatched
to ``designate-worker`` processes would go to a lot of different places, and one of
the first updates to actually reach a nameserver might contain all the changes
necessary to bring the zone up-to-date. The other tasks being worked should
check before they send their NOTIFY that the state is still behind, and check
again after they've sent their NOTIFY, but before they've began polling, so
that they can cut down on unnecessary work for themselves, and the nameservers.
You could get even smarter about the markers that you drop in a cache for these
tasks. For example, on a zone update, you could drop a key in the cache of the
nature ``zoneupdate-foo.com.``, and other if other zoneupdate tasks for the same
zone see that key, they could know to throw away their job and move on.
designate-mdns changes
~~~~~~~~~~~~~~~~~~~~~~
The partioning of certain elements that Designate had previously disappears.
The worker service will send DNS queries, it will do cpu-bound tasks, but it
will be one place to scale. It should be possible to have an extremely robust
Designate architecture by simply scaling these workers.
``designate-mdns`` will have it's entire RPCAPI transferred to
``designate-worker``. This will vastly simplify the the amount of work it needs
to do while it sits in the critical path of providing zone transfers to
nameservers Designate manages.
As a side-note, this would make this service much easier to optimize, or even
rewrite in a faster programming language.
Designate Producer
------------------
``designate-producer`` is the place where jobs that produce tasks that
are outside of the normal path of API operations and operate on
some kind of timer live.
The **key** difference to the ``zone-manager`` service, is that this service
simply generates work to be done, rather than actually doing the work.
``designate-producer`` simply decides what needs to be done, and sends RPC
messages on the queue to ``designate-worker`` to actually perform the work.
As we've grown Designate, we've seen the need
for this grow vastly, and even more so in the future.
- Deleted zone purging
- Refreshing Secondary Zones
- Emitting zone exists tasks and other billing events
- DNSSEC signing of zones
- Alias record flattening
We could move the ``periodic_sync`` and ``periodic_recovery`` tasks from the
Pool Manager to this service.
The ``periodic_sync`` and ``periodic_recovery`` tasks in the Pool Manager have
been a constant struggle to maintain and get right. This is due to a lot of
factors.
Making the generation of ``tasks`` by periodic processes the job of only one
Designate component simplifies the architecture, and allows to solve the
problems it presents one time, one way, and generally do one thing well.
Timers
~~~~~~
This service would essentially be a group of timers that wake up on a cadence
and create work to be put on the queue for ``designate-worker`` processes to
pick up.
The overhead is relatively low here, as we're not actually doing the work, but
more just scheduling the work to be done. This way we can focus on the
unexpectedly difficult problem of dividing up the production of work that these
processes will put on the queue.
Dividing work
`````````````
To explain more clearly, the biggest problem we have in this service is making
it fault-tolerant, but not duplicating work for ``designate-worker`` processes
to do. This was solved before by ``tooz`` using the zone shards in the Designate
database in ``designate-zone-manager`` and it seems to work well.
``designate-worker`` processes, as described above, will do a certain amount of
optimization so that they don't duplicate work. But if we generate too much
cruft, those processes will be bogged down just by the task of seeing if they
need to do work. So we should work to minimize the amount of duplicate work we
produce.
Queue Priority
--------------
One potential complication of this implementation is that, as the number of
timers and tasks that are out of Designate's critical path of implementation
grow, they may get in the way of ``designate-worker`` processes doing the tasks
that are most important, namely CRUD of zones and records.
We propose having queues/exchanges for each type of task, this would be an
optimal way to monitor the health of different types of tasks, and isolate the
sometimes long-running tasks that periodic timers will produce from the
relatively quicker, and more important CRUD operations. The algorithm for
choosing tasks from the various options could be customized by a particular
process if desired. But a good general default would be to handle CRUD
operations from ``designate-central`` first. Or use a weighted random
choice algorithm, with the critical-path CRUD operations having higher weights.
Work Items
----------
- Stand up a ``designate-worker`` service
- Migrate CRUD zone operations to ``designate-worker``, reworking the cache
implementation.
- Stand up ``designate-producer`` service
- Migrate Pool Manager periodic tasks to ``designate-producer``, with small
modifications to ensure they simply generate work for ``designate-worker``
- Move ``designate-mdns``' NOTIFYing and polling to ``designate-worker``
- Fix up the ``update_status`` logic in ``designate-central``
- Migrate all tasks from ``zone-manager`` to a split of ``designate-worker``
and ``designate-producer``, where ``producer`` creates the work on the queue
and ``worker`` executes it. Ensuring scalable logic for distributed work
production using cache or some other method in ``designate-producer``
- Deprecate ``pool-manager`` and ``zone-manager``
- Profit!!!
Upgrade Implications
--------------------
Upgrading to the next release with this change would introduce some operational
changes. Mostly around the services that need to be deployed. The deployment
need not be a cutover, deploying Newton Designate will work with or without the
worker. This is because of a variety of compatibility measures taken:
- ``designate-central`` will have a configurable "zone api" that it can swap
between ``designate-pool-manager`` and ``designate-worker``. If the worker
process is enabled, central can send c/u/d zones events to the the worker
instead of the pool manager.
- ``designate-worker``'s ability to send NOTIFYs and Poll DNS servers can
replace a portion of ``designate-mdns``' responsibilities. For certain
DNS servers, it's theorized that they won't behave well if a NOTIFY comes
from a different server than it's master that it zone transfers from.
For this reason, ``designate-worker``'s ability to send NOTIFYs is a
configurable element. Since the worker calls into a backend plugin to update
zones, the NOTIFY via mdns logic in those backends can remain, and if the
operator so chooses, the NOTIFY task in the worker can no-op. This works both
ways. An operator can also choose to have the MiniDNS notify calls noop, and
allow them to be completed by the worker process.
- For those who choose to firewall all DNS traffic between Designate and DNS
servers, it will be safest to deploy ``designate-worker`` processes in
close proximity to ``designate-mdns`` processes. So that the DNS polling that
``designate-worker`` does can be completed, where ``designate-mdns`` used
to do it.
- As periodic-type processes are migrated to ``designate-producer`` and
``designate-worker``, they can be marked as "worker tasks" in
``designate-zone-manager``, that can be turned off behind a configuration
flag.
The process for upgrading to the worker model code, after deploying Newton
could look something like this:
1. (To account for firewalling dns traffic) Start ``designate-worker``
processes operating on the same IPs as ``designate-mdns`` processes via
proximity or proxy. The default configuration will still allow NOTIFYs
and DNS polling to occur via ``designate-mdns``, and all other operations
to work in ``designate-pool-manager``. No traffic will reach the worker.
2. Toggle configuration values ``designate-worker::enabled`` and
``designate-worker::notify`` and restart ``designate-worker``.
3. Restart ``designate-central`` and ``designate-mdns`` processes so that mdns
NOTIFY calls no-op, and central starts to use the worker instead of
``designate-pool-manager``.
4. Toggle the ``designate-zone-manager::worker-tasks`` config flag and restart
``designate-zone-manager`` so that it hands off periodic tasks to the
producer/worker.
5. Start the ``designate-producer`` process so that the worker starts doing
recovery and other periodic tasks.
6. Stop the ``designate-pool-manger`` processes, and if all processes are
migrated out of ``designate-zone-manager``, that as well.
Milestones
----------
Target Milestone for completion:
Newton
Author(s)
---------
Tim Simmons https://launchpad.net/~timsim
Paul Glass https://launchpad.net/~pnglass
Eric Larson https://launchpad.net/~eric-larson

View File

@ -169,3 +169,9 @@ Dependencies
- Does this feature require any new library dependencies or code otherwise not
included in OpenStack? Or does it depend on a specific version of library?
Upgrade Implications
====================
Does the spec introduce a change for those running the current, or an older
version of Designate? If so, describe the change(s).