diff --git a/specs/newton/worker-model.rst b/specs/newton/worker-model.rst new file mode 100644 index 0000000..e2f0b16 --- /dev/null +++ b/specs/newton/worker-model.rst @@ -0,0 +1,373 @@ +.. + +This work is licensed under a Creative Commons Attribution 3.0 Unported License. +http://creativecommons.org/licenses/by/3.0/legalcode + +============================ +A Worker Model for Designate +============================ + +Thesis: Gratuitous unnecessary complexity exists within the Designate +code, and in the operation of Designate as a service. Making Designate +a producer-worker type of project will vastly simplify the development +and operation, and align it more to the true nature of the service it +provides (DNS). + + +Problem description +=================== + +``designate-pool-manager`` does a reasonably good job at pushing Create/Delete +changes out to nameservers, but the process gets a bit less shiny after that. + +- Polling to see if state is live is done via asynchronous and synchronous RPC + with another component. +- Cache usage is a mess (storing mutliple keys, keeping one key around forever, + storing a different key for each type of operation). +- Periodic Sync/Recovery are very unreliable as the number of changes grows. +- The update_status and logic for calculating consensus is heavy-handed, too + eager, and too complex. +- The state machine is very foggy, and the logic for updating status that gets + pushed into central is obfuscated. +- Pool Managers are tied to one pool + +``designate-zone-manager`` does a good job at executing periodic timers for the +zones it manages. However: + +- One zone-manager process is responsible for a certain set of zones, if the + operations for that set of zones gets heavy, a single zone manager process + could become overwhelmed. +- We rely on ``tooz`` to manage the extremely delicate task of ensuring + balance and coverage of all zones by zone manager processes. +- Certain work (export) that's in the critical path of operations has already + crept into the component that wasn't really meant for that. As a substitute + for proper workers, the zone-manager is looking like the current answer. + +``designate-mdns`` is a DNS server written in Python. It works well for small +amounts of traffic, but as traffic grows, we may realize that we need it to be +more specialized, as a DNS server written in Python should be. The logic for +sending NOTIFYs and polling of changes seems less likely to belong in mdns in +the future. If those bits were removed, ``designate-mdns`` could be rewritten +to make use of a better tool for the problem. + + +Proposed change +=============== + +A change to the underlying architecture of executing actual work on DNS servers +and the running of other tasks. Essentially, removing +``designate-pool-manager`` and ``designate-zone-manager``, replacing them with +``designate-worker`` and ``designate-producer`` (names up for debate) and +removing certain logic from ``designate-mdns``. All of the actual "work" would +be put in the scalable ``designate-worker`` process, which has work produced +by the API/Central, and ``designate-producer``. ``designate-mdns`` gets back +to it's roots, and only answers AXFRs. Callbacks over queues that don't involve +the API are eliminated, simplifying the code in all components that deal with +DNS servers. + +**No changes** to the API or Database are required, with minimal changes to +``designate-central``. + +To the end user, the results of this change would be relatively simple: + +- Scalability limited only by DNS servers, datastores, and queues. If at any + point Designate starts to slow down in some aspect, unless there's an issue + with those services listed above, the problem can be solved by throwing more + Designate processes at the problem. +- Fault tolerance. One or more Designate processes dying would be inivisible + in almost every case. An operator wouldn't have to fear a certain process + dying, because no one process handles responsiblities others will not, as long + as there is a minimal amount of redundancy. +- Simplicity. The Designate architecture becomes much cleaner and easier to + understand. The questions of "what is the difference between pool-manager and + zone-manager" and "what's synchronous and asynchronous" become less muddy, + or disappear altogether. +- Operationally, this opens the door for simpler scenarios for small + deployments where a customer need only scale a couple components (even on + the same machine) to get more performance. You could even go so far as to + deploy app nodes that only shared datastore (db, cache) and had their own + designate components and queues. + + +Architectural Changes +--------------------- + +These are the services that would remain present: + +- ``designate-api`` - To receive JSON and parse it for Designate +- ``designate-central`` - To do validation/storage of zone/record data and + send CRUD tasks to ``designate-worker``. +- ``designate-mdns`` - To **only** perfrom AXFRs from Designate's database +- ``designate-worker`` - Any and all tasks that Designate needs to + produce state on nameservers +- ``designate-producer`` - To run periodic/timed jobs, and produce work + for ``designate-worker`` that is out the normal path of API operations. + For example: Periodic recovery. + +Other necessary components: + +- Queue - Usually RabbitMQ +- Database - Usually MySQL +- Cache - (encouraged, although not necessary) Memcached, MySQL, Redis +- A ``tooz`` backend (Zookeeper, Memcahed, Redis) + +Services/components that are no longer required: + +- ``designate-pool-manager`` +- ``designate-zone-manager`` + + +Designate Worker +---------------- + +The scope of ``designate-worker``'s duties are essentially any and all tasks +that Designate needs to take action to perform. For example: + +- Create, Update, and Delete zones on pool targets via backend plugins +- Poll that the change is live +- Update a cache with the serial number for a zone/target +- Emit zone-exists events for billing +- Flatten Alias Records +- Clean up deleted zones +- Importing/Exporting zones +- Many more + +The service essentially exposes a vast RPCAPI that contains ``tasks``. + +An important difference to Designate's current model is that all +of these tasks do not call back. They are all fire-and-forget tasks +that will be shoved on a queue and await worker action. + +``tasks`` are essentially functions, that given relatively simple input, make +the desired income happen on either nameservers, or the Designate database. + +Cache +~~~~~ + +The cache performs a similar function to the current pool manager cache +now. + +It will store state for each different type of task that a worker can use +to decide if it needs to continue with a ``task`` received from the queue, or +simply drop it and move on to the next task. + +This varies by task, some are relatively simple, knowing whether to perform +a zone update to a certain serial number is knowable by seeing the serial +number of a zone on each target in a pool. For DNSSEC zone signing, a key +would probably be placed to indicate that a certain worker was working on +resigning a zone, as it's a more long-running process. + +In the absence of such a cache, each worker will act naive and try to complete +each task it receives. + +Tasks +~~~~~ + +Each task will be idempotent, to the degree that it is possible. + +As mentioned in the ``Cache`` section, to a certain degree, tasks could be +able to know if they need to complete work based on information in the cache. + +But they should also make an effort to not duplicate work, for instance, +if it's trying to delete a zone that's already gone, it should interpret +the zone being gone as a sign that the delete is successful and move on. + +On the whole these tasks would simply be lifted from where they currently +exist in the code, and wouldn't change all that much. + +A slight change might be that during the course of the task, we may recheck +that the work that is being undertaken still needs to be done. + +As an example: +An API customer creates many recordsets very quickly. The work being dispatched +to ``designate-worker`` processes would go to a lot of different places, and one of +the first updates to actually reach a nameserver might contain all the changes +necessary to bring the zone up-to-date. The other tasks being worked should +check before they send their NOTIFY that the state is still behind, and check +again after they've sent their NOTIFY, but before they've began polling, so +that they can cut down on unnecessary work for themselves, and the nameservers. + +You could get even smarter about the markers that you drop in a cache for these +tasks. For example, on a zone update, you could drop a key in the cache of the +nature ``zoneupdate-foo.com.``, and other if other zoneupdate tasks for the same +zone see that key, they could know to throw away their job and move on. + +designate-mdns changes +~~~~~~~~~~~~~~~~~~~~~~ + +The partioning of certain elements that Designate had previously disappears. +The worker service will send DNS queries, it will do cpu-bound tasks, but it +will be one place to scale. It should be possible to have an extremely robust +Designate architecture by simply scaling these workers. + +``designate-mdns`` will have it's entire RPCAPI transferred to +``designate-worker``. This will vastly simplify the the amount of work it needs +to do while it sits in the critical path of providing zone transfers to +nameservers Designate manages. + +As a side-note, this would make this service much easier to optimize, or even +rewrite in a faster programming language. + +Designate Producer +------------------ + +``designate-producer`` is the place where jobs that produce tasks that +are outside of the normal path of API operations and operate on +some kind of timer live. + +The **key** difference to the ``zone-manager`` service, is that this service +simply generates work to be done, rather than actually doing the work. +``designate-producer`` simply decides what needs to be done, and sends RPC +messages on the queue to ``designate-worker`` to actually perform the work. + +As we've grown Designate, we've seen the need +for this grow vastly, and even more so in the future. + +- Deleted zone purging +- Refreshing Secondary Zones +- Emitting zone exists tasks and other billing events +- DNSSEC signing of zones +- Alias record flattening + +We could move the ``periodic_sync`` and ``periodic_recovery`` tasks from the +Pool Manager to this service. + +The ``periodic_sync`` and ``periodic_recovery`` tasks in the Pool Manager have +been a constant struggle to maintain and get right. This is due to a lot of +factors. + +Making the generation of ``tasks`` by periodic processes the job of only one +Designate component simplifies the architecture, and allows to solve the +problems it presents one time, one way, and generally do one thing well. + +Timers +~~~~~~ + +This service would essentially be a group of timers that wake up on a cadence +and create work to be put on the queue for ``designate-worker`` processes to +pick up. + +The overhead is relatively low here, as we're not actually doing the work, but +more just scheduling the work to be done. This way we can focus on the +unexpectedly difficult problem of dividing up the production of work that these +processes will put on the queue. + +Dividing work +````````````` + +To explain more clearly, the biggest problem we have in this service is making +it fault-tolerant, but not duplicating work for ``designate-worker`` processes +to do. This was solved before by ``tooz`` using the zone shards in the Designate +database in ``designate-zone-manager`` and it seems to work well. + +``designate-worker`` processes, as described above, will do a certain amount of +optimization so that they don't duplicate work. But if we generate too much +cruft, those processes will be bogged down just by the task of seeing if they +need to do work. So we should work to minimize the amount of duplicate work we +produce. + +Queue Priority +-------------- + +One potential complication of this implementation is that, as the number of +timers and tasks that are out of Designate's critical path of implementation +grow, they may get in the way of ``designate-worker`` processes doing the tasks +that are most important, namely CRUD of zones and records. + +We propose having queues/exchanges for each type of task, this would be an +optimal way to monitor the health of different types of tasks, and isolate the +sometimes long-running tasks that periodic timers will produce from the +relatively quicker, and more important CRUD operations. The algorithm for +choosing tasks from the various options could be customized by a particular +process if desired. But a good general default would be to handle CRUD +operations from ``designate-central`` first. Or use a weighted random +choice algorithm, with the critical-path CRUD operations having higher weights. + + +Work Items +---------- + +- Stand up a ``designate-worker`` service +- Migrate CRUD zone operations to ``designate-worker``, reworking the cache + implementation. +- Stand up ``designate-producer`` service +- Migrate Pool Manager periodic tasks to ``designate-producer``, with small + modifications to ensure they simply generate work for ``designate-worker`` +- Move ``designate-mdns``' NOTIFYing and polling to ``designate-worker`` +- Fix up the ``update_status`` logic in ``designate-central`` +- Migrate all tasks from ``zone-manager`` to a split of ``designate-worker`` + and ``designate-producer``, where ``producer`` creates the work on the queue + and ``worker`` executes it. Ensuring scalable logic for distributed work + production using cache or some other method in ``designate-producer`` +- Deprecate ``pool-manager`` and ``zone-manager`` +- Profit!!! + + +Upgrade Implications +-------------------- + +Upgrading to the next release with this change would introduce some operational +changes. Mostly around the services that need to be deployed. The deployment +need not be a cutover, deploying Newton Designate will work with or without the +worker. This is because of a variety of compatibility measures taken: + +- ``designate-central`` will have a configurable "zone api" that it can swap + between ``designate-pool-manager`` and ``designate-worker``. If the worker + process is enabled, central can send c/u/d zones events to the the worker + instead of the pool manager. +- ``designate-worker``'s ability to send NOTIFYs and Poll DNS servers can + replace a portion of ``designate-mdns``' responsibilities. For certain + DNS servers, it's theorized that they won't behave well if a NOTIFY comes + from a different server than it's master that it zone transfers from. + For this reason, ``designate-worker``'s ability to send NOTIFYs is a + configurable element. Since the worker calls into a backend plugin to update + zones, the NOTIFY via mdns logic in those backends can remain, and if the + operator so chooses, the NOTIFY task in the worker can no-op. This works both + ways. An operator can also choose to have the MiniDNS notify calls noop, and + allow them to be completed by the worker process. +- For those who choose to firewall all DNS traffic between Designate and DNS + servers, it will be safest to deploy ``designate-worker`` processes in + close proximity to ``designate-mdns`` processes. So that the DNS polling that + ``designate-worker`` does can be completed, where ``designate-mdns`` used + to do it. +- As periodic-type processes are migrated to ``designate-producer`` and + ``designate-worker``, they can be marked as "worker tasks" in + ``designate-zone-manager``, that can be turned off behind a configuration + flag. + +The process for upgrading to the worker model code, after deploying Newton +could look something like this: + +1. (To account for firewalling dns traffic) Start ``designate-worker`` + processes operating on the same IPs as ``designate-mdns`` processes via + proximity or proxy. The default configuration will still allow NOTIFYs + and DNS polling to occur via ``designate-mdns``, and all other operations + to work in ``designate-pool-manager``. No traffic will reach the worker. +2. Toggle configuration values ``designate-worker::enabled`` and + ``designate-worker::notify`` and restart ``designate-worker``. +3. Restart ``designate-central`` and ``designate-mdns`` processes so that mdns + NOTIFY calls no-op, and central starts to use the worker instead of + ``designate-pool-manager``. +4. Toggle the ``designate-zone-manager::worker-tasks`` config flag and restart + ``designate-zone-manager`` so that it hands off periodic tasks to the + producer/worker. +5. Start the ``designate-producer`` process so that the worker starts doing + recovery and other periodic tasks. +6. Stop the ``designate-pool-manger`` processes, and if all processes are + migrated out of ``designate-zone-manager``, that as well. + + +Milestones +---------- + +Target Milestone for completion: + Newton + +Author(s) +--------- + +Tim Simmons https://launchpad.net/~timsim + +Paul Glass https://launchpad.net/~pnglass + +Eric Larson https://launchpad.net/~eric-larson diff --git a/specs/template.rst b/specs/template.rst index 2de630d..994ebb6 100644 --- a/specs/template.rst +++ b/specs/template.rst @@ -169,3 +169,9 @@ Dependencies - Does this feature require any new library dependencies or code otherwise not included in OpenStack? Or does it depend on a specific version of library? + +Upgrade Implications +==================== + +Does the spec introduce a change for those running the current, or an older +version of Designate? If so, describe the change(s).