diff --git a/doc/source/index.rst b/doc/source/index.rst index d18f326..5f229b7 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -28,6 +28,14 @@ Liberty approved specs: specs/liberty/* +Mitaka approved specs: + +.. toctree:: + :glob: + :maxdepth: 1 + + specs/mitaka/* + ================== Indices and tables diff --git a/specs/liberty/deleted-domains-purging.rst b/specs/liberty/deleted-domains-purging.rst index 340963b..ab9ecf0 100644 --- a/specs/liberty/deleted-domains-purging.rst +++ b/specs/liberty/deleted-domains-purging.rst @@ -14,7 +14,7 @@ database. Problem description =================== -Once deleted, domains are not removed immediatly from the database, mostly for +Once deleted, domains are not removed immediately from the database, mostly for billing reasons. They are flagged as deleted in the "deleted" database column and the "deleted_at" column is populated with a timestamp. @@ -45,7 +45,7 @@ plugin. The task will select a group of domains and send a RPC call to Central. Central will run a query against the database to purge any deleted domain if needed and log the number of purged domains. -Configuration paramenters: +Configuration parameters: Purging run frequency. Default: hourly. Users might want to run it frequently to minimize the cycle duration. @@ -122,7 +122,7 @@ Milestones ---------- Target Milestone for completion: - Libery-3 + Liberty-3 Work Items ---------- diff --git a/specs/mitaka/notify-throttling.rst b/specs/mitaka/notify-throttling.rst new file mode 100644 index 0000000..e4b1ec5 --- /dev/null +++ b/specs/mitaka/notify-throttling.rst @@ -0,0 +1,129 @@ +.. + +This work is licensed under a Creative Commons Attribution 3.0 Unported License. +http://creativecommons.org/licenses/by/3.0/legalcode + +============================= +Bulk zone update throttling +============================= + +https://blueprints.launchpad.net/designate/+spec/notify-throttling + +Implement a mechanism to throttle the delivery of NOTIFY transactions when +a large number of zones are updated at the same time. + + +Problem description +=================== + +If a large number of zones are updated in a short time this will generate a +consequently large amount NOTIFY transaction to be sent to the nameservers +with no delay leading to a burst of incoming AXFR requests. +This might impact on bottlenecks in MiniDNS and the storage layer in terms of +CPU, I/O or network bandwidth. + +A typical trigger is the update of an NS record in a Pool containing many zones. + +The autonomous refreshing of zones performed by resolvers can also trigger a +similar burst of AXFR. This can happen on recently started resolvers, where the +refresh timers can share the same values across many zones. + +Related to bug https://bugs.launchpad.net/designate/+bug/1498462 + +Proposed change +=============== + +Implement a mechanism for enqueuing and delayed delivery of notify transactions +at a configurable throttle speed. + +Also, implement staggering of zone refresh requests by randomizing the refresh +interval. + +API Changes +----------- + +Expose the count of zones flagged for delayed notify in the Admin +API as "/reports/counts/zones_pending_notify". + +Central Changes +--------------- + +Implement support for a new database column "pending_notify" and set it to +True every time a Pool NS record is updated. + +Storage Changes +--------------- + +Add an new boolean database column "pending_notify" on Zones. +Implement a migration script to add the column to existing databases, +defaulting to False. In future, the column might default to True. + +Other Changes +------------- + +Implement a Task in Zone Manager to periodically fetch a set of zones that need +to receive a Notify starting with the oldest in term of last update time. +The task frequency and the maximum set size can be configured to throttle the +amount of outgoing Notify. +Zone Manager will reset the "pending_notify" flag once done. + +Alternatives +------------ + +N/A + +Implementation +============== + +The throttling queue is implemented as a new database column containing a +boolean flag. See Central Changes and Storage Changes. + +Also, new zones will be created with an uniformly random refresh time between a minimum and a maximum value. + + +Design considerations +--------------------- + +The throttling queue could be implemented outside of the database: +- No need to create an extra database column +- No increased database I/O + +We propose using the database for the following reasons: +- Zone Manager is the best candidate to handle the delayed Notify. Currently there are no ways for Central to send a list of Zones to Zone Manager other than through the database +- The queue can support delayed Notify for changes other than Pool NS record updates +- Ability to monitor the queue size and ETA to inform the user and for debugging +- A persistent queue can survive Zone Manager unhandled exceptions or restarts +- The increased database load is negligible compared to the existing traffic + +Risk analysis +------------- + +- Zone Manager fails to run the Notify delivery task. The nameservers will eventually refresh the zone anyways. Impact: slow update propagation. Mitigation: expose the notification queue length to the user through Admin API and by logging. +- A big notification queue takes a considerable time to be handled. Impact: potentially prevents more urgent changes to be delivered quickly. Mitigation: encourage users to configure the throttling parameters; Provide sensible default values. Implementing a concept of notification priority seems unnecessary. + +Assignee(s) +----------- + +Primary assignee: + Federico Ceratto https://launchpad.net/~federico-ceratto + +Milestones +---------- + +Target Milestone for completion: + Liberty-3 + +Work Items +---------- + +- Implement refresh time staggering +- Implement Notify throttling +- Add throttle parameters to configuration files +- Document throttling mechanism +- Write unit and functional tests +- Test throttling and staggering on devstack + +Dependencies +============ + +N/A