Add spec for active-active

This specification contains a high-level description of a proposed
architecture for handling an active-active topology within Octavia.

Moved Distributor to new document.
Captured the comments from Mitaka mid-cycle.

Updated active-active-topology per latest comments.

Major update to active-active-distributor per latest comments.

More updates per comments

Change-Id: Ifc2d618a979fd0eb822f2cba4b759ab6ade7793f
Co-Authored-By: Eran Raichstein <eranra@il.ibm.com>
Co-Authored-By: Dean Lorenz <dean@il.ibm.com>
Co-Authored-By: Stephen Balukoff <stephen@balukoff.com>
This commit is contained in:
Stephen Balukoff 2015-10-14 01:39:27 -07:00
parent de6bfc1629
commit f43edf77b8
2 changed files with 1465 additions and 0 deletions

View File

@ -0,0 +1,830 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
.. attention::
Please review the active-active topology blueprint first ("Active-Active,
N+1 Amphorae Setup",
https://review.openstack.org/#/c/234639/7/specs/version1/active-active-topology.rst).
=================================================
Distributor for Active-Active, N+1 Amphorae Setup
=================================================
https://blueprints.launchpad.net/octavia/+spec/active-active-topology
This blueprint describes how Octavia implements a *Distributor* to support the
*active-active* loadbalancer (LB) solution, as described in the blueprint
linked above. It presents the high-level Distributor design and suggests
high-level code changes to the current code base to realize this design.
In a nutshell, in an *active-active* topology, an *Amphora Cluster* of two
or more active Amphorae collectively provide the loadbalancing service.
It is designed as a 2-step loadbalancing process; first, a lightweight
*distribution* of VIP traffic over an Amphora Cluster; then, full-featured
loadbalancing of traffic over the back-end members. Since a single
loadbalancing service, which is addressable by a single VIP address, is
served by several Amphorae at the same time, there is a need to distribute
incoming requests among these Amphorae -- that is the role of the
*Distributor*.
This blueprint uses terminology defined in the Octavia glossary when available,
and defines new terms to describe new components and features as necessary.
.. _P2:
**Note:** Items marked with [P2]_ refer to lower priority features to be
designed / implemented only after initial release.
Problem description
===================
* Octavia shall implement a Distributor to support the active-active
topology.
* The operator should be able to select and configure the Distributor
(e.g., through an Octavia configuration file or [P2]_ through a flavor
framework).
* Octavia shall support a pluggable design for the Distributor, allowing
different implementations. In particular, the Distributor shall be
abstracted through a *driver*, similarly to the current support of
Amphora implementations.
* Octavia shall support different provisioning types for the Distributor;
including VM-based (the default, similar to current Amphorae),
[P2]_ container-based, and [P2]_ external (vendor-specific) hardware.
* The operator shall be able to configure the distribution policies,
including affinity and availability (see below for details).
Architecture
============
High-level Topology Description
-------------------------------
* The following diagram illustrates the Distributor's role in an active-active
topology:
::
Front-End Back-End
Internet Networks Networks
(world) (tenants) (tenants)
║ A B C A B C
┌──╨───┐floating IP ║ ║ ║ ┌────────┬──────────┬────┐ ║ ║ ║
│ ├─ to VIP ──►╢◄──────║───────║──┤f.e. IPs│ Amphorae │b.e.├►╜ ║ ║
│ │ LB A ║ ║ ║ └──┬─────┤ of │ IPs│ ║ ║
│ │ ║ ║ ║ │VIP A│ Tenant A ├────┘ ║ ║
│ GW │ ║ ║ ║ └─────┴──────────┘ ║ ║
│Router│floating IP ║ ║ ║ ┌────────┬──────────┬────┐ ║ ║
│ ├─ to VIP ───║──────►╟◄──────║──┤f.e. IPs│ Amphorae │b.e.├──►╜ ║
│ │ LB B ║ ║ ║ └──┬─────┤ of │ IPs│ ║
│ │ ║ ║ ║ │VIP B│ Tenant B ├────┘ ║
│ │ ║ ║ ║ └─────┴──────────┘ ║
│ │floating IP ║ ║ ║ ┌────────┬──────────┬────┐ ║
│ ├─ to VIP ───║───────║──────►╢◄─┤f.e. IPs│ Amphorae │b.e.├────►╜
└──────┘ LB C ║ ║ ║ └──┬─────┤ of │ IPs│
║ ║ ║ │VIP C│ Tenant C ├────┘
arp─►╢ arp─►╢ arp─►╢ └─────┴──────────┘
┌─┴─┐ ║┌─┴─┐ ║┌─┴─┐ ║
│VIP│┌►╜│VIP│┌►╜│VIP│┌►╜
├───┴┴┐ ├───┴┴┐ ├───┴┴┐
│IP A │ │IP B │ │IP C │
┌┴─────┴─┴─────┴─┴─────┴┐
│ │
│ Distributor │
│ (multi-tenant) │
└───────────────────────┘
* In the above diagram, several tenants (A, B, C, ...) share the
Distributor, yet the Amphorae, and the front- and back-end (tenant)
networks are not shared between tenants. (See also "Distributor Sharing"
below.) Note that in the initial code implementing the distributor, the
distributor will not be shared between tenants, until tests verifying the
security of a shared distributor can be implemented.
* The Distributor acts as a (one-legged) router, listening on each
load balancer's VIP and forwarding to one of its Amphorae.
* Each load balancer's VIP is advertised and answered by the Distributor.
An ``arp`` request for any of the VIP addresses is answered by the
Distributor, hence any traffic sent for each VIP is received by the
Distributor (and forwarded to an appropriate Amphora).
* ARP is disabled on all the Amphorae for the VIP interface.
* The Distributor distributes the traffic of each VIP to an Amphora in the
corresponding load balancer Cluster.
* An example of high-level data flow:
1. Internet clients access a tenant service through an externally visible
floating-IP (IPv4 or IPv6).
2. The GW router maps the floating IP into a loadbalancer's internal VIP on
the tenant's front-end network.
3. (1st packet to VIP only) the GW send an ``arp`` request on VIP
(tenant front-end) network. The Distributor answers the ``arp`` request
with its own MAC address on this network (all the Amphorae on the network
can serve the VIP, but do not answer the ``arp``).
4. The GW router forwards the client request to the Distributor.
5. The Distributor forwards the packet to one of the Amphorae on the
tenant's front-end network (distributed according some policy,
as described below), without changing the destination IP (i.e., still
using the VIP).
6. The Amphora accepts the packet and continues the flow on the tenant's
back-end network as for other Octavia loadbalancer topologies (non
active-active).
7. The outgoing response packets from the Amphora are forwarded directly
to the GW router (that is, it does not pass through the Distributor).
Affinity of Flows to Amphorae
-----------------------------
- Affinity is required to make sure related packets are forwarded to the
same Amphora. At minimum, since TCP connections are terminated at the
Amphora, all packets that belong to the same flow must be sent to the
same Amphora. Enhanced affinity levels can be used to make sure that flows
with similar attributes are always sent to the same Amphora; this may be
desired to achieve better performance (see discussion below).
- [P2]_ The Distributor shall support different modes of client-to-Amphora
affinity. The operator should be able to select and configure the desired
affinity level.
- Since the Distributor is L3 and the "heavy lifting" is expected to be
done by the Amphorae, this specification proposes implementing two
practical affinity alternatives. Other affinity alternatives may be
implemented at a later time.
*Source IP and source port*
In this mode, the Distributor must always send packets from the same
combination of Source IP and Source port to the same Amphora. Since
the Target IP and Target Port are fixed per Listener, this mode implies
that all packets from the same TCP flow are sent to the same Amphora.
This is the minimal affinity mode, as without it TCP connections will
break.
*Note*: related flows (e.g., parallel client calls from the same HTML
page) will typically be distributed to different Amphorae; however,
these should still be routed to the same back-end. This could be
guaranteed by using cookies and/or by synchronizing the stick-tables.
Also, the Amphorae in the Cluster could be configured to use the same
hashing parameters (avoid any random seed) to ensure all make similar
decisions.
*Source IP* (default)
In this mode, the Distributor must always send packets from the same
source IP to the same Amphora, regardless of port. This mode allows TLS
session reuse (e.g., through session ids), where an abbreviated
handshake can be used to improve latency and computation time.
The main disadvantage of sending all traffic from the same source IP to
the same Amphora is that it might lead to poor load distribution for
large workloads that have the same source IP (e.g., workload behind a
single nat or proxy).
**Note on TLS implications**:
In some (typical) TLS sessions, the additional load incurred for each new
session is significantly larger than the load incurred for each new
request or connection on the same session; namely, the total load on each
Amphora will be more affected by the number of different source IPs it
serves than by the number of connections. Moreover, since the total load
on the Cluster incurred by all the connections depends on the level of
session reuse, spreading a single source IP over multiple Amphorae
*increases* the overall load on the Cluster. Thus, a Distributor that
uniformly spreads traffic without affinity per source IP (e.g., uses
per-flow affinity only) might cause an increase in overall load on the
Cluster that is proportional to the number of Amphorae. For example, in a
scale-out scenario (where a new Amphora is spawned to share the total
load), moving some flows to the new Amphora might increase the overall
Cluster load, negating the benefit of scaling-out.
Session reuse helps with the certificate exchange phase. Improvements
in performance with the certificate exchange depend on the type of keys
used, and is greatest with RSA. Session reuse may be less important with
other schemes; shared TLS session tickets are another mechanism that may
circumvent the problem; also, upcoming versions of HA-Proxy may be able
to obviate this problem by synchronizing TLS state between Amphorae
(similar to stick-table protocol).
- Per the agreement at the Mitaka mid-cycle, the default affinity shall be
based on source-IP only and a consistent hashing function (see below)
shall be used to distribute flows in a predictable manner; however,
abstraction will be used to allow other implementations at a later time.
Forwarding with OVS and OpenFlow Rules
--------------------------------------
* The reference implementation of the Distributor shall use OVS for
forwarding and configure the Distributor through OpenFlow rules.
- OpenFlow rules can be implemented by a software switch (e.g., OVS) that
can run on a VM. Thus, can be created and managed by Octavia similarly
to creation and management of Amphora VMs.
- OpenFlow rules are supported by several HW switches, so the same
control plane can be used for both SW and HW implementations.
* Outline of Rules
- A ``group`` with the ``select`` method is used to distribute IP traffic
over multiple Amphorae. There is one ``bucket`` per Amphora -- adding
an Amphora adds a new ``bucket`` and deleting and Amphora removes the
corresponding ``bucket``.
- The ``select`` method supports (OpenFlow v1.5) hashed-based selection
of the ``bucket``. The hash can be set up to use different fields,
including by source IP only (default) and by source IP and source port.
- All buckets route traffic back on the in-port (i.e., no forwarding
between ports). This ensures that the same front-end network is used
(i.e., the Distributor does not route between front-end networks;
therefore, does not mix traffic of different tenants).
- The ``bucket`` actions do a re-write of the outgoing packets. It
supports re-write of the destination MAC to that of the specific
Amphora and re-write of the source MAC to that of the Distributor
interface (together these MAC re-writes provide L3 routing functionality).
*Note:* alternative re-write rules can be used to support other forwarding
mechanisms.
- OpenFlow rules are also used to answer ``arp`` requests on the VIP.
``arp`` requests for each VIP are captured, re-written as ``arp``
replies with the MAC address of the particular front-end interface and
sent back on the in-port. Again, there is no routing between interfaces.
* Handling Amphora failure
- Initial implementation will assume a fixed size for each cluster (no
elasticity). The hashing will be "consistent" by virtue of never
changing the number of ``buckets``. If the cluster size is changed on
the fly (there should not be an API to do so) then there are no
guarantees on shuffling.
- If an Amphora fails then remapping cannot be avoided -- all flows of
the failed Amphora must be remapped to a different one. Rather than
mapping these flows to other active Amphorae in the cluster, the reference
implementation will map all flows to the cluster's *standby* Amphora (i.e.
the "+1" Amphora in this "N+1" cluster). This ensures that the cluster
size does not change. The only change in the OpenFlow rules would be to
replace the MAC of the failed Amphora with that of the standby Amphora.
- This implementation is very similar to Active-Standby fail-over. There
will be a standby Amphora that can serve traffic in case of failure.
The differences from Active-Standby is that a single Amphora acts as a
standby for multiple ones; fail-over re-routing is handled through the
Distributor (rather than by VRRP); and a whole cluster of Amphorae is
active concurrently, to enable support of large workloads.
- Health Manager will trigger re-creation of a failed Amphora. Once the
Amphora is ready it becomes the new *standby* (no changes to OpenFlow
rules).
- [P2]_ Handle concurrent failure of more than a single Amphora
* Handling Distributor failover
- To handle the event of a Distributor failover caused by a catastrophic
failure of a Distributor, and in order to preserve the client to Amphora
affinity when the Distributor is replaced, the Amphora registration process
with the Distributor should preserve positional information. This should
ensure that when a new Distributor is created, Amphorae will be assigned to
the same buckets to which they were previously assigned.
- In the reference implementation, we propose making the Distributor API
return the complete list of Amphorae MAC addresses with positional
information each time an Amphora is registered or unregistered.
Proposed change
===============
**Note:** These are changes on top of the changes described in the
"Active-Active, N+1 Amphorae Setup" blueprint, (see
https://blueprints.launchpad.net/octavia/+spec/active-active-topology)
* Create flow for the creation of an Amphora cluster with N active Amphora
and one extra standby Amphora. Set-up the Amphora roles accordingly.
* Support the creation, connection, and configuration of the various
networks and interfaces as described in `high-level topology` diagram.
The Distributor shall have a separate interface for each loadbalancer and
shall not allow any routing between different ports. In particular, when
a loadbalancer is created the Distributor should:
- Attach the Distributor to the loadbalancer's front-end network by
adding a VIP port to the Distributor (the LB VIP Neutron port).
- Configure OpenFlow rules: create a group with the desired cluster size
and with the given Amphora MACs; create rules to answer ``arp``
requests for the VIP address.
**Notes:**
[P2]_ It is desirable that the Distributor be considered as a router by
Neutron (to handle port security, network forwarding without ``arp``
spoofing, etc.). This may require changes to Neutron and may also mean
that Octavia will be a privileged user of Neutron.
Distributor needs to support IPv6 NDP
[P2_] If the Distributor is implemented as a container then hot-plugging
a port for each VIP might not be possible.
If DVR is used then routing rules must be used to forward external
traffic to the Distributor rather than rely on ``arp``. In particular,
DVR messes-up ``noarp`` settings.
* Support Amphora failure recovery
- Modify the HM and failure recovery flows to add tasks to notify the ACM
when ACTIVE-ACTIVE topology is in use. If an active Amphora fails then
it needs to be decommissioned on the Distributor and replaced with
the standby.
- Failed Amphorae should be recreated as a standby (in the new
IN_CLUSTER_STANDBY role). The standby Amphora should also be monitored and
recovered on failure.
* Distributor driver and Distributor image
- The Distributor should be supported similarly to an amphora; namely, have
its own abstract driver.
- Distributor image (for reference implementation) should include OVS
with a recent version (>1.5) that supports hash-based bucket selection.
As is done for Amphorae, Distributor image should be installed with
public keys to allow secure configuration by the Octavia controller.
- Reference implementation shall spawn a new Distributor VM as needed. It
shall monitor its health and handle recovery using heartbeats sent to the
health monitor in a similar fashion to how this is done presently with
Amphorae. [P2]_ Spawn a new Distributor if the number VIPs exceeds a given
limit (to limit the number of Neutron ports attached to one Distributor).
[P2]_ Add configuration options and/or Operator API to allow operator to
request a dedicated Distributor for a VIP (or per tenant).
* Define a REST API for Distributor configuration (no SSH API).
See below for details.
* Create data-model for Distributor.
Alternatives
------------
TBD
Data model impact
-----------------
Add table ``distributor`` with the following columns:
* id ``(sa.String(36) , nullable=False)``
ID of Distributor instance.
* compute_id ``(sa.String(36), nullable=True)``
ID of compute node running the Distributor.
* lb_network_ip ``(sa.String(64), nullable=True)``
IP of Distributor on management network.
* status ``(sa.String(36), nullable=True)``
Provisioning status
* vip_port_ids (list of ``sa.String(36)``)
List of Neutron port IDs.
New VIFs may be plugged into the Distributor when a new LB is created. We
may need to store the Neutron port IDs in order to support
fail-over from one Distributor instance to another.
Add table ``distributor_health`` with the following columns:
* distributor_id ``(sa.String(36) , nullable=False)``
ID of Distributor instance.
* last_update ``(sa.DateTime, nullable=False)``
Last time distributor heartbeat was received by a health monitor.
* busy ``(sa.Boolean, nullable=False)``
Field indicating a create / delete or other action is being conducted on
the distributor instance (ie. to prevent a race condition when multiple
health managers are in use).
Add table ``amphora_registration`` with the following columns. This describes
which Amphorae are registered with which Distributors and in which order:
* lb_id ``(sa.String(36) , nullable=False)``
ID of load balancer.
* distributor_id ``(sa.String(36) , nullable=False)``
ID of Distributor instance.
* amphora_id ``(sa.String(36) , nullable=False)``
ID of Amphora instance.
* position ``(sa.Integer, nullable=True)``
Order in which Amphorae are registered with the Distributor.
REST API Impact
---------------
Distributor will be running its own rest API server. This API will be secured
using two-way SSL authentication, and use certificate rotation in the same
way this is done with Amphorae today.
Following API calls will be addressed.
1. Post VIP Plug
Adding a VIP network interface to the Distributor involves tasks which run
outside the Distributor itself. Once these are complete, the Distributor
must be configured to use the new interface. This is a REST call, similar
to what is currently done for Amphorae when connecting to a new member
network.
`lb_id`
An identifier for the particular loadbalancer/VIP. Used for subsequent
register/unregister of Amphorae.
`vip_address`
The IP of the VIP (for which IP to answer ``arp`` requests)
`subnet_cidr`
Netmask for the VIP's subnet.
`gateway`
Gateway outbound packets from the VIP ip address should use.
`mac_address`
MAC address of the new interface corresponding to the VIP.
`vrrp_ip`
In the case of HA Distributor, this contains the IP address that will
be used in setting up the allowed address pairs relationship. (See
Amphora VIP plugging under the ACTIVE-STANDBY topology for an example
of how this is used.)
`host_routes`
List of routes that should be added when the VIP is plugged.
`alg_extras`
Extra arguments related to the algorithm that will be used to distribute
requests to Amphorae part of this load balancer configuration. This
consists of an algorithm name and affinity type. In the initial release
of ACTIVE-ACTIVE, the only valid algorithm will be *hash*, and the
affinity type may be ``Source_IP`` or [P2]_ ``Source_IP_AND_port``.
2. Pre VIP unplug
Removing a VIP network interface will involve several tasks on the
Distributor to gracefully roll-back OVS configuration and other details
that were set-up when the VIP was plugged in.
`lb_id`
ID of the VIP's loadbalancer that will be unplugged.
3. Register Amphorae
This adds Amphorae to the configuration for a given load balancer. The
Distributor should respond with a new list of all Amphorae registered with
the Distributor with positional information.
`lb_id`
ID of the loadbalancer with which the Amphora will be registered
`amphorae`
List of Amphorae MAC addresses and (optional) position argument in which
they should be registered.
4. Unregister Amphorae
This removes Amphorae from the configuration for a given load balancer. The
Distributor should respond with a new list of all Amphorae registered with
the Distributor with positional information.
`lb_id`
ID of the loadbalancer with which the Amphora will be registered
`amphorae`
List of Amphorae MAC addresses that should be unregistered with the
Distributor.
Security impact
---------------
The Distributor is designed to be multi-tenant by default. (Note that the first
reference implementation will not be multi-tenant until tests can be developed
to verify the security of a multi-tenant reference distributor.) Although each
tenant has its own front-end network, the Distributor is connected to all,
which might allow leaks between these networks. The rationale is two fold:
First, the Distributor should be considered as a trusted infrastructure
component. Second, all traffic is external traffic before it reaches the
Amphora. Note that the GW router has exactly the same attributes; in other
words, logically, we can consider the Distributor to be an extension to the GW
(or even use the GW HW to implement the Distributor).
This approach might not be considered secure enough for some cases, such as, if
LBaaS is used for internal tier-to-tier communication inside a tenant network.
Some tenants may want their loadbalancer's VIP to remain private and their
front-end network to be isolated. In these cases, in order to accomplish
active-active for this tenant we would need separate dedicated Distributor
instance(s).
Notifications impact
--------------------
Other end user impact
---------------------
Performance Impact
------------------
Other deployer impact
---------------------
Developer impact
----------------
Implementation
==============
Assignee(s)
-----------
Work Items
----------
Dependencies
============
Testing
=======
* Unit tests with tox.
* Function tests with tox.
Documentation Impact
====================
Further Discussion
==================
.. Note::
This section captures some background, ideas, concerns, and remarks that
were raised by various people. Some of the items here can be considered for
future/alternative design and some will hopefully make their way into, yet
to be written, related blueprints (e.g., auto-scaled topology).
[P2]_ Handling changes in Cluster size (manual or auto-scaled)
----------------------------------------------------------------
- The Distributor shall support different mechanism for preserving affinity
of flows to Amphorae following a *change in the size* of the Amphorae
Cluster.
- The goal is to minimize shuffling of client-to-Amphora mapping during
cluster size changes:
* When an Amphora is removed from the Cluster (e.g., due to failure or
scale-down action), all its flows are broken; however, flows to other
Amphorae should not be affected. Also, if a drain method is used to empty
the Amphora of client flows (in the case of a graceful removal), this
should prevent disruption.
* When an Amphora is *added* to the Cluster (e.g., recovery of a failed
Amphora), some new flows should be distributed to the new Amphora;
however, most flows should still go to the same Amphora they were
distributed to before the new Amphora was added. For example, if the
affinity of flows to Amphorae is per Source IP and a new Amphora was just
added then the Distributor should forward packets from this IP only one
of only two Amphorae: either the same Amphora as before or the
Amphora that was added.
Using a simple hash to maintain affinity does not meet this goal.
For example, suppose we maintain affinity (for a fixed cluster size) using
a hash (for randomizing key distribution) as
`chosen_amphora_id = hash(sourceIP # port) mod number_of_amphorae`.
When a new Amphora is added or remove the number of Amphorae changes;
thus, a different Amphora will be chosen for most flows.
- Below are the couple of ways to tackle this shuffling problem.
*Consistent Hashing*
Consistent hashing is a hashing mechanism (regardless if key is based on
IP or IP/port) that preserves most hash mappings during changes in the
size of the Amphorae Cluster. In particular, for a cluster with N
Amphorae that grows to N+1 Amphorae, a consistent hashing function
ensures that, with high probability, only 1/N of inputs flows will be
re-hashed (more precisely, K/N keys will be rehashed). Note that, even
with consistent hashing, some flows will be remapped and there is only
a statistical bound on the number of remapped flows.
The "classic" consistent hashing algorithm maps both server IDs and
keys to hash values and selects for each key the server with the
closest hash value to the key hash value. Lookup generally requires
O(log N) to search for the "closest" server. Achieving good
distribution requires multiple hashes per server (~10s) -- although
these can be pre-computed there is an ~10s*N memory footprint. Other
algorithms (e.g., MSFT's Magleb) have better performance, but provide
weaker guarantees.
There are several consistent hashing libraries available. None are
supported in OVS.
* Ketama https://github.com/RJ/ketama
* Openstack swift http://docs.openstack.org/developer/swift/ring.html
* Amazon dynamo
http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
We should also strongly consider making any consistent hashing algorithm
we develop available to all OpenStack components by making it part of an
Oslo library.
*Rendezvous hashing*
This method provides similar properties to Consistent Hashing (i.e., a
hashing function that remaps only 1/N of keys when a cluster with N
Amphorae grows to N+1 Amphorae.
For each server ID, the algorithm concatenates the key and server ID and
computes a hash. The server with the largest hash is chosen. This
approach requires O(N) for each lookup, but is much simpler to
implement and has virtually no memory footprint. Through search-tree
encoding of the server IDs it is possible to achieve O(log N) lookup,
but implementation is harder and distribution is not as good. Another
feature is that more than one server can be chosen (e.g., two largest
values) to handle larger loads -- not directly useful for the
Distributor use case.
*Hybrid, Permutation-based approach*
This is an alternative implementation of consistent hashing that may be
simpler to implement. Keys are hashed to a set of buckets; each bucket
is pre-mapped to a random permutation of the server IDs. Lookup is by
computing a hash of the key to obtain a bucket and then going over the
permutation selecting the first server. If a server is marked as "down"
the next server in the list is chosen. This approach is similar to
Rendezvous hashing if each key is directly pre-mapped to a random
permutation (and like it allows more than one server selection). If the
number of failed servers is small then lookup is about O(1); memory is
O(N * #buckets), where the granularity of distribution is improved by
increasing the number of buckets. The permutation-based approach is
useful to support clusters of fixed size that need to handle a few
nodes going down and then coming back up. If there is an assumption on
the number of failures then memory can be reduced to O( max_failures *
#buckets). This approach seems to suit the Distributor Active-Active
use-case for non-elastic workloads.
- Flow tracking is required, even with the above hash functions, to handle
the (relatively few) remapped flows. If an existing flow is remapped, its
TCP connection would break. This is acceptable when an Amphora goes down
and it flows are mapped to a new one. On the other hand, it may be
unacceptable when an Amphora is added to the cluster and 1/N of existing
flows are remapped. The Distributor may support different modes, as follows.
*None / Stateless*
In this mode, the Distributor applies its most recent forwarding rules,
regardless of previous state. Some existing flows might be remapped to a
different Amphora and would be broken. The client would have to recover
and establish a connection with the new Amphora (it would still be
mapped to the same back-end, if possible). Combined with consistent (or
similar) hashing, this may be good enough for many web applications
that are built for failure anyway, and can restore their state upon
reconnect.
*Full flow Tracking*
In this mode, the Distributor tracks existing flows to provide full
affinity, i.e., only new flows can be remapped to different Amphorae.
The Linux connection tracking may be used (e.g., through IPTables or
through OpenFlow); however, this might not scale well. Alternatively,
the Distributor can use an independent mechanism similar to HA-Proxy
sticky-tables to track the flows. Note that the Distributor only needs to
track the mapping per source IP and source port (unlike Linux connection
tracking which follows the TCP state and related connections).
*Use Ryu*
Ryu is a well supported and tested python binding for issuing OpenFlow
commands. Especially since Neutron recently moved to using this for
many of the things it does, using this in the Distributor might make
sense for Octavia as well.
Forwarding Data-path Implementation Alternatives
------------------------------------------------
The current design uses L2 forwarding based only on L3 parameters and uses
Direct Return routing (one-legged). The rational behind this approach is
to keep the Distributor as light as possible and have the Amphorae do the
bulk of the work. This allows one (or a few) Distributor instance(s) to
serve all traffic even for very large workloads. Other approaches are
possible.
2-legged Router
^^^^^^^^^^^^^^^
- Distributor acts as router, being in-path on both directions.
- New network between Distributor and Amphorae -- Only Distributor on VIP
subnet.
- No need to use MAC forwarding -- use routing rules
LVS
^^^
Use LVS for Distributor.
DNS
^^^
Use DNS for the Distributor.
- Use DNS to map to particular Amphorae. Distribution will be of
domain name rather than VIP.
- No problem with per-flow affinity, as client will use same IP for entire
TCP connection.
- Need a different public IP for each Amphora (no VIP)
Pure SDN
^^^^^^^^
- Implement the OpenFlow rules directly in the network, without a
Distributor instance.
- If the network infrastructure supports this then the Distributor can
become more robust and very lightweight, making it practical to have a
dedicated Distributor per VIP (only the rules will be dedicated as the
network and SDN controller are shared resources)
Distributor Sharing
-------------------
- The initial implementation of the Distributor will not be shared between
tenants until tests can be written to verify the security of this solution.
- The implementation should support different Distributor sharing and
cardinality configurations. This includes single-shared Distributor,
multiple-dedicated Distributors, and multiple-shared Distributors. In
particular, an abstraction layer should be used and the data-model should
include an association between the load balancer and Distributor.
- A shared Distributor uses the least amount of resources, but may not meet
isolation requirements (performance and/or security) or might become a
bottleneck.
Distributor High-Availability
-----------------------------
- The Distributor should be highly-available (as this is one of the
motivations for the active-active topology). Once the initial active-active
functionality is delivered, developing a highly available distributor should
take a high priority.
- A mechanism similar to the VRRP used on ACTIVE-STANDBY topology Amphorae
can be used.
- Since the Distributor is stateless (for fixed cluster sizes and if no
connection tracking is used) it is possible to set up an active-active
configuration and advertise more than one Distributor (e.g, for ECMP).
- As a first step, the initial implementation will use a single Distributor
instance (i.e., will not be highly-available). Health Manager will monitor
the Distributor health and initiate recovery if needed.
- The implementation should support plugging-in a hardware-based
implementation of the Distributor that may have its own high-availability
support.
- In order to preserve client to Amphora affinity in the case of a failover,
a VRRP-like HA Distributor has several options. We could potentially push
Amphora registrations to the standby Distributor with the position
arguments specified, in order to guarantee the active and standby Distributor
always have the same configuration. Or, we could invent and utilize a
synchronization protocol between the active and standby Distributors. This
will be explored and decided when an HA Distributor specification is
written and approved.
References
==========
.. [1] https://blueprints.launchpad.net/octavia/+spec/base-image
.. [2] https://blueprints.launchpad.net/octavia/+spec/controller-worker
.. [3] https://blueprints.launchpad.net/octavia/+spec/amphora-driver-interface
.. [4] https://blueprints.launchpad.net/octavia/+spec/controller
.. [5] https://blueprints.launchpad.net/octavia/+spec/operator-api
.. [6] doc/main/api/haproxy-amphora-api.rst
.. [7] https://blueprints.launchpad.net/octavia/+spec/active-active-topology

View File

@ -0,0 +1,635 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=================================
Active-Active, N+1 Amphorae Setup
=================================
https://blueprints.launchpad.net/octavia/+spec/active-active-topology
This blueprint describes how Octavia implements an *active-active*
loadbalancer (LB) solution that is highly-available through redundant
Amphorae. It presents the high-level service topology and suggests
high-level code changes to the current code base to realize this scenario.
In a nutshell, an *Amphora Cluster* of two or more active Amphorae
collectively provide the loadbalancing service.
The Amphora Cluster shall be managed by an *Amphora Cluster Manager* (ACM).
The ACM shall provide an abstraction that allows different types of
active-active features (e.g., failure recovery, elasticity, etc.). The
initial implementation shall not rely on external services, but the
abstraction shall allow for interaction with external ACMs (to be developed
later).
This blueprint uses terminology defined in Octavia glossary when available,
and defines new terms to describe new components and features as necessary.
.. _P2:
**Note:** Items marked with [P2]_ refer to lower priority features to be
designed / implemented only after initial release.
Problem description
===================
A tenant should be able to start a highly-available, loadbalancer for the
tenant's backend services as follows:
* The operator should be able to configure an active-active topology
through an Octavia configuration file or [P2]_ through a Neutron flavor,
which the loadbalancer shall support. Octavia shall support active-active
topologies in addition to the topologies that it currently supports.
* In an active-active topology, a cluster of two or more amphorae shall
host a replicated configuration of the load-balancing services. Octavia
will manage this *Amphora Cluster* as a highly-available service using a
pool of active resources.
* The Amphora Cluster shall provide the load-balancing services and support
the configurations that are supported by a single Amphora topology,
including L7 load-balancing, SSL termination, etc.
* The active-active topology shall support various Amphora types and
implementations; including, virtual machines, [P2]_ containers, and
bare-metal servers.
* The operator should be able to configure the high-availability
requirements for the active-active load-balancing services. The operator
shall be able to specify the number of healthy Amphorae that must exist
in the load-balancing Amphora Cluster. If the number of healthy Amphorae
drops under the desired number, Octavia shall automatically and
seamlessly create and configure a new Amphora and add it to the Amphora
Cluster. [P2]_ The operator should be further able to define that the
Amphora Cluster shall be allocated on separate physical resources.
* An Amphora Cluster will collectively act to serve as a single logical
loadbalancer as defined in the Octavia glossary. Octavia will seamlessly
distribute incoming external traffic among the Amphorae in the Amphora
Cluster. To that end, Octavia will employ a *Distributor* component that
will forward external traffic towards the managed amphora instances.
Conceptually, the Distributor provides an extra level of load-balancing
for an active-active Octavia application, albeit a simplified one.
Octavia should be able to support several Distributor implementations
(e.g., software-based and hardware-based) and different affinity models
(at minimum, flow-affinity should be supported to allow TCP connectivity
between clients and Amphorae).
* The detailed design of the Distributor component will be described in a
separate document (see "Distributor for Active-Active, N+1 Amphorae
Setup", active-active-distributor.rst).
High-level Topology Description
-------------------------------
Single Tenant
~~~~~~~~~~~~~
* The following diagram illustrates the active-active topology:
::
Front-End Back-End
Internet Network Network
(world) (tenant) (tenant)
║ ║ ║
┌─╨────┐ floating IP ║ ║ ┌────────┐
│Router│ to LB VIP ║ ┌────┬─────────┬────┐ ║ │ Tenant │
│ GW ├──────────────►╫◄─┤ IP │ Amphora │ IP ├─►╫◄─┤Service │
└──────┘ ║ └┬───┤ (1) │back│ ║ │ (1) │
║ │VIP├─┬──────┬┴────┘ ║ └────────┘
║ └───┘ │ MGMT │ ║ ┌────────┐
╓◄───────────────────║─────────┤ IP │ ║ │ Tenant │
║ ┌─────────┬────┐ ║ └──────┘ ╟◄─┤Service │
║ │ Distri- │ IP├►╢ ║ │ (2) │
║ │ butor ├───┬┘ ║ ┌────┬─────────┬────┐ ║ └────────┘
║ └─┬──────┬┤VIP│ ╟◄─┤ IP │ Amphora │ IP ├─►╢ ┌────────┐
║ │ MGMT │└─┬─┘ ║ └┬───┤ (2) │back│ ║ │ Tenant │
╟◄────┤ IP │ └arp►╢ │VIP├─┬──────┬┴────┘ ╟◄─┤Service │
║ └──────┘ ║ └───┘ │ MGMT │ ║ │ (3) │
╟◄───────────────────║─────────┤ IP │ ║ └────────┘
║ ┌───────────────┐ ║ └──────┘ ║
║ │ Octavia LBaaS │ ║ ••• ║ •
╟◄─┤ Controller │ ║ ┌────┬─────────┬────┐ ║ •
║ └┬─────────────┬┘ ╙◄─┤ IP │ Amphora │ IP ├─►╢
║ │ Amphora │ └┬───┤ (k) │back│ ║ ┌────────┐
║ │ Cluster Mgr.│ │VIP├─┬──────┬┴────┘ ║ │ Tenant │
║ └─────────────┘ └───┘ │ MGMT │ ╙◄─┤Service │
╟◄─────────────────────────────┤ IP │ │ (m) │
║ └──────┘ └────────┘
Management Amphora Cluster Back-end Pool
Network 1..k 1..m
* An example of high-level data-flow:
1. Internet clients access a tenant service through an externally visible
floating-IP (IPv4 or IPv6).
2. If IPv4, a gateway router maps the floating IP into a loadbalancer's
internal VIP on the tenant's front-end network.
3. The (multi-tenant) Distributor receives incoming requests to the
loadbalancer's VIP. It acts as a one-legged direct return LB,
answering ``arp`` requests for the loadbalancer's VIP (see Distributor
spec.).
4. The Distributor distributes incoming connections over the tenant's
Amphora Cluster, by forwarding each new connection opened with a
loadbalancer's VIP to a front-end MAC address of an Amphora in the
Amphora Cluster (layer-2 forwarding). *Note*: the Distributor may
implement other forwarding schemes to support more complex routing
mechanisms, such as DVR (see Distributor spec.).
5. An Amphora receives the connection and accepts traffic addressed to
the loadbalancer's VIP. The front-end IPs of the Amphorae are
allocated on the tenant's front-end network. Each Amphora accepts VIP
traffic, but does not answer ``arp`` request for the VIP address.
6. The Amphora load-balances the incoming connections to the back-end
pool of tenant servers, by forwarding each external request to a
member on the tenant network. The Amphora also performs SSL
termination if configured.
7. Outgoing traffic traverses from the back-end pool members, through
the Amphora and directly to the gateway (i.e., not through the
Distributor).
Multi-tenant Support
~~~~~~~~~~~~~~~~~~~~
* The following diagram illustrates the active-active topology with
multiple tenants:
::
Front-End Back-End
Internet Networks Networks
(world) (tenant) (tenant)
║ B A A
║ floating IP ║ ║ ║ ┌────────┐
┌─╨────┐ to LB VIP A ║ ║ ┌────┬─────────┬────┐ ║ │Tenant A│
│Router├───────────────║─►╫◄─┤A IP│ Amphora │A IP├─►╫◄─┤Service │
│ GW ├──────────────►╢ ║ └┬───┤ (1) │back│ ║ │ (1) │
└──────┘ floating IP ║ ║ │VIP├─┬──────┬┴────┘ ║ └────────┘
to LB VIP B ║ ║ └───┘ │ MGMT │ ║ ┌────────┐
╓◄───────────────────║──║─────────┤ IP │ ║ │Tenant A│
║ ║ ║ └──────┘ ╟◄─┤Service │
M B A ┌────┬─────────┬────┐ ║ │ (2) │
║ ║ ╟◄─┤A IP│ Amphora │A IP├─►╢ └────────┘
║ ║ ║ └┬───┤ (2) │back│ ║ ┌────────┐
║ ║ ║ │VIP├─┬──────┬┴────┘ ║ │Tenant A│
║ ║ ║ └───┘ │ MGMT │ ╟◄─┤Service │
╟◄───────────────────║──║─────────┤ IP │ ║ │ (3) │
║ ║ ║ └──────┘ ║ └────────┘
║ B A ••• B •
║ ┌─────────┬────┐ ║ ║ ┌────┬─────────┬────┐ ║ •
║ │ │IP A├─╢─►╫◄─┤A IP│ Amphora │A IP├─►╢ ┌────────┐
║ │ ├───┬┘ ║ ║ └┬───┤ (k) │back│ ║ │Tenant A│
║ │ Distri- │VIP├─arp►╜ │VIP├─┬──────┬┴────┘ ╙◄─┤Service │
║ │ butor ├───┘ ║ └───┘ │ MGMT │ │ (m) │
╟◄─ │ │ ─────║────────────┤ IP │ └────────┘
║ │ ├────┐ ║ └──────┘
║ │ │IP B├►╢ tenant A
║ │ ├───┬┘ ║ = = = = = = = = = = = = = = = = = = = = =
║ │ │VIP│ ║ ┌────┬─────────┬────┐ B tenant B
║ └─┬──────┬┴─┬─┘ ╟◄────┤B IP│ Amphora │B IP├─►╢ ┌────────┐
║ │ MGMT │ └arp►╢ └┬───┤ (1) │back│ ║ │Tenant B│
╟◄────┤ IP │ ║ │VIP├─┬──────┬┴────┘ ╟◄─┤Service │
║ └──────┘ ║ └───┘ │ MGMT │ ║ │ (1) │
╟◄───────────────────║────────────┤ IP │ ║ └────────┘
║ ┌───────────────┐ ║ └──────┘ ║
M │ Octavia LBaaS │ B ••• B •
╟◄─┤ Controller │ ║ ┌────┬─────────┬────┐ ║ •
║ └┬─────────────┬┘ ╙◄────┤B IP│ Amphora │B IP├─►╢
║ │ Amphora │ └┬───┤ (q) │back│ ║ ┌────────┐
║ │ Cluster Mgr.│ │VIP├─┬──────┬┴────┘ ║ │Tenant B│
║ └─────────────┘ └───┘ │ MGMT │ ╙◄─┤Service │
╟◄────────────────────────────────┤ IP │ │ (r) │
║ └──────┘ └────────┘
Management Amphora Clusters Back-end Pool
Network A(1..k), B(1..q) A(1..m),B(1..r)
* Both tenants A and B share the Distributor, but each has a different
front-end network. The Distributor listens on both loadbalancers' VIPs
and forwards to either A's or B's Amphorae.
* The Amphorae and the back-end (tenant) networks are not shared between
tenants.
Problem Details
---------------
* Octavia should support different Distributor implementations, similar
to its support for different Amphora types. The operator should be able
to configure different types of algorithms for the Distributor. All
algorithms should provide flow-affinity to allow TLS termination at the
amphora. See Distributor spec. for details.
* Octavia controller shall seamlessly configure any newly created Amphora
([P2]_ including peer state synchronization, such as sticky-tables, if
needed) and shall reconfigure the other solution components (e.g.,
Neutron) as needed. The controller shall further manage all Amphora
life-cycle events.
* Since it is impractical at scale for peer state synchronization to occur
between all Amphorae part of a single load balancer, Amphorae that are all
part of a single load balancer configuration need to be divided into smaller
peer groups (consisting of 2 or 3 Amphorae) with which they should
synchronize state information.
Proposed change
===============
Required changes
----------------
The active-active loadbalancers require the following high-level changes:
Amphora related changes
~~~~~~~~~~~~~~~~~~~~~~~
* Updated Amphora image to support active-active topology. The front-end
still has both a unique IP (to allow direct addressing on front-end
network) and a VIP; however, it should not answer ARP requests for the
VIP address (all Amphorae in a single Amphora Cluster concurrently serve
the same VIP). Amphorae should continue to have a management IP on the LB
Network so Octavia can configure them. Amphorae should also generally
support hot-plugging interfaces into back-end tenant networks as they do
in the current implementation. [P2]_ Finally, the Amphora configuration
may need to be changed to randomize the member list, in order to prevent
synchronized decisions by all Amphorae in the Amphora Cluster.
* Extend data model to support active-active Amphora. This is somewhat
similar to active-passive (VRRP) support. Each Amphora needs to store its
IP and port on it's front-end network (similar to ha_ip and ha_port_id
in the current model) and its role should indicate it is in a cluster.
The provisioning status should be interpreted as referring to an Amphora
only and not the load-balancing service. The status of the load balancer
should correspond to the number of ``ONLINE`` Amphorae in the Cluster.
If all Amphoae are ``ONLINE``, the load balancer is also ``ONLINE``. If a
small number of Amphorae are not ``ONLINE``, then the load balancer is
``DEGRADED``. If enough Amphorae are not ``ONLINE`` (past a threshold), then
the load balancer is ``DOWN``.
* Rework some of the controller worker flows to support creation and
deletion of Amphorae by the ACM in an asynchronous manner. The compute
node may be created/deleted independently of the corresponding Amphora
flow, triggered as events by the ACM logic (e.g., node update). The flows
do not need much change (beyond those implied by the changes in the data
model), since the post-creation/pre-deletion configuration of each
Amphora is unchanged. This is also similar to the failure recovery flow,
where a recovery flow is triggered asynchronously.
* Create a flow (or task) for the controller worker for (de-)registration
of Amphorae with Distributor. The Distributor has to be aware of the
current ``ONLINE`` Amphorae, to which it can forward traffic. [P2_] The
Distributor can do very basic monitoring of the Amphorae health (primarily
to make sure network connectivity between the Distributor and Amphorae is
working). Monitoring pool member health will remain the purview of the
pool health monitors.
* All the Amphorae in the Amphora Cluster shall replicate the same
listeners, pools, and TLS configuration, as they do now. We assume all
Amphorae in the Amphora Cluster can perform exactly the same
load-balancing decisions and can be treated as equivalent by the
Distributor (except for affinity considerations).
* Extend the Amphora (REST) API and/or *Plug VIP* task to allow disabling
of ``arp`` on the VIP.
* In order to prevent losing session_persistence data in the event of an
Amphora failure, the Amphorae will need to be configured to share
session_persistence data (via stick tables) with a subset of other
Amphorae that are part of the same load balancer configuration (ie. a
peer group).
Amphora Cluster Manager driver for the active-active topology (*new*)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Add an active-active topology to the topology types.
* Add a new driver to support creation/deletion of an Amphora Cluster via
an ACM. This will re-use existing controller-worker flows as much as
possible. The reference ACM will call the existing drivers to create
compute nodes for the Amphorae and configure them.
* The ACM shall orchestrate creation and deletion of Amphora instances to
meet the availability requirements. Amphora failover will utilize the
existing health monitor flows, with hooks to notify the ACM when
ACTIVE-ACTIVE topology is used. [P2]_ ACM shall handle graceful amphora
removal via draining (delay actual removal until existing connections are
terminated or some timeout has passed).
* Change the flow of LB creation. The ACM driver shall create an Amphora
Cluster instance for each new loadbalancer. It should maintain the
desired number of Amphorae in the Cluster and meet the
high-availability configuration given by the operator. *Note*: a base
functionality is already supported by the Health Manager; it may be
enough to support a fixed or dynamic cluster size. In any case, existing
flows to manage Amphora life cycle will be re-used in the reference ACM
driver.
* The ACM shall be responsible for providing health, performance, and
life-cycle management at the Cluster-level rather than at Amphora-level.
Maintaining the loadbalancer status (as described above) by some function
of the collective status of all Amphorae in the Cluster is one example.
Other examples include tracking configuration changes, providing Cluster
statistics, monitoring and maintaining compute nodes for the Cluster,
etc. The ACM abstraction would also support pluggable ACM implementations
that may provide more advance capabilities (e.g., elasticity, AZ aware
availability, etc.). The reference ACM driver will re-use existing
components and/or code which currently handle health, life-cycle, etc.
management for other load balancer topologies.
* New data model for an Amphora Cluster which has a one-to-one mapping with
the loadbalancer. This defines the common properties of the Amphora
Cluster (e.g., id, min. size, desired size, etc.) and additional
properties for the specific implementation.
* Add configuration file options to support configuration of an
active-active Amphora Cluster. Add default configuration. [P2]_ Add
Operator API.
* Add or update documentation for new components added and new or changed
functionality.
* Communication between the ACM and Distributors should be secured using
two-way SSL certificate authentication much the same way this is accomplished
between other Octavia controller components and Amphorae today.
Network driver changes
~~~~~~~~~~~~~~~~~~~~~~
* Support the creation, connection, and configuration of the various
networks and interfaces as described in high-level topology' diagram.
* Adding a new loadbalancer requires attaching the Distributor to the
loadbalancer's front-end network, adding a VIP port to the Distributor,
and configuring the Distributor to answer ``arp`` requests for the VIP.
The Distributor shall have a separate interface for each loadbalancer and
shall not allow any routing between different ports; in particular,
Amphorae of different tenants must not be able to communicate with each
other. In the reference implementation, this will be accomplished by using
separate OVS bridges per load balancer.
* Adding a new Amphora requires attaching it to the front-end and back-end
networks (similar to current implementation), adding the VIP (but with
``arp`` disabled), and registering the Amphora with the Distributor. The
tenant's front-end and back-end networks must allow attachment of
dynamically created Amphorae by involving the ACM (e.g., when the health
monitor replaces a failed Amphora). ([P2]_ extend the LBaaS API to allow
specifying an address range for new Amphorae usage, e.g., a subnet pool).
Amphora health-monitoring support
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Modify Health Manager to manage the health for an Amphora Cluster through
the ACM; namely, forward Amphora health change events to the ACM, so it
can decide when the Amphora Cluster is considered to be in healthy state.
This should be done in addition to managing the health of each Amphora.
[P2]_ Monitor the Amphorae also on their front-end network (i.e., from
the Distributor).
Distributor support
~~~~~~~~~~~~~~~~~~~
* **Note:** as mentioned above, the detailed design of the Distributor
component is described in a separate document). Some design
considerations are highlighted below.
* The Distributor should be supported similarly to an Amphora; namely, have
its own abstract driver.
* For a reference implementation, add support for a Distributor image.
* Define a REST API for Distributor configuration (no SSH API). The API
shall support:
- Add and remove a VIP (loadbalancer) and specify distribution parameters
(e.g., affinity, algorithm, etc.).
- Registration and de-registration of Amphorae.
- Status
- [P2]_ Macro-level stats
* Spawn Distributors (if using on demand Distributor compute nodes) and/or
attach to existing ones as needed. Manage health and life-cycle of the
Distributor(s). Create, connect, and configure Distributor networks as
necessary.
* Create data model for the Distributor.
* Add Distributor driver and flows to (re-)configure the Distributor on
creation/destruction of a new loadbalancer (add/remove loadbalancer VIP)
and [P2]_ configure the distribution algorithm for the loadbalancer's
Amphora Cluster.
* Add flows to Octavia to (re-)configure the Distributor on adding/removing
Amphorae from the Amphora Cluster.
Packaging
~~~~~~~~~
* Extend Octavia installation scripts to create an image for the Distributor.
Alternatives
------------
* Use external services to manage the cluster directly.
This utilizes functionality that already exists in OpenStack (e.g.,
like Heat and Ceilometer) rather than replicating it. This approach
would also benefit from future extensions to these services. On the
other hand, this adds undesirable dependencies on other projects (and
their corresponding teams), complicates handling of failures, and
require defensive coding around service calls. Furthermore, these
services cannot handle the LB-specific control configuration.
* Implement a nested Octavia
Use another layer of Octavia to distribute traffic across the Amphora
Cluster (i.e., the Amphorae in the Cluster are back-end members of
another Octavia instance). This approach has the potential to provide
greater flexibility (e.g., provide NAT and/or more complex distribution
algorithms). It also potentially reuses existing code. However, we do
not want the Distributor to proxy connections so HA-Proxy cannot be
used. Furthermore, this approach might significantly increase the
overhead of the solution.
Data model impact
-----------------
* loadbalancer table
- `cluster_id`: associated Amphora Cluster (no changes to table, 1-1
relationship from Cluster data-model)
* lb_topology table
- new value: ``ACTIVE_ACTIVE``
* amphora_role table
- new value: ``IN_CLUSTER``
* Distributor table (*new*): Distributor information, similar to Amphora.
See Distributor spec.
* Cluster table (*new*): an extension to loadbalancer (i.e., one-to-one
mapping to load-balancer)
- `id` (primary key)
- `cluster_name`: identifier of Cluster instance for Amphora Cluster
Manager
- `desired_size`: required number of Amphorae in Cluster. Octavia will
create this many active-active Amphorae in the Amphora Cluster.
- `min_size`: number of ``ACTIVE`` Amphorae in Cluster must be above this
number for Amphora Cluster status to be ``ACTIVE``
- `cooldown`: cooldown period between successive add/remove Amphora
operations (to avoid thrashing)
- `load_balancer_id`: 1:1 relationship to loadbalancer
- `distributor_id`: N:1 relationship to Distributor. Support multiple
Distributors
- `provisioning_status`
- `operating_status`
- `enabled`
- `cluster_type`: type of Amphora Cluster implementation
REST API Impact
---------------
* Distributor REST API -- This is a new internal API that will be secured
via two-way SSL certificate authentication. See Distributor spec.
* Amphora REST API -- support configuration of disabling ``arp`` on VIP.
* [P2]_ LBaaS API -- support configuration of desired availability, perhaps
by selecting a flavor (e.g., gold is a minimum of 4 Amphorae, platinum is
a minimum of 10 Amphora).
* Operator API --
- Topology to use
- Cluster type
- Default availability parameters for the Amphora Cluster
Security impact
---------------
* See the Distributor spec. for Distributor related security impact.
Notifications impact
--------------------
None.
Other end user impact
---------------------
None.
Performance Impact
------------------
ACTIVE-ACTIVE should be able to deliver significantly higher performance than
SINGLE or ACTIVE-STANDBY topology. It will consume more resources to deliver
this higher performance.
Other deployer impact
---------------------
The reference ACM becomes a new process that is part of the Octavia control
components (like the controller worker, health monitor and housekeeper). If
the reference implementation is used, a new Distributor image will need to be
created and stored in glance much the same way the Amphora image is created
and stored today.
Developer impact
----------------
None.
Implementation
==============
Assignee(s)
-----------
@TODO
Work Items
----------
@TODO
Dependencies
============
@TODO
Testing
=======
* Unit tests with tox.
* Function tests with tox.
* Scenario tests.
Documentation Impact
====================
Need to document all new APIs and API changes, new ACTIVE-ACTIVE topology
design and features, and new instructions for operators seeking to deploy
Octavia with ACTIVE-ACTIVE topology.
References
==========
.. [1] https://blueprints.launchpad.net/octavia/+spec/base-image
.. [2] https://blueprints.launchpad.net/octavia/+spec/controller-worker
.. [3] https://blueprints.launchpad.net/octavia/+spec/amphora-driver-interface
.. [4] https://blueprints.launchpad.net/octavia/+spec/controller
.. [5] https://blueprints.launchpad.net/octavia/+spec/operator-api
.. [6] doc/main/api/haproxy-amphora-api.rst
.. [7] https://blueprints.launchpad.net/octavia/+spec/active-active-topology