Add spec for active-active
This specification contains a high-level description of a proposed architecture for handling an active-active topology within Octavia. Moved Distributor to new document. Captured the comments from Mitaka mid-cycle. Updated active-active-topology per latest comments. Major update to active-active-distributor per latest comments. More updates per comments Change-Id: Ifc2d618a979fd0eb822f2cba4b759ab6ade7793f Co-Authored-By: Eran Raichstein <eranra@il.ibm.com> Co-Authored-By: Dean Lorenz <dean@il.ibm.com> Co-Authored-By: Stephen Balukoff <stephen@balukoff.com>
This commit is contained in:
parent
de6bfc1629
commit
f43edf77b8
|
@ -0,0 +1,830 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
.. attention::
|
||||
Please review the active-active topology blueprint first ("Active-Active,
|
||||
N+1 Amphorae Setup",
|
||||
https://review.openstack.org/#/c/234639/7/specs/version1/active-active-topology.rst).
|
||||
|
||||
=================================================
|
||||
Distributor for Active-Active, N+1 Amphorae Setup
|
||||
=================================================
|
||||
|
||||
https://blueprints.launchpad.net/octavia/+spec/active-active-topology
|
||||
|
||||
This blueprint describes how Octavia implements a *Distributor* to support the
|
||||
*active-active* loadbalancer (LB) solution, as described in the blueprint
|
||||
linked above. It presents the high-level Distributor design and suggests
|
||||
high-level code changes to the current code base to realize this design.
|
||||
|
||||
In a nutshell, in an *active-active* topology, an *Amphora Cluster* of two
|
||||
or more active Amphorae collectively provide the loadbalancing service.
|
||||
It is designed as a 2-step loadbalancing process; first, a lightweight
|
||||
*distribution* of VIP traffic over an Amphora Cluster; then, full-featured
|
||||
loadbalancing of traffic over the back-end members. Since a single
|
||||
loadbalancing service, which is addressable by a single VIP address, is
|
||||
served by several Amphorae at the same time, there is a need to distribute
|
||||
incoming requests among these Amphorae -- that is the role of the
|
||||
*Distributor*.
|
||||
|
||||
This blueprint uses terminology defined in the Octavia glossary when available,
|
||||
and defines new terms to describe new components and features as necessary.
|
||||
|
||||
.. _P2:
|
||||
|
||||
**Note:** Items marked with [P2]_ refer to lower priority features to be
|
||||
designed / implemented only after initial release.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
* Octavia shall implement a Distributor to support the active-active
|
||||
topology.
|
||||
|
||||
* The operator should be able to select and configure the Distributor
|
||||
(e.g., through an Octavia configuration file or [P2]_ through a flavor
|
||||
framework).
|
||||
|
||||
* Octavia shall support a pluggable design for the Distributor, allowing
|
||||
different implementations. In particular, the Distributor shall be
|
||||
abstracted through a *driver*, similarly to the current support of
|
||||
Amphora implementations.
|
||||
|
||||
* Octavia shall support different provisioning types for the Distributor;
|
||||
including VM-based (the default, similar to current Amphorae),
|
||||
[P2]_ container-based, and [P2]_ external (vendor-specific) hardware.
|
||||
|
||||
* The operator shall be able to configure the distribution policies,
|
||||
including affinity and availability (see below for details).
|
||||
|
||||
Architecture
|
||||
============
|
||||
|
||||
High-level Topology Description
|
||||
-------------------------------
|
||||
|
||||
* The following diagram illustrates the Distributor's role in an active-active
|
||||
topology:
|
||||
|
||||
::
|
||||
|
||||
|
||||
Front-End Back-End
|
||||
Internet Networks Networks
|
||||
(world) (tenants) (tenants)
|
||||
║ A B C A B C
|
||||
┌──╨───┐floating IP ║ ║ ║ ┌────────┬──────────┬────┐ ║ ║ ║
|
||||
│ ├─ to VIP ──►╢◄──────║───────║──┤f.e. IPs│ Amphorae │b.e.├►╜ ║ ║
|
||||
│ │ LB A ║ ║ ║ └──┬─────┤ of │ IPs│ ║ ║
|
||||
│ │ ║ ║ ║ │VIP A│ Tenant A ├────┘ ║ ║
|
||||
│ GW │ ║ ║ ║ └─────┴──────────┘ ║ ║
|
||||
│Router│floating IP ║ ║ ║ ┌────────┬──────────┬────┐ ║ ║
|
||||
│ ├─ to VIP ───║──────►╟◄──────║──┤f.e. IPs│ Amphorae │b.e.├──►╜ ║
|
||||
│ │ LB B ║ ║ ║ └──┬─────┤ of │ IPs│ ║
|
||||
│ │ ║ ║ ║ │VIP B│ Tenant B ├────┘ ║
|
||||
│ │ ║ ║ ║ └─────┴──────────┘ ║
|
||||
│ │floating IP ║ ║ ║ ┌────────┬──────────┬────┐ ║
|
||||
│ ├─ to VIP ───║───────║──────►╢◄─┤f.e. IPs│ Amphorae │b.e.├────►╜
|
||||
└──────┘ LB C ║ ║ ║ └──┬─────┤ of │ IPs│
|
||||
║ ║ ║ │VIP C│ Tenant C ├────┘
|
||||
arp─►╢ arp─►╢ arp─►╢ └─────┴──────────┘
|
||||
┌─┴─┐ ║┌─┴─┐ ║┌─┴─┐ ║
|
||||
│VIP│┌►╜│VIP│┌►╜│VIP│┌►╜
|
||||
├───┴┴┐ ├───┴┴┐ ├───┴┴┐
|
||||
│IP A │ │IP B │ │IP C │
|
||||
┌┴─────┴─┴─────┴─┴─────┴┐
|
||||
│ │
|
||||
│ Distributor │
|
||||
│ (multi-tenant) │
|
||||
└───────────────────────┘
|
||||
|
||||
|
||||
* In the above diagram, several tenants (A, B, C, ...) share the
|
||||
Distributor, yet the Amphorae, and the front- and back-end (tenant)
|
||||
networks are not shared between tenants. (See also "Distributor Sharing"
|
||||
below.) Note that in the initial code implementing the distributor, the
|
||||
distributor will not be shared between tenants, until tests verifying the
|
||||
security of a shared distributor can be implemented.
|
||||
|
||||
* The Distributor acts as a (one-legged) router, listening on each
|
||||
load balancer's VIP and forwarding to one of its Amphorae.
|
||||
|
||||
* Each load balancer's VIP is advertised and answered by the Distributor.
|
||||
An ``arp`` request for any of the VIP addresses is answered by the
|
||||
Distributor, hence any traffic sent for each VIP is received by the
|
||||
Distributor (and forwarded to an appropriate Amphora).
|
||||
|
||||
* ARP is disabled on all the Amphorae for the VIP interface.
|
||||
|
||||
* The Distributor distributes the traffic of each VIP to an Amphora in the
|
||||
corresponding load balancer Cluster.
|
||||
|
||||
* An example of high-level data flow:
|
||||
|
||||
1. Internet clients access a tenant service through an externally visible
|
||||
floating-IP (IPv4 or IPv6).
|
||||
|
||||
2. The GW router maps the floating IP into a loadbalancer's internal VIP on
|
||||
the tenant's front-end network.
|
||||
|
||||
3. (1st packet to VIP only) the GW send an ``arp`` request on VIP
|
||||
(tenant front-end) network. The Distributor answers the ``arp`` request
|
||||
with its own MAC address on this network (all the Amphorae on the network
|
||||
can serve the VIP, but do not answer the ``arp``).
|
||||
|
||||
4. The GW router forwards the client request to the Distributor.
|
||||
|
||||
5. The Distributor forwards the packet to one of the Amphorae on the
|
||||
tenant's front-end network (distributed according some policy,
|
||||
as described below), without changing the destination IP (i.e., still
|
||||
using the VIP).
|
||||
|
||||
6. The Amphora accepts the packet and continues the flow on the tenant's
|
||||
back-end network as for other Octavia loadbalancer topologies (non
|
||||
active-active).
|
||||
|
||||
7. The outgoing response packets from the Amphora are forwarded directly
|
||||
to the GW router (that is, it does not pass through the Distributor).
|
||||
|
||||
Affinity of Flows to Amphorae
|
||||
-----------------------------
|
||||
|
||||
- Affinity is required to make sure related packets are forwarded to the
|
||||
same Amphora. At minimum, since TCP connections are terminated at the
|
||||
Amphora, all packets that belong to the same flow must be sent to the
|
||||
same Amphora. Enhanced affinity levels can be used to make sure that flows
|
||||
with similar attributes are always sent to the same Amphora; this may be
|
||||
desired to achieve better performance (see discussion below).
|
||||
|
||||
- [P2]_ The Distributor shall support different modes of client-to-Amphora
|
||||
affinity. The operator should be able to select and configure the desired
|
||||
affinity level.
|
||||
|
||||
- Since the Distributor is L3 and the "heavy lifting" is expected to be
|
||||
done by the Amphorae, this specification proposes implementing two
|
||||
practical affinity alternatives. Other affinity alternatives may be
|
||||
implemented at a later time.
|
||||
|
||||
*Source IP and source port*
|
||||
In this mode, the Distributor must always send packets from the same
|
||||
combination of Source IP and Source port to the same Amphora. Since
|
||||
the Target IP and Target Port are fixed per Listener, this mode implies
|
||||
that all packets from the same TCP flow are sent to the same Amphora.
|
||||
This is the minimal affinity mode, as without it TCP connections will
|
||||
break.
|
||||
|
||||
*Note*: related flows (e.g., parallel client calls from the same HTML
|
||||
page) will typically be distributed to different Amphorae; however,
|
||||
these should still be routed to the same back-end. This could be
|
||||
guaranteed by using cookies and/or by synchronizing the stick-tables.
|
||||
Also, the Amphorae in the Cluster could be configured to use the same
|
||||
hashing parameters (avoid any random seed) to ensure all make similar
|
||||
decisions.
|
||||
|
||||
*Source IP* (default)
|
||||
In this mode, the Distributor must always send packets from the same
|
||||
source IP to the same Amphora, regardless of port. This mode allows TLS
|
||||
session reuse (e.g., through session ids), where an abbreviated
|
||||
handshake can be used to improve latency and computation time.
|
||||
|
||||
The main disadvantage of sending all traffic from the same source IP to
|
||||
the same Amphora is that it might lead to poor load distribution for
|
||||
large workloads that have the same source IP (e.g., workload behind a
|
||||
single nat or proxy).
|
||||
|
||||
**Note on TLS implications**:
|
||||
In some (typical) TLS sessions, the additional load incurred for each new
|
||||
session is significantly larger than the load incurred for each new
|
||||
request or connection on the same session; namely, the total load on each
|
||||
Amphora will be more affected by the number of different source IPs it
|
||||
serves than by the number of connections. Moreover, since the total load
|
||||
on the Cluster incurred by all the connections depends on the level of
|
||||
session reuse, spreading a single source IP over multiple Amphorae
|
||||
*increases* the overall load on the Cluster. Thus, a Distributor that
|
||||
uniformly spreads traffic without affinity per source IP (e.g., uses
|
||||
per-flow affinity only) might cause an increase in overall load on the
|
||||
Cluster that is proportional to the number of Amphorae. For example, in a
|
||||
scale-out scenario (where a new Amphora is spawned to share the total
|
||||
load), moving some flows to the new Amphora might increase the overall
|
||||
Cluster load, negating the benefit of scaling-out.
|
||||
|
||||
Session reuse helps with the certificate exchange phase. Improvements
|
||||
in performance with the certificate exchange depend on the type of keys
|
||||
used, and is greatest with RSA. Session reuse may be less important with
|
||||
other schemes; shared TLS session tickets are another mechanism that may
|
||||
circumvent the problem; also, upcoming versions of HA-Proxy may be able
|
||||
to obviate this problem by synchronizing TLS state between Amphorae
|
||||
(similar to stick-table protocol).
|
||||
|
||||
- Per the agreement at the Mitaka mid-cycle, the default affinity shall be
|
||||
based on source-IP only and a consistent hashing function (see below)
|
||||
shall be used to distribute flows in a predictable manner; however,
|
||||
abstraction will be used to allow other implementations at a later time.
|
||||
|
||||
Forwarding with OVS and OpenFlow Rules
|
||||
--------------------------------------
|
||||
|
||||
* The reference implementation of the Distributor shall use OVS for
|
||||
forwarding and configure the Distributor through OpenFlow rules.
|
||||
|
||||
- OpenFlow rules can be implemented by a software switch (e.g., OVS) that
|
||||
can run on a VM. Thus, can be created and managed by Octavia similarly
|
||||
to creation and management of Amphora VMs.
|
||||
|
||||
- OpenFlow rules are supported by several HW switches, so the same
|
||||
control plane can be used for both SW and HW implementations.
|
||||
|
||||
* Outline of Rules
|
||||
|
||||
- A ``group`` with the ``select`` method is used to distribute IP traffic
|
||||
over multiple Amphorae. There is one ``bucket`` per Amphora -- adding
|
||||
an Amphora adds a new ``bucket`` and deleting and Amphora removes the
|
||||
corresponding ``bucket``.
|
||||
|
||||
- The ``select`` method supports (OpenFlow v1.5) hashed-based selection
|
||||
of the ``bucket``. The hash can be set up to use different fields,
|
||||
including by source IP only (default) and by source IP and source port.
|
||||
|
||||
- All buckets route traffic back on the in-port (i.e., no forwarding
|
||||
between ports). This ensures that the same front-end network is used
|
||||
(i.e., the Distributor does not route between front-end networks;
|
||||
therefore, does not mix traffic of different tenants).
|
||||
|
||||
- The ``bucket`` actions do a re-write of the outgoing packets. It
|
||||
supports re-write of the destination MAC to that of the specific
|
||||
Amphora and re-write of the source MAC to that of the Distributor
|
||||
interface (together these MAC re-writes provide L3 routing functionality).
|
||||
|
||||
*Note:* alternative re-write rules can be used to support other forwarding
|
||||
mechanisms.
|
||||
|
||||
- OpenFlow rules are also used to answer ``arp`` requests on the VIP.
|
||||
``arp`` requests for each VIP are captured, re-written as ``arp``
|
||||
replies with the MAC address of the particular front-end interface and
|
||||
sent back on the in-port. Again, there is no routing between interfaces.
|
||||
|
||||
* Handling Amphora failure
|
||||
|
||||
- Initial implementation will assume a fixed size for each cluster (no
|
||||
elasticity). The hashing will be "consistent" by virtue of never
|
||||
changing the number of ``buckets``. If the cluster size is changed on
|
||||
the fly (there should not be an API to do so) then there are no
|
||||
guarantees on shuffling.
|
||||
|
||||
- If an Amphora fails then remapping cannot be avoided -- all flows of
|
||||
the failed Amphora must be remapped to a different one. Rather than
|
||||
mapping these flows to other active Amphorae in the cluster, the reference
|
||||
implementation will map all flows to the cluster's *standby* Amphora (i.e.
|
||||
the "+1" Amphora in this "N+1" cluster). This ensures that the cluster
|
||||
size does not change. The only change in the OpenFlow rules would be to
|
||||
replace the MAC of the failed Amphora with that of the standby Amphora.
|
||||
|
||||
- This implementation is very similar to Active-Standby fail-over. There
|
||||
will be a standby Amphora that can serve traffic in case of failure.
|
||||
The differences from Active-Standby is that a single Amphora acts as a
|
||||
standby for multiple ones; fail-over re-routing is handled through the
|
||||
Distributor (rather than by VRRP); and a whole cluster of Amphorae is
|
||||
active concurrently, to enable support of large workloads.
|
||||
|
||||
- Health Manager will trigger re-creation of a failed Amphora. Once the
|
||||
Amphora is ready it becomes the new *standby* (no changes to OpenFlow
|
||||
rules).
|
||||
|
||||
- [P2]_ Handle concurrent failure of more than a single Amphora
|
||||
|
||||
* Handling Distributor failover
|
||||
|
||||
- To handle the event of a Distributor failover caused by a catastrophic
|
||||
failure of a Distributor, and in order to preserve the client to Amphora
|
||||
affinity when the Distributor is replaced, the Amphora registration process
|
||||
with the Distributor should preserve positional information. This should
|
||||
ensure that when a new Distributor is created, Amphorae will be assigned to
|
||||
the same buckets to which they were previously assigned.
|
||||
|
||||
- In the reference implementation, we propose making the Distributor API
|
||||
return the complete list of Amphorae MAC addresses with positional
|
||||
information each time an Amphora is registered or unregistered.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
**Note:** These are changes on top of the changes described in the
|
||||
"Active-Active, N+1 Amphorae Setup" blueprint, (see
|
||||
https://blueprints.launchpad.net/octavia/+spec/active-active-topology)
|
||||
|
||||
* Create flow for the creation of an Amphora cluster with N active Amphora
|
||||
and one extra standby Amphora. Set-up the Amphora roles accordingly.
|
||||
|
||||
* Support the creation, connection, and configuration of the various
|
||||
networks and interfaces as described in `high-level topology` diagram.
|
||||
The Distributor shall have a separate interface for each loadbalancer and
|
||||
shall not allow any routing between different ports. In particular, when
|
||||
a loadbalancer is created the Distributor should:
|
||||
|
||||
- Attach the Distributor to the loadbalancer's front-end network by
|
||||
adding a VIP port to the Distributor (the LB VIP Neutron port).
|
||||
|
||||
- Configure OpenFlow rules: create a group with the desired cluster size
|
||||
and with the given Amphora MACs; create rules to answer ``arp``
|
||||
requests for the VIP address.
|
||||
|
||||
**Notes:**
|
||||
[P2]_ It is desirable that the Distributor be considered as a router by
|
||||
Neutron (to handle port security, network forwarding without ``arp``
|
||||
spoofing, etc.). This may require changes to Neutron and may also mean
|
||||
that Octavia will be a privileged user of Neutron.
|
||||
|
||||
Distributor needs to support IPv6 NDP
|
||||
|
||||
[P2_] If the Distributor is implemented as a container then hot-plugging
|
||||
a port for each VIP might not be possible.
|
||||
|
||||
If DVR is used then routing rules must be used to forward external
|
||||
traffic to the Distributor rather than rely on ``arp``. In particular,
|
||||
DVR messes-up ``noarp`` settings.
|
||||
|
||||
* Support Amphora failure recovery
|
||||
|
||||
- Modify the HM and failure recovery flows to add tasks to notify the ACM
|
||||
when ACTIVE-ACTIVE topology is in use. If an active Amphora fails then
|
||||
it needs to be decommissioned on the Distributor and replaced with
|
||||
the standby.
|
||||
|
||||
- Failed Amphorae should be recreated as a standby (in the new
|
||||
IN_CLUSTER_STANDBY role). The standby Amphora should also be monitored and
|
||||
recovered on failure.
|
||||
|
||||
* Distributor driver and Distributor image
|
||||
|
||||
- The Distributor should be supported similarly to an amphora; namely, have
|
||||
its own abstract driver.
|
||||
|
||||
- Distributor image (for reference implementation) should include OVS
|
||||
with a recent version (>1.5) that supports hash-based bucket selection.
|
||||
As is done for Amphorae, Distributor image should be installed with
|
||||
public keys to allow secure configuration by the Octavia controller.
|
||||
|
||||
- Reference implementation shall spawn a new Distributor VM as needed. It
|
||||
shall monitor its health and handle recovery using heartbeats sent to the
|
||||
health monitor in a similar fashion to how this is done presently with
|
||||
Amphorae. [P2]_ Spawn a new Distributor if the number VIPs exceeds a given
|
||||
limit (to limit the number of Neutron ports attached to one Distributor).
|
||||
[P2]_ Add configuration options and/or Operator API to allow operator to
|
||||
request a dedicated Distributor for a VIP (or per tenant).
|
||||
|
||||
* Define a REST API for Distributor configuration (no SSH API).
|
||||
See below for details.
|
||||
|
||||
* Create data-model for Distributor.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
TBD
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
Add table ``distributor`` with the following columns:
|
||||
|
||||
* id ``(sa.String(36) , nullable=False)``
|
||||
ID of Distributor instance.
|
||||
|
||||
* compute_id ``(sa.String(36), nullable=True)``
|
||||
ID of compute node running the Distributor.
|
||||
|
||||
* lb_network_ip ``(sa.String(64), nullable=True)``
|
||||
IP of Distributor on management network.
|
||||
|
||||
* status ``(sa.String(36), nullable=True)``
|
||||
Provisioning status
|
||||
|
||||
* vip_port_ids (list of ``sa.String(36)``)
|
||||
List of Neutron port IDs.
|
||||
New VIFs may be plugged into the Distributor when a new LB is created. We
|
||||
may need to store the Neutron port IDs in order to support
|
||||
fail-over from one Distributor instance to another.
|
||||
|
||||
Add table ``distributor_health`` with the following columns:
|
||||
|
||||
* distributor_id ``(sa.String(36) , nullable=False)``
|
||||
ID of Distributor instance.
|
||||
|
||||
* last_update ``(sa.DateTime, nullable=False)``
|
||||
Last time distributor heartbeat was received by a health monitor.
|
||||
|
||||
* busy ``(sa.Boolean, nullable=False)``
|
||||
Field indicating a create / delete or other action is being conducted on
|
||||
the distributor instance (ie. to prevent a race condition when multiple
|
||||
health managers are in use).
|
||||
|
||||
Add table ``amphora_registration`` with the following columns. This describes
|
||||
which Amphorae are registered with which Distributors and in which order:
|
||||
|
||||
* lb_id ``(sa.String(36) , nullable=False)``
|
||||
ID of load balancer.
|
||||
|
||||
* distributor_id ``(sa.String(36) , nullable=False)``
|
||||
ID of Distributor instance.
|
||||
|
||||
* amphora_id ``(sa.String(36) , nullable=False)``
|
||||
ID of Amphora instance.
|
||||
|
||||
* position ``(sa.Integer, nullable=True)``
|
||||
Order in which Amphorae are registered with the Distributor.
|
||||
|
||||
REST API Impact
|
||||
---------------
|
||||
Distributor will be running its own rest API server. This API will be secured
|
||||
using two-way SSL authentication, and use certificate rotation in the same
|
||||
way this is done with Amphorae today.
|
||||
|
||||
Following API calls will be addressed.
|
||||
|
||||
1. Post VIP Plug
|
||||
|
||||
Adding a VIP network interface to the Distributor involves tasks which run
|
||||
outside the Distributor itself. Once these are complete, the Distributor
|
||||
must be configured to use the new interface. This is a REST call, similar
|
||||
to what is currently done for Amphorae when connecting to a new member
|
||||
network.
|
||||
|
||||
`lb_id`
|
||||
An identifier for the particular loadbalancer/VIP. Used for subsequent
|
||||
register/unregister of Amphorae.
|
||||
|
||||
`vip_address`
|
||||
The IP of the VIP (for which IP to answer ``arp`` requests)
|
||||
|
||||
`subnet_cidr`
|
||||
Netmask for the VIP's subnet.
|
||||
|
||||
`gateway`
|
||||
Gateway outbound packets from the VIP ip address should use.
|
||||
|
||||
`mac_address`
|
||||
MAC address of the new interface corresponding to the VIP.
|
||||
|
||||
`vrrp_ip`
|
||||
In the case of HA Distributor, this contains the IP address that will
|
||||
be used in setting up the allowed address pairs relationship. (See
|
||||
Amphora VIP plugging under the ACTIVE-STANDBY topology for an example
|
||||
of how this is used.)
|
||||
|
||||
`host_routes`
|
||||
List of routes that should be added when the VIP is plugged.
|
||||
|
||||
`alg_extras`
|
||||
Extra arguments related to the algorithm that will be used to distribute
|
||||
requests to Amphorae part of this load balancer configuration. This
|
||||
consists of an algorithm name and affinity type. In the initial release
|
||||
of ACTIVE-ACTIVE, the only valid algorithm will be *hash*, and the
|
||||
affinity type may be ``Source_IP`` or [P2]_ ``Source_IP_AND_port``.
|
||||
|
||||
2. Pre VIP unplug
|
||||
|
||||
Removing a VIP network interface will involve several tasks on the
|
||||
Distributor to gracefully roll-back OVS configuration and other details
|
||||
that were set-up when the VIP was plugged in.
|
||||
|
||||
`lb_id`
|
||||
ID of the VIP's loadbalancer that will be unplugged.
|
||||
|
||||
3. Register Amphorae
|
||||
|
||||
This adds Amphorae to the configuration for a given load balancer. The
|
||||
Distributor should respond with a new list of all Amphorae registered with
|
||||
the Distributor with positional information.
|
||||
|
||||
`lb_id`
|
||||
ID of the loadbalancer with which the Amphora will be registered
|
||||
|
||||
`amphorae`
|
||||
List of Amphorae MAC addresses and (optional) position argument in which
|
||||
they should be registered.
|
||||
|
||||
4. Unregister Amphorae
|
||||
|
||||
This removes Amphorae from the configuration for a given load balancer. The
|
||||
Distributor should respond with a new list of all Amphorae registered with
|
||||
the Distributor with positional information.
|
||||
|
||||
`lb_id`
|
||||
ID of the loadbalancer with which the Amphora will be registered
|
||||
|
||||
`amphorae`
|
||||
List of Amphorae MAC addresses that should be unregistered with the
|
||||
Distributor.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
The Distributor is designed to be multi-tenant by default. (Note that the first
|
||||
reference implementation will not be multi-tenant until tests can be developed
|
||||
to verify the security of a multi-tenant reference distributor.) Although each
|
||||
tenant has its own front-end network, the Distributor is connected to all,
|
||||
which might allow leaks between these networks. The rationale is two fold:
|
||||
First, the Distributor should be considered as a trusted infrastructure
|
||||
component. Second, all traffic is external traffic before it reaches the
|
||||
Amphora. Note that the GW router has exactly the same attributes; in other
|
||||
words, logically, we can consider the Distributor to be an extension to the GW
|
||||
(or even use the GW HW to implement the Distributor).
|
||||
|
||||
This approach might not be considered secure enough for some cases, such as, if
|
||||
LBaaS is used for internal tier-to-tier communication inside a tenant network.
|
||||
Some tenants may want their loadbalancer's VIP to remain private and their
|
||||
front-end network to be isolated. In these cases, in order to accomplish
|
||||
active-active for this tenant we would need separate dedicated Distributor
|
||||
instance(s).
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
* Unit tests with tox.
|
||||
* Function tests with tox.
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Further Discussion
|
||||
==================
|
||||
|
||||
.. Note::
|
||||
This section captures some background, ideas, concerns, and remarks that
|
||||
were raised by various people. Some of the items here can be considered for
|
||||
future/alternative design and some will hopefully make their way into, yet
|
||||
to be written, related blueprints (e.g., auto-scaled topology).
|
||||
|
||||
[P2]_ Handling changes in Cluster size (manual or auto-scaled)
|
||||
----------------------------------------------------------------
|
||||
|
||||
- The Distributor shall support different mechanism for preserving affinity
|
||||
of flows to Amphorae following a *change in the size* of the Amphorae
|
||||
Cluster.
|
||||
|
||||
- The goal is to minimize shuffling of client-to-Amphora mapping during
|
||||
cluster size changes:
|
||||
|
||||
* When an Amphora is removed from the Cluster (e.g., due to failure or
|
||||
scale-down action), all its flows are broken; however, flows to other
|
||||
Amphorae should not be affected. Also, if a drain method is used to empty
|
||||
the Amphora of client flows (in the case of a graceful removal), this
|
||||
should prevent disruption.
|
||||
|
||||
* When an Amphora is *added* to the Cluster (e.g., recovery of a failed
|
||||
Amphora), some new flows should be distributed to the new Amphora;
|
||||
however, most flows should still go to the same Amphora they were
|
||||
distributed to before the new Amphora was added. For example, if the
|
||||
affinity of flows to Amphorae is per Source IP and a new Amphora was just
|
||||
added then the Distributor should forward packets from this IP only one
|
||||
of only two Amphorae: either the same Amphora as before or the
|
||||
Amphora that was added.
|
||||
|
||||
Using a simple hash to maintain affinity does not meet this goal.
|
||||
|
||||
For example, suppose we maintain affinity (for a fixed cluster size) using
|
||||
a hash (for randomizing key distribution) as
|
||||
`chosen_amphora_id = hash(sourceIP # port) mod number_of_amphorae`.
|
||||
When a new Amphora is added or remove the number of Amphorae changes;
|
||||
thus, a different Amphora will be chosen for most flows.
|
||||
|
||||
- Below are the couple of ways to tackle this shuffling problem.
|
||||
|
||||
*Consistent Hashing*
|
||||
Consistent hashing is a hashing mechanism (regardless if key is based on
|
||||
IP or IP/port) that preserves most hash mappings during changes in the
|
||||
size of the Amphorae Cluster. In particular, for a cluster with N
|
||||
Amphorae that grows to N+1 Amphorae, a consistent hashing function
|
||||
ensures that, with high probability, only 1/N of inputs flows will be
|
||||
re-hashed (more precisely, K/N keys will be rehashed). Note that, even
|
||||
with consistent hashing, some flows will be remapped and there is only
|
||||
a statistical bound on the number of remapped flows.
|
||||
|
||||
The "classic" consistent hashing algorithm maps both server IDs and
|
||||
keys to hash values and selects for each key the server with the
|
||||
closest hash value to the key hash value. Lookup generally requires
|
||||
O(log N) to search for the "closest" server. Achieving good
|
||||
distribution requires multiple hashes per server (~10s) -- although
|
||||
these can be pre-computed there is an ~10s*N memory footprint. Other
|
||||
algorithms (e.g., MSFT's Magleb) have better performance, but provide
|
||||
weaker guarantees.
|
||||
|
||||
There are several consistent hashing libraries available. None are
|
||||
supported in OVS.
|
||||
|
||||
* Ketama https://github.com/RJ/ketama
|
||||
|
||||
* Openstack swift http://docs.openstack.org/developer/swift/ring.html
|
||||
|
||||
* Amazon dynamo
|
||||
http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
|
||||
|
||||
We should also strongly consider making any consistent hashing algorithm
|
||||
we develop available to all OpenStack components by making it part of an
|
||||
Oslo library.
|
||||
|
||||
*Rendezvous hashing*
|
||||
This method provides similar properties to Consistent Hashing (i.e., a
|
||||
hashing function that remaps only 1/N of keys when a cluster with N
|
||||
Amphorae grows to N+1 Amphorae.
|
||||
|
||||
For each server ID, the algorithm concatenates the key and server ID and
|
||||
computes a hash. The server with the largest hash is chosen. This
|
||||
approach requires O(N) for each lookup, but is much simpler to
|
||||
implement and has virtually no memory footprint. Through search-tree
|
||||
encoding of the server IDs it is possible to achieve O(log N) lookup,
|
||||
but implementation is harder and distribution is not as good. Another
|
||||
feature is that more than one server can be chosen (e.g., two largest
|
||||
values) to handle larger loads -- not directly useful for the
|
||||
Distributor use case.
|
||||
|
||||
*Hybrid, Permutation-based approach*
|
||||
This is an alternative implementation of consistent hashing that may be
|
||||
simpler to implement. Keys are hashed to a set of buckets; each bucket
|
||||
is pre-mapped to a random permutation of the server IDs. Lookup is by
|
||||
computing a hash of the key to obtain a bucket and then going over the
|
||||
permutation selecting the first server. If a server is marked as "down"
|
||||
the next server in the list is chosen. This approach is similar to
|
||||
Rendezvous hashing if each key is directly pre-mapped to a random
|
||||
permutation (and like it allows more than one server selection). If the
|
||||
number of failed servers is small then lookup is about O(1); memory is
|
||||
O(N * #buckets), where the granularity of distribution is improved by
|
||||
increasing the number of buckets. The permutation-based approach is
|
||||
useful to support clusters of fixed size that need to handle a few
|
||||
nodes going down and then coming back up. If there is an assumption on
|
||||
the number of failures then memory can be reduced to O( max_failures *
|
||||
#buckets). This approach seems to suit the Distributor Active-Active
|
||||
use-case for non-elastic workloads.
|
||||
|
||||
- Flow tracking is required, even with the above hash functions, to handle
|
||||
the (relatively few) remapped flows. If an existing flow is remapped, its
|
||||
TCP connection would break. This is acceptable when an Amphora goes down
|
||||
and it flows are mapped to a new one. On the other hand, it may be
|
||||
unacceptable when an Amphora is added to the cluster and 1/N of existing
|
||||
flows are remapped. The Distributor may support different modes, as follows.
|
||||
|
||||
*None / Stateless*
|
||||
In this mode, the Distributor applies its most recent forwarding rules,
|
||||
regardless of previous state. Some existing flows might be remapped to a
|
||||
different Amphora and would be broken. The client would have to recover
|
||||
and establish a connection with the new Amphora (it would still be
|
||||
mapped to the same back-end, if possible). Combined with consistent (or
|
||||
similar) hashing, this may be good enough for many web applications
|
||||
that are built for failure anyway, and can restore their state upon
|
||||
reconnect.
|
||||
|
||||
*Full flow Tracking*
|
||||
In this mode, the Distributor tracks existing flows to provide full
|
||||
affinity, i.e., only new flows can be remapped to different Amphorae.
|
||||
The Linux connection tracking may be used (e.g., through IPTables or
|
||||
through OpenFlow); however, this might not scale well. Alternatively,
|
||||
the Distributor can use an independent mechanism similar to HA-Proxy
|
||||
sticky-tables to track the flows. Note that the Distributor only needs to
|
||||
track the mapping per source IP and source port (unlike Linux connection
|
||||
tracking which follows the TCP state and related connections).
|
||||
|
||||
*Use Ryu*
|
||||
Ryu is a well supported and tested python binding for issuing OpenFlow
|
||||
commands. Especially since Neutron recently moved to using this for
|
||||
many of the things it does, using this in the Distributor might make
|
||||
sense for Octavia as well.
|
||||
|
||||
Forwarding Data-path Implementation Alternatives
|
||||
------------------------------------------------
|
||||
|
||||
The current design uses L2 forwarding based only on L3 parameters and uses
|
||||
Direct Return routing (one-legged). The rational behind this approach is
|
||||
to keep the Distributor as light as possible and have the Amphorae do the
|
||||
bulk of the work. This allows one (or a few) Distributor instance(s) to
|
||||
serve all traffic even for very large workloads. Other approaches are
|
||||
possible.
|
||||
|
||||
2-legged Router
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
- Distributor acts as router, being in-path on both directions.
|
||||
|
||||
- New network between Distributor and Amphorae -- Only Distributor on VIP
|
||||
subnet.
|
||||
|
||||
- No need to use MAC forwarding -- use routing rules
|
||||
|
||||
LVS
|
||||
^^^
|
||||
|
||||
Use LVS for Distributor.
|
||||
|
||||
DNS
|
||||
^^^
|
||||
|
||||
Use DNS for the Distributor.
|
||||
|
||||
- Use DNS to map to particular Amphorae. Distribution will be of
|
||||
domain name rather than VIP.
|
||||
|
||||
- No problem with per-flow affinity, as client will use same IP for entire
|
||||
TCP connection.
|
||||
|
||||
- Need a different public IP for each Amphora (no VIP)
|
||||
|
||||
Pure SDN
|
||||
^^^^^^^^
|
||||
|
||||
- Implement the OpenFlow rules directly in the network, without a
|
||||
Distributor instance.
|
||||
|
||||
- If the network infrastructure supports this then the Distributor can
|
||||
become more robust and very lightweight, making it practical to have a
|
||||
dedicated Distributor per VIP (only the rules will be dedicated as the
|
||||
network and SDN controller are shared resources)
|
||||
|
||||
Distributor Sharing
|
||||
-------------------
|
||||
|
||||
- The initial implementation of the Distributor will not be shared between
|
||||
tenants until tests can be written to verify the security of this solution.
|
||||
|
||||
- The implementation should support different Distributor sharing and
|
||||
cardinality configurations. This includes single-shared Distributor,
|
||||
multiple-dedicated Distributors, and multiple-shared Distributors. In
|
||||
particular, an abstraction layer should be used and the data-model should
|
||||
include an association between the load balancer and Distributor.
|
||||
|
||||
- A shared Distributor uses the least amount of resources, but may not meet
|
||||
isolation requirements (performance and/or security) or might become a
|
||||
bottleneck.
|
||||
|
||||
Distributor High-Availability
|
||||
-----------------------------
|
||||
|
||||
- The Distributor should be highly-available (as this is one of the
|
||||
motivations for the active-active topology). Once the initial active-active
|
||||
functionality is delivered, developing a highly available distributor should
|
||||
take a high priority.
|
||||
|
||||
- A mechanism similar to the VRRP used on ACTIVE-STANDBY topology Amphorae
|
||||
can be used.
|
||||
|
||||
- Since the Distributor is stateless (for fixed cluster sizes and if no
|
||||
connection tracking is used) it is possible to set up an active-active
|
||||
configuration and advertise more than one Distributor (e.g, for ECMP).
|
||||
|
||||
- As a first step, the initial implementation will use a single Distributor
|
||||
instance (i.e., will not be highly-available). Health Manager will monitor
|
||||
the Distributor health and initiate recovery if needed.
|
||||
|
||||
- The implementation should support plugging-in a hardware-based
|
||||
implementation of the Distributor that may have its own high-availability
|
||||
support.
|
||||
|
||||
- In order to preserve client to Amphora affinity in the case of a failover,
|
||||
a VRRP-like HA Distributor has several options. We could potentially push
|
||||
Amphora registrations to the standby Distributor with the position
|
||||
arguments specified, in order to guarantee the active and standby Distributor
|
||||
always have the same configuration. Or, we could invent and utilize a
|
||||
synchronization protocol between the active and standby Distributors. This
|
||||
will be explored and decided when an HA Distributor specification is
|
||||
written and approved.
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] https://blueprints.launchpad.net/octavia/+spec/base-image
|
||||
.. [2] https://blueprints.launchpad.net/octavia/+spec/controller-worker
|
||||
.. [3] https://blueprints.launchpad.net/octavia/+spec/amphora-driver-interface
|
||||
.. [4] https://blueprints.launchpad.net/octavia/+spec/controller
|
||||
.. [5] https://blueprints.launchpad.net/octavia/+spec/operator-api
|
||||
.. [6] doc/main/api/haproxy-amphora-api.rst
|
||||
.. [7] https://blueprints.launchpad.net/octavia/+spec/active-active-topology
|
|
@ -0,0 +1,635 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
|
||||
=================================
|
||||
Active-Active, N+1 Amphorae Setup
|
||||
=================================
|
||||
|
||||
https://blueprints.launchpad.net/octavia/+spec/active-active-topology
|
||||
|
||||
This blueprint describes how Octavia implements an *active-active*
|
||||
loadbalancer (LB) solution that is highly-available through redundant
|
||||
Amphorae. It presents the high-level service topology and suggests
|
||||
high-level code changes to the current code base to realize this scenario.
|
||||
In a nutshell, an *Amphora Cluster* of two or more active Amphorae
|
||||
collectively provide the loadbalancing service.
|
||||
|
||||
The Amphora Cluster shall be managed by an *Amphora Cluster Manager* (ACM).
|
||||
The ACM shall provide an abstraction that allows different types of
|
||||
active-active features (e.g., failure recovery, elasticity, etc.). The
|
||||
initial implementation shall not rely on external services, but the
|
||||
abstraction shall allow for interaction with external ACMs (to be developed
|
||||
later).
|
||||
|
||||
This blueprint uses terminology defined in Octavia glossary when available,
|
||||
and defines new terms to describe new components and features as necessary.
|
||||
|
||||
.. _P2:
|
||||
|
||||
**Note:** Items marked with [P2]_ refer to lower priority features to be
|
||||
designed / implemented only after initial release.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
A tenant should be able to start a highly-available, loadbalancer for the
|
||||
tenant's backend services as follows:
|
||||
|
||||
* The operator should be able to configure an active-active topology
|
||||
through an Octavia configuration file or [P2]_ through a Neutron flavor,
|
||||
which the loadbalancer shall support. Octavia shall support active-active
|
||||
topologies in addition to the topologies that it currently supports.
|
||||
|
||||
* In an active-active topology, a cluster of two or more amphorae shall
|
||||
host a replicated configuration of the load-balancing services. Octavia
|
||||
will manage this *Amphora Cluster* as a highly-available service using a
|
||||
pool of active resources.
|
||||
|
||||
* The Amphora Cluster shall provide the load-balancing services and support
|
||||
the configurations that are supported by a single Amphora topology,
|
||||
including L7 load-balancing, SSL termination, etc.
|
||||
|
||||
* The active-active topology shall support various Amphora types and
|
||||
implementations; including, virtual machines, [P2]_ containers, and
|
||||
bare-metal servers.
|
||||
|
||||
* The operator should be able to configure the high-availability
|
||||
requirements for the active-active load-balancing services. The operator
|
||||
shall be able to specify the number of healthy Amphorae that must exist
|
||||
in the load-balancing Amphora Cluster. If the number of healthy Amphorae
|
||||
drops under the desired number, Octavia shall automatically and
|
||||
seamlessly create and configure a new Amphora and add it to the Amphora
|
||||
Cluster. [P2]_ The operator should be further able to define that the
|
||||
Amphora Cluster shall be allocated on separate physical resources.
|
||||
|
||||
* An Amphora Cluster will collectively act to serve as a single logical
|
||||
loadbalancer as defined in the Octavia glossary. Octavia will seamlessly
|
||||
distribute incoming external traffic among the Amphorae in the Amphora
|
||||
Cluster. To that end, Octavia will employ a *Distributor* component that
|
||||
will forward external traffic towards the managed amphora instances.
|
||||
Conceptually, the Distributor provides an extra level of load-balancing
|
||||
for an active-active Octavia application, albeit a simplified one.
|
||||
Octavia should be able to support several Distributor implementations
|
||||
(e.g., software-based and hardware-based) and different affinity models
|
||||
(at minimum, flow-affinity should be supported to allow TCP connectivity
|
||||
between clients and Amphorae).
|
||||
|
||||
* The detailed design of the Distributor component will be described in a
|
||||
separate document (see "Distributor for Active-Active, N+1 Amphorae
|
||||
Setup", active-active-distributor.rst).
|
||||
|
||||
|
||||
High-level Topology Description
|
||||
-------------------------------
|
||||
|
||||
Single Tenant
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
* The following diagram illustrates the active-active topology:
|
||||
|
||||
::
|
||||
|
||||
Front-End Back-End
|
||||
Internet Network Network
|
||||
(world) (tenant) (tenant)
|
||||
║ ║ ║
|
||||
┌─╨────┐ floating IP ║ ║ ┌────────┐
|
||||
│Router│ to LB VIP ║ ┌────┬─────────┬────┐ ║ │ Tenant │
|
||||
│ GW ├──────────────►╫◄─┤ IP │ Amphora │ IP ├─►╫◄─┤Service │
|
||||
└──────┘ ║ └┬───┤ (1) │back│ ║ │ (1) │
|
||||
║ │VIP├─┬──────┬┴────┘ ║ └────────┘
|
||||
║ └───┘ │ MGMT │ ║ ┌────────┐
|
||||
╓◄───────────────────║─────────┤ IP │ ║ │ Tenant │
|
||||
║ ┌─────────┬────┐ ║ └──────┘ ╟◄─┤Service │
|
||||
║ │ Distri- │ IP├►╢ ║ │ (2) │
|
||||
║ │ butor ├───┬┘ ║ ┌────┬─────────┬────┐ ║ └────────┘
|
||||
║ └─┬──────┬┤VIP│ ╟◄─┤ IP │ Amphora │ IP ├─►╢ ┌────────┐
|
||||
║ │ MGMT │└─┬─┘ ║ └┬───┤ (2) │back│ ║ │ Tenant │
|
||||
╟◄────┤ IP │ └arp►╢ │VIP├─┬──────┬┴────┘ ╟◄─┤Service │
|
||||
║ └──────┘ ║ └───┘ │ MGMT │ ║ │ (3) │
|
||||
╟◄───────────────────║─────────┤ IP │ ║ └────────┘
|
||||
║ ┌───────────────┐ ║ └──────┘ ║
|
||||
║ │ Octavia LBaaS │ ║ ••• ║ •
|
||||
╟◄─┤ Controller │ ║ ┌────┬─────────┬────┐ ║ •
|
||||
║ └┬─────────────┬┘ ╙◄─┤ IP │ Amphora │ IP ├─►╢
|
||||
║ │ Amphora │ └┬───┤ (k) │back│ ║ ┌────────┐
|
||||
║ │ Cluster Mgr.│ │VIP├─┬──────┬┴────┘ ║ │ Tenant │
|
||||
║ └─────────────┘ └───┘ │ MGMT │ ╙◄─┤Service │
|
||||
╟◄─────────────────────────────┤ IP │ │ (m) │
|
||||
║ └──────┘ └────────┘
|
||||
║
|
||||
Management Amphora Cluster Back-end Pool
|
||||
Network 1..k 1..m
|
||||
|
||||
* An example of high-level data-flow:
|
||||
|
||||
1. Internet clients access a tenant service through an externally visible
|
||||
floating-IP (IPv4 or IPv6).
|
||||
|
||||
2. If IPv4, a gateway router maps the floating IP into a loadbalancer's
|
||||
internal VIP on the tenant's front-end network.
|
||||
|
||||
3. The (multi-tenant) Distributor receives incoming requests to the
|
||||
loadbalancer's VIP. It acts as a one-legged direct return LB,
|
||||
answering ``arp`` requests for the loadbalancer's VIP (see Distributor
|
||||
spec.).
|
||||
|
||||
4. The Distributor distributes incoming connections over the tenant's
|
||||
Amphora Cluster, by forwarding each new connection opened with a
|
||||
loadbalancer's VIP to a front-end MAC address of an Amphora in the
|
||||
Amphora Cluster (layer-2 forwarding). *Note*: the Distributor may
|
||||
implement other forwarding schemes to support more complex routing
|
||||
mechanisms, such as DVR (see Distributor spec.).
|
||||
|
||||
5. An Amphora receives the connection and accepts traffic addressed to
|
||||
the loadbalancer's VIP. The front-end IPs of the Amphorae are
|
||||
allocated on the tenant's front-end network. Each Amphora accepts VIP
|
||||
traffic, but does not answer ``arp`` request for the VIP address.
|
||||
|
||||
6. The Amphora load-balances the incoming connections to the back-end
|
||||
pool of tenant servers, by forwarding each external request to a
|
||||
member on the tenant network. The Amphora also performs SSL
|
||||
termination if configured.
|
||||
|
||||
7. Outgoing traffic traverses from the back-end pool members, through
|
||||
the Amphora and directly to the gateway (i.e., not through the
|
||||
Distributor).
|
||||
|
||||
Multi-tenant Support
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* The following diagram illustrates the active-active topology with
|
||||
multiple tenants:
|
||||
|
||||
::
|
||||
|
||||
Front-End Back-End
|
||||
Internet Networks Networks
|
||||
(world) (tenant) (tenant)
|
||||
║ B A A
|
||||
║ floating IP ║ ║ ║ ┌────────┐
|
||||
┌─╨────┐ to LB VIP A ║ ║ ┌────┬─────────┬────┐ ║ │Tenant A│
|
||||
│Router├───────────────║─►╫◄─┤A IP│ Amphora │A IP├─►╫◄─┤Service │
|
||||
│ GW ├──────────────►╢ ║ └┬───┤ (1) │back│ ║ │ (1) │
|
||||
└──────┘ floating IP ║ ║ │VIP├─┬──────┬┴────┘ ║ └────────┘
|
||||
to LB VIP B ║ ║ └───┘ │ MGMT │ ║ ┌────────┐
|
||||
╓◄───────────────────║──║─────────┤ IP │ ║ │Tenant A│
|
||||
║ ║ ║ └──────┘ ╟◄─┤Service │
|
||||
M B A ┌────┬─────────┬────┐ ║ │ (2) │
|
||||
║ ║ ╟◄─┤A IP│ Amphora │A IP├─►╢ └────────┘
|
||||
║ ║ ║ └┬───┤ (2) │back│ ║ ┌────────┐
|
||||
║ ║ ║ │VIP├─┬──────┬┴────┘ ║ │Tenant A│
|
||||
║ ║ ║ └───┘ │ MGMT │ ╟◄─┤Service │
|
||||
╟◄───────────────────║──║─────────┤ IP │ ║ │ (3) │
|
||||
║ ║ ║ └──────┘ ║ └────────┘
|
||||
║ B A ••• B •
|
||||
║ ┌─────────┬────┐ ║ ║ ┌────┬─────────┬────┐ ║ •
|
||||
║ │ │IP A├─╢─►╫◄─┤A IP│ Amphora │A IP├─►╢ ┌────────┐
|
||||
║ │ ├───┬┘ ║ ║ └┬───┤ (k) │back│ ║ │Tenant A│
|
||||
║ │ Distri- │VIP├─arp►╜ │VIP├─┬──────┬┴────┘ ╙◄─┤Service │
|
||||
║ │ butor ├───┘ ║ └───┘ │ MGMT │ │ (m) │
|
||||
╟◄─ │ │ ─────║────────────┤ IP │ └────────┘
|
||||
║ │ ├────┐ ║ └──────┘
|
||||
║ │ │IP B├►╢ tenant A
|
||||
║ │ ├───┬┘ ║ = = = = = = = = = = = = = = = = = = = = =
|
||||
║ │ │VIP│ ║ ┌────┬─────────┬────┐ B tenant B
|
||||
║ └─┬──────┬┴─┬─┘ ╟◄────┤B IP│ Amphora │B IP├─►╢ ┌────────┐
|
||||
║ │ MGMT │ └arp►╢ └┬───┤ (1) │back│ ║ │Tenant B│
|
||||
╟◄────┤ IP │ ║ │VIP├─┬──────┬┴────┘ ╟◄─┤Service │
|
||||
║ └──────┘ ║ └───┘ │ MGMT │ ║ │ (1) │
|
||||
╟◄───────────────────║────────────┤ IP │ ║ └────────┘
|
||||
║ ┌───────────────┐ ║ └──────┘ ║
|
||||
M │ Octavia LBaaS │ B ••• B •
|
||||
╟◄─┤ Controller │ ║ ┌────┬─────────┬────┐ ║ •
|
||||
║ └┬─────────────┬┘ ╙◄────┤B IP│ Amphora │B IP├─►╢
|
||||
║ │ Amphora │ └┬───┤ (q) │back│ ║ ┌────────┐
|
||||
║ │ Cluster Mgr.│ │VIP├─┬──────┬┴────┘ ║ │Tenant B│
|
||||
║ └─────────────┘ └───┘ │ MGMT │ ╙◄─┤Service │
|
||||
╟◄────────────────────────────────┤ IP │ │ (r) │
|
||||
║ └──────┘ └────────┘
|
||||
║
|
||||
Management Amphora Clusters Back-end Pool
|
||||
Network A(1..k), B(1..q) A(1..m),B(1..r)
|
||||
|
||||
|
||||
* Both tenants A and B share the Distributor, but each has a different
|
||||
front-end network. The Distributor listens on both loadbalancers' VIPs
|
||||
and forwards to either A's or B's Amphorae.
|
||||
|
||||
* The Amphorae and the back-end (tenant) networks are not shared between
|
||||
tenants.
|
||||
|
||||
|
||||
Problem Details
|
||||
---------------
|
||||
|
||||
* Octavia should support different Distributor implementations, similar
|
||||
to its support for different Amphora types. The operator should be able
|
||||
to configure different types of algorithms for the Distributor. All
|
||||
algorithms should provide flow-affinity to allow TLS termination at the
|
||||
amphora. See Distributor spec. for details.
|
||||
|
||||
* Octavia controller shall seamlessly configure any newly created Amphora
|
||||
([P2]_ including peer state synchronization, such as sticky-tables, if
|
||||
needed) and shall reconfigure the other solution components (e.g.,
|
||||
Neutron) as needed. The controller shall further manage all Amphora
|
||||
life-cycle events.
|
||||
|
||||
* Since it is impractical at scale for peer state synchronization to occur
|
||||
between all Amphorae part of a single load balancer, Amphorae that are all
|
||||
part of a single load balancer configuration need to be divided into smaller
|
||||
peer groups (consisting of 2 or 3 Amphorae) with which they should
|
||||
synchronize state information.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
|
||||
Required changes
|
||||
----------------
|
||||
|
||||
The active-active loadbalancers require the following high-level changes:
|
||||
|
||||
|
||||
Amphora related changes
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* Updated Amphora image to support active-active topology. The front-end
|
||||
still has both a unique IP (to allow direct addressing on front-end
|
||||
network) and a VIP; however, it should not answer ARP requests for the
|
||||
VIP address (all Amphorae in a single Amphora Cluster concurrently serve
|
||||
the same VIP). Amphorae should continue to have a management IP on the LB
|
||||
Network so Octavia can configure them. Amphorae should also generally
|
||||
support hot-plugging interfaces into back-end tenant networks as they do
|
||||
in the current implementation. [P2]_ Finally, the Amphora configuration
|
||||
may need to be changed to randomize the member list, in order to prevent
|
||||
synchronized decisions by all Amphorae in the Amphora Cluster.
|
||||
|
||||
* Extend data model to support active-active Amphora. This is somewhat
|
||||
similar to active-passive (VRRP) support. Each Amphora needs to store its
|
||||
IP and port on it's front-end network (similar to ha_ip and ha_port_id
|
||||
in the current model) and its role should indicate it is in a cluster.
|
||||
|
||||
The provisioning status should be interpreted as referring to an Amphora
|
||||
only and not the load-balancing service. The status of the load balancer
|
||||
should correspond to the number of ``ONLINE`` Amphorae in the Cluster.
|
||||
If all Amphoae are ``ONLINE``, the load balancer is also ``ONLINE``. If a
|
||||
small number of Amphorae are not ``ONLINE``, then the load balancer is
|
||||
``DEGRADED``. If enough Amphorae are not ``ONLINE`` (past a threshold), then
|
||||
the load balancer is ``DOWN``.
|
||||
|
||||
* Rework some of the controller worker flows to support creation and
|
||||
deletion of Amphorae by the ACM in an asynchronous manner. The compute
|
||||
node may be created/deleted independently of the corresponding Amphora
|
||||
flow, triggered as events by the ACM logic (e.g., node update). The flows
|
||||
do not need much change (beyond those implied by the changes in the data
|
||||
model), since the post-creation/pre-deletion configuration of each
|
||||
Amphora is unchanged. This is also similar to the failure recovery flow,
|
||||
where a recovery flow is triggered asynchronously.
|
||||
|
||||
* Create a flow (or task) for the controller worker for (de-)registration
|
||||
of Amphorae with Distributor. The Distributor has to be aware of the
|
||||
current ``ONLINE`` Amphorae, to which it can forward traffic. [P2_] The
|
||||
Distributor can do very basic monitoring of the Amphorae health (primarily
|
||||
to make sure network connectivity between the Distributor and Amphorae is
|
||||
working). Monitoring pool member health will remain the purview of the
|
||||
pool health monitors.
|
||||
|
||||
* All the Amphorae in the Amphora Cluster shall replicate the same
|
||||
listeners, pools, and TLS configuration, as they do now. We assume all
|
||||
Amphorae in the Amphora Cluster can perform exactly the same
|
||||
load-balancing decisions and can be treated as equivalent by the
|
||||
Distributor (except for affinity considerations).
|
||||
|
||||
* Extend the Amphora (REST) API and/or *Plug VIP* task to allow disabling
|
||||
of ``arp`` on the VIP.
|
||||
|
||||
* In order to prevent losing session_persistence data in the event of an
|
||||
Amphora failure, the Amphorae will need to be configured to share
|
||||
session_persistence data (via stick tables) with a subset of other
|
||||
Amphorae that are part of the same load balancer configuration (ie. a
|
||||
peer group).
|
||||
|
||||
Amphora Cluster Manager driver for the active-active topology (*new*)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* Add an active-active topology to the topology types.
|
||||
|
||||
* Add a new driver to support creation/deletion of an Amphora Cluster via
|
||||
an ACM. This will re-use existing controller-worker flows as much as
|
||||
possible. The reference ACM will call the existing drivers to create
|
||||
compute nodes for the Amphorae and configure them.
|
||||
|
||||
* The ACM shall orchestrate creation and deletion of Amphora instances to
|
||||
meet the availability requirements. Amphora failover will utilize the
|
||||
existing health monitor flows, with hooks to notify the ACM when
|
||||
ACTIVE-ACTIVE topology is used. [P2]_ ACM shall handle graceful amphora
|
||||
removal via draining (delay actual removal until existing connections are
|
||||
terminated or some timeout has passed).
|
||||
|
||||
* Change the flow of LB creation. The ACM driver shall create an Amphora
|
||||
Cluster instance for each new loadbalancer. It should maintain the
|
||||
desired number of Amphorae in the Cluster and meet the
|
||||
high-availability configuration given by the operator. *Note*: a base
|
||||
functionality is already supported by the Health Manager; it may be
|
||||
enough to support a fixed or dynamic cluster size. In any case, existing
|
||||
flows to manage Amphora life cycle will be re-used in the reference ACM
|
||||
driver.
|
||||
|
||||
* The ACM shall be responsible for providing health, performance, and
|
||||
life-cycle management at the Cluster-level rather than at Amphora-level.
|
||||
Maintaining the loadbalancer status (as described above) by some function
|
||||
of the collective status of all Amphorae in the Cluster is one example.
|
||||
Other examples include tracking configuration changes, providing Cluster
|
||||
statistics, monitoring and maintaining compute nodes for the Cluster,
|
||||
etc. The ACM abstraction would also support pluggable ACM implementations
|
||||
that may provide more advance capabilities (e.g., elasticity, AZ aware
|
||||
availability, etc.). The reference ACM driver will re-use existing
|
||||
components and/or code which currently handle health, life-cycle, etc.
|
||||
management for other load balancer topologies.
|
||||
|
||||
* New data model for an Amphora Cluster which has a one-to-one mapping with
|
||||
the loadbalancer. This defines the common properties of the Amphora
|
||||
Cluster (e.g., id, min. size, desired size, etc.) and additional
|
||||
properties for the specific implementation.
|
||||
|
||||
* Add configuration file options to support configuration of an
|
||||
active-active Amphora Cluster. Add default configuration. [P2]_ Add
|
||||
Operator API.
|
||||
|
||||
* Add or update documentation for new components added and new or changed
|
||||
functionality.
|
||||
|
||||
* Communication between the ACM and Distributors should be secured using
|
||||
two-way SSL certificate authentication much the same way this is accomplished
|
||||
between other Octavia controller components and Amphorae today.
|
||||
|
||||
Network driver changes
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* Support the creation, connection, and configuration of the various
|
||||
networks and interfaces as described in ‘high-level topology' diagram.
|
||||
|
||||
* Adding a new loadbalancer requires attaching the Distributor to the
|
||||
loadbalancer's front-end network, adding a VIP port to the Distributor,
|
||||
and configuring the Distributor to answer ``arp`` requests for the VIP.
|
||||
The Distributor shall have a separate interface for each loadbalancer and
|
||||
shall not allow any routing between different ports; in particular,
|
||||
Amphorae of different tenants must not be able to communicate with each
|
||||
other. In the reference implementation, this will be accomplished by using
|
||||
separate OVS bridges per load balancer.
|
||||
|
||||
* Adding a new Amphora requires attaching it to the front-end and back-end
|
||||
networks (similar to current implementation), adding the VIP (but with
|
||||
``arp`` disabled), and registering the Amphora with the Distributor. The
|
||||
tenant's front-end and back-end networks must allow attachment of
|
||||
dynamically created Amphorae by involving the ACM (e.g., when the health
|
||||
monitor replaces a failed Amphora). ([P2]_ extend the LBaaS API to allow
|
||||
specifying an address range for new Amphorae usage, e.g., a subnet pool).
|
||||
|
||||
|
||||
Amphora health-monitoring support
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* Modify Health Manager to manage the health for an Amphora Cluster through
|
||||
the ACM; namely, forward Amphora health change events to the ACM, so it
|
||||
can decide when the Amphora Cluster is considered to be in healthy state.
|
||||
This should be done in addition to managing the health of each Amphora.
|
||||
[P2]_ Monitor the Amphorae also on their front-end network (i.e., from
|
||||
the Distributor).
|
||||
|
||||
|
||||
Distributor support
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* **Note:** as mentioned above, the detailed design of the Distributor
|
||||
component is described in a separate document). Some design
|
||||
considerations are highlighted below.
|
||||
|
||||
* The Distributor should be supported similarly to an Amphora; namely, have
|
||||
its own abstract driver.
|
||||
|
||||
* For a reference implementation, add support for a Distributor image.
|
||||
|
||||
* Define a REST API for Distributor configuration (no SSH API). The API
|
||||
shall support:
|
||||
|
||||
- Add and remove a VIP (loadbalancer) and specify distribution parameters
|
||||
(e.g., affinity, algorithm, etc.).
|
||||
|
||||
- Registration and de-registration of Amphorae.
|
||||
|
||||
- Status
|
||||
|
||||
- [P2]_ Macro-level stats
|
||||
|
||||
* Spawn Distributors (if using on demand Distributor compute nodes) and/or
|
||||
attach to existing ones as needed. Manage health and life-cycle of the
|
||||
Distributor(s). Create, connect, and configure Distributor networks as
|
||||
necessary.
|
||||
|
||||
* Create data model for the Distributor.
|
||||
|
||||
* Add Distributor driver and flows to (re-)configure the Distributor on
|
||||
creation/destruction of a new loadbalancer (add/remove loadbalancer VIP)
|
||||
and [P2]_ configure the distribution algorithm for the loadbalancer's
|
||||
Amphora Cluster.
|
||||
|
||||
* Add flows to Octavia to (re-)configure the Distributor on adding/removing
|
||||
Amphorae from the Amphora Cluster.
|
||||
|
||||
|
||||
Packaging
|
||||
~~~~~~~~~
|
||||
|
||||
* Extend Octavia installation scripts to create an image for the Distributor.
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
* Use external services to manage the cluster directly.
|
||||
This utilizes functionality that already exists in OpenStack (e.g.,
|
||||
like Heat and Ceilometer) rather than replicating it. This approach
|
||||
would also benefit from future extensions to these services. On the
|
||||
other hand, this adds undesirable dependencies on other projects (and
|
||||
their corresponding teams), complicates handling of failures, and
|
||||
require defensive coding around service calls. Furthermore, these
|
||||
services cannot handle the LB-specific control configuration.
|
||||
|
||||
* Implement a nested Octavia
|
||||
Use another layer of Octavia to distribute traffic across the Amphora
|
||||
Cluster (i.e., the Amphorae in the Cluster are back-end members of
|
||||
another Octavia instance). This approach has the potential to provide
|
||||
greater flexibility (e.g., provide NAT and/or more complex distribution
|
||||
algorithms). It also potentially reuses existing code. However, we do
|
||||
not want the Distributor to proxy connections so HA-Proxy cannot be
|
||||
used. Furthermore, this approach might significantly increase the
|
||||
overhead of the solution.
|
||||
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
* loadbalancer table
|
||||
|
||||
- `cluster_id`: associated Amphora Cluster (no changes to table, 1-1
|
||||
relationship from Cluster data-model)
|
||||
|
||||
* lb_topology table
|
||||
|
||||
- new value: ``ACTIVE_ACTIVE``
|
||||
|
||||
* amphora_role table
|
||||
|
||||
- new value: ``IN_CLUSTER``
|
||||
|
||||
* Distributor table (*new*): Distributor information, similar to Amphora.
|
||||
See Distributor spec.
|
||||
|
||||
* Cluster table (*new*): an extension to loadbalancer (i.e., one-to-one
|
||||
mapping to load-balancer)
|
||||
|
||||
- `id` (primary key)
|
||||
|
||||
- `cluster_name`: identifier of Cluster instance for Amphora Cluster
|
||||
Manager
|
||||
|
||||
- `desired_size`: required number of Amphorae in Cluster. Octavia will
|
||||
create this many active-active Amphorae in the Amphora Cluster.
|
||||
|
||||
- `min_size`: number of ``ACTIVE`` Amphorae in Cluster must be above this
|
||||
number for Amphora Cluster status to be ``ACTIVE``
|
||||
|
||||
- `cooldown`: cooldown period between successive add/remove Amphora
|
||||
operations (to avoid thrashing)
|
||||
|
||||
- `load_balancer_id`: 1:1 relationship to loadbalancer
|
||||
|
||||
- `distributor_id`: N:1 relationship to Distributor. Support multiple
|
||||
Distributors
|
||||
|
||||
- `provisioning_status`
|
||||
|
||||
- `operating_status`
|
||||
|
||||
- `enabled`
|
||||
|
||||
- `cluster_type`: type of Amphora Cluster implementation
|
||||
|
||||
|
||||
REST API Impact
|
||||
---------------
|
||||
|
||||
* Distributor REST API -- This is a new internal API that will be secured
|
||||
via two-way SSL certificate authentication. See Distributor spec.
|
||||
|
||||
* Amphora REST API -- support configuration of disabling ``arp`` on VIP.
|
||||
|
||||
* [P2]_ LBaaS API -- support configuration of desired availability, perhaps
|
||||
by selecting a flavor (e.g., gold is a minimum of 4 Amphorae, platinum is
|
||||
a minimum of 10 Amphora).
|
||||
|
||||
* Operator API --
|
||||
|
||||
- Topology to use
|
||||
|
||||
- Cluster type
|
||||
|
||||
- Default availability parameters for the Amphora Cluster
|
||||
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
* See the Distributor spec. for Distributor related security impact.
|
||||
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None.
|
||||
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
ACTIVE-ACTIVE should be able to deliver significantly higher performance than
|
||||
SINGLE or ACTIVE-STANDBY topology. It will consume more resources to deliver
|
||||
this higher performance.
|
||||
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
The reference ACM becomes a new process that is part of the Octavia control
|
||||
components (like the controller worker, health monitor and housekeeper). If
|
||||
the reference implementation is used, a new Distributor image will need to be
|
||||
created and stored in glance much the same way the Amphora image is created
|
||||
and stored today.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
None.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
@TODO
|
||||
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
@TODO
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
@TODO
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
* Unit tests with tox.
|
||||
* Function tests with tox.
|
||||
* Scenario tests.
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
Need to document all new APIs and API changes, new ACTIVE-ACTIVE topology
|
||||
design and features, and new instructions for operators seeking to deploy
|
||||
Octavia with ACTIVE-ACTIVE topology.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] https://blueprints.launchpad.net/octavia/+spec/base-image
|
||||
.. [2] https://blueprints.launchpad.net/octavia/+spec/controller-worker
|
||||
.. [3] https://blueprints.launchpad.net/octavia/+spec/amphora-driver-interface
|
||||
.. [4] https://blueprints.launchpad.net/octavia/+spec/controller
|
||||
.. [5] https://blueprints.launchpad.net/octavia/+spec/operator-api
|
||||
.. [6] doc/main/api/haproxy-amphora-api.rst
|
||||
.. [7] https://blueprints.launchpad.net/octavia/+spec/active-active-topology
|
Loading…
Reference in New Issue