Add a spec for basic high availability
This basic design replicates the entire congress server and uses the database for shared state. Change-Id: I28cf36c1874d91e5541afc34f55eb4dbdf96c8b8
This commit is contained in:
parent
36fc55675b
commit
2e63f71ec0
|
@ -0,0 +1,188 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
===============================
|
||||
Support basic high availability
|
||||
===============================
|
||||
|
||||
Include the URL of your launchpad blueprint:
|
||||
|
||||
https://blueprints.launchpad.net/congress/+spec/basic-high-availability
|
||||
|
||||
Congress needs to support high availability (HA) for API requests so that
|
||||
a client can continue to make successful API requests to Congress even if a
|
||||
congress server becomes unavailable. This proposal describes a basic HA
|
||||
solution that replicates the entire Congress server as-is. Each replica runs
|
||||
the policy engine, contains all the table data, and runs the datasource
|
||||
drivers.
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Today, Congress runs as a single standalone server. That single server is
|
||||
responsible for handling all API queries and is a single point of failure. If
|
||||
the server fails, it will cause downtime for clients that integrate Congress.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
This spec proposes to
|
||||
* Replicate the entire Congress server
|
||||
* Use an off the shelf load balancer to distribute requests to the replicas
|
||||
and avoid failed replicas
|
||||
* Write API calls modify the database
|
||||
* Each replica periodically checks the database for changes to policy or
|
||||
datasources
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
A more advanced design separates the policy engine from the datasources, and
|
||||
replicates the policy engine N times, but uses a master-standby configuration
|
||||
for the datasource driver. This way only the master datasource driver talks
|
||||
to datasources thus reducing the load on datasources. The datasource
|
||||
communicates incoming data changes to the replica over a message bus. This
|
||||
change would require more code changes to separate the engine from the
|
||||
datasources and to change how the messaging bus works.
|
||||
|
||||
Yet another proposal is to funnel all datasource updates to a central machine
|
||||
which would precompute materialized views of all tables. This has the
|
||||
advantage of giving the replicated Congress API consistency, but it relies on
|
||||
a single machine to compute the materialized views, and relies on
|
||||
materializing the views which stores all intermediate table content in memory,
|
||||
which can consume an unmanagable amount of memory with many intermediate
|
||||
tables. This alternate proposal would also require a significant amount of
|
||||
code changes.
|
||||
|
||||
|
||||
Policy
|
||||
------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Policy actions
|
||||
--------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Data sources
|
||||
------------
|
||||
|
||||
We need to ensure that each data service (such as Nova or Neutron) can accept
|
||||
and handle requests from more than one datasource driver instance at the same
|
||||
time since each replica will be fetching data from each data service. In
|
||||
other words, if there are N replicas, then each data service must respond with
|
||||
all the data separate N times, and the data service must be able to cope with
|
||||
that higher request load.
|
||||
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
Two API calls may return different data if a different replica serves each API
|
||||
call because both the data from the datasources and the policy rules may be
|
||||
out of sync between two replicas. The rate at which each replica checks the
|
||||
database for updates can limit the problem for policy rules, but data skew
|
||||
will still affect the replicas.
|
||||
|
||||
|
||||
Performance impact
|
||||
------------------
|
||||
|
||||
This change should improve throughput for the Congress server since there can
|
||||
be multiple replicas instead of a single server. However, there may be an
|
||||
impact on datasources since each replica will be requesting data from the
|
||||
datasource. The period database requests to check for updates should have a
|
||||
minimal impact on performance.
|
||||
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None
|
||||
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
All shared state must be stored in the database, and periodically checked at
|
||||
all replicas.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
ayip@vmware.com
|
||||
|
||||
|
||||
Work items
|
||||
----------
|
||||
|
||||
* Add period check for database updates and add a test
|
||||
* Add a test that starts two replicas and queries both
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
None
|
||||
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Start two replicas, using the same database. Write a policy change on one
|
||||
replica and check that the policy change occurs on the second replica.
|
||||
|
||||
Start two replicas, and kill one. Make sure the second replica can still
|
||||
serve requests. Start first replica again and make sure it can still serve
|
||||
requests.
|
||||
|
||||
|
||||
Documentation impact
|
||||
====================
|
||||
|
||||
We should add a description of how to configure Congress in HA mode, with a
|
||||
load balancer and a shared database.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
None
|
Loading…
Reference in New Issue