Merge "Mechanism to Prevent Race Conditions Spec"

This commit is contained in:
Jenkins 2016-12-15 21:06:30 +00:00 committed by Gerrit Code Review
commit 3c764415e0
2 changed files with 291 additions and 0 deletions

View File

@ -23,6 +23,7 @@ sys.path.insert(0, os.path.abspath('../..'))
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.graphviz',
'oslosphinx',
'yasfb',
]

View File

@ -0,0 +1,290 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
====================================
Mechanism to Prevent Race Conditions
====================================
https://blueprints.launchpad.net/manila/+spec/eliminate-race-conditions
This proposal is to develop a general solution for preventing race
conditions which will work across services and in deployments where
there are multiple copies of the same services (commonly known as
Active/Active HA deployments).
The focus is on keeping all state in the database, and protecting changes to
database state using briefly-held locks. Also concurrent operations which
are mutually exclusive should fail as early as possible with a helpful error
code to simplify the retry logic of upper layers.
Problem description
===================
Certain operations in Manila should not be allowed to proceed in parallel,
because the result of one operation would prevent the other operation from
completing successfully.
For example, taking a snapshot of a share cannot happen simultaneously while
deleting that share. Either the snapshot must occur first, which prevents the
delete -- or the delete must occur first, which prevents the snapshot.
Unfortunately, not enough state is stored in the database to prevent these
operations from racing with each other, so in practice two API calls can both
proceed through the API service to the share manager, where eventually an
error will occur and one or both operations will fail mysteriously.
There are multiple scenarios like the above where undefined behavior results.
This specification does not attempt to enumerate all of them because the goal
is to describe a mechanism for fixing these kinds of issues rather than
explicitly fixing all such issues. Generally speaking, race conditions should
be treated as bugs, but up until now Manila has lacked the tools to fix these
bugs reliably.
Use cases
=========
Specific cases:
* Two snapshot operations should not be able to occur at the same time. One
must complete before the second can begin. This ensures that snapshots occur
in a known order, and prevents the useless situation of having 2 identical
snapshots.
* Taking a snapshot of a share should prevent a delete of that share.
* Valid changes to access rules should always be accepted regardless of the
state of the existing access rules. Although rules are applied to the backend
asynchronously, it's valid to add multiple rules faster than the system can
apply them and expect Manila to catch up.
General cases:
* These guarantees must be enforceable using a database running in a clustered
configuration such as Galera. This prevents obvious solutions such as relying
on DB row-level locking.
* These guarantees must be enforceable while running multiple copies of Manila
services, including multiple API, scheduler, and share manager services.
This prevents obvious solutions like in-process locks.
* These guarantees must be enforceable while running in a distributed
configuration where cooperating services are on different nodes (physical,
VMs, or containers). This prevents our existing approach of using file
locks. Even though network-based file locking solutions exist, they represent
a single point of failure and are unacceptable in properly distributed
environments.
* The Manila services should be able to automatically and gracefully recover
from crashes and other unplanned downtime. This means that implicit state
should be avoided and because long-held locks are implicit state they should
be avoided in favor of short-held locks with explicit state.
Proposed change
===============
More transitional states will be added so that operations which can conflict
with other operations can be explicitly detected by looking at the share
state. This spec only proposes one specific new state to address the races
involving snapshots, but more generally provides a framework for resolving
similar races as they are discovered.
Transitions between states will always be done while holding a distributed
lock -- a lock implemented by a distributed lock manager (DLM). Use of
distributed locks ensures that all services see the same locking state even if
services run on different nodes, and even across transient failures such as
node failures and network partitions. The lock will be held only for the
duration of the database test-and-set operation to minimize lock contention.
No locks will be held during calls from the share manager to the share driver.
Mutual exclusion between driver calls will be achieved with state checks.
No locks will be held during RPC calls or casts.
Alternatives
------------
The approach used by Cinder which relies on elaborate SQL calls to
compare-and-swap fields was considered but rejected for the following reasons:
* The code in Cinder can't be shared with Manila because it relies on OVO (Oslo
Versioned Objects)
* Not enough people understand how it works so it's likely to be hard to
maintain.
* Cinder's compare-and-swap approach limits the kind of state changes you can
make because updating multiple tables atomically is impossible. Locks
don't suffer from this restriction.
Data model impact
-----------------
New states will be added:
* Snapshotting
* States for access rules covered in `Access rules spec`_
.. graphviz::
digraph share_states {
label="Share States"
// Transitional States
creating[shape=hexagon];
manage_starting[shape=hexagon];
deleting[shape=hexagon];
snapshotting[shape=hexagon,color=gold4, fontcolor=gold4];
migrating[shape=hexagon];
shrinking[shape=hexagon];
extending[shape=hexagon];
unmanage_starting[shape=hexagon];
replication_change[shape=hexagon];
// Error states
error[color=red4, fontcolor=red4];
shrinking_error[color=red4, fontcolor=red4];
shrinking_possible_data_loss_error[color=red4, fontcolor=red4];
extending_error[color=red4, fontcolor=red4];
unmanage_error[color=red4, fontcolor=red4];
manage_error[color=red4, fontcolor=red4];
error_deleting[color=red4, fontcolor=red4];
// Other states
new[color=blue, fontcolor=blue];
available[color=darkgreen, fontcolor=darkgreen];
deleted[shape=box, color=navy, fontcolor=navy];
unmanaged[shape=box, color=navy, fontcolor=navy];
// User requested transitions
new -> creating[label="create"];
new -> manage_starting[label="manage"];
available -> deleting[label="delete"];
available -> snapshotting[label="create snapshot", color=gold4, fontcolor=gold4];
available -> migrating[label="migrate"];
available -> shrinking[label="shrink"];
available -> extending[label="extend"];
available -> unmanage_starting[label="unmanage"];
available -> replication_change[label="add replica"];
// Automatic transitions
creating -> available[label="success", color=darkgreen, fontcolor=darkgreen];
deleting -> deleted[label="success", color=darkgreen, fontcolor=darkgreen];
snapshotting -> available[label="success", color=darkgreen, fontcolor=darkgreen];
manage_starting -> available[label="success", color=darkgreen, fontcolor=darkgreen];
unmanage_starting -> unmanaged[label="success", color=darkgreen, fontcolor=darkgreen];
extending -> available[label="success", color=darkgreen, fontcolor=darkgreen];
shrinking -> available[label="success", color=darkgreen, fontcolor=darkgreen];
replication_change -> available[label="success", color=darkgreen, fontcolor=darkgreen];
// Reset transitions
error -> available[label="reset"];
shrinking_error -> available[label="reset"];
extending_error -> available[label="reset"];
unmanage_error -> available[label="reset"];
manage_error -> available[label="reset"];
error_deleting -> available[label="reset"];
// Error transitions
creating -> error[label="fail", color=red4, fontcolor=red4];
migrating -> error[label="fail", color=red4, fontcolor=red4];
shrinking -> shrinking_error[label="fail", color=red4, fontcolor=red4];
shrinking -> shrinking_possible_data_loss_error[label="fail", color=red4, fontcolor=red4];
extending -> extending_error[label="fail", color=red4, fontcolor=red4];
unmanage_starting -> unmanage_error[label="fail", color=red4, fontcolor=red4];
manage_starting -> manage_error[label="fail", color=red4, fontcolor=red4];
snapshotting -> error[label="fail", color=red4, fontcolor=red4];
deleting -> error_deleting[label="fail", color=red4, fontcolor=red4];
}
REST API impact
---------------
New states will be visible through any API that shows states. Also new
error conditions will become possible as we detect races earlier and
report them directly.
The behavioral changes related to locking will not be microversioned, as it
won't be possible or desirable to emulate the old behavior once the changes
are implemented. However in cases where new states are added, those changes
will be microversioned so that clients which depend on the new states can
detect that the server supports them.
Driver impact
-------------
None
Security impact
---------------
None
Notifications impact
--------------------
None
Other end user impact
---------------------
None
Performance Impact
------------------
Distributed locking is expected to moderately slow down state changes. Also
adding more state changes will slow down operations that require them.
Other deployer impact
---------------------
Requirement to deploy and configure suitable `Tooz`_ backend. Since Manila will
depend on tooz for correctness, tooz backends that fail to meet the API
contract won't be suitable.
Developer impact
----------------
This will be significant. Developers will need to follow the new model for
all features that involve state changes. Also care will be needed with locks
to avoid deadlock situations. Holding locks for very limited time will help
avoid deadlocks but in case 2 locks are ever held at the same time, they need
to be deadlock-proofed by establishing a lock order.
Implementation
==============
Assignee(s)
-----------
bswartz
Work Items
----------
* Add snapshotting state
* Complete tooz integration
* Wrap state changes with tooz locks
Dependencies
============
Tooz
Testing
=======
Existing tests will help ensure no regressions, but to detect race
conditions we need rally tests or similarly high-concurrency tests.
Documentation Impact
====================
Admin guide - need to document tooz requirements.
Developer reference - need to document state machines and locking protocol
References
==========
_`Access rules spec`: https://review.openstack.org/#/c/399049/
_`Tooz` integration: https://review.openstack.org/#/c/318336/