Document RBD retype fix
This is really a bug and not a feature, but I'm using this spec as a place to record the relevant information. Change-Id: Ifa4249912691dafe21aed2031b472553358b9298
This commit is contained in:
parent
25e2145f0b
commit
2f56d4b2a7
|
@ -0,0 +1,136 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================================
|
||||
RBD retype fix
|
||||
==========================================
|
||||
|
||||
https://bugs.launchpad.net/cinder/+bug/2019190
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
RBD retype of an in-use volume leads to data loss/corruption.
|
||||
|
||||
|
||||
Reference: https://bugs.launchpad.net/cinder/+bug/2019190
|
||||
|
||||
|
||||
Cinder commit
|
||||
5edc77a18 Driver assisted migration on retype when it's safe
|
||||
https://review.opendev.org/c/openstack/cinder/+/739548
|
||||
|
||||
enabled RBD retype migration to use the optimized path when it
|
||||
did not previously.
|
||||
|
||||
This revealed an underlying flaw in the Cinder+Nova migration code --
|
||||
Cinder's _migrate_volume_generic() calls compute.nova_api.update_server_volume
|
||||
to tell Nova that the location of a volume has changed when migration
|
||||
is done. (This call results in a swap_volume in Nova.)
|
||||
|
||||
The optimized path does not do this at all, which means that migration
|
||||
of the volume occurs, but Nova continues using the old location of
|
||||
the volume. This means that the instance appears to work fine,
|
||||
until it is rebooted or another operation occurs that will re-attach the
|
||||
volume. At this point, the volume in the new location is attached,
|
||||
but it has old data in it, because all data since migration was written
|
||||
to the old volume in the wrong location.
|
||||
|
||||
|
||||
Use Cases
|
||||
=========
|
||||
|
||||
What use cases does this address? What impact on actors does this change have?
|
||||
Ensure you're clear about the actors in each use case: Developer, end user,
|
||||
deployer, etc.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
This can be mitigated by reverting
|
||||
https://review.opendev.org/c/openstack/cinder/+/739548
|
||||
in the short term.
|
||||
|
||||
Longer term, Cinder should be fixed to appropriately call
|
||||
update_server_volume() in the optimized migration path.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
|
||||
Active/Active HA impact
|
||||
-----------------------
|
||||
|
||||
|
||||
Notifications impact
|
||||
--------------------
|
||||
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
eharney
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
Test scenario described in https://bugs.launchpad.net/cinder/+bug/2019190
|
||||
|
||||
- Attach volume
|
||||
- Write data (A) to volume
|
||||
- Retype migrate volume
|
||||
- Write newer data (B) to volume
|
||||
- Reboot instance
|
||||
- Attach volume to instance
|
||||
- Observe that data B is not there
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
Loading…
Reference in New Issue