Document RBD retype fix

This is really a bug and not a feature, but
I'm using this spec as a place to record the
relevant information.

Change-Id: Ifa4249912691dafe21aed2031b472553358b9298
This commit is contained in:
Eric Harney 2023-10-20 13:35:09 -04:00
parent 25e2145f0b
commit 2f56d4b2a7
1 changed files with 136 additions and 0 deletions

View File

@ -0,0 +1,136 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
==========================================
RBD retype fix
==========================================
https://bugs.launchpad.net/cinder/+bug/2019190
Problem description
===================
RBD retype of an in-use volume leads to data loss/corruption.
Reference: https://bugs.launchpad.net/cinder/+bug/2019190
Cinder commit
5edc77a18 Driver assisted migration on retype when it's safe
https://review.opendev.org/c/openstack/cinder/+/739548
enabled RBD retype migration to use the optimized path when it
did not previously.
This revealed an underlying flaw in the Cinder+Nova migration code --
Cinder's _migrate_volume_generic() calls compute.nova_api.update_server_volume
to tell Nova that the location of a volume has changed when migration
is done. (This call results in a swap_volume in Nova.)
The optimized path does not do this at all, which means that migration
of the volume occurs, but Nova continues using the old location of
the volume. This means that the instance appears to work fine,
until it is rebooted or another operation occurs that will re-attach the
volume. At this point, the volume in the new location is attached,
but it has old data in it, because all data since migration was written
to the old volume in the wrong location.
Use Cases
=========
What use cases does this address? What impact on actors does this change have?
Ensure you're clear about the actors in each use case: Developer, end user,
deployer, etc.
Proposed change
===============
This can be mitigated by reverting
https://review.opendev.org/c/openstack/cinder/+/739548
in the short term.
Longer term, Cinder should be fixed to appropriately call
update_server_volume() in the optimized migration path.
Alternatives
------------
Data model impact
-----------------
REST API impact
---------------
Security impact
---------------
Active/Active HA impact
-----------------------
Notifications impact
--------------------
Other end user impact
---------------------
Performance Impact
------------------
Other deployer impact
---------------------
Developer impact
----------------
Implementation
==============
Assignee(s)
-----------
Primary assignee:
eharney
Work Items
----------
Dependencies
============
Testing
=======
Test scenario described in https://bugs.launchpad.net/cinder/+bug/2019190
- Attach volume
- Write data (A) to volume
- Retype migrate volume
- Write newer data (B) to volume
- Reboot instance
- Attach volume to instance
- Observe that data B is not there
Documentation Impact
====================
References
==========