When rebuilding a volume backed instance, while copying the new
image to the existing volume, we preserve sparseness.
This could be problematic since we don't write the zero blocks of
the new image and the data in the old image can still persist
leading to a data leak scenario.
To prevent this, we are using `-S 0`[1][2] option with the `qemu-img convert`
command to write all the zero bytes into the volume.
In the testing done, this doesn't seem to be a problem with known 'raw'
images but good to handle the case anyway.
Following is the testing performed with 3 images:
1. CIRROS QCOW2 to RAW
======================
Volume size: 1 GiB
Image size (raw): 112 MiB
CREATE VOLUME FROM IMAGE (without -S 0)
LVS (10.94% allocated)
volume-91ea43ef-684c-402f-896e-63e45e5f4fff stack-volumes-lvmdriver-1 Vwi-a-tz-- 1.00g stack-volumes-lvmdriver-1-pool 10.94
REBUILD (with -S 0)
LVS (10.94% allocated)
volume-91ea43ef-684c-402f-896e-63e45e5f4fff stack-volumes-lvmdriver-1 Vwi-aotz-- 1.00g stack-volumes-lvmdriver-1-pool 10.94
Conclusion:
Same space is consumed on the disk with and without preserving sparseness.
2. DEBIAN QCOW2 to RAW
======================
Volume size: 3 GiB
Image size (raw): 2 GiB
CREATE VOLUME FROM IMAGE (without -S 0)
LVS (66.67% allocated)
volume-edc42b6a-df5d-420e-85d3-b3e52bcb735e stack-volumes-lvmdriver-1 Vwi-a-tz-- 3.00g stack-volumes-lvmdriver-1-pool 66.67
REBUILD (with -S 0)
LVS (66.67% allocated)
volume-edc42b6a-df5d-420e-85d3-b3e52bcb735e stack-volumes-lvmdriver-1 Vwi-aotz-- 3.00g stack-volumes-lvmdriver-1-pool 66.67
Conclusion:
Same space is consumed on the disk with and without preserving sparseness.
3. FEDORA QCOW2 TO RAW
======================
CREATE VOLUME FROM IMAGE (without -S 0)
Volume size: 6 GiB
Image size (raw): 5 GiB
LVS (83.33% allocated)
volume-efa1a227-a30d-4385-867a-db22a3e80ad7 stack-volumes-lvmdriver-1 Vwi-a-tz-- 6.00g stack-volumes-lvmdriver-1-pool 83.33
REBUILD (with -S 0)
LVS (83.33% allocated)
volume-efa1a227-a30d-4385-867a-db22a3e80ad7 stack-volumes-lvmdriver-1 Vwi-aotz-- 6.00g stack-volumes-lvmdriver-1-pool 83.33
Conclusion:
Same space is consumed on the disk with and without preserving sparseness.
Another testing was done to check if the `-S 0` option actually
works in OpenStack setup.
Note that we are converting qcow2 to qcow2 image which won't
happen in a real world deployment and only for test purposes.
DEBIAN QCOW2 TO QCOW2
=====================
CREATE VOLUME FROM IMAGE (without -S 0)
LVS (52.61% allocated)
volume-de581f84-e722-4f4a-94fb-10f767069f50 stack-volumes-lvmdriver-1 Vwi-a-tz-- 3.00g stack-volumes-lvmdriver-1-pool 52.61
REBUILD (with -S 0)
LVS (66.68% allocated)
volume-de581f84-e722-4f4a-94fb-10f767069f50 stack-volumes-lvmdriver-1 Vwi-aotz-- 3.00g stack-volumes-lvmdriver-1-pool 66.68
Conclusion:
We can see that the space allocation increased hence we are not preserving sparseness when using the -S 0 option.
[1] https://qemu-project.gitlab.io/qemu/tools/qemu-img.html#cmdoption-qemu-img-common-opts-S
[2] abf635ddfe/qemu-img.c (L182-L186)
Closes-Bug: #2045431
Change-Id: I5be7eaba68a5b8e1c43f0d95486b5c79c14e1b95
We decided that H301 makes no sense for the "typing"
module, just set that in tox.ini instead of every
time it is used.
Change-Id: Id983fb0a9feef2311bf4b2e6fd70386ab60e974a
This patch allows delete_volume and delete_snapshot calls
to fail less often when using RBD volume clones and snapshots.
RBD clone v2 support allows remove() to pass in situations
where it would previously fail, but it still fails with an
ImageBusy error in some situations. For example:
volume1
-> snapshot s1 of volume 1
-> volume2 cloned from snapshot 1
Deleting snapshot s1 would fail with ImageBusy.
This is fixed by using RBD flatten operations to break
dependencies between volumes/snapshots and other RBD volumes
or snapshots.
Delete now works as follows:
1. Attempt RBD remove()
This is the "fast path" for removing a simple volume
that involves no extra overhead.
2. If busy and the volume has child dependencies,
flatten those dependencies with RBD flatten()
3. Attempt RBD remove() again
This will succeed in more cases than (1) would have.
4. If remove() failed, use trash_move() instead to move
the image to the trash instead.
This allows Cinder deletion of a volume (volume1) to proceed
in the scenario where volume2 was cloned from snapshot s1 of
volume1, and snapshot s1 has been trashed and not fully
deleted from the RBD backend. (Snapshots in the trash
namespace are no longer visible but are still in the
dependency chain.)
This allows Cinder deletions to succeed in most scenarios where
they would previously fail.
In cases where a .clone_snap snapshot is present, we still do a
rename to .deleted instead of deleting/trashing the volume. This
should be worked on further in a follow-up as it is likely not
necessary most of the time.
A new configuration option, rbd_concurrent_flatten_operations, was
introduced to limit how many flatten calls can be made at the same time.
This is to prevent overloading the backend. The default value is 3.
Co-Authored-By: Eric Harney <eharney@redhat.com>
Co-Authored-By: Sofia Enriquez <lsofia.enriquez@gmail.com>
Closes-Bug: #1969643
Change-Id: I009d0748fdc829ca0b4f99bc9b70dadd19717d04
librbd errors when update_features is called w/
features = 0 -- when this situation would occur,
skip calling update_features.
Closes-Bug: #1997980
Change-Id: Iab6a990ce7dee2c13deb4f46aeec0f46ffb7cd62
Currently the `rbd_secret_uuid` configuration option has no default
value, but at the same time the RBD driver doesn't complain if it's
missing, and will only fail on the Nova side if there is no
`rbd_secret_uuid` configured in the `[LIBVIRT]` section of `nova.conf`.
Using Cinder's `rbd_secret_uuid` has been the preferred way since the
Ocata release.
Most deployments set the `rbd_secret_uuid` value in libvirt's secret to
the Ceph cluster FSID value, and that's what this patch does when it is
not defined.
It doesn't change how replication locations are configured, and those
will still require the secret uuid to be defined.
Change-Id: I739ae6ae5b4d9b074d610f6a70371b294a3e70f8
Currently snapshot delete requires access to the source volume and
the operation fails if the source volume doesn't exist in the backend.
This prevents some snapshots from being deleted when the source volume
image is deleted from the backend for some reason (for example, after
cluster format).
This change makes the rbd driver to skip updating the source volume
if it doesn't exist. A warning log is left so that operators can be
aware of any skip event.
Closes-Bug: #1957073
Change-Id: Icd9dad9ad7b3ad71b3962b078e5b94670ac41c87
QoS support for the Ceph Cinder driver
- Support injecting QoS metadata into ceph when creating a volume
- Supports updating QoS parameters when retype operation is performed
Note(s):
1) The version history added to cinder/volume/drivers/rbd.py is incomplete due to lack of prior knowledge in regards to the driver versioning.
Signed-off-by: Danny Webb <danny.webb@thehutgroup.com>
Co-Authored-By: Sergey Drozdov <sergey.drozdov.dev@gmail.com, sergey.drozdov93@thehutgroup.com>
Implements: rbd-backend-qos
Blueprint: https://blueprints.launchpad.net/cinder/+spec/rbd-backend-qos
Change-Id: I25862085074d15e6cebb7f69c258fa9bcafe6d59
These are no longer needed.
This is achieved by assuming that fields like
Volume.name, Snapshot.name, and Snapshot.volume_name
are already Unicode strings in our objects.
Also removes some Python 2 compat cruft.
Change-Id: I018f97a4aace1f536ab816e866b67ce23576609c
This works in Python 3.7 or greater and is
cleaner looking.
See PEP-585 for more info.
https://peps.python.org/pep-0585/
Change-Id: I4c9da881cea1a3638da504c4b79ca8db13851b06
There might be a case where the exception raised by `op_features`
could be other than AttributeError. When any exception other than
AttributeError is raised, we should log and avoid raising an
exception.
Closes-Bug: #1942210
Co-Authored-By: Gorka Eguileor <geguileo@redhat.com>
Change-Id: I513abe980b73d7e7b1a3cd9c7ff89490f7fd6b08
When using the LVM cinder driver the cacheable capability is not being
reported by the backend to the scheduler when the transport protocol is
NVMe-oF (nvmet target driver), but it is properly reported if it's the
LIO target driver.
This also happens with other drivers that should be reporting that they
are cacheable.
This happens because even if the volume manager correctly uses the
"storage_protocol" reported by the drivers on their stats to add the
"cacheable" capability for iSCSI, FC, and NVMe-oF protocols, it isn't
taking into account all the variants these have:
- FC, fc, fibre_channel
- iSCSI, iscsi
- NVMe-oF, nvmeof, NVMeOF
Same thing happens for the shared_targets of the volumes, which are not
missing an iSCSI variant.
This patch creates constants for the different storge protocols to try
to avoid these variants (as agreed on the PTG) and also makes the
cacheable and shared_targets check against all the existing variants.
This change facilitates identifying NVMe-oF drivers (for bug 1961102)
for the shared_targets part.
Closes-Bug: #1969366
Related-Bug: #1961102
Change-Id: I1333b0471974e94eb2b3b79ea70a06e0afe28cd9
Convert methods to static methods where possible.
This helps with code maintenance by delineating which
methods do not depend on the driver state.
Change-Id: I2525be56926400beee520b4a2abef14060372af0
Ceph has changed the meaning of the ``bytes_used`` column in the pools
reported by the ``df`` command, which means that in some deployments the
rbd driver is not reporting the expected information ot the schedulers.
The information we should be used for the calculations is returned in
the ``stored`` field in those systems.
This patch uses ``stored`` when present and fallbacks to ``bytes_used``
if not.
Closes-Bug: #1960206
Change-Id: I0ca25789a0b279d82f766091235f24f429405da6
This is cleaner if we catch OSError specifically, and
also alleviates the need for pylint/type checking skips.
Change-Id: Ief59eea8ccdd8d263e262ba04d209829321ac6d1
In cases where we don't need to modify the image,
open rbd images in read-only mode.
Closes-Bug: #1947518
Change-Id: I8287460b902dd525aa5313861142f5fb8490e60a
Currently RBD doesn't allow deleting volumes with snapshots or volume
dependencies. This causes Cinder API errors on delete calls that should
succeed.
When using the RBD v2 clone api, deleting a volume that has a snapshot
in the trash space raises a busy exception.
In order to solve this, this patch removes the proactive VolumeIsBusy
exception raise and calls the trash operation which should succeed when
the volume has dependencies.
In addition to this code it's important to enable the Ceph Trash auto
purge. Otherwise Ceph may end up with a couple of images in trash
namespace for a while. However, this approach is the lesser of 2 evils
because the user will be able to delete volumes with dependencies
while the operator could check the trash namespace and manually purge
the images. It is definitely better to potentially trouble 1 person
(operator) that didn't read the release notes once than troubling
every single user.
Closes-Bug: #1941815
Co-Author: Eric Harney <eharney@redhat.com>
Change-Id: I5dbbcca780017b358600016afca8a9424aa137fd
There are instances where cinder needs to create a temporary volume and
this can trigger a flatten of the new temporary volume, which will make
the operation take a lot longer.
In some cases this means slower operations, but in others it leads to
rpc timeout failures.
A case where we see timeout failures is when doing a backup of a
snapshot and we have rbd_flatten_volume_from_snapshot=true.
This patch ensures that we don't flatten temporary volumes.
Closes-Bug: #1916843
Change-Id: I8f55c3beb2f8df5b2227506f82ddf6ee57c951ae
The recent release of Ceph Pacific saw a change to the clone() logic
where invalid values of stripe unit would cause an error to be returned
where previous versions would correct the value at runtime. This
becomes a problem when creating a volume from an image, where the source
RBD image may have a larger stripe unit than cinder's RBD driver is
configured for. When this happens, clone() is called with a stripe unit
that is too small given that of the source image and the clone fails.
The RBD driver in Cinder has a configuration parameter
'rbd_store_chunk_size' that stores the preferred object size for cloned
images. If clone() is called without a stripe_unit passed in, the
stripe unit defaults to the object size, which is 4MB by default. The
issue arises when creating a volume from a Glance image, where Glance is
creating images with a default stripe unit of 8MB (distinctly larger
than that of Cinder). If we do not consider the incoming stripe unit
and select the larger of the two, Ceph cannot clone an RBD image with a
smaller stripe unit and raises an error.
This patch adds a function in our driver's clone logic to select the
larger of the two stripe unit values so that the appropriate stripe unit
is chosen.
It should also be noted that we're determining the correct stripe unit,
but using the 'order' argument to clone(). Ceph will set the stripe
unit equal to the object size (order) by default and we rely on this
behaviour for the following reason: passing stripe-unit alone or with
an order argument causes an invalid argument exception to be raised in
pre-pacific releases of Ceph, as it's argument parsing appears to have
limitations.
Closes-Bug: #1931004
Change-Id: Iec111ab83e9ed8182c9679c911e3d90927d5a7c3
Usually the source volume would be the same size or smaller
than the destination volume and they must share the same
volume-type. In particular, when the destination volume is
same size as the source volume, creating an encrypted volume
from a snapshot of an encrypted volume truncates the data in
the new volume.
In order to fix this the RBD workflow would be something
like this:
A source luks volume would be 1026M, we write some data
and create a snap from it. We like to create a new luks
volume from a snapshot so the create_volume_from_snapshot()
method performs a RBD clone first and then a resize if needed.
In addition the _clone() method creates a clone
(copy-on-write child) of the parent snapshot. Object size
will be identical to that of the parent image unless specified
(we don't in cinder) so size will be the same as the parent
snapshot.
If the desired size of the destination luks volume is 1G the
create_volume_from_snapshot() won't perform any resize and
will be 1026M as the parent. This solves bug #1922408 because
we don't force it to resize and because of that we don't
truncate the data anymore.
The second case scenario is when we would like to increase
the size of the destination volume. As far as I can tell this
won't face the encryption header problem but we still need to
calculate the difference size to provide the size that the
user is expecting.
That's why the fix proposed calculate the new_size based on:
size difference = desired size - size of source volume
new size = current size + size difference
Closes-Bug: #1922408
Co-Authored-By: Sofia Enriquez <lsofia.enriquez@gmail.com>
Change-Id: I220b5e3b01d115262a8b1dd45758f0531aea0edf
Starting with the Mimic release and the Ceph RBD clone v2 feature,
it is no longer necessary to manage snapshot protection status.
The usage of clone v2 is controlled via the Ceph “rbd default
clone format” configuration option which defaults to “auto”, but
can be overridden to “1” to force the legacy clone v1 behavior or
“2” to force the new clonev2 behavior.
This must be configured in Ceph, it's not something that can be
configured on the Cinder side.
As an alternative, setting the minimum client to mimic
"$ ceph osd set-require-min-compat-client mimic"
Above all with Ceph RBD clone v2 support enabled, image snapshots
can be cloned without marking the snapshot as protected because
RBD automatically tracks the usage of the snapshot by clones. As
a result Ceph RBD clone v2 API improves performance.
This proposed patch adds logging messages for the operator
indicating whether Ceph has been configured to use the Ceph RBD
v2 clone feature.
In this way the operator can consult with the Ceph administrator
to take appropriate action.
You can determine which volumes are using the clone v2 feature
thanks to volume.volume.op_features(). It returns a value depending
on the v2 clone api being enabled or not.
Co-authored-by: Eric Harney <eharney@redhat.com>
Change-Id: Ib52879a270a4ae4cdd3cb5fc18f2b7bdbccd8ab5
The driver requires the new rbd-iscsi-client package, which is used
to talk to the rbd-target-api on the ceph iscsi gateway node.
The rbd-target-api is a python script meant to keep ceph iscsi gw nodes in
sync with each other, but the API is works for creating iscsi targets.
This is a new driver that makes heavy use of the ceph-iscsi project's
rbd-target-api python REST client here:
https://github.com/ceph/ceph-iscsi
The driver is a derivation of the rbd driver, and the intention is to reuse
as much of the base rbd driver as possible and just do iSCSI specific
code here.
Change-Id: Iff0e4d1137851c8f0b8ec25632d1186c2859b2fc
In Cinder we always try to have sane defaults, but the current RBD
default for rbd_exclusive_cinder_pools may lead to issues on deployments
with a large number of volumes:
- Cinder taking a long time to start.
- Cinder becoming non-responsive.
- Cinder stats gathering taking longer than the gathering period.
This is cause by the driver making an independent request to get
detailed information on each image to accurately calculate the space
used by the Cinder volumes.
With this patch we change the default to make sure that these issues
don't happen in the most common deployment case (the exclusive Cinder
pool).
Related-Bug: #1704106
Change-Id: I839441a71238cdad540ba8d9d4d18b1f0fa3ee9d
Cinder can fail to create an image-based volume if RBD mirroring
is enabled. With the journaling-based approach to RBD mirroring,
ceph will still create a snapshot as a result of volume creation.
The volume create in _create_from_image_download() results in
a snapshot getting created, resulting in a race where delete_volume()
gets a VolumeIsBusy exception.
Change-Id: Ib80e04512ec34a390e9e17af2f3544e18cad8598
Closes-Bug: #1900775
Removed a sanity check in the code that raised an exception
if the clone depth of a volume to be cloned exceeded the
rbd_max_clone_depth config value. A consequence of this
check was that if an operator lowered the value, volumes
whose clone depth was greater than the new value (as would
be allowed by the previous, higher setting) could no longer
be cloned.
Change-Id: I8c445058a25c2eca2fda91bdeb6befedae34ccf2
Closes-bug: #1901241
The current implementation of create_cloned_volume calls flatten
directly, and this makes whole thread of cinder-volume blocked by that
flatten call. This causes heartbeat timeout in RabbitMQ when cloning
a volume with rbd backend.
This patch makes sure that flatten is executed in a different thread,
to allow heatbeat thread to run while flattening a rbd image.
Closes-Bug: #1898918
Change-Id: I9f28260008117abcebfc96dbe69bf892f5cd14fe
In the last cycle we deprecated the RBD configuration option as per
OSSN-0085, and it was removed for victoria by Change
I3cc58b2d74d82ab6b2186e9ea7cfafeb4c3de822
This patch modifies the RBD driver to support cinderlib use cases that
are not affected by the security vulnerability.
Even if we have the configuration option in cinder.conf it will not be
seen by the Cinder RBD driver, it will only see it if we skip the Oslo
Config mechanism and set it directly on the instance as an attribute,
like cinderlib does.
Related-Bug: #1849624
Implements: blueprint rbd-remove-option-causing-ossn-0085
Change-Id: Iae63aee973932b2eef26d832a7f413a04bad0df1
This option was deprecated in the Ussuri release and is now removed.
See OSSN-0085 for details.
Change-Id: I3cc58b2d74d82ab6b2186e9ea7cfafeb4c3de822
Implements: bp rbd-remove-option-causing-ossn-0085