Add file to the reno documentation build to show release notes for
stable/2024.1.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2024.1.
Sem-Ver: feature
Change-Id: I8c58db8c491fb9c66933ea44be60cadadd7b66b1
When fetching the target value (T in HCTL) for the storage HBAs,
we use the /sys/class/fc_transport path to find available targets.
However, this path only contains targets that already have a LUN
attached from, to the host.
Scenario:
If we have 2 controllers on the backend side with 4 target HBAs each (total 8).
For the first LUN mapping from controller1, we will do a wildcard
scan and find the 4 targets from controller1 which will get
populated in the /fc_transport path.
If we try mapping a LUN from controller2, we try to find targets in the
fc_transport path but the path only contains targets from controller1 so
we will not be able to discover the LUN from controller2 and fail with
NoFibreChannelVolumeDeviceFound exception.
Solution:
In each rescan attempt, we will first search for targets in the
fc_transport path: "/sys/class/fc_transport/target<host>*".
If the target in not found then we will search in the fc_remote_ports
path: "/sys/class/fc_remote_ports/rport-<host>*"
If a [c,t,l] combination is found from either path, we add it to
the list of ctls we later use it for scanning.
This way, we don't alter the current "working" mechanism of scanning
but also add an additional way of discovering targets and improving
the scan to avoid failure scenarios in each rescan attempt.
Closes-Bug: #2051237
Change-Id: Ia74b0fc24e0cf92453e65d15b4a76e565ed04d16
The nvme cli has changed its behavior, now they no longer differentiate
between errors returning a different exit code.
Exit code 1 is for errors and 0 for success.
This patch fixes the detection of race conditions to also look for the
message in case it's a newer CLI version.
Together with change I318f167baa0ba7789f4ca2c7c12a8de5568195e0 we are
ready for nvme CLI v2.
Closes-Bug: #1961222
Change-Id: Idf4d79527e1f03cec754ad708d069b2905b90d3f
Attaching NVMe-oF no longer works in CentosOS 9 stream using nvme 2.4
and libnvme 1.4.
The reason is that the 'address' file in sysfs now has the 'src_addr'
information.
Before we had:
traddr=127.0.0.1,trsvcid=4420 After:
Now we have:
traddr=127.0.0.1,trsvcid=4420,src_addr=127.0.0.1
This patch fixes this issue and future proofs for any additional
information that may be added by parsing the contents and searching for
the parts we care: destination address and port.
Closes-Bug: #2035811
Change-Id: I7a33f38fb1b215dd23e2cff3ffa79025cf19def7
When an nvme subsystem has all portals in connecting state and we try
to attach a new volume to that same subsystem it will fail.
We can reproduce it with LVM+nvmet if we configure it to share targets
and then:
- Create instance
- Attach 2 volumes
- Delete instance (this leaves the subsystem in connecting state [1])
- Create instance
- Attach volume <== FAILS
The problem comes from the '_connect_target' method that ignores
subsystems in 'connecting' state, so if they are all in that state it
considers it equivalent to all portals being inaccessible.
This patch changes this behavior and if we cannot connect to a target
but we have portals in 'connecting' state we wait for the next retry of
the nvme linux driver. Specifically we wait 10 more seconds that the
interval between retries.
[1]: https://bugs.launchpad.net/nova/+bug/2035375
Closes-Bug: #2035695
Change-Id: Ife710f52c339d67f2dcb160c20ad0d75480a1f48
Dell Powerflex 4.x changed the error code of VOLUME_NOT_MAPPED_ERROR
to 4039. This patch adds that error code.
Closes-Bug: #2046810
Change-Id: I76aa9e353747b1651480efb0f3de11c707fe5abe
This patch improves the creation of the /etc/nvme/hostnqn file by using
the system UUID value we usually already know.
This saves us one or two calls to the nvme-cli command and it also
allows older nvme-cli versions that don't have the `show-hostnqn`
command or have it but can only read from file to generate the same
value every time, which may be useful when running inside a container
under some circumstances.
Change-Id: Ib250d213295695390dbdbb3506cb297a86e95218
The Dell PowerFlex (scaleio) connector maintains a token cache
for PowerFlex OS.
The cache was overwritten with None by misktake
in Change-ID I6f01a178616b74ed9a86876ca46e7e46eb360518.
This patch fixes the broken cache to avoid unnecessary login.
Closes-Bug: #2004630
Change-Id: I2399b0b2af8254cd5697b44dcfcec553c2845bec
from an image
This patch fixes the issue of password getting writen in plain text in
logs while creating a new volume. It created a new logger with default
log level at error.
Closes-Bug: #2003179
Change-Id: I0292a30f402e5acddd8bbc31dfaef12ce24bf0b9
Dell Powerflex 4.x changed the error code of VOLUME_ALREADY_MAPPED_ERROR
to 4037. This patch adds that error code.
Closes-Bug: #2013749
Change-Id: I928c97ea977f6d0a0b654f15c80c00523c141406
In some old nvme-cli versions the NVMe-oF create_hostnqn method fails.
This happens specifically on versions between not having the
show-hostnqn command and having it always return a value. On those
version the command only returns the value present in the file and never
tries to return an idempotent or random value.
This patch adds for that specific case, which is identified by the
stderr message:
hostnqn is not available -- use nvme gen-hostnqn
Closes-Bug: #2035606
Change-Id: Ic57d0fd85daf358e2b23326022fc471f034b0a2f
Add file to the reno documentation build to show release notes for
stable/2023.2.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.2.
Sem-Ver: feature
Change-Id: I25bef272ded6c7c963c6ad0f95103fe421fa8fe7
In a multipath enabled deployment, when we try to extend a volume
and some paths are down, we fail to extend the multipath device and
leave the environment in an inconsistent state. See LP Bug #2032177
for more details.
To handle this, we check if all the paths are up before trying to
extend the device and fail fast if any path is down. This ensures
we don't partially extend some paths and leave the other to the
original size leading to inconsistent state in the environment.
Closes-Bug: 2032177
Change-Id: I5fc02efc5e9657821a1335f1c1ac5fe036e9329a
The NVMe-oF connector currently create the `/etc/nvme/hostnqn` file if
it doesn't exist, but it may still be missing the `/etc/nvme/hostid`
value.
Some distribution packages create the file on installation but others
may not.
It is recommended for the file to be present so that nvme doesn't
randomly generate it.
Randomly generating it means that the value will be different for the
same storage array and the same volume if we connect, disconnect, and
connect the volume again.
This patch ensures that the file will exist and will try to use the
system's UUID as reported by DMI or a randomly generated one.
Closes-Bug: #2016029
Change-Id: I0b60f9078f23f8464d8234841645ed520e8ba655
This patch adds non SAM addressing modes for LUNs, specifically for
SAM-2 and SAM-3 flat addressing.
Code has been manually verified on Pure storage systems that uses SAM-2
addressing mode, because it's unusual for CI jobs to have more than
256 LUNs on a single target.
Closes-Bug: #2006960
Change-Id: If32d054e8f944f162bdc8700d842134a80049877
When multipath is enabled and friendly names are ON and we try
to extend a volume, we pass the SCSI WWID to the multipath_resize_map
method which is not correct.
There are 2 things we can pass to the multipathd resize map command:
1) Multipath device (eg: dm-0)
2) Alias (eg: mpath1) or UUID (eg: 36005076303ffc56200000000000010aa)
The value should be an alias (mpath1) when friendly names are ON
and UUID (36005076303ffc56200000000000010aa) when friendly names
are OFF. However, we only pass the UUID irrespective of the value
set for friendly names.
This patch passes the multipath device path (to multipathd resize
map command) which is the real path of the multipath device (/dev/dm-*).
This fixes the issue as it passes the same value irrespective of if
the friendly names are ON or OFF.
-> self.multipath_resize_map(os.path.realpath(mpath_device))
(Pdb) mpath_device
'/dev/disk/by-id/dm-uuid-mpath-3600140522774ce73be84f9cb9537e0c9'
(Pdb) os.path.realpath(mpath_device)
'/dev/dm-5'
Closes-Bug: 1609753
Change-Id: I1c60af19c2ebaa9de878cd07cfc0077c5ea56fe3
The -v arg suppresses printing of [/dir] with the
device for bind mounts and btrfs volumes, which is
what we want for this usage.
This fixes _get_host_uuid() failing when using
a btrfs rootfs.
Closes-Bug: #2026257
Change-Id: I2d8f24193ecf821843bf8f4ea14b445561d6225c
This patch adds support for the force and ignore_errors on the
disconnect_volume of the FC connector like we have in the iSCSI
connector.
Related-Bug: #2004555
Change-Id: Ia74ecfba03ba23de9d30eb33706245a7f85e1d66
Add file to the reno documentation build to show release notes for
stable/2023.1.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.1.
Sem-Ver: feature
Change-Id: Ic3ac791f083aa097412dc9075a6b20f2b148db02
Currently we don't have os_brick DEBUG log levels in Nova when setting
the service to debug mode.
That happens because Nova is forcefully setting oslo.privsep.daemon
levels to INFO to prevent leaking instance XML details (bug #1784062).
Oslo Privsep now supports per-context debug log levels, so this patch
sets the log level name for its only os_brick privsep context to
"os_brick.privileged" to differentiate it from the service it runs under
which uses the default "oslo_privsep.daemon".
This way even though Nova is still disabling its own privileged calls it
won't affect os-brick privileged calls, allowing us to properly debug
block device attach/detach operations.
Closes-Bug: #1922052
Related-Bug: #1784062
Change-Id: I0de32021eb90ca045845a6c7c7e3d27e52895948
Add file to the reno documentation build to show release notes for
stable/zed.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/zed.
Sem-Ver: feature
Change-Id: Ia775c42636307fa35d0612937e4c949c0cc2193b
Cinder microversion 3.69 adds an additional value to shared_targets
beyond true and false. Now null/None is also a valid value that can be
used to force locking.
So we now have 3 possible values:
- True ==> Lock if iSCSI initiator doesn't support manual scans
- False ==> Never lock.
- None ==> Always lock.
This patch updates the guard_connection context manager to support the
3 possible values.
With this Cinder can now force locking for NVMe-oF drivers that share
subsystems.
Closes-Bug: #1961102
Depends-On: I8cda6d9830f39e27ac700b1d8796fe0489fd7c0a
Change-Id: Id872543ce08c934cefbbbdaff6ddc61e3828b1f1
We currently have 2 different connection information formats: The
initial one and the new one.
Within the new format we have 2 variants: Replicated and Not replicated.
Currently the nvmeof connector has multiple code paths and we end up
finding bugs that affect one path and not the other, and bugs that are
present in both may end up getting fixed in only one of them.
This patch consolidates both formats by converting the connection
information into a common internal representation, thus reducing the
number of code paths.
Thanks to this consolidation the Cinder drivers are less restricted on
how they can identify a volume in the connection information. They are
no longer forced to pass the NVMe UUID (in case they cannot get that
information or the backend doesn't support it) and they can provide
other information instead (nguid or namespace id).
This connection properties format consolidation is explained in the new
NVMeOFConnProps class docstring.
By consolidating the code paths a number of bugs get fixed
automatically, because they were broken in one path but not in the
other. As examples, the old code path didn't have rescans but the new
one did, and the old code path had device flush but the new one didn't.
Given the big refactoring needed to consolidate everything this patch
also accomplishes the following things:
- Uses Portal, Target, and NVMeOFConnProps classes instead of using the
harder to read and error prone dictionaries and tuples.
- Adds method annotations.
- Documents most methods (exect the raid ones which I'm not familiar
with)
- Adds the connector to the docs.
- Makes method signatures conform with Cinder team standards: no more
static methods passing self and no more calling class methods using
the class name instead of self.
Closes-Bug: #1964590
Closes-Bug: #1964395
Closes-Bug: #1964388
Closes-Bug: #1964385
Closes-Bug: #1964380
Closes-Bug: #1964383
Closes-Bug: #1965954
Closes-Bug: #1903032
Related-Bug: #1961102
Change-Id: I6c2a952f7e286f3e3890e3f2fcb2fdd1063f0c17
Patch fixing bug #1861071 resolved the issue of extending LUKS v1
volumes when nova connects them via libvirt instead of through os-brick,
but nova side still fails to extend LUKSv2 in-use volumes when they
don't go through libvirt.
The logs will show a very similar error, but the user won't know that
this has happened and Cinder will show the new size:
libvirt.libvirtError: internal error: unable to execute QEMU command
'block_resize': Cannot grow device files
There are 2 parts to this problem:
- The device mapper device is not automatically extended.
- Nova tries to use the encrypted block device size as the size of the
decrypted device.
This patch adds new functionality to the encryptors so that they can
extend decrypted volumes to match the size of the encrypted device.
New method added to encryptors is called "extend_volume", and should be
called after the homonymous method in the connector has been called, and
the value returned by the encryptor's extend_volume method is the real
size of the decrypted volume (encrypted volume - headers).
The patch only adds functionality for LUKS and LUKSv2 volumes, not to
cryptsetup volumes.
Related-Bug: #1967157
Change-Id: I351f1a7769c9f915e4cd280f05a8b8b87f40df84