Commit Graph

45 Commits

Author SHA1 Message Date
Gorka Eguileor 59961647d3 SCSI: Support non SAM LUN addressing
This patch adds non SAM addressing modes for LUNs, specifically for
SAM-2 and SAM-3 flat addressing.

Code has been manually verified on Pure storage systems that uses SAM-2
addressing mode, because it's unusual for CI jobs to have more than
256 LUNs on a single target.

Closes-Bug: #2006960
Change-Id: If32d054e8f944f162bdc8700d842134a80049877
2023-08-23 12:19:11 +02:00
Brian Rosmaita 1b739ed2d5 Revert "Fix iSCSI disconnect_volume when flush fails"
This reverts commit 8070ac3bd9.

Reason for revert: This requires some more discussion, I should not have ninja-approved it.

Change-Id: I25917b95a32da4fd831d669cd21988f400f258e0
2023-04-27 12:19:16 +00:00
Rajat Dhasmana 8070ac3bd9 Fix iSCSI disconnect_volume when flush fails
The purpose of providing force=True and ignore_errors=True
is to tell os-brick that we want to disconnect the volume
even if flush fails (force) and not raise any exceptions
(ignore_errors). Currently, in an iSCSI multipath environment,
disconnect_volume can fail when both parameters are True.

The current flow when disconnecting an iSCSI volume is
that if flushing a multipath device fails, we manually
remove the device, logout from the target portals,
and try the flush again.

There are two problems here:

1) code problem: The second flush is not wrapped by
ExceptionChainer. This causes it to raise the exception
immediately after flush fails irrespective of the value
of the ignore_errors flag.

2) conceptual problem: In this situation, there is no point
in making the second flush attempt. Instead, we should just
remove the multipath map from multipathd monitoring since
we have already removed the paths manually.

This patch fixes the conceptual problem, as we don't make a second
flush call and ignore any errors on the execution ``multipathd del map``
thereby also fixing the code problem.

Closes-Bug: #2012251
Change-Id: I828911495a2de550ea997e6f51cc039a7b7fa8cd
2023-04-21 06:32:09 +00:00
Takashi Natsume e7b1426e31 Fix wrong assertion methods
Replace 'called_once_with' with 'assert_called_once_with'.

Change-Id: I18b55da0d1f142818e7ea14f6eebcc0f0f0cd23f
Closes-Bug: 1989280
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
2023-02-05 22:33:35 +00:00
Eric Harney a519dd8d07 mypy: initiator
Change-Id: I96d0a1b1276b3c666e8deb251e5bb71c68fc9a30
2022-08-18 14:34:01 -04:00
Gorka Eguileor 1583a5038d Fix encryption symlink issues
This patch fixes 2 issues related to the symlinks, or lack of, that
connectors' connect_volume methods return.

Some connectors always return the block device instead of a symlink for
encrypted volumes, and other connectors return a symlink that is owned
by the system's udev rules.  Both cases are problematic

Returning the real device can prevent the encryptor connect_volume to
complete successfully, and in other cases (such as nvmeof) it completes,
but on the connector's disconnect volume it will leave the device behind
(i.e., /dev/nvme0n1) preventing new connections that would use that same
device name.

Returning a symlink owned by the system's udev rules means that they can
be reclaimed by those rules at any time.  This can happen with
cryptsetup, because when it creates a new mapping it triggers udev rules
for the device that can reclaim the symlink after os-brick has replaced
it.

This patch creates a couple of decorators to facilitate this for all
connectors. These decorators transform the paths so that the callers
gets the expected symlink, but the connector doesn't need to worry about
it and will always see the value it returns regardless of what symlink
the caller gets.

From this moment onwards we use our own custom symlink that starts with
"/dev/disk/by-id/os-brick".

The patch fixes bugs in other connectors (such as the RBD local
connection), but since there are no open bugs they have not been
reported.

Closes-Bug: #1964379
Closes-Bug: #1967790
Change-Id: Ie373ab050dcc0a35c749d9a53b6cf5ca060bcb58
2022-05-24 15:01:00 +02:00
Sophie Huang 8832c53899 multipath/iscsi: iSCSI connections are not reinitiated after reboot
After compute host reboot, in an iSCSI/multipath environment, some
of the connections to the iSCSI portal are not reinitiated and missing
iSCSI devices are observed. This patchset introduced retries for this
particular scenario.

Closes-Bug: #1944474
Change-Id: I60ee7421f7b792e8324286908a9fdd8fb53e433e
2021-10-06 14:34:26 +00:00
Gorka Eguileor d4205bd0be iSCSI: Fix flushing after multipath cfg change
OS-Brick disconnect_volume code assumes that the use_multipath parameter
that is used to instantiate the connector has the same value than the
connector that was used on the original connect_volume call.

Unfortunately this is not necessarily true, because Nova can attach a
volume, then its multipath configuration can be enabled or disabled, and
then a detach can be issued.

This leads to a series of serious issues such as:

- Not flushing the single path on disconnect_volume (possible data loss)
  and leaving it as a leftover device on the host when Nova calls
  terminate-connection on Cinder.

- Not flushing the multipath device (possible data loss) and leaving it
  as a leftover device similarly to the other case.

This patch changes how we do disconnects, now we assume we are always
disconnecting multipaths, and fallback to doing the single path
disconnect we used to do if we can't go that route.

The case when we cannot do a multipathed detach is mostly when we did
the connect as a single path and the Cinder driver doesn't provide
portal_targets and portal_iqns in the connection info for non
multipathed initialize-connection calls.

This changes introduces an additional call when working with single
paths (checking the discoverydb), but it should be an acceptable
trade-off to not lose data.

Closes-Bug: #1921381
Change-Id: I066d456fb1fe9159d4be50ffd8abf9a6d8d07901
2021-03-26 11:07:45 +01:00
Takashi Kajinami 4478433550 Avoid unhandled exceptions during connecting to iSCSI portals
Currently we don't properly catch some possible exceptions during
connectiing to iSCSI portals, like failures in "iscsiadm -m session".
Because of this _connect_vol threads can abort unexpectedly in some
failure patterns, and this abort causes hung in subsequent steps
waiting for results from _connct_vol threads.

This change ensures that any exceptions during connecting to iSCSI
portals are handled in the _connect_vol method corectly, to avoid
unexpected abort without updating thread results.

Closes-Bug: #1915678
Change-Id: I0278c502806b99f8ec65cb146e3852e43031e9b8
2021-03-04 12:01:41 +09:00
Jon Bernard 4fabe1b33d Improve error handling on target query
The command 'iscsiadm -m node' will return entries for corrupt targets
in the form '[]:port,-1' instead of the expected format.  This causes an
IndexError exception during parsing.  This patch skips invalid entries.

Closes-bug: #1886855
Change-Id: I9a1746658474c0f1be7ec29a36767085aaf2ab7f
2020-11-21 14:30:44 -05:00
Gorka Eguileor 0cdd9bbbe2 Leverage the iSCSI mpath to get the WWN
Now that we search the multipath device even if we haven't been able to
find the WWN in the sysfs we can leverage the multipath daemon
information on sysfs to get the WWN.

Pass the mpath to "get_sysfs_wwn" method where we check the sysfs to get
the WWN.

Change-Id: Id1905bc174b8f2f3a345664d8a0a05284ca69927
2020-08-13 22:10:09 -04:00
Zuul 6974808cb5 Merge "iSCSI detect multipath DM with no WWN" 2020-08-13 23:08:39 +00:00
Dirk Mueller 188cbed313 Switch from unittest2 compat methods to Python 3.x methods
With the removal of Python 2.x we can remove the unittest2 compat
wrappers and switch to assertCountEqual instead of assertItemsEqual

We have been able to use them since then, because
testtools required unittest2, which still included it. With testtools
removing Python 2.7 support [3][4], we will lose support for
assertItemsEqual, so we should switch to use assertCountEqual.

[1] - https://bugs.python.org/issue17866
[2] - https://hg.python.org/cpython/rev/d9921cb6e3cd
[3] - testing-cabal/testtools#286
[4] - testing-cabal/testtools#277

Change-Id: I7767abc2ed4317288303fc9a2235e869f46a63b0
2020-06-23 14:14:34 +02:00
Gorka Eguileor 63f52be546 iSCSI detect multipath DM with no WWN
If udev rules are slow (or don't happen) and they don't generate the
symlinks we use to detect the WWN, or if we just fail to find the WWN we
end up not detecting the iSCSI multipath even if it is present in the
system.

With this patch we no longer wait to find the WWN before trying to
locate the multipath device.

This means that as long as the multipath daemon is able to generate the
multipath we will be able to return a multipath device path to the
caller.

Closes-Bug: 1881619
Change-Id: Ic48bd9ac408c56073e58168df7e74e4b949ac2f2
2020-06-01 18:43:49 +02:00
Sean McGinnis 3f1314674d
Switch from retrying to tenacity
This switching our retry mechanism over from the retrying library to the
tenacity library. Retrying has been active for a few years now and
appears to be no longer maintained.

This has a small behavior change in that before we were applying an
exponential backoff to the first time a retry was needed. This no longer
happens, but retries will exponentially back off with each retry.

Also cleaned up some minor nits with unit test assert argument order.

Closes-bug: #1635397

Change-Id: I24cab206b16e63859d4886c55d40a03d398ce30d
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
2020-05-06 09:08:01 -05:00
Sean McGinnis 58fb18e1c2
Use unittest.mock instead of third party mock
Now that we no longer support py27, we can use the standard library
unittest.mock module instead of the third party mock lib.

Change-Id: I4927ff7bbdb737884055dd584bc37bd9e93827be
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
2020-04-18 16:22:50 -05:00
Lee Yarwood 331316827a iscsi: Add _get_device_link retry when waiting for /dev/disk/by-id/ to populate
Bug #1820007 documents failures to find /dev/disk/by-id/ symlinks
associated with encrypted volumes both in real world and CI
environments. These failures appear to be due to udev on these slow or
overloaded hosts failing to populate the required /dev/disk/by-id/
symlinks in time after the iSCSI volume has been connected.

This change seeks to avoid such failures by simply decorating
_get_device_link with the @utils.retry to hopefully allow udev time to
create the required symlinks under /dev/disk/by-id/.

Closes-Bug: #1820007
Change-Id: Ib9c8ebae7a6051e18538920139fecd123682a474
2019-11-29 16:23:18 +00:00
Chris M 81f26f822d Fix bad argument to iscsiadm in iSCSI discovery
Fix a call to _iscsiadm_update() in which a list was being passed as
the target_iqn connection property.  This property is used directly as
an argument to the iscsiadm -T option, so it must be a plain string.

Change-Id: I9c2ff1de1f89fb49dd6c5a90679d5c4238d5476a
Closes-bug: 1838820
2019-08-06 19:19:29 +00:00
Gorka Eguileor 1c07f221f2 iSCSI single path: Don't fail if there's no WWN
This patch relaxes the single-pathed connections and allows them to
complete even when we cannot detect the WWN on the sysfs, just like
multipath connections do.

Closes-Bug: #1834443
Change-Id: Iae5a304329a2b172bc6b7f310623fad18956ae45
2019-06-28 10:00:52 +02:00
Gorka Eguileor 71a1e22418 Context manager to handle shared_targets
We introduced "shared_targets" and "service_uuid" fields in volumes to
allow volume consumers to protect themselves from unintended leftover
devices when handling iSCSI connections with shared targets.

The way they protect themselves from the automatic scans that happen on
detach/map race conditions is by locking and only allowing one attach or
one detach operation for each server to happen at a given time.

When using an up to date Open iSCSI initiator we don't need to use
locks, as it has the possibility to disable automatic LUN scans (which
are the real cause of the leftover devices), and OS-Brick already
supports this feature.

This is currently not the case, since Nova is blindly locking whenever
"shared_targets" is set to True.

Thanks to the context manager introduced in this patch we can improve
our separation of concerns (Nova doesn't have to care when we have to
lock), and we can fine tune the locking to only lock when the iSCSI
initiator doesn't support disabling automatic scans.

The context manager "guard_connection" lives in
"os_brick.initiator.utils" and needs a cinder volume.  The volume can be
passed as a dictionary or OVO instance.

Change-Id: I4970363301d5d1f4e7d0f07e09b34d15ee6884c3
Related-Bug: #1800515
2018-10-30 16:55:01 +01:00
Zuul 995daed9fe Merge "The validation of iscsi session should be case insensitive" 2018-10-02 23:22:47 +00:00
imacdonn d398fa8233 'iscsiadm -m session' failure handling
Sometimes 'iscsiadm -m session' outputs warnings on stderr, but
still serves its purpose (usable stdout). We should not give up on
it when stderr is non-empty - instead, rely on the exit status, log
the stderr as a warning, and move on.

This change also removes 1 (ISCSI_ERR) from the list of acceptable
exit codes for the iscsiadm command, to ensure that failures get
caught.

Change-Id: Id8183cf3d8baf2f8ba6a00a47fa2ad7cc2a96aa5
Closes-Bug: #1732199
2018-09-25 17:29:42 +00:00
Yong Huang 8782c4accf The validation of iscsi session should be case insensitive
The validation of iscsi session should be case insensitive since IPv6
address is case insensitive. If the portal address are all lower cases,
and the address get from 'iscsiadm -m session' contain upper cases,
the validation of iscsi session will failed, the loop will be endless.

Change-Id: Ief5108e40f13cc53d8d9e9c5d1a5054f58f72aa0
Closes-bug: #1793627
2018-09-20 16:46:32 -04:00
Gorka Eguileor aecf9c968b Improve iSCSI device detection speed
Current iSCSI device detection checks for the presence of devices based
on the scan time, so it checks for presence, sleeps, scans, and checks
again.  This means that if the device becomes available while we sleep
to send another scan we won't detect the device.

This patch changes this, making the searching and the rescanning
independent operations operating at different cadences.

Checking for the device will happen every seconds, and the rescans will
happen after 4, 9, and 16 seconds.

Change-Id: I716a3ea8583e289819cc37b6b5dd9730dd59406b
2018-08-28 16:49:52 +02:00
Gorka Eguileor d866ee75c2 Fix multipath disconnect with path failure
Under certain conditions detaching a multipath device may result on
failure when flushing one of the individual paths, but the disconnect
should have succeeded, because there were other paths available to flush
all the data.

OS-Brick is currently following standard recommended disconnect
mechanism for multipath devices:

- Release all device holders
- Flush multipath
- Flush single paths
- Delete single devices

The problem is that this procedure does an innecessary step, flushing
individual single paths, that may result in an error.

Originally it was thought that the individual flushes were necessary to
prevent data loss, but upon further study of the multipath-tools and the
device-mapper code it was discovered that this is not really the case.

After the multipath flushing has been completed we can be sure that the
data has been successfully sent and acknowledge by the device.

Closes-Bug: #1785669
Change-Id: I10f7fea2d69d5d9011f0d5486863a8d9d8a9696e
2018-08-13 11:02:01 +02:00
iain MacDonnell 296887a59e Accept ISCSI_ERR_NO_OBJS_FOUND from iscsiadm
iscsiadm exits with code 21 (ISCSI_ERR_NO_OBJS_FOUND) when no nodes
exist. We must accept this as an empty result when attempting to get
node startup info, instead of producing a ProcessExecutionError.

Change-Id: I55f4b3b075bd7779e96777dee64bf577c45fddf1
Closes-Bug: 1756206
2018-03-22 08:53:45 -07:00
Rikimaru Honjo 3266fb51a5 Recover node.startup values after discovering
os_brick updates node.startup values from default value to
"automatic" when it creates iscsi connection.
But existing target's node.startup values will be reverted from
"automatic" to default value in creating iscsi connection process
if multipath is used.
When using multipath with a discovery type of backend, the
"iscsiadm -m discovery -t sendtargets -p ..." command will recreate
all target information of specified node.[1] node.startup value wil
be reverted to default value of existing targets by recreating.
As a result, "automatic" targets and default value targets will be
mixed on the host.

So this patch recovers node.startup values after discovering.

[1]
This behavior was explained in following page:
https://github.com/open-iscsi/open-iscsi/issues/58

Change-Id: I30b736ae3b916f77fc0778f5364c5f6ed6fecc60
closes-bug: #1670237
2017-12-01 23:23:22 +00:00
Mathieu Gagné ea4375981d Fix ISCSIConnector._get_potential_volume_paths logic
Previous implementation assumed that _get_iscsi_sessions()
returned raw entries like _get_iscsi_sessions_full().

The _get_iscsi_sessions() function returns a list of portals.
There is no need to filter it further, just use the returned value as-is.

Closes-bug: #1707296
Change-Id: I2c65bb1d93d79636387ee8db9072291df903fb88
2017-07-28 16:38:40 -04:00
yuyafei b291c2bbd4 Get the right portal from output of iscsiadm command
The output of iscsiadm command:
# iscsiadm -m discovery -t sendtargets -I default -p 172.168.101.36:3260
172.168.101.36:3260,129 iqn.2099-01.cn.com.zte:usp.spr11-4c:09:b4:b0:55:91
and we get ips is "172.168.101.36:3260,129", but we want
"172.168.101.36:3260".

Change-Id: Ie31fc43483da97e1351232f5aa19907617ee558a
Co-Authored-By: Bin Zhou <zhou.bin9@zte.com.cn>
Closes-Bug: #1705674
2017-07-22 01:53:41 +00:00
Gorka Eguileor 8e4adda001 Don't obscure logs on iSCSI sendtargets failure
On iSCSI connections that make use of discovery we will be obscuring the
the logs if the sendtargets command fails because when we try to do the
cleanup another exception will be raised (TargetPortalsNotFound).

The original exception is still logged since we are using
excutils.save_and_reraise_exception(), but it will be misleading if we
don't pay close attention.

This patch fixes this by ignoring the 'Unable to find target portals
information on discoverydb.' error, but logging a debug log message,
when doing the cleanup because this exception means that the sendtargets
failed and therefore we don't have anything to cleanup.

Change-Id: I7ddf827c7f2285acd72fd5a2fcd351928cb5d2df
2017-07-13 14:38:08 +02:00
Gorka Eguileor 66520dcf6c Return WWN in multipath_id
When we refactored the iSCSI connect mechanism we inadvertently changed
the value returned for the "multipath_id" key.

This patch fixes this and return the WWN as the value again.

This value is used by the encryption mechanism and expects it to be the
WWN.

Related-Bug: #1703954
Change-Id: Ia6d96a1e3a71488b44b3ca2323610a8f0a7cf675
2017-07-13 14:33:05 +02:00
Gorka Eguileor f341e9c3ed Return symlinks for encrypted volumes
When connecting encrypted volumes we need to return a symbolink link or
we will break all future attachments after detaching the volume.

OS-Brick on 1.14 and 1.15 returns real paths instead of returning symbolic
links, which results in the encryption attach_volume call replacing the
real device with a link to the crypt dm.

The issue comes from the Nova flow when attaching an encrypted volume:

1- Attach volume
2- Generate libvirt configuration with path from step 1
3- Encrypt attach volume

Since step 2 has already generated the config with the path from step 1 then
step 3 must preserve this path.

When step 1 returns a symbolic link we just forcefully replace it with a link
to the crypt dm and everything is OK, but when we return a real path it
does the same thing, which means we'll be replacing for example /dev/sda
with a symlink, which will then break the detach process, and all future
attachments.

Until Nova, Cinder, and OS-Brick are changed to have a different flow
(1, 3, 2) we need a workaround to make it work.

The workaround this patch introduces is to return a symbolic link when
the volume is encrypted.

It will try to return the symlink that always exists, but if it's not
there it will just look for ANY link to the device in '/dev/disk/by-id'.

Related-Bug: #1703954
Change-Id: If4461c3dcd67e5d948be9d9a3643c1eb81aaace9
2017-07-13 14:31:40 +02:00
Jenkins 6cdfd2a9a3 Merge "Fix manual scan for discovery type backends" 2017-07-02 00:11:05 +00:00
Gorka Eguileor be37c2e040 Fix iSCSI cleanup fix on discovery backends
On Change-Id Iada5d4fbeb07aeaf3afb953a289b6b89778c382c we tried to fix
an issue with the multipath detach of backends that used discovery on
attach, but contrary to the commit message and the docstrings it didn't
look for ip,port in the discoverydb but ip:port instead, which meant
that it would never find what it was looking for.

This patch fixes that fix to make it search for the right regex.

TrivialFix
Closes-Bug: #1699061

Change-Id: Ibfa1a78a555e984c662f668677451f5a3ed55602
2017-06-29 13:59:04 +02:00
Gorka Eguileor d310e06caf Fix manual scan for discovery type backends
When we added open-iscsi manual scan support we made it so that it
wouldn't modify the scan mode of already existing nodes, and this is a
problem for backends that use discovery, since the discovery will add
the nodes with the default automatic scan mode.

This patch makes sure that we always try to change the mode to manual
scans and only reports scans to be automatic if the initiator doesn't
support this feature.

TrivialFix

Change-Id: Id9e2f539d5273a20038399d9af6017652a4f987f
2017-06-29 13:41:13 +02:00
Gorka Eguileor 1905398f61 Fix iSCSI cleanup issue when using discovery
With the latest iSCSI disconnect refactoring [1] we have introduced a
regression where not all devices are properly cleaned up on a multipath
connection if we are using target discovery.

On connection we do the discovery so we get the targets, but on
disconnection  the code is expecting to have all the ips, iqns, and luns
information on the connection properties, so it will only flush the
multipath and disconnect volumes that were connected using the
information from the connection property.

This patch fixes this retrieving the targets information from the
discoverydb that was gathered during the connection phase.

Output from discoverydb query looks like this:

 SENDTARGETS:
 DiscoveryAddress: 192.168.1.33,3260
 DiscoveryAddress: 192.168.1.2,3260
 Target: iqn.2004-04.com.qnap:ts-831x:iscsi.cinder-20170531114245.9eff88
     Portal: 192.168.1.3:3260,1
         Iface Name: default
     Portal: 192.168.1.2:3260,1
         Iface Name: default
 Target: iqn.2004-04.com.qnap:ts-831x:iscsi.cinder-20170531114447.9eff88
     Portal: 192.168.1.3:3260,1
         Iface Name: default
     Portal: 192.168.1.2:3260,1
         Iface Name: default
 DiscoveryAddress: 192.168.1.38,3260
 iSNS:
 No targets found.
 STATIC:
 No targets found.
 FIRMWARE:
 No targets found.

[1] https://review.openstack.org/455392

Change-Id: Iada5d4fbeb07aeaf3afb953a289b6b89778c382c
Closes-Bug: #1699061
2017-06-20 15:57:22 +02:00
Gorka Eguileor f67d46c538 Add open-iscsi manual scan support
It was recently added to open-iscsi the functionality to disable
automatic LUN scans on iscsid start, on login, and on reception of
AEN/AER messages reporting LUN data has changed.

Those 3 cases were one of the causes why Nova-CPU and Cinder-Volumes
nodes would have unexpected devices.  With this new feature we can
prevent them from appearing unnexpectedly.

This patch adds the mechanism required to configure our sessions for
manual scans in a backward compatible way.

Manual scans are enabled setting `node.session.scan` to `manual`.

Change-Id: I146a74f9f79c68a89677b9b26a324e06a35886f2
2017-06-16 16:09:35 +02:00
Gorka Eguileor 56c8665d3d Refactor iSCSI connect
This patch refactors iSCSI connect code changing the approach to one
that relies primarily on sysfs, instead of CLI tools, to retrieve all
the required information: devices from the connection, multipath system
device name, multipath name, the WWN for the block devices...

By doing so, not only do we fix a good number of bugs, but we also
improve the reliability and speed of the mechanism.

A good example of improvements and benefits achieved by this patch are:

- Clean all leftovers on exceptions on a connection.

- Parallelize logins on multipath to increase speed on flaky network
  connections.

- As long as there is at least one good path when working with multipath
  we will return a multipath device instead of a single path device,
  which helps with temporary flaky connections.

- Don't use the rescan retry parameter as log in retry on multipath
  connections so both single and multipath cases are consistent.

- We no longer rely on just one device to get the wwn and look for the
  multipath.  This would be problematic with flaky connections.

- No more querying iSCSI devices for their WWN (page 0x83) removing
  delays and issue on flaky connections.

- It's no longer a problem for the mechanism the fact that a device
  exists but is not accessible.

- We use links in `/dev/disk/by-id` to get the WWID on connect, so we
  make sure there are no leftovers on disconnect, but we no longer use
  symlinks from `/dev/disk/by-path`, `/dev/disk/by-id`, or `/dev/mapper`
  to find devices.

- We no longer need to rely on the WWN to determine the multipath, we
  have the session and the LUN, so we trace the devices and from those
  we get if they belong to a multipath.

- Stop polluting the logs with unnecessary exceptions from checking if
  the node or session exist.

- Action retries will now only log the final exception instead of
  logging all the exceptions.

- Warn when a multipath could not be formed and a single device is being
  used, as performance will be degraded.

- We no longer do global rescans on single connections that could be
  bringing in unrelated and unwanted devices (`iscsiadm -T iqn -p portal
  --rescan`).

- Fix scan mechanism that would not request all scans when the same iqn
  was shareed between portals and could send a scan request to the wrong
  IP if they shared the same iqn.

- When doing single pathed connections we could end with a multipath
  because we didn't clean up unfruitful connections.

It's worth mentioning that this patch does not touch the extend
operation.

Change-Id: Ia1c47bfaa7bc3544a5feba6a8a30faf2f132b8d7
2017-06-16 16:09:35 +02:00
Gorka Eguileor 400ca5d6db Refactor iSCSI disconnect
This patch refactors iSCSI disconnect code changing the approach to one
that just uses `iscsiadm -m session` and sysfs to get all the required
information: devices from the connection, multipath system device name,
multipath name, the WWN for the block devices...

By doing so, not only do we fix a good number of bugs, but we also
improve the reliability and speed of the mechanism.

A good example of improvements and benefits achieved by this patch are:

- Common code for multipath and single path disconnects.

- No more querying iSCSI devices for their WWN (page 0x83) removing
  delays and issue on flaky connections.

- All devices are properly cleaned even if they are not part of the
  multipath.

- We wait for device removal and do it in parallel if there are
  multiple.

- Removed usage of `multipath -l` to find devices which is really slow
  with flaky connections and didn't work when called with a device from
  a path that is down.

- Prevent losing data when detaching, currently if the multipath flush
  fails for any other reason than "in use" we silently continue with the
  removal.  That is the case when all paths are momentarily down.

- Adds a new mechanism for the caller of the disconnect to specify that
  it's acceptable to lose data and that it's more important to leave a
  clean system.  That is the case if we are creating a volume from an
  image, since the volume will just be set to error, but we don't want
  leftovers.  Optionally we can tell os-brick to ignore errors and don't
  raise an exception if the flush fails.

- Add a warning when we could be leaving leftovers behind due to
  disconnect issues.

- Action retries (like multipath flush) will now only log the final
  exception instead of logging all the exceptions.

- Flushes of individual paths now use exponential backoff retries
  instead of random retries between 0.2 and 2 seconds (from oslo
  library).

- We no longer use symlinks from `/dev/disk/by-path`, `/dev/disk/by-id`,
  or `/dev/mapper` to find devices or multipaths, as they could be
  leftovers from previous runs.

- With high failure rates (above 30%) some CLI calls will enter into a
  weird state where they wait forever, so we add a timeout mechanism in
  our `execute` method and add it to those specific calls.

Closes-Bug: #1502534
Change-Id: I058ff0a0e5ad517507dc3cda39087c913558561d
2017-05-31 15:31:20 +02:00
Walter A. Boring IV e9f318e9b6 Mask logging of connection info for iSCSI connector
The iSCSI Connector object could possibly log CHAP passwords
to the log file.  This patch uses the oslo strutils to mask out
any passwords that may get logged.

Change-Id: I3496377874bf5820afd919923282c846a956ef67
2017-04-17 21:15:39 +00:00
Gorka Eguileor a6e789f27e Fix iSCSI multipath rescan
iSCSI multipath rescan uses iscsiadm --rescan option for nodes and
sessions, which can end up recreating devices that had just been removed
if there's a race condition between the removal of a SCSI device and the
connection of a volume.

The race condition happens if a rescan done when attaching happens right
between us removing the path and removing the exported lun, because the
rescan will add not only the new path we are attaching, but the old path
we are removing, since the lun still hasn't been removed.

This would leave orphaned devices that unnecessarily pollute our
environment,

This patch narrows the rescan to only rescan for the specific target id,
channel, and lun number if we can find this information.

When we cannot find this information we do the scan as we were doing it
before.

Closes-Bug: #1664032
Change-Id: I1b3bd34db260165a6ea9ca061f946d6dfcf8553f
2017-02-27 16:56:28 +01:00
Patrick East e591bc78cc Stop calling multipath -r when attaching/detaching iSCSI volumes
Looking into this more there isn't any documented reason why we do this,
and on Ubuntu 16.04 there are issues with timing and devices/symlinks
getting messed up when we do the reload of device maps. We shouldn't
need to be forcing multipathd to do this, it loads devices on its own.

We'll leave in the one in 'wait_for_rw(..)' for now because there is
some evidence that you may need to call it to update the rw state of
the multipath devices, see:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise
_Linux/6/html/Storage_Administration_Guide/ch37s04s02.html

Change-Id: Iec58284abdc9bcbf99df5d07289bb9d60a3554d7
Closes-Bug: #1623700
2016-09-21 15:32:45 -07:00
Michael Price cef2880afe Fix iSCSI discovery with ISER transport
The iSCSI discovery is not working with the ISER transport type. By default,
iscsiadm will use the tcp transport type. This patch changes the iscsiadm
invocation to provide the '-I' option with a value of iser where
ISER is in use, and a value of default otherwise.

Closes-Bug: #1594892
Change-Id: I1f0e01c4c2e928022080cfc0475fb5190066df23
2016-08-08 17:16:22 +00:00
Gorka Eguileor 45184cb9b0 Fix the mocking mess
Our tests' mocking is a little messy, we have tests that start a mocking
but don't stop it, others try to stop it but use a mock object instead
of a patcher object, so they don't stop it, and we keep using stopall to
stop the patchings instead of individual stops, and we've seen in Cinder
that this can be problematic.

This patch fixes these issue, most of them just by using base TestCase
mock_object method, and some calling the right stop method (ie:
VolumeEncryptorTestCase).

Change-Id: I545abfa8749e778e923c37e0b908efc578f70619
2016-08-03 18:40:42 +02:00
Kendall Nelson c5e3d8affb Splitting Out Connectors from connector.py
This is a larger refactor of the connector.py file. The goal is to
simplfy the file by moving the vendor connector classes to their own
files, and keep only the InitiatorConnector in the connector.py file.
The vendor specific connector tests are also split out into their own
files.

Change-Id: I020e75ca8cd8bec2ad1b38f3ade5cc1f63a4fee5
Implements: bp connector-refactor
2016-08-02 15:54:15 -05:00