Commit Graph

78 Commits

Author SHA1 Message Date
Jianjian Huo a53270a15a swift-manage-shard-ranges repair: check for parent-child overlaps.
Stuck shard ranges have been seen in the production, root cause has been
traced back to that s-m-s-r failed to detect parent-child relationship
in overlaps and it either shrinked child shard ranges into parents or
the other way around. A patch has been added to check minimum age before
s-m-s-r performs repair, which will most likely prevent this from
happening again, but we also need to check for parent-child relationship
in overlaps explicitly during repairs. This patch will do that and
remove parent or child shard ranges from doners, and prevent s-m-s-r
from shrinking them into acceptor shard ranges.

Drive-by 1: fixup gap repair probe test.
The probe test is no longer appropriate because we're no longer
allowed to repair parent-child overlaps, so replace the test with a
manually created gap.

Drive-by 2: address probe test TODOs.
The commented assertion would fail because the node filtering
comparison failed to account for the same node having different indexes
when generated for the root versus the shard. Adding a new iterable
function filter_nodes makes the node filtering behave as expected.

Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>

Change-Id: Iaa89e94a2746ba939fb62449e24bdab9666d7bab
2022-09-09 11:04:43 -07:00
Pete Zaitcev 85d0211279 Get rid of port to node assumptions and their modulo kludges
We had get_server_number for many years now, there's no reason
to continue using the node_id=port%100//10 thing.

Change-Id: I5357a095110e8e4889c0468c154611209c6e8c07
2021-09-30 00:42:24 -05:00
Pete Zaitcev bcff1282b5 Band-aid and test the crash of the account server
We have a better fix in the works, see the change
Ic53068867feb0c18c88ddbe029af83a970336545. But it is
taking too long to coalesce and users are unhappy right now.

Related: rhbz#1838242, rhbz#1965348
Change-Id: I3f7bfc2877355b7cb433af77c4e2dfdfa94ff14d
2021-08-12 16:26:48 -05:00
Alistair Coles 40aace89f0 Capture logs when running custom daemons in probe tests
Previously a DebugLogger was used when probe tests ran 'custom
daemons', which provided a means to inspect captured logs but logged
to console. This patch replaces that DebugLogger with a 'normal' swift
logger that is adapted to also capture logs and support log
inspection.

Change-Id: I25da3aa81018c5de7b63e5584ac6a9dbb73243db
2021-06-24 09:32:38 +01:00
Alistair Coles 46ea3aeae8 Quarantine stale EC fragments after checking handoffs
If the reconstructor finds a fragment that appears to be stale then it
will now quarantine the fragment.  Fragments are considered stale if
insufficient fragments at the same timestamp can be found to rebuild
missing fragments, and the number found is less than or equal to a new
reconstructor 'quarantine_threshold' config option.

Before quarantining a fragment the reconstructor will attempt to fetch
fragments from handoff nodes in addition to the usual primary nodes.
The handoff requests are limited by a new 'request_node_count'
config option.

'quarantine_threshold' defaults to zero i.e. no fragments will be
quarantined. 'request node count' defaults to '2 * replicas'.

Closes-Bug: 1655608

Change-Id: I08e1200291833dea3deba32cdb364baa99dc2816
2021-05-10 20:45:17 +01:00
Alistair Coles 122840cc04 probe test: use helper functions more widely
Use the recently added assert_subprocess_success [1] helper function
more widely.

Add run_custom_sharder helper.

Add container-sharder key to the ProbeTest.configs dict.

[1] Related-Change: I9ec411462e4aaf9f21aba6c5fd7698ff75a07de3

Change-Id: Ic2bc4efeba5ae5bc8881f0deaf4fd9e10213d3b7
2021-04-08 12:18:40 +01:00
Alistair Coles 6ed82b106c Run garbage collector during probe test setUp
DatabaseBrokers cache opened connections.  If a probe test
instantiates a DatabaseBroker, or any other class that in turn
instantiates a DatabaseBroker, such as a ContainerSharder, then
connections may hold db files open until the DatabaseBroker is garbage
collected. This can cause subsequent probe tests to fail during their
setUp() because resetswift is unable to unmount device directories
while db files are open.

A call to gc.collect() is added during setUp() to ensure db files are
closed before resetswift() is called.

Closes-Bug: 1917050
Change-Id: Ifda4407c9ecff4c636fe07e013c3ebcebd0df018
2021-02-26 15:51:06 +00:00
Alistair Coles 1dceafa7d5 ssync: sync non-durable fragments from handoffs
Previously, ssync would not sync nor cleanup non-durable data
fragments on handoffs. When the reconstructor is syncing objects from
a handoff node (a 'revert' reconstructor job) it may be useful, and is
not harmful, to also send non-durable fragments if the receiver has
older or no fragment data.

Several changes are made to enable this. On the sending side:

  - For handoff (revert) jobs, the reconstructor instantiates
    SsyncSender with a new 'include_non_durable' option.
  - If configured with the include_non_durable option, the SsyncSender
    calls the diskfile yield_hashes function with options that allow
    non-durable fragments to be yielded.
  - The diskfile yield_hashes function is enhanced to include a
    'durable' flag in the data structure yielded for each object.
  - The SsyncSender includes the 'durable' flag in the metadata sent
    during the missing_check exchange with the receiver.
  - If the receiver requests the non-durable object, the SsyncSender
    includes a new 'X-Backend-No-Commit' header when sending the PUT
    subrequest for the object.
  - The SsyncSender includes the non-durable object in the collection
    of synced objects returned to the reconstructor so that the
    non-durable fragment is removed from the handoff node.

On the receiving side:

  - The object server includes a new 'X-Backend-Accept-No-Commit'
    header in its response to SSYNC requests. This indicates to the
    sender that the receiver has been upgraded to understand the
    'X-Backend-No-Commit' header.
  - The SsyncReceiver is enhanced to consider non-durable data when
    determining if the sender's data is wanted or not.
  - The object server PUT method is enhanced to check for and
    'X-Backend-No-Commit' header before committing a diskfile.

If a handoff sender has both a durable and newer non-durable fragment
for the same object and frag-index, only the newer non-durable
fragment will be synced and removed on the first reconstructor
pass. The durable fragment will be synced and removed on the next
reconstructor pass.

Change-Id: I1d47b865e0a621f35d323bbed472a6cfd2a5971b
Closes-Bug: 1778002
2021-01-20 12:00:10 +00:00
Alistair Coles 128f199508 Refactor reconstructor probe tests
Refactor the reconstructor probe test to share common setup and helper
methods.

Change-Id: If75803648169f85b854c3d5d8784aaebbd93805b
2021-01-11 13:57:55 +00:00
Samuel Merritt b971280907 Let developers/operators add watchers to object audit
Swift operators may find it useful to operate on each object in their
cluster in some way. This commit provides them a way to hook into the
object auditor with a simple, clearly-defined boundary so that they
can iterate over their objects without additional disk IO.

For example, a cluster operator may want to ensure a semantic
consistency with all SLO segments accounted in their manifests,
or locate objects that aren't in container listings. Now that Swift
has encryption support, this could be used to locate unencrypted
objects. The list goes on.

This commit makes the auditor locate, via entry points, the watchers
named in its config file.

A watcher is a class with at least these four methods:

   __init__(self, conf, logger, **kwargs)

   start(self, audit_type, **kwargs)

   see_object(self, object_metadata, data_file_path, **kwargs)

   end(self, **kwargs)

The auditor will call watcher.start(audit_type) at the start of an
audit pass, watcher.see_object(...) for each object audited, and
watcher.end() at the end of an audit pass. All method arguments are
passed as keyword args.

This version of the API is implemented on the context of the
auditor itself, without spawning any additional processes.
If the plugins are not working well -- hang, crash, or leak --
it's easier to debug them when there's no additional complication
of processes that run by themselves.

In addition, we include a reference implementation of plugin for
the watcher API, as a help to plugin writers.

Change-Id: I1be1faec53b2cdfaabf927598f1460e23c206b0a
2020-12-26 17:16:14 -06:00
Ade Lee 5320ecbaf2 replace md5 with swift utils version
md5 is not an approved algorithm in FIPS mode, and trying to
instantiate a hashlib.md5() will fail when the system is running in
FIPS mode.

md5 is allowed when in a non-security context.  There is a plan to
add a keyword parameter (usedforsecurity) to hashlib.md5() to annotate
whether or not the instance is being used in a security context.

In the case where it is not, the instantiation of md5 will be allowed.
See https://bugs.python.org/issue9216 for more details.

Some downstream python versions already support this parameter.  To
support these versions, a new encapsulation of md5() is added to
swift/common/utils.py.  This encapsulation is identical to the one being
added to oslo.utils, but is recreated here to avoid adding a dependency.

This patch is to replace the instances of hashlib.md5() with this new
encapsulation, adding an annotation indicating whether the usage is
a security context or not.

While this patch seems large, it is really just the same change over and
again.  Reviewers need to pay particular attention as to whether the
keyword parameter (usedforsecurity) is set correctly.   Right now, all
of them appear to be not used in a security context.

Now that all the instances have been converted, we can update the bandit
run to look for these instances and ensure that new invocations do not
creep in.

With this latest patch, the functional and unit tests all pass
on a FIPS enabled system.

Co-Authored-By: Pete Zaitcev
Change-Id: Ibb4917da4c083e1e094156d748708b87387f2d87
2020-12-15 09:52:55 -05:00
Tim Burke 5bd95cf2b7 probe tests: Get rid of `server` arg for device_dir() and storage_dir()
It's not actually *used* anywhere.

Change-Id: I8f9b5cf7f5749481ef391a2029b0c4263443a89b
2020-07-16 13:50:58 -07:00
Tim Burke 630c9ef809 probe tests: Work when fronted by a TLS terminator
* Add a new config option, proxy_base_url
* Support HTTPS as well as HTTP connections
* Monkey-patch eventlet early so we never import an unpatched version
  from swiftclient

Change-Id: I4945d512966d3666f2738058f15a916c65ad4a6b
2020-05-04 10:54:01 -07:00
Clay Gerrard 2759d5d51c New Object Versioning mode
This patch adds a new object versioning mode. This new mode provides
a new set of APIs for users to interact with older versions of an
object. It also changes the naming scheme of older versions and adds
a version-id to each object.

This new mode is not backwards compatible or interchangeable with the
other two modes (i.e., stack and history), especially due to the changes
in the namimg scheme of older versions. This new mode will also serve
as a foundation for adding S3 versioning compatibility in the s3api
middleware.

Note that this does not (yet) support using a versioned container as
a source in container-sync. Container sync should be enhanced to sync
previous versions of objects.

Change-Id: Ic7d39ba425ca324eeb4543a2ce8d03428e2225a1
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Thiago da Silva <thiagodasilva@gmail.com>
2020-01-24 17:39:56 -08:00
Clay Gerrard 698717d886 Allow internal clients to use reserved namespace
Reserve the namespace starting with the NULL byte for internal
use-cases.  Backend services will allow path names to include the NULL
byte in urls and validate names in the reserved namespace.  Database
services will filter all names starting with the NULL byte from
responses unless the request includes the header:

    X-Backend-Allow-Reserved-Names: true

The proxy server will not allow path names to include the NULL byte in
urls unless a middlware has set the X-Backend-Allow-Reserved-Names
header.  Middlewares can use the reserved namespace to create objects
and containers that can not be directly manipulated by clients.  Any
objects and bytes created in the reserved namespace will be aggregated
to the user's account totals.

When deploying internal proxys developers and operators may configure
the gatekeeper middleware to translate the X-Allow-Reserved-Names header
to the Backend header so they can manipulate the reserved namespace
directly through the normal API.

UpgradeImpact: it's not safe to rollback from this change

Change-Id: If912f71d8b0d03369680374e8233da85d8d38f85
2019-11-27 11:22:00 -06:00
Tim Burke 1d7e1558b3 py3: (mostly) port probe tests
There's still one problem, though: since swiftclient on py3 doesn't
support non-ASCII characters in metadata names, none of the tests in
TestReconstructorRebuildUTF8 will pass.

Change-Id: I4ec879ade534e09c3a625414d8aa1f16fd600fa4
2019-09-04 10:17:45 -07:00
Clay Gerrard 771963c926 Increase node_timeout in gate
Give storage nodes more time to complete requests for multi-node upgrade
and probetests.

Also slightly decouple probetests from default configs.

Change-Id: I334ef517d833916a3b7be3151a812d4f9c66a6e1
2019-02-12 10:39:17 -06:00
Alistair Coles 9d742b85ad Refactoring, test infrastructure changes and cleanup
...in preparation for the container sharding feature.

Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>

Change-Id: I4455677abb114a645cff93cd41b394d227e805de
2018-05-15 18:18:25 +01:00
Alistair Coles 1f4ebbc990 kill orphans during probe test setup
orphans processes sometimes cause probe test failures so
get rid of them before each test.

Change-Id: I4ba6748d30fbb28371f13aa95387c49bc8223402
2018-02-08 16:43:18 -08:00
Clay Gerrard 1d5cf3e730 add symlink to probetest for reconciler
Change-Id: Ib2c5616f2965ab92b1c76d573e869206c91464c6
2017-12-14 12:16:39 -08:00
Steve Kowalik 5a06e3da3b No longer import nose
Since Python 2.7, unittest in the standard library has included mulitple
facilities for skipping tests by decorators as well as an exception.
Switch to that directly, rather than importing nose.

Change-Id: I4009033473ea24f0d0faed3670db844f40051f30
2017-11-07 15:39:25 +11:00
Clay Gerrard feee399840 Use check_drive consistently
We added check_drive to the account/container servers to unify how all
the storage wsgi servers treat device dirs/mounts.  Thus pushes that
unification down into the consistency engine.

Drive-by:
 * use FakeLogger less
 * clean up some repeititon in probe utility for device re-"mounting"

Related-Change-Id: I3362a6ebff423016bb367b4b6b322bb41ae08764
Change-Id: I941ffbc568ebfa5964d49964dc20c382a5e2ec2a
2017-11-01 16:33:40 +00:00
Kota Tsuyuzaki 1e79f828ad Remove all post_as_copy related code and configes
It was deprecated and we discussed on this topic in Denver PTG
for Queen cycle. Main motivation for this work is that deprecated
post_as_copy option and its gate blocks future symlink work.

Change-Id: I411893db1565864ed5beb6ae75c38b982a574476
2017-09-16 05:50:41 +00:00
Thiago da Silva d0bfd036af ready yet? nope, please wait!
Related-Change: Iab923c4f48ac7a5dd41237761ed91d01a59dc77c

Change-Id: Id4e17569e9ec856663e1539eaf72872296698367
Signed-off-by: Thiago da Silva <thiago@redhat.com>
2017-07-25 17:12:11 -04:00
Tim Burke 675145ef4a Remove deprecated vm_test_mode option
This was deprecated in the 2.5.0 release (i.e. Liberty cycle), and we've
been warning about it ever since. A year and a half seems like a long
enough time.

Change-Id: I5688e8f7dedb534071e67d799252bf0b2ccdd9b6
Related-Change: Iad91df50dadbe96c921181797799b4444323ce2e
2017-05-25 13:02:42 -07:00
Clay Gerrard d062af836c DRY out probe.common
Specifically to facilitate the reuse of the retry check server
function to fill in the creds for the test2 account which is required
for probetests after the related change.

Change-Id: I9729faa4c8c8d6d65a481bc2ea3f0566d511034c
Related-Change: I8d503419b7996721a671ed6b2795224775a7d8c6
2016-09-14 10:12:38 -07:00
Alistair Coles f679ed0cc8 Make container sync copy SLO manifests
Currently the container sync daemon fails to copy
an SLO manifest, and the error will stall progress
of the sync process on that container. There are
several reasons why the sync of an SLO manifest
may fail:

1. The GET of the manifest from the source
   container returns an X-Static-Large-Object header
   that is not allowed to be included with a PUT
   to the destination container.

2. The format of the manifest object that is read
   from the source is not in the syntax required
   for a SLO manifest PUT.

3. Assuming 2 were fixed, the PUT of the manifest
   includes an ETag header which will not match the
   md5 of the manifest generated by the receiving
   proxy's SLO middleware.

4. If the manifest is being synced to a different
   account and/or cluster, then the SLO segments may
   not have been synced and so the validation of the
   PUT manifest will fail.

This patch addresses all of these obstacles by
enabling the destination container-sync middleware to
cause the SLO middleware to be bypassed by setting a
swift.slo_override flag in the request environ. This
flag is only set for request that have been validated
as originating from a container sync peer.

This is justifed by noting that a SLO manifest PUT from
a container sync peer can be assumed to have valid syntax
because it was already been validated when written to
the source container.

Furthermore, we must allow SLO manifests to be synced
without requiring the semantic of their content to be
re-validated because we have no way to enforce or check
that segments have been synced prior to the manifest, nor
to check that the semantic of the manifest is still valid
at the source.

This does mean that GETs to synced SLO manifests may fail
if segments have not been synced. This is however
consistent with the expectation for synced DLO manifests
and indeed for the source SLO manifest if segments have
been deleted since it was written.

Co-Authored-By: Oshrit Feder <oshritf@il.ibm.com>
Change-Id: I8d503419b7996721a671ed6b2795224775a7d8c6
Closes-Bug: #1605597
2016-09-14 13:32:00 +01:00
Kota Tsuyuzaki 95a5a4a7ec Don't run probe tests if resetswift failed
Probe test is cleaning up the swift environment for each test in setUp
method. However, probe tests will run even if we cannot use the resetswift
script for some reasons (e.g. not permitted, the script not found) and
probably the probe tests will fail after a long time passed for the
execution.

To prevent such an unfortunate situation and also to find the reason
easily, this patch adds the exit code check for "resetswift" and if it
failed, the test will raise AssertionError with the stdout and stderr to
make it easy to find the reason.

Closes-Bug: #1613494

Change-Id: Id80d56ab6b71402ead4fe22c120064d78c1e74ac
2016-08-16 18:02:58 -07:00
Tim Burke 6b0e9a3e24 Remove unused (but defaulted) args
Every time we call start_server, check is True.
Every time we call check_server, we use the default timeout.

Change-Id: Id38182f15bcbfbb145b57cee179a8fd47ec8e2b7
2016-06-02 16:49:32 +00:00
Kota Tsuyuzaki e56a1a550a pids in probe is no longer used
Change-Id: I1fd76004257a8c05ce8bb1f3ca0e45000509f833
2016-06-01 23:53:35 -07:00
Jenkins 2a0935e9e3 Merge "Send correct size in POST async update for EC object" 2016-06-01 22:15:31 +00:00
Tim Burke a821dd42de Don't include holes when reporting how many devices a ring has
Change-Id: I9b933051aec009c6108ee9d2dd5c0978772bf699
2016-05-26 13:42:12 -07:00
Alistair Coles c1b1a5a0ee Send correct size in POST async update for EC object
When a PUT request is made to an EC object the resulting container
update must include the override values for the actual object
etag and size, as opposed to the fragment etag and size. When a POST
request is made the same override values should be included in the
container update, but currently the update includes the incorrect EC
fragment size (but the correct body etag).

This is ok so long as the update for the object PUT request arrives at
the container server first (whether by direct update or replication)
because the etag and size values in an update due to an object POST
will not have a newer timestamp that the PUT and will therefore be
ignored at the container server.

However, if the update due to the object PUT request has not arrived
at the container server when the update due to the object POST
arrives, then the etag and incorrect size sent with the POST update
will be recorded in the container server. If the update due to the PUT
subsequently arrives it will not fix this error because the timestamp
of its etag and size values is not greater than that of the already
recorded values.

Fortunately the correct object body size is persisted with the object
as X-Backend-Container-Update-Override-Size sysmeta so this patch
fixes the container update due to a POST to use that value instead of
the Content-Length metadata.

Closes-Bug: #1582723
Change-Id: Ide7c9c59eb41aa09eaced2acfd0700f882c6eab1
2016-05-17 15:00:21 +01:00
Alistair Coles e91de49d68 Update container on fast-POST
This patch makes a number of changes to enable content-type
metadata to be updated when using the fast-POST mode of
operation, as proposed in the associated spec [1].

* the object server and diskfile are modified to allow
  content-type to be updated by a POST and the updated value
  to be stored in .meta files.

* the object server accepts PUTs and DELETEs with older
  timestamps than existing .meta files. This is to be
  consistent with replication that will leave a later .meta
  file in place when replicating a .data file.

* the diskfile interface is modified to provide accessor
  methods for the content-type and its timestamp.

* the naming of .meta files is modified to encode two
  timestamps when the .meta file contains a content-type value
  that was set prior to the latest metadata update; this
  enables consistency to be achieved when rsync is used for
  replication.

* ssync is modified to sync meta files when content-type
  differs between local and remote copies of objects.

* the object server issues container updates when handling
  POST requests, notifying the container server of the current
  immutable metadata (etag, size, hash, swift_bytes),
  content-type with their respective timestamps, and the
  mutable metadata timestamp.

* the container server maintains the most recently reported
  values for immutable metadata, content-type and mutable
  metadata, each with their respective timestamps, in a single
  db row.

* new probe tests verify that replication achieves eventual
  consistency of containers and objects after discrete updates
  to content-type and mutable metadata, and that container-sync
  sync's objects after fast-post updates.

[1] spec change-id: I60688efc3df692d3a39557114dca8c5490f7837e

Change-Id: Ia597cd460bb5fd40aa92e886e3e18a7542603d01
2016-03-03 14:25:10 +00:00
Christian Schwede c30ceec6f1 Fix ring device checks in probetests
If a device has been removed from one of the rings, it actually is set as None
within the ring. In that case the length of the devices is not True without
filtering the None devices. However, if the length matched the condition but
included a removed device the probetests would fail with a TypeError.

This fix could be done also in swift/common/ring/ring.py, but it seems it only
affects probetests right now, thus fixing it there and not changing the current
behavior.

Change-Id: I8ccf9b32a51957e040dd370bc9f711d4328d17b1
2015-10-07 19:59:15 +00:00
Romain LE DISEZ 71f6fd025e Allows to configure the rsync modules where the replicators will send data
Currently, the rsync module where the replicators send data is static. It
forbids administrators to set rsync configuration based on their current
deployment or needs.

As an example, the rsyncd configuration example encourages to set a connections
limit for the modules account, container and object. It permits to protect
devices from excessives parallels connections, because it would impact
performances.

On a server with many devices, it is tempting to increase this number
proportionally, but nothing guarantees that the distribution of the connections
will be balanced. In the worst scenario, a single device can receive all the
connections, which is a severe impact on performances.

This commit adds a new option named 'rsync_module' to the *-replicator sections
of the *-server configuration file. This configuration variable can be
extrapolated with device attributes like ip, port, device, zone, ... by using
the format {NAME}. eg:
    rsync_module = {replication_ip}::object_{device}

With this configuration, an administrators can solve the problem of connections
distribution by creating one module per device in rsyncd configuration.

The default values are backward compatible:
    {replication_ip}::account
    {replication_ip}::container
    {replication_ip}::object

Option vm_test_mode is deprecated by this commit, but backward compatibility is
maintained. The option is only effective when rsync_module is not set. In that
case, {replication_port} is appended to the default value of rsync_module.

Change-Id: Iad91df50dadbe96c921181797799b4444323ce2e
2015-09-07 08:00:18 +02:00
paul luse 893f30c61d EC GET path: require fragments to be of same set
And if they are not, exhaust the node iter to go get more.  The
problem without this implementation is a simple overwrite where
a GET follows before the handoff has put the newer obj back on
the 'alive again' node such that the proxy gets n-1 fragments
of the newest set and 1 of the older.

This patch bucketizes the fragments by etag and if it doesn't
have enough continues to exhaust the node iterator until it
has a large enough matching set.

Change-Id: Ib710a133ce1be278365067fd0d6610d80f1f7372
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Closes-Bug: 1457691
2015-08-27 21:09:41 -07:00
janonymous 923238aa1b test/(functional/probe):Replace python print operator with print function (pep H233, py33)
'print' function is compatible with 2.x and 3.x python versions
Link : https://www.python.org/dev/peps/pep-3105/

Python 2.6 has a __future__ import that removes print as language syntax,
letting you use the functional form instead

Change-Id: I416c6ac21ccbfb91ec328ffb1ed21e492ef52d58
2015-08-20 11:42:58 +09:00
Clay Gerrard 768d7ab074 Add a probetest for HUP/reload
This would have been enough to catch the regression, and we can extend
them as we work on any future ehancements to our process management.

Change-Id: I9a1b57aa15663380c45cf783afc8212ab4ffbace
2015-07-30 15:49:23 -07:00
Victor Stinner e24d7c36fa Use six to fix imports on Python 3
Get configparser, queue, http_client modules from six.moves.

Patch generated by the six_moves operation of the sixer tool:
https://pypi.python.org/pypi/sixer

Change-Id: I666241ab50101b8cc6f992dd80134ce27327bd7d
2015-07-24 11:48:28 +02:00
Victor Stinner e70b66586e Replace dict.iteritems() with dict.items()
The iteritems() of Python 2 dictionaries has been renamed to items() on
Python 3. According to a discussion on the openstack-dev mailing list,
the overhead of creating a temporary list using dict.items() on Python 2
is very low because most dictionaries are small:

http://lists.openstack.org/pipermail/openstack-dev/2015-June/066391.html

Patch generated by the following command:

    sed -i 's,iteritems,items,g' \
      $(find swift -name "*.py") \
      $(find test -name "*.py")

Change-Id: I6070bb6c684be76e8e77222a7d280ec6edd43496
2015-06-24 09:39:55 +02:00
Darrell Bishop df134df901 Allow 1+ object-servers-per-disk deployment
Enabled by a new > 0 integer config value, "servers_per_port" in the
[DEFAULT] config section for object-server and/or replication server
configs.  The setting's integer value determines how many different
object-server workers handle requests for any single unique local port
in the ring.  In this mode, the parent swift-object-server process
continues to run as the original user (i.e. root if low-port binding
is required), binds to all ports as defined in the ring, and forks off
the specified number of workers per listen socket.  The child, per-port
servers drop privileges and behave pretty much how object-server workers
always have, except that because the ring has unique ports per disk, the
object-servers will only be handling requests for a single disk.  The
parent process detects dead servers and restarts them (with the correct
listen socket), starts missing servers when an updated ring file is
found with a device on the server with a new port, and kills extraneous
servers when their port is found to no longer be in the ring.  The ring
files are stat'ed at most every "ring_check_interval" seconds, as
configured in the object-server config (same default of 15s).

Immediately stopping all swift-object-worker processes still works by
sending the parent a SIGTERM.  Likewise, a SIGHUP to the parent process
still causes the parent process to close all listen sockets and exit,
allowing existing children to finish serving their existing requests.
The drop_privileges helper function now has an optional param to
suppress the setsid() call, which otherwise screws up the child workers'
process management.

The class method RingData.load() can be told to only load the ring
metadata (i.e. everything except replica2part2dev_id) with the optional
kwarg, header_only=True.  This is used to keep the parent and all
forked off workers from unnecessarily having full copies of all storage
policy rings in memory.

A new helper class, swift.common.storage_policy.BindPortsCache,
provides a method to return a set of all device ports in all rings for
the server on which it is instantiated (identified by its set of IP
addresses).  The BindPortsCache instance will track mtimes of ring
files, so they are not opened more frequently than necessary.

This patch includes enhancements to the probe tests and
object-replicator/object-reconstructor config plumbing to allow the
probe tests to work correctly both in the "normal" config (same IP but
unique ports for each SAIO "server") and a server-per-port setup where
each SAIO "server" must have a unique IP address and unique port per
disk within each "server".  The main probe tests only work with 4
servers and 4 disks, but you can see the difference in the rings for the
EC probe tests where there are 2 disks per server for a total of 8
disks.  Specifically, swift.common.ring.utils.is_local_device() will
ignore the ports when the "my_port" argument is None.  Then,
object-replicator and object-reconstructor both set self.bind_port to
None if server_per_port is enabled.  Bonus improvement for IPv6
addresses in is_local_device().

This PR for vagrant-swift-all-in-one will aid in testing this patch:
https://github.com/swiftstack/vagrant-swift-all-in-one/pull/16/

Also allow SAIO to answer is_local_device() better; common SAIO setups
have multiple "servers" all on the same host with different ports for
the different "servers" (which happen to match the IPs specified in the
rings for the devices on each of those "servers").

However, you can configure the SAIO to have different localhost IP
addresses (e.g. 127.0.0.1, 127.0.0.2, etc.) in the ring and in the
servers' config files' bind_ip setting.

This new whataremyips() implementation combined with a little plumbing
allows is_local_device() to accurately answer, even on an SAIO.

In the default case (an unspecified bind_ip defaults to '0.0.0.0') as
well as an explict "bind to everything" like '0.0.0.0' or '::',
whataremyips() behaves as it always has, returning all IP addresses for
the server.

Also updated probe tests to handle each "server" in the SAIO having a
unique IP address.

For some (noisy) benchmarks that show servers_per_port=X is at least as
good as the same number of "normal" workers:
https://gist.github.com/dbishop/c214f89ca708a6b1624a#file-summary-md

Benchmarks showing the benefits of I/O isolation with a small number of
slow disks:
https://gist.github.com/dbishop/fd0ab067babdecfb07ca#file-results-md

If you were wondering what the overhead of threads_per_disk looks like:
https://gist.github.com/dbishop/1d14755fedc86a161718#file-tabular_results-md

DocImpact

Change-Id: I2239a4000b41a7e7cc53465ce794af49d44796c6
2015-06-18 12:43:50 -07:00
Clay Gerrard a3559edc23 Exclude local_dev from sync partners on failure
If the primary left or right hand partners are down, the next best thing
is to validate the rest of the primary nodes.  Where the rest should
exclude not just the left and right hand partners - but ourself as well.

This fixes a accidental noop when partner node is unavailable and
another node is missing data.

Validation:

Add probetests to cover ssync failures for the primary sync_to nodes for
sync jobs.

Drive-by:

Make additional plumbing for the check_mount and check_dir constraints into
the remaining daemons.

Change-Id: I4d1c047106c242bca85c94b569d98fd59bb255f4
2015-05-26 12:50:31 -07:00
Clay Gerrard 52b102163e Don't apply the wrong Etag validation to rebuilt fragments
Because of the object-server's interaction with ssync sender's
X-Backend-Replication-Headers when a object (or fragment archive) is
pushed unmodified to another node it's ETag value is duped into the
recieving ends metadata as Etag.  This interacts poorly with the
reconstructor's RebuildingECDiskFileStream which can not know ahead of
time the ETag of the fragment archive being rebuilt.

Don't send the Etag from the local source fragment archive being used as
the basis for the rebuilt fragent archive's metadata along to ssync.

Change-Id: Ie59ad93a67a7f439c9a84cd9cff31540f97f334a
2015-04-15 23:33:32 +01:00
paul luse 647b66a2ce Erasure Code Reconstructor
This patch adds the erasure code reconstructor. It follows the
design of the replicator but:
  - There is no notion of update() or update_deleted().
  - There is a single job processor
  - Jobs are processed partition by partition.
  - At the end of processing a rebalanced or handoff partition, the
    reconstructor will remove successfully reverted objects if any.

And various ssync changes such as the addition of reconstruct_fa()
function called from ssync_sender which performs the actual
reconstruction while sending the object to the receiver

Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
blueprint ec-reconstructor
Change-Id: I7d15620dc66ee646b223bb9fff700796cd6bef51
2015-04-14 00:52:17 -07:00
Martin Kletzander 76b106fc01 Fix common misspellings
Wikipedia's list of common misspellings [1] has a machine-readable
version.  This patch fixes those misspellings mentioned in the list
which don't have multiple right variants (as e.g. "accension", which can
be both "accession" and "ascension"), such misspellings are left
untouched.  The list of changes was manually re-checked for false
positives.

[1] https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines

Change-Id: Ic9a5438629664f7cea216413a28acc0e8992da05
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
2015-03-24 11:07:56 +01:00
Leah Klearman ca0fce8542 more probe test refactoring
* move get_to_final_state into ProbeTest
* get rid of kill_servers
* add replicators manager and updaters manager to ProbeTest

(this is all going someplace, i promise)

Change-Id: I8393a2ebc0d04051cae48cc3c49580f70818dbf2
2015-02-13 16:55:45 -08:00
Leah Klearman 2c1b5af062 refactor probe tests
* refactor probe tests to use probe.common.ProbeTest
* move reset_environment functionality to ProbeTest.setUp()
* choose rings and policies that meet the criteria - raise SkipTest if
nothing matches
* replace all AssertionErrors in setup with SkipTest

Change-Id: Id56c497d58083f5fd55f5283cdd346840df039d3
2015-02-12 11:30:21 -08:00
Alistair Coles 22b65846aa Make probe tests tolerate deprecated policies
A deprecated policy in swift.conf causes errors in
probe tests that may attempt to use that policy.

This patch introduces a list ENABLED_POLICIES in
test/probe/common.py and changes probe tests to only
use policies contained in that list.

Change-Id: Ie65477c15d631fcfc3a4a5772fbe6d7d171b22b0
2014-09-09 13:09:37 +01:00
Yuan Zhou ad2a9cefe5 Fixes probe tests with non-zero default storage policy
Add headers param to direct_client.direct_get_object, which is used in
probetests to passthrough the X-Storage-Policy-Index header.

DocImpact
Implements: blueprint storage-policies
Change-Id: I19adbbcefbc086c8467bd904a275d55cde596412
2014-06-18 21:09:53 -07:00