Commit Graph

181 Commits

Author SHA1 Message Date
Shreeya Deshpande bc3a59bdd3 Refactor utils
- Move statsd client into it's own module
- Move all logging functions into their own module
- Move all config functions into their own module
- Move all helper functions into their own module

Partial-Bug: #2015274
Change-Id: Ic4b5005e3efffa8dba17d91a41e46d5c68533f9a
2024-04-30 20:27:47 +00:00
Tim Burke c522f5676e Add ClosingIterator class; be more explicit about closes
... in document_iters_to_http_response_body.

We seemed to be relying a little too heavily upon prompt garbage
collection to log client disconnects, leading to failures in
test_base.py::TestGetOrHeadHandler::test_disconnected_logging
under python 3.12.

Closes-Bug: #2046352
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I4479d2690f708312270eb92759789ddce7f7f930
2024-02-12 11:16:09 +00:00
Alistair Coles dc3eda7e89 proxy: don't send multi-part terminator when no parts sent
If the proxy timed out while reading a replicated policy multi-part
response body, it would transform the ChunkReadTimeout to a
StopIteration. This masks the fact that the backend read has
terminated unexpectedly. The document_iters_to_multipart_byteranges
would complete iterating over parts and send a multipart terminator
line, even though no parts may have been sent.

This patch removes the conversion of ChunkReadTmeout to StopIteration.
The ChunkReadTimeout that is now raised prevents the
document_iters_to_multipart_byteranges 'for' loop completing and
therefore stops the multi-part terminator line being sent. It is
raised from the GetOrHeadHandler similar to other scenarios that raise
ChunkReadTimeouts while the resp body is being read.

A ChunkReadTimeout exception handler is removed in the
_iter_parts_from_response method. This handler was previously never
reached (because StopIteration rather than ChunkReadTimeout was raised
from _get_next_response_part), but if it were reached (i.e. with this
change) then it would repeat logging of the error and repeat
incrementing the node's error counter.

This change in the GetOrHeadHandler mimics a similar change in the
ECFragGetter [1].

[1] Related-Chage: I0654815543be3df059eb2875d9b3669dbd97f5b4
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Change-Id: I6dd53e239f5e7eefcf1c74229a19b1df1c989b4a
2024-02-05 10:28:40 +00:00
Alistair Coles 72ac5b3be0 proxy: refactor to share namespace cache helpers
Create new helper functions to set and get namespaces in cache. Use
these in both the object and container controllers when caching
namespaces for updating and listing state shard ranges respectively.

Add unit tests for the new helper functions.

No intentional behavioural changes.

Change-Id: I6833ec64540fa19f658f0ee78952ecb43b49f169
2023-11-21 10:30:32 +00:00
Alistair Coles f8c94d6bbc proxy-server: add replicated GET path tests
Improve test coverage for the resuming multipart replicated GET path.

Change-Id: I7de34f443399f645f5021ed392e515f795ed7249
2023-09-21 12:02:20 +01:00
Alistair Coles 369a72c4cf proxy: remove client_chunk_size and skip_bytes from GetOrHeadHandler
The client_chunk_size attribute was introduced into GetOrHeadHandler
for EC support [1]. It was only ever not None for an
ECObjectController. The ECObjectController stopped using
GetOrHeadHandler for Object GET when the ECFragGetter class was
introduced [2], but the EC specific code was not expunged from
GetOrHeadHandler. In [3] the ECFragGetter client_chunk_size was renamed
to fragment_size to better reflect what it represented.

The skip_bytes attribute was similarly introduced for EC support. It
is only ever non-zero if client_chunk_size is an int. For EC,
skip_bytes is used to undo the effect of expanding the backend
range(s) to fetch whole fragments: the range(s) of decoded bytes
returned to the client may need to be narrower than the backend
ranges. There is no equivalent requirement for replicated GETs.

The elimination of client_chunk_size and skip_bytes simplifies the
yielding of chunks from the GetOrHeadHandler response iter.

Related-Change:
[1] I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
[2] I0dc5644a84ededee753e449e053e6b1786fdcf32
[3] Ie1efaab3bd0510275d534b5c023cb73c98bec90d

Change-Id: I31ed36d32682469e3c5ca8bf9a2b383568d63c72
2023-07-24 09:15:12 -05:00
Shreeya Deshpande 647ee83906 Unit test for keepalive timeout
Create a unit test to verify client timeout for multiple requests

Change-Id: I974e01cd2cb18f4ea87c3966dbf4b06bff22ed39
2023-05-10 09:01:41 -07:00
Clay Gerrard c95f8e6c05 tests for wsgi/daemon config parsing
Change-Id: Ibb82555830b88962cc765fc88281ca42a9ce9d9c
2023-04-14 14:51:23 -05:00
Tim Burke be16d6c4fd tests: Get rid of test.unit.SkipTest
unittest.SkipTest suffices.

Change-Id: I11eb73f7dc4a8598fae85d1efca721f69067fb4f
2023-02-16 23:59:53 -08:00
Zuul 6994200026 Merge "Remove :memory: from DatabaseBrokers and unittests" 2023-02-09 08:46:13 +00:00
Tim Burke 69b18e3c50 tests: Remove references to soft_lock
As best as I can tell, this has *never* been an interface.

Change-Id: I42e4b82a7af8a81e497e68ad25ac3bc4d0d74970
2023-01-11 14:05:34 -08:00
Matthew Oliver c4e00eb89f Sharder: Fall back to local device in get_shard_broker
If the sharder is processing a node that has 0 weight, especially for
all the devices on the node, the `find_local_handoff_for_part` can
fail because there will be no local hand off devices available as it
uses the replica2part2dev_id to find a device.  However, a 0 weighted
device won't appear in the replica2part2dev table.

This patch extends `find_local_handoff_for_part`, if it fails to find
a node from the ring it'll fall back to a local device identified by
the `_local_device_ids` that is built up when the replicator or
sharder was identifing local devices. This uses the ring.devs, so does
include 0 weighted devices.  This allows the sharder to find a
location to write the shard_broker in a handoff location while
sharding.

Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Change-Id: Ic38698e9ca0397770c7362229baef1101a72788f
2022-07-29 15:02:26 +01:00
Matthew Oliver a548da916f Remove :memory: from DatabaseBrokers and unittests
The SQLite in-memory databases have been great for testing but
as the swift DatabaseBroker's have become more complex, the limitations
of in memory databases are being reached. Mostly due
to the introduction of container sharding where a broker sometimes needs
to make multiple connections to the same database as the same time.

Rather then rework the real broker logic to better support in-memory
testing, it's actually easier to just remove the in-memory broker tests
and use a "real" broker in a tempdir. This allows us to better test how
brokers behave in real life, pending files and all.

This patch replaces all the :memory: brokers in the tests with real ones
placed in a tempdir. To achieve this, we new base unittest class `TestDBBase`
has been added that creates, cleans up and provides some helper methods
to manage the db path and location.

Further, all references to :memory: in the Database brokers have been
removed.

Change-Id: I5983132f776b84db634fef39c833d5cfdce11980
2022-07-12 12:30:43 +10:00
Zuul eeb5533457 Merge "memcached: Give callers the option to accept errors" 2022-05-13 16:44:57 +00:00
Alistair Coles 2f607cd319 Round s3api listing LastModified to integer resolution
s3api bucket listing elements currently have LastModified values with
millisecond precision. This is inconsistent with the value of the
Last-Modified header returned with an object GET or HEAD response
which has second precision. This patch reduces the precision to
seconds in bucket listings and upload part listings. This is also
consistent with observation of an aws listing response.

The last modified values in the swift native listing *up* to
the nearest second to be consistent with the seconds-precision
Last-Modified time header that is returned with an object GET or HEAD.
However, we continue to include millisecond digits set to 0 in the
last-modified string, e.g.: '2014-06-10T22:47:32.000Z'.

Also, fix the last modified time returned in an object copy response
to be consistent with the last modified time of the object that was
created. Previously it was rounded down, but it should be rounded up.

Change-Id: I8c98791a920eeedfc79e8a9d83e5032c07ae86d3
2022-05-10 11:26:27 +01:00
Tim Burke 9bed525bfb memcached: Give callers the option to accept errors
Auth middlewares in particular may want to *know* when there's a
communication breakdown as opposed to a cache miss.

Update our shard-range cache stats to acknowlegde the distinction.

Drive-by: Log an error if all memcached servers are error-limited.

Change-Id: Ic8d0915235d11124d06ec940c5be9a2edbe85c83
2022-04-28 13:20:44 -07:00
Tim Burke 874a5865b8 tests: Improve FakeMemcache call tracking
Make it much more like mock tracking, so we can easily add new kwargs.

Change-Id: Ib29816c4626bb0d914929783bd676e8b6cb19bbf
2022-01-07 13:09:43 -08:00
Tim Burke f7101f3795 tests: Unify FakeMemcaches
Change-Id: I114d1628bb6dea04f246ff3ab12f4ccfdc4ec358
2022-01-06 10:13:15 -08:00
Tim Burke 1eaf7474fe Fix some imports for py310
Between this and the (unreleased) pyeclib fix, I see unit and func tests
passing on py310. Haven't tried probe tests, yet.

Change-Id: Iacf66eda75fed6bf96900107250f393227c57ae5
2021-11-25 14:54:17 -08:00
Zuul b04e7c2e53 Merge "Switch get(full)argspec function according to python version" 2021-07-15 18:21:40 +00:00
Takashi Kajinami e00ae03370 Switch get(full)argspec function according to python version
inspect.getargspec was deprecated since Python 3.0 and
inspect.getfullargspec is its replacement with correct handling of
function annotations and keyword-only parameters[1].

This change ensures that inspect.getfullargspec is used in Python 3.

[1] https://docs.python.org/3/library/inspect.html#inspect.getargspec

Change-Id: I63a4fda4f5da00c0f752e58f2e7192baea5012bb
2021-07-15 23:51:23 +09:00
Matthew Oliver e491693e36 reconciler: PPI aware reconciler
This patch makes the reconciler PPI aware. It does this by adding a
helper method `can_reconcile_policy` that is used to check that the
policies used for the source and destination aren't in the middle of a
PPI (their ring doesn't have next_part_power set).

In order to accomplish this the reconciler has had to include the
POLICIES singleton and grown swift_dir and ring_check_interval config options.

Closes-Bug: #1934314
Change-Id: I78a94dd1be90913a7a75d90850ec5ef4a85be4db
2021-07-13 13:55:13 +10:00
Matthew Oliver 4ce907a4ae relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:

  GET /recon/relinker

To gather get live relinking progress data:

  $ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
  {
      "devices": {
          "sdb3": {
              "parts_done": 523,
              "policies": {
                  "1": {
                      "next_part_power": 11,
                      "start_time": 1618998724.845616,
                      "stats": {
                          "errors": 0,
                          "files": 1630,
                          "hash_dirs": 1630,
                          "linked": 1630,
                          "policies": 1,
                          "removed": 0
                      },
                      "timestamp": 1618998730.24672,
                      "total_parts": 1029,
                      "total_time": 5.400741815567017
                  }},
              "start_time": 1618998724.845946,
              "stats": {
                  "errors": 0,
                  "files": 836,
                  "hash_dirs": 836,
                  "linked": 836,
                  "removed": 0
              },
              "timestamp": 1618998730.24672,
              "total_parts": 523,
              "total_time": 5.400741815567017
          },
          "sdb7": {
              "parts_done": 506,
              "policies": {
                  "1": {
                      "next_part_power": 11,
                      "part_power": 10,
                      "parts_done": 506,
                      "start_time": 1618998724.845616,
                      "stats": {
                          "errors": 0,
                          "files": 794,
                          "hash_dirs": 794,
                          "linked": 794,
                          "removed": 0
                      },
                      "step": "relink",
                      "timestamp": 1618998730.166175,
                      "total_parts": 506,
                      "total_time": 5.320528984069824
                  }
              },
              "start_time": 1618998724.845616,
              "stats": {
                  "errors": 0,
                  "files": 794,
                  "hash_dirs": 794,
                  "linked": 794,
                  "removed": 0
              },
              "timestamp": 1618998730.166175,
              "total_parts": 506,
              "total_time": 5.320528984069824
          }
      },
      "workers": {
          "100": {
              "drives": ["sda1"],
              "return_code": 0,
              "timestamp": 1618998730.166175}
      }}

Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.

Mock recon_cache_path more widely and assert no error logs more
widely.

Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021-05-10 16:13:32 +01:00
Alistair Coles 29418998b7 Fix shrinking making acceptors prematurely active
During sharding a shard range is moved to CLEAVED state when cleaved
from its parent. However, during shrinking an acceptor shard should
not be moved to CLEAVED state when the shrinking shard cleaves to it,
because the shrinking shard is not the acceptor's parent and does not
know if the acceptor has yet been cleaved from its parent.

The existing attempt to prevent a shrinking shard updating its
acceptor state relied on comparing the acceptor namespace to the
shrinking shard namespace: if the acceptor namespace fully enclosed
the shrinkng shard then it was inferred that shrinking was taking
place. That check is sufficient for normal shrinking of one shard into
an expanding acceptor, but is not sufficient when shrinking in order
to fix overlaps, when a shard might shrink into more than one
acceptor, none of which completely encloses the shrinking shard.

Fortunately, since [1], it is possible to determine that a shard is
shrinking from its own shard range state being either SHRINKING or
SHRUNK.

It is still advantageous to delete and merge the shrinking shard range
into the acceptor when the acceptor fully encloses the shrinking shard
because that increases the likelihood of the root being updated with
the deleted shard range in a timely manner.

[1] Related-Change: I9034a5715406b310c7282f1bec9625fe7acd57b6
Change-Id: I91110bc747323e757d8b63003ad3d38f915c1f35
2021-04-29 09:38:46 +01:00
Clay Gerrard 2a312d1cd5 Cleanup tests' import of debug_logger
Change-Id: I19ca860deaa6dbf388bdcd1f0b0f77f72ff19689
2021-04-27 12:04:41 +01:00
Clay Gerrard 4a4d899680 Refactor EC multipart/byteranges control flow
The multipart document handling in the proxy is consumed via iteration,
but the error handling code is not consistent with how it applies
conversions of IO errors/timeouts and retry failures to StopIteration.

In an effort to make the code more obvious and easier to debug and
maintain I've added comments and additional tests as well as tightening
up StopIteration exception handling.

Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I0654815543be3df059eb2875d9b3669dbd97f5b4
2021-04-21 12:45:20 -05:00
Alistair Coles 8f4200791b Move DebugLogger to its own module
Move DebugLogger and associated classes to its own module under test
so that it can be imported (for example in probe tests) without
requiring all the dependencies in test/unit/__init__.py.

Change-Id: I0ea3c26e54d91f27159805a45e49ad7f8f0e0431
2021-01-22 10:45:01 -06:00
Tim Burke 6f813f6bfa Fix __exit__ calls
The context manager protocol requires that __exit__ be called with three
args: type, value, and traceback. In some places, we didn't include any
args at all, leading to test failures during clean-up.

Change-Id: I2998830e6eac685b1f753937d12cf5346a4eb081
2021-01-13 12:42:23 -08:00
Alistair Coles 077ba77ea6 Use cached shard ranges for container GETs
This patch makes four significant changes to the handling of GET
requests for sharding or sharded containers:
  - container server GET requests may now result in the entire list of
    shard ranges being returned for the 'listing' state regardless of
    any request parameter constraints.
  - the proxy server may cache that list of shard ranges in memcache
    and the requests environ infocache dict, and subsequently use the
    cached shard ranges when handling GET requests for the same
    container.
  - the proxy now caches more container metadata so that it can
    synthesize a complete set of container GET response headers from
    cache.
  - the proxy server now enforces more container GET request validity
    checks that were previously only enforced by the backend server,
    e.g. checks for valid request parameter values

With this change, when the proxy learns from container metadata
that the container is sharded then it will cache shard
ranges fetched from the backend during a container GET in memcache.
On subsequent container GETs the proxy will use the cached shard
ranges to gather object listings from shard containers, avoiding
further GET requests to the root container until the cached shard
ranges expire from cache.

Cached shard ranges are most useful if they cover the entire object
name space in the container. The proxy therefore uses a new
X-Backend-Override-Shard-Name-Filter header to instruct the container
server to ignore any request parameters that would constrain the
returned shard range listing i.e. 'marker', 'end_marker', 'includes'
and 'reverse' parameters.  Having obtained the entire shard range
listing (either from the server or from cache) the proxy now applies
those request parameter constraints itself when constructing the
client response.

When using cached shard ranges the proxy will synthesize response
headers from the container metadata that is also in cache. To enable
the full set of container GET response headers to be synthezised in
this way, the set of metadata that the proxy caches when handling a
backend container GET response is expanded to include various
timestamps.

The X-Newest header may be used to disable looking up shard ranges
in cache.

Change-Id: I5fc696625d69d1ee9218ee2a508a1b9be6cf9685
2021-01-06 16:28:49 +00:00
Alistair Coles 5e33026495 Use CloseableChain when creating iterator of SLO response
When handling a GET response ProxyLoggingMiddleware will try to close
a reiterated [1] proxy response iterator if, for example, there is a
client disconnect.

The reiterate function encapsulates the result of calling iter() on
the proxy response. In the case of an SLO response, the iter method
returned an instance of iterchools.chain, rather than the response
itself, which is an instance of SegmentedIterable. As a result the
SegmentedIterable.close() method would not be called and object server
connections would not be closed.

This patch replaces the iterchools.chain with a CloseableChain which
encapsulates the SegmentedIterable and closes it when
CloseableChain.close() is called.

[1] The use of reiterate was introduced by the Related-Change.

Closes-Bug: #1909588
Related-Change: I27feabe923a6520e983637a9c68a19ec7174a0df
Change-Id: Ib7450a85692114973782525004466db49f63066d
2020-12-29 16:14:28 +00:00
Ade Lee 5320ecbaf2 replace md5 with swift utils version
md5 is not an approved algorithm in FIPS mode, and trying to
instantiate a hashlib.md5() will fail when the system is running in
FIPS mode.

md5 is allowed when in a non-security context.  There is a plan to
add a keyword parameter (usedforsecurity) to hashlib.md5() to annotate
whether or not the instance is being used in a security context.

In the case where it is not, the instantiation of md5 will be allowed.
See https://bugs.python.org/issue9216 for more details.

Some downstream python versions already support this parameter.  To
support these versions, a new encapsulation of md5() is added to
swift/common/utils.py.  This encapsulation is identical to the one being
added to oslo.utils, but is recreated here to avoid adding a dependency.

This patch is to replace the instances of hashlib.md5() with this new
encapsulation, adding an annotation indicating whether the usage is
a security context or not.

While this patch seems large, it is really just the same change over and
again.  Reviewers need to pay particular attention as to whether the
keyword parameter (usedforsecurity) is set correctly.   Right now, all
of them appear to be not used in a security context.

Now that all the instances have been converted, we can update the bandit
run to look for these instances and ensure that new invocations do not
creep in.

With this latest patch, the functional and unit tests all pass
on a FIPS enabled system.

Co-Authored-By: Pete Zaitcev
Change-Id: Ibb4917da4c083e1e094156d748708b87387f2d87
2020-12-15 09:52:55 -05:00
Clay Gerrard 5f95e1bece Use bigger GreenPool for concurrent EC
We're getting some blockage trying to feed backup requests in waterfall
EC because the pool_size was limited to the initial batch of requests.
This was (un?)fortunately working out in practice because there were
lots of initial primary fragment requests and some would inevitably be
quick enough to make room for the pending feeder requests.  But when
enough of the initial requests were slow (network issue at the proxy?)
we wouldn't have the expected number of pending backup requests
in-flight.  Since concurrent EC should never make extra requests to
non-primaries (at least not until an existing primary request
completes) ec_n_unique_fragments makes a reasonable cap for the pool.

Drive-bys:

 * Don't make concurrent_ec_extra_requests unless you have enabled
   concurrent_gets.
 * Improved mock_http_connect extra requests tracking formatting
 * FakeStatus __repr__'s w/ status code in AssertionErrors

Change-Id: Iec579ed874ef097c659dc80fff1ba326b6da05e9
2020-09-25 09:47:40 -05:00
Tim Burke d5625abf60 proxy: Include thread_locals when spawning _fragment_GET_request
Otherwise, we miss out on transaction id and client IP information when
timeouts pop.

Closes-Bug: #1892421
Change-Id: I6dea3ccf780bcc703db8447a2ef13c33838ff12d
2020-09-08 15:00:02 -07:00
Tim Burke b7b45eadcd ec: Close down some unused responses more quickly
These should get GC'ed eventually, but sooner is probably better than later.

Change-Id: I4daa18c36235e6df65e8b1c00a12dbf10677ca61
2020-09-03 11:20:07 -07:00
Tim Burke 2a6dfae2f3 Allow direct and internal clients to use the replication network
A new header `X-Backend-Use-Replication-Network` is added; if true, use
the replication network instead of the client-data-path network.

Several background daemons are updated to use the replication network:

  * account-reaper
  * container-reconciler
  * container-sharder
  * container-sync
  * object-expirer

Note that if container-sync is being used to sync data within the same
cluster, the replication network will only be used when communicating
with the "source" container; the "destination" traffic will continue to
use the configured realm endpoint.

The direct and internal client APIs still default to using the
client-data-path network; this maintains backwards compatibility for
external tools written against them.

UpgradeImpact
=============

Until recently, servers configured with

  replication_server = true

would only handle REPLICATE (and, in the case of object servers, SSYNC)
requests, and would respond 405 Method Not Allowed to other requests.
When upgrading from Swift 2.25.0 or earlier, remove the config option
and restart services prior to upgrade to avoid a flood of background
daemon errors in logs.

Note that some background daemons find work by querying Swift rather
than walking local drives that should be available on the replication
network:

  * container-reconciler
  * object-expirer

Previosuly these may have been configured without access to the
replication network; ensure they have access before upgrading.

Closes-Bug: #1883302
Related-Bug: #1446873
Related-Change: Ica2b41a52d11cb10c94fa8ad780a201318c4fc87
Change-Id: Ieef534bf5d5fb53602e875b51c15ef565882fbff
2020-08-04 21:22:04 +00:00
Zuul a495f1e327 Merge "pep8: Turn on E305" 2020-04-10 11:55:07 +00:00
Tim Burke 668242c422 pep8: Turn on E305
Change-Id: Ia968ec7375ab346a2155769a46e74ce694a57fc2
2020-04-03 21:22:38 +02:00
Romain LE DISEZ 804776b379 Optimize obj replicator/reconstructor healthchecks
DaemonStrategy class calls Daemon.is_healthy() method every 0.1 seconds
to ensure that all workers are running as wanted.

On object replicator/reconstructor daemons, is_healthy() check if the rings
changed to decide if workers must be created/killed. With large rings,
this operation can be CPU intensive, especially on low-end CPU.

This patch:
- increases the check interval to 5 seconds by default, because none of
  these daemons are critical for performance (they are not in the datapath).
  But it allows each daemon to change this value if necessary
- ensures that before doing a computation of all devices in the ring,
  object replicator/reconstructor checks that the ring really changed
  (by checking the mtime of the ring.gz files)

On an Atom N2800 processor, this patch reduced the CPU usage of the main
object replicator/reconstructor from 70% of a core to 0%.

Change-Id: I2867e2be539f325778e2f044a151fd0773a7c390
2020-04-01 08:03:32 -04:00
Sean McGinnis 5b26b749b5
Drop use of unittest2
unittest2 was needed for Python version <= 2.6, so it hasn't been needed
for quite some time. See unittest2 note one:

https://docs.python.org/2.7/library/unittest.html

This drops unittest2 in favor of the standard unittest module.

Change-Id: I2e787cfbf1709b7f9c889230a10c03689e032957
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
2020-01-12 03:13:41 -06:00
Clay Gerrard 286082222d Use less responses from handoffs
Since we don't use 404s from handoffs anymore, we need to not let errors
on handoffs overwhelm primary responses either

Change-Id: I2624e113c9d945542f787e5f18f487bd7be3d32e
Closes-Bug: #1857909
2020-01-02 16:44:05 -08:00
Tim Burke d270596b67 Consistently use io.BytesIO
Change-Id: Ic41b37ac75b5596a8307c4962be86f2a4b0d9731
2019-10-15 15:09:46 +02:00
Thomas Goirand 12a7b42062 Fix test_parse_get_node_args
Looks like xattr_supported_check was missing ERANGE

Change-Id: I82263e48e836f38f77d81593c8435f64a4728b5d
2019-07-19 01:32:25 +02:00
Clay Gerrard 563e1671cf Return 503 when primary containers can't respond
Closes-Bug: #1833612

Change-Id: I53ed04b5de20c261ddd79c98c629580472e09961
2019-06-25 12:23:12 -05:00
Tim Burke e8e7106d14 py3: port obj/reconstructor tests
All of the swift changes we needed for this were already done elsewhere.

Change-Id: Ib2c26fdf7bd36ed1cccd5dbd1fa208f912f4d8d5
2019-06-10 08:31:41 -07:00
Tim Burke 2e35376c6d py3: symlink follow-up
- Have the unit tests use WSGI strings, like a real system.
- Port the func tests.

Change-Id: I3a6f409208de45ebf9f55f7f59e4fe6ac6fbe163
2019-05-30 16:25:17 -07:00
Tim Burke b8284538be py3: start porting for unit/proxy/test_server.py
Mostly this ammounts to

    Exception.message -> Exception.args[0]
    '...' -> b'...'
    StringIO -> BytesIO
    makefile() -> makefile('rwb')
    iter.next() -> next(iter)
    bytes[n] -> bytes[n:n + 1]
    integer division

Note that the versioning tests are mostly untouched; they seemed to get
a little hairy.

Change-Id: I167b5375e7ed39d4abecf0653f84834ea7dac635
2019-05-04 20:35:05 -07:00
Pete Zaitcev 575538b55b py3: port the container
This started with ShardRanges and its CLI. The sharder is at the
bottom of the dependency chain. Even container backend needs it.
Once we started tinkering with the sharder, it all snowballed to
include the rest of the container services.

Beware, this does affect some of Python 2 code. Mostly it's trivial
and obviously correct, but needs checking by reviewers.

About killing the stray "from __future__ import unicode_literals":
we do not do it in general. The specific problem it caused was
a failure of functional tests because unicode leaked into a field
that was supposed to be encoded. It is just too hard to track the
types when rules change from file to file, so off with its head.

Change-Id: Iba4e65d0e46d8c1f5a91feb96c2c07f99ca7c666
2019-02-20 21:30:46 -06:00
Zuul 64e5fd364a Merge "Stop using duplicate dev IDs in write_fake_ring" 2019-02-09 07:08:21 +00:00
Clay Gerrard ea8e545a27 Rebuild frags for unmounted disks
Change the behavior of the EC reconstructor to perform a fragment
rebuild to a handoff node when a primary peer responds with 507 to the
REPLICATE request.

Each primary node in a EC ring will sync with exactly three primary
peers, in addition to the left & right nodes we now select a third node
from the far side of the ring.  If any of these partners respond
unmounted the reconstructor will rebuild it's fragments to a handoff
node with the appropriate index.

To prevent ssync (which is uninterruptible) receiving a 409 (Conflict)
we must give the remote handoff node the correct backend_index for the
fragments it will recieve.  In the common case we will use
determistically different handoffs for each fragment index to prevent
multiple unmounted primary disks from forcing a single handoff node to
hold more than one rebuilt fragment.

Handoff nodes will continue to attempt to revert rebuilt handoff
fragments to the appropriate primary until it is remounted or
rebalanced.  After a rebalance of EC rings (potentially removing
unmounted/failed devices), it's most IO efficient to run in
handoffs_only mode to avoid unnecessary rebuilds.

Closes-Bug: #1510342

Change-Id: Ief44ed39d97f65e4270bf73051da9a2dd0ddbaec
2019-02-08 18:04:55 +00:00
Tim Burke 8a6159f67b Stop using duplicate dev IDs in write_fake_ring
This would cause some weird issues where get_more_nodes() would actually
yield out something, despite us only having two drives.

Change-Id: Ibf658d69fce075c76c0870a542348f220376c87a
2019-02-08 09:36:35 -08:00