- Move statsd client into it's own module
- Move all logging functions into their own module
- Move all config functions into their own module
- Move all helper functions into their own module
Partial-Bug: #2015274
Change-Id: Ic4b5005e3efffa8dba17d91a41e46d5c68533f9a
... in document_iters_to_http_response_body.
We seemed to be relying a little too heavily upon prompt garbage
collection to log client disconnects, leading to failures in
test_base.py::TestGetOrHeadHandler::test_disconnected_logging
under python 3.12.
Closes-Bug: #2046352
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I4479d2690f708312270eb92759789ddce7f7f930
If the proxy timed out while reading a replicated policy multi-part
response body, it would transform the ChunkReadTimeout to a
StopIteration. This masks the fact that the backend read has
terminated unexpectedly. The document_iters_to_multipart_byteranges
would complete iterating over parts and send a multipart terminator
line, even though no parts may have been sent.
This patch removes the conversion of ChunkReadTmeout to StopIteration.
The ChunkReadTimeout that is now raised prevents the
document_iters_to_multipart_byteranges 'for' loop completing and
therefore stops the multi-part terminator line being sent. It is
raised from the GetOrHeadHandler similar to other scenarios that raise
ChunkReadTimeouts while the resp body is being read.
A ChunkReadTimeout exception handler is removed in the
_iter_parts_from_response method. This handler was previously never
reached (because StopIteration rather than ChunkReadTimeout was raised
from _get_next_response_part), but if it were reached (i.e. with this
change) then it would repeat logging of the error and repeat
incrementing the node's error counter.
This change in the GetOrHeadHandler mimics a similar change in the
ECFragGetter [1].
[1] Related-Chage: I0654815543be3df059eb2875d9b3669dbd97f5b4
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Change-Id: I6dd53e239f5e7eefcf1c74229a19b1df1c989b4a
Create new helper functions to set and get namespaces in cache. Use
these in both the object and container controllers when caching
namespaces for updating and listing state shard ranges respectively.
Add unit tests for the new helper functions.
No intentional behavioural changes.
Change-Id: I6833ec64540fa19f658f0ee78952ecb43b49f169
The client_chunk_size attribute was introduced into GetOrHeadHandler
for EC support [1]. It was only ever not None for an
ECObjectController. The ECObjectController stopped using
GetOrHeadHandler for Object GET when the ECFragGetter class was
introduced [2], but the EC specific code was not expunged from
GetOrHeadHandler. In [3] the ECFragGetter client_chunk_size was renamed
to fragment_size to better reflect what it represented.
The skip_bytes attribute was similarly introduced for EC support. It
is only ever non-zero if client_chunk_size is an int. For EC,
skip_bytes is used to undo the effect of expanding the backend
range(s) to fetch whole fragments: the range(s) of decoded bytes
returned to the client may need to be narrower than the backend
ranges. There is no equivalent requirement for replicated GETs.
The elimination of client_chunk_size and skip_bytes simplifies the
yielding of chunks from the GetOrHeadHandler response iter.
Related-Change:
[1] I9c13c03616489f8eab7dcd7c5f21237ed4cb6fd2
[2] I0dc5644a84ededee753e449e053e6b1786fdcf32
[3] Ie1efaab3bd0510275d534b5c023cb73c98bec90d
Change-Id: I31ed36d32682469e3c5ca8bf9a2b383568d63c72
If the sharder is processing a node that has 0 weight, especially for
all the devices on the node, the `find_local_handoff_for_part` can
fail because there will be no local hand off devices available as it
uses the replica2part2dev_id to find a device. However, a 0 weighted
device won't appear in the replica2part2dev table.
This patch extends `find_local_handoff_for_part`, if it fails to find
a node from the ring it'll fall back to a local device identified by
the `_local_device_ids` that is built up when the replicator or
sharder was identifing local devices. This uses the ring.devs, so does
include 0 weighted devices. This allows the sharder to find a
location to write the shard_broker in a handoff location while
sharding.
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Change-Id: Ic38698e9ca0397770c7362229baef1101a72788f
The SQLite in-memory databases have been great for testing but
as the swift DatabaseBroker's have become more complex, the limitations
of in memory databases are being reached. Mostly due
to the introduction of container sharding where a broker sometimes needs
to make multiple connections to the same database as the same time.
Rather then rework the real broker logic to better support in-memory
testing, it's actually easier to just remove the in-memory broker tests
and use a "real" broker in a tempdir. This allows us to better test how
brokers behave in real life, pending files and all.
This patch replaces all the :memory: brokers in the tests with real ones
placed in a tempdir. To achieve this, we new base unittest class `TestDBBase`
has been added that creates, cleans up and provides some helper methods
to manage the db path and location.
Further, all references to :memory: in the Database brokers have been
removed.
Change-Id: I5983132f776b84db634fef39c833d5cfdce11980
s3api bucket listing elements currently have LastModified values with
millisecond precision. This is inconsistent with the value of the
Last-Modified header returned with an object GET or HEAD response
which has second precision. This patch reduces the precision to
seconds in bucket listings and upload part listings. This is also
consistent with observation of an aws listing response.
The last modified values in the swift native listing *up* to
the nearest second to be consistent with the seconds-precision
Last-Modified time header that is returned with an object GET or HEAD.
However, we continue to include millisecond digits set to 0 in the
last-modified string, e.g.: '2014-06-10T22:47:32.000Z'.
Also, fix the last modified time returned in an object copy response
to be consistent with the last modified time of the object that was
created. Previously it was rounded down, but it should be rounded up.
Change-Id: I8c98791a920eeedfc79e8a9d83e5032c07ae86d3
Auth middlewares in particular may want to *know* when there's a
communication breakdown as opposed to a cache miss.
Update our shard-range cache stats to acknowlegde the distinction.
Drive-by: Log an error if all memcached servers are error-limited.
Change-Id: Ic8d0915235d11124d06ec940c5be9a2edbe85c83
Between this and the (unreleased) pyeclib fix, I see unit and func tests
passing on py310. Haven't tried probe tests, yet.
Change-Id: Iacf66eda75fed6bf96900107250f393227c57ae5
inspect.getargspec was deprecated since Python 3.0 and
inspect.getfullargspec is its replacement with correct handling of
function annotations and keyword-only parameters[1].
This change ensures that inspect.getfullargspec is used in Python 3.
[1] https://docs.python.org/3/library/inspect.html#inspect.getargspec
Change-Id: I63a4fda4f5da00c0f752e58f2e7192baea5012bb
This patch makes the reconciler PPI aware. It does this by adding a
helper method `can_reconcile_policy` that is used to check that the
policies used for the source and destination aren't in the middle of a
PPI (their ring doesn't have next_part_power set).
In order to accomplish this the reconciler has had to include the
POLICIES singleton and grown swift_dir and ring_check_interval config options.
Closes-Bug: #1934314
Change-Id: I78a94dd1be90913a7a75d90850ec5ef4a85be4db
During sharding a shard range is moved to CLEAVED state when cleaved
from its parent. However, during shrinking an acceptor shard should
not be moved to CLEAVED state when the shrinking shard cleaves to it,
because the shrinking shard is not the acceptor's parent and does not
know if the acceptor has yet been cleaved from its parent.
The existing attempt to prevent a shrinking shard updating its
acceptor state relied on comparing the acceptor namespace to the
shrinking shard namespace: if the acceptor namespace fully enclosed
the shrinkng shard then it was inferred that shrinking was taking
place. That check is sufficient for normal shrinking of one shard into
an expanding acceptor, but is not sufficient when shrinking in order
to fix overlaps, when a shard might shrink into more than one
acceptor, none of which completely encloses the shrinking shard.
Fortunately, since [1], it is possible to determine that a shard is
shrinking from its own shard range state being either SHRINKING or
SHRUNK.
It is still advantageous to delete and merge the shrinking shard range
into the acceptor when the acceptor fully encloses the shrinking shard
because that increases the likelihood of the root being updated with
the deleted shard range in a timely manner.
[1] Related-Change: I9034a5715406b310c7282f1bec9625fe7acd57b6
Change-Id: I91110bc747323e757d8b63003ad3d38f915c1f35
The multipart document handling in the proxy is consumed via iteration,
but the error handling code is not consistent with how it applies
conversions of IO errors/timeouts and retry failures to StopIteration.
In an effort to make the code more obvious and easier to debug and
maintain I've added comments and additional tests as well as tightening
up StopIteration exception handling.
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I0654815543be3df059eb2875d9b3669dbd97f5b4
Move DebugLogger and associated classes to its own module under test
so that it can be imported (for example in probe tests) without
requiring all the dependencies in test/unit/__init__.py.
Change-Id: I0ea3c26e54d91f27159805a45e49ad7f8f0e0431
The context manager protocol requires that __exit__ be called with three
args: type, value, and traceback. In some places, we didn't include any
args at all, leading to test failures during clean-up.
Change-Id: I2998830e6eac685b1f753937d12cf5346a4eb081
This patch makes four significant changes to the handling of GET
requests for sharding or sharded containers:
- container server GET requests may now result in the entire list of
shard ranges being returned for the 'listing' state regardless of
any request parameter constraints.
- the proxy server may cache that list of shard ranges in memcache
and the requests environ infocache dict, and subsequently use the
cached shard ranges when handling GET requests for the same
container.
- the proxy now caches more container metadata so that it can
synthesize a complete set of container GET response headers from
cache.
- the proxy server now enforces more container GET request validity
checks that were previously only enforced by the backend server,
e.g. checks for valid request parameter values
With this change, when the proxy learns from container metadata
that the container is sharded then it will cache shard
ranges fetched from the backend during a container GET in memcache.
On subsequent container GETs the proxy will use the cached shard
ranges to gather object listings from shard containers, avoiding
further GET requests to the root container until the cached shard
ranges expire from cache.
Cached shard ranges are most useful if they cover the entire object
name space in the container. The proxy therefore uses a new
X-Backend-Override-Shard-Name-Filter header to instruct the container
server to ignore any request parameters that would constrain the
returned shard range listing i.e. 'marker', 'end_marker', 'includes'
and 'reverse' parameters. Having obtained the entire shard range
listing (either from the server or from cache) the proxy now applies
those request parameter constraints itself when constructing the
client response.
When using cached shard ranges the proxy will synthesize response
headers from the container metadata that is also in cache. To enable
the full set of container GET response headers to be synthezised in
this way, the set of metadata that the proxy caches when handling a
backend container GET response is expanded to include various
timestamps.
The X-Newest header may be used to disable looking up shard ranges
in cache.
Change-Id: I5fc696625d69d1ee9218ee2a508a1b9be6cf9685
When handling a GET response ProxyLoggingMiddleware will try to close
a reiterated [1] proxy response iterator if, for example, there is a
client disconnect.
The reiterate function encapsulates the result of calling iter() on
the proxy response. In the case of an SLO response, the iter method
returned an instance of iterchools.chain, rather than the response
itself, which is an instance of SegmentedIterable. As a result the
SegmentedIterable.close() method would not be called and object server
connections would not be closed.
This patch replaces the iterchools.chain with a CloseableChain which
encapsulates the SegmentedIterable and closes it when
CloseableChain.close() is called.
[1] The use of reiterate was introduced by the Related-Change.
Closes-Bug: #1909588
Related-Change: I27feabe923a6520e983637a9c68a19ec7174a0df
Change-Id: Ib7450a85692114973782525004466db49f63066d
md5 is not an approved algorithm in FIPS mode, and trying to
instantiate a hashlib.md5() will fail when the system is running in
FIPS mode.
md5 is allowed when in a non-security context. There is a plan to
add a keyword parameter (usedforsecurity) to hashlib.md5() to annotate
whether or not the instance is being used in a security context.
In the case where it is not, the instantiation of md5 will be allowed.
See https://bugs.python.org/issue9216 for more details.
Some downstream python versions already support this parameter. To
support these versions, a new encapsulation of md5() is added to
swift/common/utils.py. This encapsulation is identical to the one being
added to oslo.utils, but is recreated here to avoid adding a dependency.
This patch is to replace the instances of hashlib.md5() with this new
encapsulation, adding an annotation indicating whether the usage is
a security context or not.
While this patch seems large, it is really just the same change over and
again. Reviewers need to pay particular attention as to whether the
keyword parameter (usedforsecurity) is set correctly. Right now, all
of them appear to be not used in a security context.
Now that all the instances have been converted, we can update the bandit
run to look for these instances and ensure that new invocations do not
creep in.
With this latest patch, the functional and unit tests all pass
on a FIPS enabled system.
Co-Authored-By: Pete Zaitcev
Change-Id: Ibb4917da4c083e1e094156d748708b87387f2d87
We're getting some blockage trying to feed backup requests in waterfall
EC because the pool_size was limited to the initial batch of requests.
This was (un?)fortunately working out in practice because there were
lots of initial primary fragment requests and some would inevitably be
quick enough to make room for the pending feeder requests. But when
enough of the initial requests were slow (network issue at the proxy?)
we wouldn't have the expected number of pending backup requests
in-flight. Since concurrent EC should never make extra requests to
non-primaries (at least not until an existing primary request
completes) ec_n_unique_fragments makes a reasonable cap for the pool.
Drive-bys:
* Don't make concurrent_ec_extra_requests unless you have enabled
concurrent_gets.
* Improved mock_http_connect extra requests tracking formatting
* FakeStatus __repr__'s w/ status code in AssertionErrors
Change-Id: Iec579ed874ef097c659dc80fff1ba326b6da05e9
Otherwise, we miss out on transaction id and client IP information when
timeouts pop.
Closes-Bug: #1892421
Change-Id: I6dea3ccf780bcc703db8447a2ef13c33838ff12d
A new header `X-Backend-Use-Replication-Network` is added; if true, use
the replication network instead of the client-data-path network.
Several background daemons are updated to use the replication network:
* account-reaper
* container-reconciler
* container-sharder
* container-sync
* object-expirer
Note that if container-sync is being used to sync data within the same
cluster, the replication network will only be used when communicating
with the "source" container; the "destination" traffic will continue to
use the configured realm endpoint.
The direct and internal client APIs still default to using the
client-data-path network; this maintains backwards compatibility for
external tools written against them.
UpgradeImpact
=============
Until recently, servers configured with
replication_server = true
would only handle REPLICATE (and, in the case of object servers, SSYNC)
requests, and would respond 405 Method Not Allowed to other requests.
When upgrading from Swift 2.25.0 or earlier, remove the config option
and restart services prior to upgrade to avoid a flood of background
daemon errors in logs.
Note that some background daemons find work by querying Swift rather
than walking local drives that should be available on the replication
network:
* container-reconciler
* object-expirer
Previosuly these may have been configured without access to the
replication network; ensure they have access before upgrading.
Closes-Bug: #1883302
Related-Bug: #1446873
Related-Change: Ica2b41a52d11cb10c94fa8ad780a201318c4fc87
Change-Id: Ieef534bf5d5fb53602e875b51c15ef565882fbff
DaemonStrategy class calls Daemon.is_healthy() method every 0.1 seconds
to ensure that all workers are running as wanted.
On object replicator/reconstructor daemons, is_healthy() check if the rings
changed to decide if workers must be created/killed. With large rings,
this operation can be CPU intensive, especially on low-end CPU.
This patch:
- increases the check interval to 5 seconds by default, because none of
these daemons are critical for performance (they are not in the datapath).
But it allows each daemon to change this value if necessary
- ensures that before doing a computation of all devices in the ring,
object replicator/reconstructor checks that the ring really changed
(by checking the mtime of the ring.gz files)
On an Atom N2800 processor, this patch reduced the CPU usage of the main
object replicator/reconstructor from 70% of a core to 0%.
Change-Id: I2867e2be539f325778e2f044a151fd0773a7c390
unittest2 was needed for Python version <= 2.6, so it hasn't been needed
for quite some time. See unittest2 note one:
https://docs.python.org/2.7/library/unittest.html
This drops unittest2 in favor of the standard unittest module.
Change-Id: I2e787cfbf1709b7f9c889230a10c03689e032957
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
Since we don't use 404s from handoffs anymore, we need to not let errors
on handoffs overwhelm primary responses either
Change-Id: I2624e113c9d945542f787e5f18f487bd7be3d32e
Closes-Bug: #1857909
Mostly this ammounts to
Exception.message -> Exception.args[0]
'...' -> b'...'
StringIO -> BytesIO
makefile() -> makefile('rwb')
iter.next() -> next(iter)
bytes[n] -> bytes[n:n + 1]
integer division
Note that the versioning tests are mostly untouched; they seemed to get
a little hairy.
Change-Id: I167b5375e7ed39d4abecf0653f84834ea7dac635
This started with ShardRanges and its CLI. The sharder is at the
bottom of the dependency chain. Even container backend needs it.
Once we started tinkering with the sharder, it all snowballed to
include the rest of the container services.
Beware, this does affect some of Python 2 code. Mostly it's trivial
and obviously correct, but needs checking by reviewers.
About killing the stray "from __future__ import unicode_literals":
we do not do it in general. The specific problem it caused was
a failure of functional tests because unicode leaked into a field
that was supposed to be encoded. It is just too hard to track the
types when rules change from file to file, so off with its head.
Change-Id: Iba4e65d0e46d8c1f5a91feb96c2c07f99ca7c666
Change the behavior of the EC reconstructor to perform a fragment
rebuild to a handoff node when a primary peer responds with 507 to the
REPLICATE request.
Each primary node in a EC ring will sync with exactly three primary
peers, in addition to the left & right nodes we now select a third node
from the far side of the ring. If any of these partners respond
unmounted the reconstructor will rebuild it's fragments to a handoff
node with the appropriate index.
To prevent ssync (which is uninterruptible) receiving a 409 (Conflict)
we must give the remote handoff node the correct backend_index for the
fragments it will recieve. In the common case we will use
determistically different handoffs for each fragment index to prevent
multiple unmounted primary disks from forcing a single handoff node to
hold more than one rebuilt fragment.
Handoff nodes will continue to attempt to revert rebuilt handoff
fragments to the appropriate primary until it is remounted or
rebalanced. After a rebalance of EC rings (potentially removing
unmounted/failed devices), it's most IO efficient to run in
handoffs_only mode to avoid unnecessary rebuilds.
Closes-Bug: #1510342
Change-Id: Ief44ed39d97f65e4270bf73051da9a2dd0ddbaec
This would cause some weird issues where get_more_nodes() would actually
yield out something, despite us only having two drives.
Change-Id: Ibf658d69fce075c76c0870a542348f220376c87a