This change allows individual SLO segments to be downloaded by adding
an extra 'part-number' query parameter to the GET request. You can
also retrieve the Content-Length of an individual segment with a HEAD
request.
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I7af0dc9898ca35f042b52dd5db000072f2c7512e
Last time we did this was nearly 4 years ago; drag ourselves into
something approaching the present. Address a few new pyflakes issues
that seem reasonable to enforce:
E275 missing whitespace after keyword
E231 missing whitespace after ','
E721 do not compare types, for exact checks use `is` / `is not`,
for instance checks use `isinstance()`
Main motivator is that the old hacking kept us on an old version
of flake8 et al., which no longer work with newer Pythons.
Change-Id: I54b46349fabb9776dcadc6def1cfb961c123aaa0
This patch reorganizes the SLO read response handling. The main goal
was to push the response header replacement for both GET/HEAD SLO and
multipart-manifest=get paths all into a common return path. A new
RespAttrs primitive is used to carry around some metadata details from
requests made in SLO. The authors hope these changes make the code more
easily readable and easier to modify.
Drive-By: add new "friendly_close" function in common.utils so we can
drain empty/error responses more confidently (and use it in swob and
request_helpers).
Drive-By: the tests added in the Related-Change discovered a 500 on
If-[Un]Modified-Since conditional GET requests - it probably wasn't
important, but this refactor fixed it on accident as a side effect.
Closes-Bug: #2040178
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Co-Authored-By: Ashwin Nair <nairashwin952013@gmail.com>
Related-Change-Id: I54094f3d2098f56b755ec19cc9315d06a6ca8b15
Change-Id: Idc84e70539fc7480b6ecb86e2f0da904baf2c727
When the proxy app is trying to send back an object and hits an error
(maybe a timeout, maybe an EC decode error) *after* it has sent headers
and started streaming data, it just stops sending data, expecting
clients to notice the discrepency in Content-Length and retry.
When that happened in SLO while reading a manifest, previously we'd just
assume the manifest is empty and send back an empty response. This would
cause confusion for users (who'd think we lost data or soemthing) and
was clearly wrong.
Now, return a 500 to the client. Retrying is perfectly reasonable.
Change-Id: I7fc923ad0ef37459b7a76ce360dd7f320053d3f7
When clients issue a ?multipart-manifest=delete request to non-SLOs, we
try to fetch the manifest then drain and close the response upon seeing
it wasn't actually an SLO manifest. This could previously cause the extra
transfer (and discard) of several gigabytes of data.
Now, add two extra headers to the request:
* Range: bytes=-1
* X-Backend-Ignore-Range-If-Metadata-Present: X-Static-Large-Object
The first limits how much data we'll be discarding, while the second tells
object servers to ignore the range header if it's an SLO manifest. Note
that object-servers may still need to return more than one byte to the
proxy -- an EC policy will require that we get a full fragment's worth
from each server -- but at least we've got a better cap on our downside.
Why one byte? Because range requests weren't designed to be able to
return no data. Why the last byte (as opposed to the first)? Because
bytes=0-0 will 416 on a zero-byte object, while bytes=-1 will 200.
Note that the backend header was introduced in Swift 2.24.0 -- if we get
a response from an older object-server, it may respect the Range header
even though it's returning an SLO manifest. In that case, retry without
either header.
Related-Bug: #1980954
Co-Authored-By: Romain de Joux <romain.de-joux@ovhcloud.com>
Change-Id: If3861e5b9c4f17ab3b82ea16673ddb29d07820a1
In get_slo_segments a GET subrequest is processed to get SLO manifest,
but if the object is not a SLO the response was not drain/closed.
Closes-Bug: 1980954
Change-Id: I7862c8ef153416c00c8ca7d6bf2f3556a1776d8c
The *_swift_info functions use in module global dicts to provide a
registry mechanism for registering and getting swift info.
This is an abnormal pattern and doesn't quite fit into utils. Further
we looking at following this pattern for sensitive info to trim in the
future.
So this patch does some house cleaning and moves this registry to a new
module swift.common.registry. And updates all the references to it.
For backwards compat we still import the *_swift_info methods into utils
for any 3rd party tools or middleware.
Change-Id: I71fd7f50d1aafc001d6905438f42de4e58af8421
We've had this option for a year now, and it seems to help. Let's enable
it for everyone. Note that Swift clients still need to opt into the
async delete via a query param, while S3 clients get it for free.
Change-Id: Ib4164f877908b855ce354cc722d9cb0be8be9921
md5 is not an approved algorithm in FIPS mode, and trying to
instantiate a hashlib.md5() will fail when the system is running in
FIPS mode.
md5 is allowed when in a non-security context. There is a plan to
add a keyword parameter (usedforsecurity) to hashlib.md5() to annotate
whether or not the instance is being used in a security context.
In the case where it is not, the instantiation of md5 will be allowed.
See https://bugs.python.org/issue9216 for more details.
Some downstream python versions already support this parameter. To
support these versions, a new encapsulation of md5() is added to
swift/common/utils.py. This encapsulation is identical to the one being
added to oslo.utils, but is recreated here to avoid adding a dependency.
This patch is to replace the instances of hashlib.md5() with this new
encapsulation, adding an annotation indicating whether the usage is
a security context or not.
While this patch seems large, it is really just the same change over and
again. Reviewers need to pay particular attention as to whether the
keyword parameter (usedforsecurity) is set correctly. Right now, all
of them appear to be not used in a security context.
Now that all the instances have been converted, we can update the bandit
run to look for these instances and ensure that new invocations do not
creep in.
With this latest patch, the functional and unit tests all pass
on a FIPS enabled system.
Co-Authored-By: Pete Zaitcev
Change-Id: Ibb4917da4c083e1e094156d748708b87387f2d87
Add a new config option to SLO, allow_async_delete, to allow operators
to opt-in to this new behavior. If their expirer queues get out of hand,
they can always turn it back off.
If the option is disabled, handle the delete inline; this matches the
behavior of old Swift.
Only allow an async delete if all segments are in the same container and
none are nested SLOs, that way we only have two auth checks to make.
Have s3api try to use this new mode if the data seems to have been
uploaded via S3 (since it should be safe to assume that the above
criteria are met).
Drive-by: Allow the expirer queue and swift-container-deleter to use
high-precision timestamps.
Change-Id: I0bbe1ccd06776ef3e23438b40d8fb9a7c2de8921
When client sends a '?multipart-manifest=get&format=raw' request
middleware will change the manifest returned from object server.
This patch makes sure the response etag is updated to reflect
changes to manifest content
Change-Id: I0ac6dd0808fb041ba7663f4a472a06ee3f1d9a71
This patch adds a new object versioning mode. This new mode provides
a new set of APIs for users to interact with older versions of an
object. It also changes the naming scheme of older versions and adds
a version-id to each object.
This new mode is not backwards compatible or interchangeable with the
other two modes (i.e., stack and history), especially due to the changes
in the namimg scheme of older versions. This new mode will also serve
as a foundation for adding S3 versioning compatibility in the s3api
middleware.
Note that this does not (yet) support using a versioned container as
a source in container-sync. Container sync should be enhanced to sync
previous versions of objects.
Change-Id: Ic7d39ba425ca324eeb4543a2ce8d03428e2225a1
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Co-Authored-By: Thiago da Silva <thiagodasilva@gmail.com>
Otherwise, we waste a request on some 416/206 response that won't be
helpful.
To do this, add a new X-Backend-Ignore-Range-If-Metadata-Present header
whose value is a comma-separated list of header names. Middlewares may
include this header to tell object-servers to send the whole object
(rather than a 206 or 416) if *any* of the metadata are present.
Have dlo and symlink use it, too; it won't save us any round-trips, but
it should clean up some object-server logging.
Change-Id: I4ff2a178d0456e7e37d561109ef57dd0d92cbd4e
If you set SLO's max_manifest_segments to a value larger than 10000,
then clients are able to create manifests with that many segments, but
unable to use "?multipart-manifest=delete" to delete them.
This is because the SLO middleware has its very own bulk-deleter that
it uses to handle such requests, and that bulk-deleter only allows
10000 deletions per request by default. This commit removes the
limitation so that any SLO manifest can be deleted along with its
segments.
I considered setting max-deletes-per-request to be equal to SLO's
max_manifest_segments, but that only works if max_manifest_segments
has never been decreased.
Note that this commit does not increase max_manifest_segments. Clients
cannot make SLOs any bigger than they could before. Also note that
this commit does not affect user-initiated bulk deletes, i.e. POST
requests with "?bulk-delete=true" set. Those requests are still
limited in their size, and those limits are not changed.
Change-Id: I6a35937e8418f4f2b8e29825fc9c40415e34742f
Closes-Bug: 1746685
This matches the ETag of the underlying swift object, as opposed to the
MD5-of-MD5s that is the large object's ETag.
Change-Id: Ifab726f63739f62aeef495c970939410341694d1
Previously, we never checked whether the response we get when refetching
is even successful, much less whether it's still coming from an SLO.
Now, if the refetched data is newer, act on it. If it's older, 503.
Closes-Bug: #1837270
Change-Id: I106b94c77da220c762869aa800c31b87c3dffeeb
On py3, if/when you hit an error, you can get very noisy tracebacks like
<traceback coming out of split_path()>
During handling of the above exception, another exception occurred:
<meaningful traceback>
In general, I like this, but when we've used exception handling for
flow-control, it gets difficult to separate the wheat from the chaff.
Change-Id: I5f3bc6416207cab2c7e3a77ee6689360b55990e7
This adds wsgi_to_str(self.path_info) everywhere we forgot it,
not only in the slo module itself.
Dropping the body=''.join(body) after call_slo() is obvious:
the latter only returns strings of bytes, not lists of such.
Change-Id: I6b4d87e4cda4945bc128dbc9c1edd39e736a59d2
... to allow other middlewares to impose additional constraints on
or make edits to SLO manifests before being written.
The callback takes a single argument: the python list that represents
the manifest to be written. All the normal list operations listed at
https://docs.python.org/2/library/stdtypes.html#mutable-sequence-types
are available to make changes to that before SLO serializes it as JSON.
The callback may return a list of problematic segments; each item in the
list should be a tuple of
(quoted object name, description of problem)
This will be useful both for s3api minimum segment size validation and
creating tar large objects.
Change-Id: I198c5196e0221a72b14597a06e5ce3c4b2bbf436
Related-Bug: #1636663
Container servers will store an etag like
<MD5 of manifest on disk>; slo_etag=<MD5 on concatenated ETags>
which the SLO middleware will break out into separate
"hash": "<MD5 of manifest on disk",
"slo_etag": "\"<MD5 of concatenated ETags\"",
keys for JSON listings. Text and XML listings are unaffected.
If a middleware left of SLO already specified a container update
override, the slo_etag parameter will be appended. If the base header
value was blank, the MD5 of the manifest will be inserted.
SLOs that were created on previous versions of Swift will continue to
just have the MD5 of the manifest in container listings.
Closes-Bug: 1618573
Change-Id: I67478923619b00ec1a37d56b6fec6a218453dafc
Instead, require that callers provide an encoding.
Related-Change: I31408f525ba9836f634a35581d4aee6fa2c9428f
Change-Id: I3e5ed9e4401eea76c375bb43ad4afc58b1d8006a
If an account contains non-ASCII characters, currently SLO delete code
will fail, as get_slo_segments() method receives a unicode object, but
UTF-8 encoded account name. Attempting to concatenate the strings fails
with a UnicodeError, as it tries to use the ASCII codec to decode the
UTF-8 encoded account name.
This patch allows accounts with non-ASCII characters in their names to
delete SLOs.
Change-Id: I619d41e62c16b25bd5f58d300a3dc71aa4dc75c2
If a middleware left of SLO wants to override the ETag for a large
object, it will need to send a X-Backend-Etag-Is-At on GETs if it wants
to be at all performant. This would work fine coming out of the object
controller (which would look at the headers in the response, figure out
what's the real conditional etag, and pass it to swob.Response), and
even encryption (which would do the same), but at SLO, we'd just replace
the ETag, flag it as a conditional response, and let swob assume the
*SLO* ETag is the conditional one.
Now, SLO will jump through the same resolve_backend_etag_is_at hoops that
other parts of the proxy have to deal with. This allows If-Match and
If-None-Match to work correctly if/when swift3 stores an S3-compatible
multipart-upload ETag.
Change-Id: Ibbf59d38d7bcc9c485b1d5305548144025d77441
This patch updates the SLO middleware and SegmentedIterable to add
support for user-specified inlined-data segments. Such segments will
contain base64-encoded data to be added before/after an object-backed
segment within an SLO. To accommodate the potential extra data we
increase the default SLO maximum manifest size from 2MiB to 8MiB.
The default maximum number of segments remains 1000, but this will
only be enforced for object-backed segments.
This patch is a prerequisite for a future patch enabling the
download of large objects as tarballs. The TLO patch will be added
as a dependent patch later.
UpgradeImpact
=============
During a rolling upgrade, an updated proxy may write a manifest that
out-of-date proxies will not be able to read. This will resolve itself
once the upgrade completes on all nodes.
Change-Id: Ib8dc216a84d370e6da7d6b819af79582b671d699
Why weren't we doing that before?? The etag should be the same as for
GET/HEAD, and by sending it, we can assure resuming clients that they're
downlading the same object even if they didn't include an If-Match
header.
Change-Id: I4ccbd1ae3a909ecb4606ef18211d1b868f5cad86
Related-Change: Ic11662eb5c7176fbf422a6fc87a569928d6f85a1
An SLO PUT requires that we HEAD every referenced object; as a result, it
can be a very time-intensive operation. This makes it difficult as a
client to differentiate between a proxy-server that's still doing work and
one that's crashed but left the socket open.
Now, clients can opt-in to receiving heartbeats during long-running PUTs
by including the query parameter
heartbeat=on
With heartbeating turned on, the proxy will start its response immediately
with 202 Accepted then send a single whitespace character periodically
until the request completes. At that point, a final summary chunk will be
sent which includes a "Response Status" key indicating success or failure
and (if successful) an "Etag" key indicating the Etag of the resulting SLO.
This mechanism is very similar to the way bulk extractions and deletions
work, and even the way SLO behaves for ?multipart-manifest=delete requests.
Note that this is opt-in: this prevents us from sending the 202 response
to existing clients that may mis-interpret it as an immediate indication
of success.
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Related-Bug: 1718811
Change-Id: I65cee5f629c87364e188aa05a06d563c3849c8f3
The error messages alone should provide plenty enough information.
Plus, running functional tests really shouldn't cause tracebacks.
Also, tighten up tests for log messages.
Change-Id: I55136484d342af756fa153d971dcb9159a435f13
Since we used to allow zero-byte last segments but now we don't, it
can be difficult to deal with some old SLO manifests.
Imagine you're writing some code to sync objects from Swift cluster A
to Swift cluster B. You start off with just a GET from A piped into a
PUT to B, and that works great until you hit a SLO manifest and B
won't accept a 500GB object. So, you write some code to detect SLO
manifests, sync their segments, then take the JSON manifest
(?multipart-manifest=get) and sync *that* over.
Now, life is good... until one day you get an exception notification
that there's this manifest on cluster A that cluster B won't
accept. Turns out that, back when Swift would take zero-byte final
segments on SLOs (before commit 7f636a5), someone uploaded such a SLO
to cluster A. Now, however, zero-byte final segments are invalid, so
that SLO that exists over in cluster A can't just be copied to cluster
B.
A little coding later, your sync tool detects zero-byte final segments
and removes them when copying a manifest. But now your ETags don't
match between clusters, so you have to figure out some way to deal
with that, and so you put it in metadata, but then you realize that
your syncer might encounter a SLO which contains a sub-SLO which has a
zero-byte final segment, and it's right about then that you start
thinking about giving up on programming and getting a job as an
elevator mechanic.
This commit makes life easier for developers of such applications by
allowing SLOs to have zero-byte segments again.
Change-Id: Ia37880bbb435e269ec53b2963eb1b9121696d479
Probably the most common format for documenting arguments is reST field
lists [1]. This change updates some docstrings to comply with the field
lists syntax.
[1] http://sphinx-doc.org/domains.html#info-field-lists
Change-Id: I0c35c6b4df840018534737bca2ca32dc977b0e05