The existing FakeSwift implementation already supports using registered
GET responses for GET requests with a query param. It also supports
using registered GET responses for HEAD requests (if they either both
had the exact SAME matching query params, or both did not have ANY query
params). But it did not support using registered GET responses w/o
query params for HEAD requests with a query param, even though a GET
with the same query param would work.
This change makes it a little more consistent: any client or test that
makes a GET request, should be able to make a similar HEAD request and
expect consistent response headers and status.
This test infra improvement is needed as we're going to be extending
test_slo with a bunch of tests that assert consistent response headers
for both GET and HEAD requests w/ new query params.
Change-Id: Idb4020fdeee87a9164312dc9647ab0820b098ff8
md5 is not an approved algorithm in FIPS mode, and trying to
instantiate a hashlib.md5() will fail when the system is running in
FIPS mode.
md5 is allowed when in a non-security context. There is a plan to
add a keyword parameter (usedforsecurity) to hashlib.md5() to annotate
whether or not the instance is being used in a security context.
In the case where it is not, the instantiation of md5 will be allowed.
See https://bugs.python.org/issue9216 for more details.
Some downstream python versions already support this parameter. To
support these versions, a new encapsulation of md5() is added to
swift/common/utils.py. This encapsulation is identical to the one being
added to oslo.utils, but is recreated here to avoid adding a dependency.
This patch is to replace the instances of hashlib.md5() with this new
encapsulation, adding an annotation indicating whether the usage is
a security context or not.
While this patch seems large, it is really just the same change over and
again. Reviewers need to pay particular attention as to whether the
keyword parameter (usedforsecurity) is set correctly. Right now, all
of them appear to be not used in a security context.
Now that all the instances have been converted, we can update the bandit
run to look for these instances and ensure that new invocations do not
creep in.
With this latest patch, the functional and unit tests all pass
on a FIPS enabled system.
Co-Authored-By: Pete Zaitcev
Change-Id: Ibb4917da4c083e1e094156d748708b87387f2d87
With DLO and SLO, we validate that we can read the first segment before
sending data to the client; this helps catch auth errors where the user
has access to read the manifest but not the segments.
Sometimes, though, that validation fails for transient reasons; if the
proxy couldn't get enough responses from primaries to determine whether
the object exists (for example), we should send back a 503 to indicate
to the client that it should retry the request.
Change-Id: Ice5358ff85ee2d5fe60785b73b67dea493044a2c
...and check for it *before* doing the MD5 check. We see this happen
ocassionally, but as best I can tell, it's always due to a
ChunkReadTimeout popping in the proxy that it can't recover from.
Change-Id: If238725bbec4fc3f6c8d000599c735a7c4972f7d
Previously, logs would often show 499s in places where some other status
would be more appropriate.
Change-Id: I68dbb8593101cd3b5b64a1a947c68e340e36ce02
We already had a unit test for this, but it was inadvertently retesting
what test_get_suffix_range_many_segments tested.
Change-Id: I43eee7029ca884268c3f2ad74300b94b299fd8d2
Closes-Bug: #1831790
We don't actually need so many py2/py3 branches, and once we
clean those up, there's hardly any reason to import six.
Change-Id: Ia3b4f02e7eb99ad1a76aa35c39dc198528fd39ad
If a WSGI application produces the header "Content-Length: <N>" but
does not produce exactly N bytes of response, then that is an error
and an exception should be thrown so that the WSGI server can take the
correct action (close the TCP connection for HTTP <= 1.1, something
else for HTTP 2.0).
As part of this, I also fixed a bug in DLOs where a HEAD response
might have a body. The way it works is this:
* user makes HEAD request for DLO manifest
* DLO middleware makes GET request for container
* authorize callback (e.g. from tempurl) replies 401 for container
GET; response has a nonempty body (it's a GET response; that's
fine)
* DLO notes that response is non-2xx, returns it as-is
* client gets response with nonempty body to a HEAD request
The fix there was simple; if the original request method was HEAD,
clear out the response body.
Change-Id: I74d8c13eba2a4917b5a116875b51a781b33a7abf
Related-Bug: 1568650
There were two middlewares using a common pattern to load
the proxy-server app config section. The existing pattern
fails to recognise option overrides that are declared using
paste-deploy's 'set' notation, as illustrated by the change
to test_dlo.py in this patch.
This patch replaces the existing code with a helper function
that loads the proxy-server config using the paste-deploy loader.
The resulting config dict is therefore exactly the same as that
used to initialise the proxy-server app.
Change-Id: Ib58ce03e2010f41e7eb11f1a6dc78b0b7f55d466
The error messages alone should provide plenty enough information.
Plus, running functional tests really shouldn't cause tracebacks.
Also, tighten up tests for log messages.
Change-Id: I55136484d342af756fa153d971dcb9159a435f13
Make some json -> (text, xml) stuff in a common module, reference that in
account/container servers so we don't break existing clients (including
out-of-date proxies), but have the proxy controllers always force a json
listing.
This simplifies operations on listings (such as the ones already happening in
decrypter, or the ones planned for symlink and sharding) by only needing to
consider a single response type.
There is a downside of larger backend requests for text/plain listings, but
it seems like a net win?
Change-Id: Id3ce37aa0402e2d8dd5784ce329d7cb4fbaf700d
Use assertIsNone() instead of assertEqual(), because assertEqual()
still fails on false values when compared to None
Change-Id: Ic52c319e3e55135df834fdf857982e1721bc44bb
If a user sends a Range header with no satisfiable ranges, we send back
a 416 Requested Range Not Satisfiable response. Previously however,
there would be no indication of the size of the object they were
requesting, so they wouldn't know how to craft a satisfiable range. We
*do* send a Content-Length, but it is (correctly) the length of the
error message.
The RFC [1] has an answer for this:
> A server generating a 416 (Range Not Satisfiable) response to a
> byte-range request SHOULD send a Content-Range header field with an
> unsatisfied-range value, as in the following example:
>
> Content-Range: bytes */1234
>
> The complete-length in a 416 response indicates the current length of
> the selected representation.
Now, we'll send a Content-Range header for all 416 responses, including
those coming from the object server as well as those generated on a
proxy because of the Range mangling required to support EC policies.
[1] RFC 7233, section 4.2, although similar language was used in RFC
2616, sections 10.4.17 and 14.16
Change-Id: I80c7390fc6f84a10a212b0641bb07a64dfccbd45
Rewrite server side copy and 'object post as copy' feature as middleware to
simplify the PUT method in the object controller code. COPY is no longer
a verb implemented as public method in Proxy application.
The server side copy middleware is inserted to the left of dlo, slo and
versioned_writes middlewares in the proxy server pipeline. As a result,
dlo and slo copy_hooks are no longer required. SLO manifests are now
validated when copied so when copying a manifest to another account the
referenced segments must be readable in that account for the manifest
copy to succeed (previously this validation was not made, meaning the
manifest was copied but could be unusable if the segments were not
readable).
With this change, there should be no change in functionality or existing
behavior. This is asserted with (almost) no changes required to existing
functional tests.
Some notes (for operators):
* Middleware required to be auto-inserted before slo and dlo and
versioned_writes
* Turning off server side copy is not configurable.
* object_post_as_copy is no longer a configurable option of proxy server
but of this middleware. However, for smooth upgrade, config option set
in proxy server app is also read.
DocImpact: Introducing server side copy as middleware
Co-Authored-By: Alistair Coles <alistair.coles@hpe.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Change-Id: Ic96a92e938589a2f6add35a40741fd062f1c29eb
Signed-off-by: Prashanth Pai <ppai@redhat.com>
Signed-off-by: Thiago da Silva <thiago@redhat.com>
There was a function in swift.common.utils that was importing
swob.HeaderKeyDict at call time. It couldn't import it at compilation
time since utils can't import from swob or else it blows up with a
circular import error.
This commit just moves HeaderKeyDict into swift.common.header_key_dict
so that we can remove the inline import.
Change-Id: I656fde8cc2e125327c26c589cf1045cb81ffc7e5
contextlib.nested() is missing completely in Python 3.
Since 2.7, we can use multiple context managers in a 'with' statement,
like so:
with thing1() as t1, thing2() as t2:
do_stuff()
Now, if we had some code that needed to nest an arbitrary number of
context managers, there's stuff we could do with contextlib.ExitStack
and such... but we don't. We only use contextlib.nested() in tests to
set up bunches of mocks without crazy-deep indentation, and all that
stuff fits perfectly into multiple-context-manager 'with' statements.
Change-Id: Id472958b007948f05dbd4c7fb8cf3ffab58e2681
Rewrite object versioning as middleware to simplify the PUT method
in the object controller.
The functionality remains basically the
same with the only major difference being the ability to now
version slo manifest files. dlo manifests are still not
supported as part of this patch.
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
DocImpact
Change-Id: Ie899290b3312e201979eafefb253d1a60b65b837
Signed-off-by: Thiago da Silva <thiago@redhat.com>
Signed-off-by: Prashanth Pai <ppai@redhat.com>
PEP 333 (WSGI) says: "If the iterable returned by the application has
a close() method, the server or gateway must call that method upon
completion of the current request[.]"
There's a bunch of places where we weren't doing that; some of them
matter more than others. Calling .close() can prevent a connection
leak in some cases. In others, it just provides a certain pedantic
smugness. Either way, we should do what WSGI requires.
Noteworthy goofs include:
* If a client is downloading a large object and disconnects halfway
through, a proxy -> obj connection may be leaked. In this case,
the WSGI iterable is a SegmentedIterable, which lacked a close()
method. Thus, when the WSGI server noticed the client disconnect,
it had no way of telling the SegmentedIterable about it, and so
the underlying iterable for the segment's data didn't get
closed.
Here, it seems likely (though unproven) that the object server
would time out and kill the connection, or that a
ChunkWriteTimeout would fire down in the proxy server, so the
leaked connection would eventually go away. However, a flurry of
client disconnects could leave a big pile of useless connections.
* If a conditional request receives a 304 or 412, the underlying
app_iter is not closed. This mostly affects conditional requests
for large objects.
The leaked connections were noticed by this patch's co-author, who
made the changes to SegmentedIterable. Those changes helped, but did
not completely fix, the issue. The rest of the patch is an attempt to
plug the rest of the holes.
Co-Authored-By: Romain LE DISEZ <romain.ledisez@ovh.net>
Change-Id: I168e147aae7c1728e7e3fdabb7fba6f2d747d937
Closes-Bug: #1466549
* Get FakeConn ready for expect 100 continue
* Use debug_logger more and with better interfaces
* Fix patch_policies to be less annoying
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Paul Luse <paul.e.luse@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
Change-Id: I28c0a3539d994cbb8e6b94d63a23ed4ea6cb956d
This patch changes the response code from Internal Server Error to
Conflict when there is a missing segment and the position is first.
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Closes-Bug: #1386568
Change-Id: Iac175b4dc6ac9081436738697a27fe669acce0eb
This patch fixes the unit tests to remove the temporary directories
created during run of unit tests. Some of unit tests did not tear down
correctly, whatever it had set it up for running. This would over period
of time bloat up the tmp directory. As on date, there were around 49 tmp
directories left uncleared per round of unit tests. This patch fixes it.
Change-Id: If591375ca9cc87d52c7c9c6dc16c9fb4b49e99fc
Prior to this patch both mainline code and testing modules imported
and used constraints directly into their own namespace, or relied on
the namespace of other modules that were not the constraints
module. This meant that if a unit test wanted to change a constraint
for its operation, it had to know how that module was using the
constraint, instead of referencing the constraint module itself.
This patch unifies the use of constraints so that all constraints are
referenced via the constraints module. In turn, this allows a test to
leverage the re-loadable nature of the constraints in the constraints
module.
It addition, a number of functional tests where using the default
values for constraints, instead of the configured value discovered in
a test.conf or in an existing swift.conf. This patch removes those
direct references in favor of the load_constraint() method from the
test/functional/tests.py module.
Change-Id: Ia5313d653c667dd9ca800786de59b59334c34eaa
Now if a client is downloading a large object and one of the segment
fetches gets bad data, we'll abort the request right away instead of
letting the client carry on. We do this by checking each segment's
body against the etag in the GET response's headers, like we tell
clients to do.
To make sure that the etag is the MD5, we add ?multipart-manifest=get
to all segment GET requests. That way, a segment won't be another
large object, because a large object's etag isn't its MD5. This does
mean that clients can't make, say, a DLO manifest that points to a
bunch of SLO manifests. However, that capability was only introduced
in Swift 1.13.0 (post-Havana); prior to that, clients couldn't mix
large-object types. I'm more than happy to trade that capability away
in exchange for data-integrity checks.
Change-Id: I98c78a291ee1a863a36d54a22867bd01c126644e
If auth is setup in the env then it needs to be copied over with the
make_request wsgi helper. Also renamed make_request to
make_subrequest- when I grepped for make_request I got > 250 results,
this'll make it easier to find references to this function in the
future.
Updated docs and sample confs to show tempurl needs to be before dlo and
slo as well as auth.
Change-Id: I9750555727f520a7c9fedd5f4fd31ff0f63d8088
If a client issues a COPY request for a DLO with the query param
multipart-manifest=get, then the expectation is a new
manifest. However, this wasn't the case; X-Object-Manifest wasn't
being set on the new object, so the result was a normal object with
the same contents as the old manifest (typically 0 bytes).
There was already a mechanism by which middlewares could modify COPY
requests; this commit extends that so they can set headers on the new
object.
Note that this has nothing to do with a "normal" DLO copy, i.e. one
without multipart-manifest=get. That one makes a new object that's the
concatenation of the segments, and it was working just fine.
Change-Id: I1073af9fee6e34ebdfad7b1a89aeb05e4523a151
The GET response for a DLO has a Content-Length that's computed from
the container listing, but the response body's length is determined by
the segments. If a segment grows or shrinks between when the headers
are sent and when the segment is requested, this confuses clients.
For example, if the DLO is longer than its Content-Length and the
client sends another request on the same TCP connection, then it can
get leftover object bytes instead of an HTTP status line.
Alternately, if the headers it sends fill up the TCP buffers since
Swift won't read them until the first response is done, then deadlock
hilarity ensues.
If the DLO is shorter than its Content-Length, you're pretty much
guaranteed a deadlock: the client is waiting for the rest of the
response, and the server is waiting for a new request.
Now SegmentedIterable detects both these conditions and raises an
exception so that the TCP connection gets torn down. It can't save
this request, but it can stop the next one from getting hosed too.
Change-Id: Icf79ba046ef7aaaab49ce6d0b33147332c967afc
I moved the checking of If-Match and If-None-Match out of the object
server's GET method and into swob so that everyone can use it. The
interface is similar to the Range handling; make a response with
conditional_response=True, and you get handing of If-Match and
If-None-Match.
Since the only users of conditional_response are object GET, object
HEAD, SLO, and DLO, this has the effect of adding support for If-Match
and If-None-Match to just the latter three places and nowhere
else. This makes object GET and HEAD consistent for any kind of
object, large or small.
This also fixes a bug where various conditional headers (If-*) were
passed through to the object server on segment requests, which could
cause segment requests to fail with a 304 or 412 response. Now only
certain headers are copied to the segment requests, and that doesn't
include the conditional ones, so they can't goof up the segment
retrieval.
Note that I moved SegmentedIterable to swift.common.request_helpers
because it sprouted a transitive dependency on swob, and leaving it in
utils caused a circular import.
Bonus fix: unified the handling of DiskFileQuarantined and
DiskFileNotFound in object server GET and HEAD. Now in either case, a
412 will be returned if the client said "If-Match: *". If not, the
response is a 404, just like before.
Closes-Bug: 1279076
Closes-Bug: 1280022
Closes-Bug: 1280028
Change-Id: Id2ee78346244d516b980202e990aa38ce6812de5
SLO was setting it sometimes (the PUT path), but not others (the GET
path), and DLO just wasn't doing it at all. One could already
distinguish DLO/SLO internal requests by the user agent, but we should
set swift.source too.
Change-Id: I5d5b7dd49bd1522a9c830d0abd21fff92ae79a39
This is for the same reason that SLO got pulled into middleware, which
includes stuff like automatic retry of GETs on broken connection and
the multi-ring storage policy stuff.
The proxy will automatically insert the dlo middleware at an
appropriate place in the pipeline the same way it does with the
gatekeeper middleware. Clusters will still support DLOs after upgrade
even with an old config file that doesn't mention dlo at all.
Includes support for reading config values from the proxy server's
config section so that upgraded clusters continue to work as before.
Bonus fix: resolve 'after' vs. 'after_fn' in proxy's required filters
list. Having two was confusing, so I kept the more-general one.
DocImpact
blueprint multi-ring-large-objects
Change-Id: Ib3b3830c246816dd549fc74be98b4bc651e7bace