Reserve the namespace starting with the NULL byte for internal
use-cases. Backend services will allow path names to include the NULL
byte in urls and validate names in the reserved namespace. Database
services will filter all names starting with the NULL byte from
responses unless the request includes the header:
X-Backend-Allow-Reserved-Names: true
The proxy server will not allow path names to include the NULL byte in
urls unless a middlware has set the X-Backend-Allow-Reserved-Names
header. Middlewares can use the reserved namespace to create objects
and containers that can not be directly manipulated by clients. Any
objects and bytes created in the reserved namespace will be aggregated
to the user's account totals.
When deploying internal proxys developers and operators may configure
the gatekeeper middleware to translate the X-Allow-Reserved-Names header
to the Backend header so they can manipulate the reserved namespace
directly through the normal API.
UpgradeImpact: it's not safe to rollback from this change
Change-Id: If912f71d8b0d03369680374e8233da85d8d38f85
Previously, we were storing the WSGI-style UTF-8-bytes-decoded-as-Latin-1
strings in the JSON field, and sending them back to eventlet directly.
If running in a mixed py2/py3 cluster, replication would eventually get
that back to the py2 server, and worse, the native-string version would
get back to the py3 server! Then on GET or HEAD, eventlet would barf
if any characters were outside the Latin-1 range.
Closes-Bug: #1837805
Change-Id: I31d942e72fd7bfbb1db4dbb1dd522dff69969e5d
This version scatters the cancer of WSGI strings around, but
reduces the size of the patch. In particular, we can continue
to iterate strings.
Change-Id: Ia5815602d05925c5de110accc4dfb1368203bd8d
Make some json -> (text, xml) stuff in a common module, reference that in
account/container servers so we don't break existing clients (including
out-of-date proxies), but have the proxy controllers always force a json
listing.
This simplifies operations on listings (such as the ones already happening in
decrypter, or the ones planned for symlink and sharding) by only needing to
consider a single response type.
There is a downside of larger backend requests for text/plain listings, but
it seems like a net win?
Change-Id: Id3ce37aa0402e2d8dd5784ce329d7cb4fbaf700d
Often, we want the current timestamp. May as well improve the ergonomics
a bit and provide a class method for it.
Change-Id: I3581c635c094a8c4339e9b770331a03eab704074
For now, last modified timestamp is supported only on
object listing. (i.e. GET container)
For example:
GET container with json format results in like as:
[{"hash": "d41d8cd98f00b204e9800998ecf8427e", "last_modified":
"2015-06-10T04:58:23.460230", "bytes": 0, "name": "object",
"content_type": "application/octet-stream"}]
However, container listing (i.e. GET account) shows just a dict
consists of ("name", "bytes", "name") for each container.
For example:
GET accounts with json format result in like as:
[{"count": 0, "bytes": 0, "name": "container"}]
This patch is for supporting last_modified key in the container
listing results as well as object listing like as:
[{"count": 0, "bytes": 0, "name": "container", "last_modified":
"2015-06-10T04:58:23.460230"}]
This patch is changing just output for listing. The original
timestamp to show the last modified is already in container table
of account.db as a "put_timestamp" column.
Note that this patch *DOESN'T* change the put_timestamp semantics.
i.e. the last_modified timestamp will be changed only at both PUT
container and POST container.
(PUT object doesn't affect the timestamp)
Note that the tuple format of returning value from
swift.account.backend.AccountBroker.list_containers is now
(name, object_count, bytes_used, put_timestamp, 0)
* put_timestamp is added *
Original discussion was in working session at Vancouver Summit.
Etherpads are around here:
https://etherpad.openstack.org/p/liberty-swift-contributors-meetuphttps://etherpad.openstack.org/p/liberty-container-listing-update
DocImpact
Change-Id: Iba0503916f1481a20c59ae9136436f40183e4c5b
This change adds the ability to tell the container or account server to
reverse their listings. This is done by sending a reverse=TRUE_VALUE,
Where TRUE_VALUE is one of the values true can be in common/utils:
TRUE_VALUES = set(('true', '1', 'yes', 'on', 't', 'y'))
For example:
curl -i -X GET -H "X-Auth-Token: $TOKEN" $STORAGE_URL/c/?reverse=on
I borrowed the swapping of the markers code from Kevin's old change,
thanks Kevin. And Tim Burke added some real nuggets of awesomeness.
DocImpact
Co-Authored-By: Kevin McDonald <kmcdonald@softlayer.com>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Implements: blueprint reverse-object-listing
Change-Id: I5eb655360ac95042877da26d18707aebc11c02f6
a1c32702, 736cf54a, and 38787d0f remove uses of `simplejson` from
various parts of Swift in favor of the standard libary `json`
module (introduced in Python 2.6). This commit performs the remaining
`simplejson` to `json` replacements, removes two comments highlighting
quirks of simplejson with respect to Unicode, and removes the references
to it in setup documentation and requirements.txt.
There were a lot of places where we were importing json from
swift.common.utils, which is less intuitive than a direct `import json`,
so that replacement is made as well.
(And in two more tiny drive-bys, we add some pretty-indenting to an XML
fragment and use `super` rather than naming a base class explicitly.)
Change-Id: I769e88dda7f76ce15cf7ce930dc1874d24f9498a
The iteritems() of Python 2 dictionaries has been renamed to items() on
Python 3. According to a discussion on the openstack-dev mailing list,
the overhead of creating a temporary list using dict.items() on Python 2
is very low because most dictionaries are small:
http://lists.openstack.org/pipermail/openstack-dev/2015-June/066391.html
Patch generated by the following command:
sed -i 's,iteritems,items,g' \
$(find swift -name "*.py") \
$(find test -name "*.py")
Change-Id: I6070bb6c684be76e8e77222a7d280ec6edd43496
The normalized form of the X-Timestamp header looks like a float with a fixed
width to ensure stable string sorting - normalized timestamps look like
"1402464677.04188"
To support overwrites of existing data without modifying the original
timestamp but still maintain consistency a second internal offset
vector is append to the normalized timestamp form which compares and
sorts greater than the fixed width float format but less than a newer
timestamp. The internalized format of timestamps looks like
"1402464677.04188_0000000000000000" - the portion after the underscore
is the offset and is a formatted hexadecimal integer.
The internalized form is not exposed to clients in responses from Swift.
Normal client operations will not create a timestamp with an offset.
The Timestamp class in common.utils supports internalized and normalized
formatting of timestamps and also comparison of timestamp values. When the
offset value of a Timestamp is 0 - it's considered insignificant and need not
be represented in the string format; to support backwards compatibility during
a Swift upgrade the internalized and normalized form of a Timestamp with an
insignificant offset are identical. When a timestamp includes an offset it
will always be represented in the internalized form, but is still excluded
from the normalized form. Timestamps with an equivalent timestamp portion
(the float part) will compare and order by their offset. Timestamps with a
greater timestamp portion will always compare and order greater than a
Timestamp with a lesser timestamp regardless of it's offset. String
comparison and ordering is guaranteed for the internalized string format, and
is backwards compatible for normalized timestamps which do not include an
offset.
The reconciler currently uses a offset bump to ensure that objects can move to
the wrong storage policy and be moved back. This use-case is valid because
the content represented by the user-facing timestamp is not modified in way.
Future consumers of the offset vector of timestamps should be mindful of HTTP
semantics of If-Modified and take care to avoid deviation in the response from
the object server without an accompanying change to the user facing timestamp.
DocImpact
Implements: blueprint storage-policies
Change-Id: Id85c960b126ec919a481dc62469bf172b7fb8549
This change updates the account HEAD handler to report out per
policy object and byte usage for the account. Cumulative values
are still reported and policy names are used in the report
out (unless request is sent to an account server directly in
which case policy indexes are used for easier accounting).
Below is an example of the relevant HEAD response for a cluster
with 3 policies and just a few small objects:
X-Account-Container-Count: 3
X-Account-Object-Count: 3
X-Account-Bytes-Used: 21
X-Storage-Policy-Bronze-Object-Count: 1
X-Storage-Policy-Bronze-Bytes-Used: 7
X-Storage-Policy-Silver-Object-Count: 1
X-Storage-Policy-Silver-Bytes-Used: 7
X-Storage-Policy-Gold-Object-Count: 1
X-Storage-Policy-Gold-Bytes-Used: 7
Set a DEFAULT storage_policy_index for existing container rows during
migration.
Copy existing object_count and bytes_used in policy_stat table during
migration.
DocImpact
Implements: blueprint storage-policies
Change-Id: I5ec251f9a8014dd89764340de927d09466c72221
Place all the methods related to on-disk layout and / or configuration
into a new common module that can be shared by the various modules
using the same on-disk layout.
Change-Id: I27ffd4665d5115ffdde649c48a4d18e12017e6a9
Signed-off-by: Peter Portante <peter.portante@redhat.com>
There were a few different places where we had some repeated code to
figure out what format an account or container listing response should
be in (text, JSON, or XML). Now that's been pulled into a single
function.
As part of this, you can now raise HTTPException subclasses in proxy
controllers instead of laboriously plumbing error responses all the
way back up to swift.proxy.server.Application.handle_request(). This
lets us avoid certain ugly patterns, like the one where a method
returns a tuple of (x, y, z, error) and the caller has to see if it
got (value, value, value, None) or (None, None, None, errorvalue). Now
we can just raise the error.
Change-Id: I316873df289160d526487ad116f6fbb9a575e3de
This lets us get rid of some really repetitive exception conversion
code, like everybody that called common.utils.get_param() had to catch
a UnicodeDecodeError and turn that into returning HTTPBadRequest. Now
get_param() just raises HTTPBadRequest directly, and the __call__
methods in the account/container/object servers catch and return
it. All that "except UnicodeDecodeError" stuff goes away.
Refactored the path splitting and device validation in the object
server too.
There are other things that can benefit from this as well, but this
patch is big enough.
Change-Id: I2be96ef757d04bfd6af180cd9c92393c841db21f
Commit 8f9b135 fixed a bug where an XML attribute could have arbitrary
characters jammed into it, resulting in a document with arbitrary
tags... and it did remove the ability to get an arbitrary XML document
out of the object server. However, it left a couple of ways to get a
malformed XML document, one example of which is an account named ".
This fixes up the remaining ways and ensures you always get a
well-formed XML document in the account-listing response. Also, it
adds a unit test for the escaping of the container name; this was
already working, just untested.
If you look in the discussion for bug 1183884, you'll see that the
review comments there are basically "seems good, but could use a unit
test". (The astute reader will note that I am one of the guilty
parties in that review.)
I found this bug while writing the missing unit test.
The moral of this story is left as an exercise for the reader.
Fixes bug 1183884 harder.
Change-Id: I84b24dd930ba1bb6c4f674f8d3996639dedbce15
If account autocreation is on and the proxy receives a GET request for
a nonexistent account, it'll fake up a response that makes it look as
if the account exists, but without reifying that account into sqlite
DB files.
That faked-out response was just fine as long as you wanted a
text/plain response, but it didn't handle the case of format=json or
format=xml; in those cases, the response would still be
text/plain. This can break clients, and certainly causes crashes in
swift3. Now, those responses match as closely as possible.
The code for generating an account-listing response has been pulled
into (the new) swift.account.utils module, and both the fake response
and the real response use it, thus ensuring that they can't
accidentally diverge. There's also a new probe test for that
non-divergence.
Also, cleaned up a redundant matching of the Accept header in the code
for generating the account listing.
Note that some of the added tests here pass with or without this code
change; they were added because the code I was changing (parts of the
real account GET) wasn't covered by tests.
Bug 1183169
Change-Id: I2a3b8e5d9053e4d0280a320f31efa7c90c94bb06