This lock was taken two places:
1. when cleaving into a shard broker
2. when removing misplaced objects from a shard broker
These two operations would not execute concurrently if there is just
one sharder process running. If there were more than one sharder
process, each processing different nodes, then they might visit the
same db when one is visiting the parent container and cleaving to the
shard and one is visiting the shard container and handling misplaced
objects. However, any objects merged by the cleaving process will not
be removed by the misplaced object process because the removal is
limited to a max row count that is sampled at the start of the
misplaced objects handling. It is therefore not necessary to protect
these operations with a lock.
Change-Id: Icb3f9d8843b0fe601a32006adb4dcb29779c8a06
To avoid using a 'deleted in (0, 1)' condition in the sql query to get
all deleted and undeleted objects, yield the undeleted objects
followed by the deleted objects in each shard range.
This is an alternative to [1].
[1] Related-Change: I67159e5ae6a114298cfd61dec692e5a0235df10e
Change-Id: I5e802c627f15e239023f954c5c12e5367d2bd4a0
Early versions of the proof of concept needed to include deleted
objects in container listings in order to merge multiple listing
sources in the proxy. This is not currently required.
Change-Id: I0fe405d2b1638467d7f6d669692f1c51fa9d3c85
Previously the local devices list was restricted to devices that were
included for sharding, which in the extreme case of only one device
would force all handoff shard dbs to be created on same device as the
sharding db.
Now, all devices are included for purposes of chooding a device to
create a shard db.
Also add some unit tests for partition and device filtering.
Change-Id: Id4dfa83103f89a0398ed389a39bfa7c79e2e9a09
Updating the lower attribute fo the source shard range *copy* is fine,
but is only a few steps away from a bug if the copy operation was
removed [1]. There is no need to mutate the shard range, just use a
marker variable.
[1] The shard range timestamp would have to also be updated for the
mutated shard range to ever merge, but nevertheless this is not a good
pattern.
Change-Id: I1b9e5e7d20ce628e6f13f50763c72131effc3862
Previously, if a shard db was cleaved but failed to replicate, the
cleaving would be repeated i.e. all rows in the shard range would be
merged from the retiring source db to the shard db. This is
unnecessary, and inefficient, if the shard db is still intact from the
previous attempt to cleave and replicate.
Now, once rows have been cleaved, a sync point is stored in the shard
db under the retiring source db id. This sync point is checked before
rows are merged. If the shard db was re-created or replaced by another
out-of-sync copy, the sync point will not be found and all rows will
be merged. If the shard db is unchanged, or is a copy to which the
original shard db has been replicated, the sync point will be found
and no rows will be merged.
Change-Id: Ic9a5e525afcbc265b06d30c8897f7ffa5b3f95a7
There is no need to conditionally update shard brokers when they are
created. The 'force' flag was a leftover from when own shard range
state was stored in sysmeta and the most recent writer of state would
take precedence. We consequently needed to avoid updating the
sharding meta-timestamp in sysmeta as a side-effect of getting the
shard broker.
Since [1] the meta-timestamp is managed by the shard range merging
logic so that the most recent time takes precedence rather than the
most recent writer.
[1] Related-Change: Ie65cd4ce0abc98452163f6acdfad3cce5cdd216f
Change-Id: I3f3d13cf1caa684260f2ddfdb731ca941336f319
Actually test that the reclaim method does reclaim metadata.
Replace class invocation of _reclaim_metadata() with instance
invocation.
Fix reclaim() docstring.
Change-Id: I7a473e164c8c14b26b195db9a91fea1d2cd5b267
Related-Change: Ied1373362c38bbe7bab84fe4958888b0145e68ba
This attempts to import openstack/swift3 package into swift upstream
repository, namespace. This is almost simple porting except following items.
1. Rename swift3 namespace to swift.common.middleware.s3api
1.1 Rename also some conflicted class names (e.g. Request/Response)
2. Port unittests to test/unit/s3api dir to be able to run on the gate.
3. Port functests to test/functional/s3api and setup in-process testing
4. Port docs to doc dir, then address the namespace change.
5. Use get_logger() instead of global logger instance
6. Avoid global conf instance
Ex. fix various minor issue on those steps (e.g. packages, dependencies,
deprecated things)
The details and patch references in the work on feature/s3api are listed
at https://trello.com/b/ZloaZ23t/s3api (completed board)
Note that, because this is just a porting, no new feature is developed since
the last swift3 release, and in the future work, Swift upstream may continue
to work on remaining items for further improvements and the best compatibility
of Amazon S3. Please read the new docs for your deployment and keep track to
know what would be changed in the future releases.
Change-Id: Ib803ea89cfee9a53c429606149159dd136c036fd
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Shard ranges *can't* be in the pending file, so no need to try to clear it
when fetching shard ranges. And we already try to commit down in _empty, so
is_reclaimable was previously trying to do it *twice*.
Change-Id: Ia5867b163eb4e9b516a0306a505cd2bc3dd49d43
...but never reclaim own_shard_range because we always want to know
what own_shard_range state is up until the db is unlinked.
The sharder audit process will delete a shard container if its own
shard range has been deleted for > reclaim_age.
The replicator will unlink the db if the shard container has been
deleted for > reclaim_age.
So there is potentially a window of time which is more than 2 *
reclaim_age after the own_shard_range was deleted but before the db
gets unlinked.
Change-Id: Ied1373362c38bbe7bab84fe4958888b0145e68ba
The LogAdapter txn_id is stored in a threading.local object that is a
class attribute, so all instances of the LogAdapter in same thread
will pick up the txn_id set in another instance. That means that after
the internal client has been used to make a request, all subsequent
sharder logs include the request txn_id.
The txn_id might be useful if an error occurs in _fetch_shard_ranges
but is annoying elsewhere.
Change-Id: I4e1961c13be301381907885579d4137cd0a9b16a
We don't need to log the fact that we created zero ranges, and we don't need
to log at info for *every* container we find.
It *is* handy to write down the time elapsed when we finish cleaving a shard,
though, so I don't have to correlate log lines, parse and diff dates, etc.
Change-Id: I2b2b8a6b8801082f3068ec0f264ab5d4086aa2c8
The object reconstructor will now fork all available worker processes
when operating on a subset of local devices.
Example:
A system has 24 disks, named "d1" through "d24"
reconstructor_workers = 8
invoked with --override-devices=d1,d2,d3,d4,d5,d6
In this case, the reconstructor will now use 6 worker processes, one
per disk. The old behavior was to use 2 worker processes, one for d1,
d3, and d5 and the other for d2, d4, and d6 (because 24 / 8 = 3, so we
assigned 3 disks per worker before creating another).
I think the new behavior better matches operators' expectations. If I
give a concurrent program six tasks to do and tell it to operate on up
to eight at a time, I'd expect it to do all six tasks at once, not run
two concurrent batches of three tasks apiece.
This has no effect when --override-devices is not specified. When
operating on all local devices instead of a subset, the new and old
code produce the same result.
The reconstructor's behavior now matches the object replicator's
behavior.
Change-Id: Ib308c156c77b9b92541a12dd7e9b1a8ea8307a30
Refactor to provide module level functions for the finding
of sharding and shrinking candidates, so that these can be
used by other callers.
Add unit tests.
Change-Id: Iada00e63f14238b67aaa818314fa6601eeec624e
Remove this TODO since we do not plan to take a lock before deleting a
retiring db. A lock might be required if there was a risk of the
retiring db being modified between the cleaving context being checked
and the db being deleted in _complete_sharding(). The following steps
have been taken to avoid any such modification:
1. No object updates are committed to the retiring db from the pending
updates file [1].
2. No objects are replicated to the retiring db (nor the fresh db)
after sharding begins [2].
3. Multiple attempts are made to abort any rsync_then_merge process
that started before sharding began [3].
[1] Related-Change: I268d01a373491c693b793748065d212f9703ffab
[2] Related-Change: I289f558381d028b4d3129e4e51549d3f1a58dc2f
[3] Related-Change: Ib285efbadb222b7c843fc212e5ae912ccd7b7ead
Change-Id: Ia53aa7a04b483ff91ca03de1a723602eb777a289
Otherwise, we lose track of sharding candidates and currently-sharding
containers in recon when the sharding cycle time goes past an hour.
Change-Id: I8c6721d9f03c0f738254db37478e2813976fdf1a