swift

Commit Graph

Author	SHA1	Message	Date
Matthew Oliver	00bfc425ce	Add FakeStatsdClient to unit tests Currently we simply mock calls in the FakeLogger for calls statsd calls, and there are also some helper methods for counting and collating metrics that were called. This Fakelogger is overloaded and doesn't simulate the real world. In real life we use a Statsdclient that is attached to the logger. We've been in the situation where unit tests pass but the statsd client stacktraces because we don't actually fake the statsdclient based off the real one and let it's use its internal logic. This patch creates a new FakeStatsdClient that is based off the real one, this can then be used (like the real statsd client) and attached to the FakeLogger. There is quite a bit of churn in tests to make this work, because we now have to looking into the fake statsd client to check the faked calls made. The FakeStatsdClient does everything the real one does, except overrides the _send method and socket creation so no actual statsd metrics are emitted. Change-Id: I9cdf395e85ab559c2b67b0617f898ad2d6a870d4	2023-08-07 10:10:45 +01:00
Takashi Natsume	6fd523947a	Fix misuse of assertTrue Fix misuse of assertTrue in test/unit/obj/test_reconstructor.py. Change-Id: I9c55bb16421ec85a20d3d4a0e6be43ce20c08b3c Closes-Bug: 1986776 Signed-off-by: Takashi Natsume <takanattie@gmail.com>	2022-08-17 18:05:26 +09:00
Clay Gerrard	12bc79bf01	Add ring_ip option to object services This will be used when finding their own devices in rings, defaulting to the bind_ip. Notably, this allows services to be containerized while servers_per_port is enabled: * For the object-server, the ring_ip should be set to the host ip and will be used to discover which ports need binding. Sockets will still be bound to the bind_ip (likely 0.0.0.0), with the assumption that the host will publish ports 1:1. * For the replicator and reconstructor, the ring_ip will be used to discover which devices should be replicated. While bind_ip could previously be used for this, it would have required a separate config from the object-server. Also rename object deamon's bind_ip attribute to ring_ip so that it's more obvious wherever we're using the IP for ring lookups instead of socket binding. Co-Authored-By: Tim Burke <tim.burke@gmail.com> Change-Id: I1c9bb8086994f7930acd8cda8f56e766938c2218	2022-06-02 16:31:29 -05:00
Tim Burke	1907594bd8	reconstructor: Abort just the changed policies We've already walked the disks looking for work, may as well continue with the work that's definitely still valid. Change-Id: I4c33ed5f5a66d89d259761b5ce12fb6652b28c40	2022-01-10 09:05:28 -08:00
Alistair Coles	1b3879e0da	reconstructor: include partially reverted handoffs in handoffs_remaining For a reconstructor revert job, if sync'd to sufficient other nodes, the handoff partition is considered done and handoffs_remaining is not incremented. With the new max_objects_per_revert option [1], a ssync job may appear to be complete but not all objects have yet been reverted, so handoffs remaining should be incremented. [1] Related-Change: If81760c80a4692212e3774e73af5ce37c02e8aff Change-Id: I59572f75b9b0ba331369eb7358932943b7935ff0	2021-12-03 14:37:59 +00:00
Alistair Coles	8ee631ccee	reconstructor: restrict max objects per revert job Previously the ssync Sender would attempt to revert all objects in a partition within a single SSYNC request. With this change the reconstructor daemon option max_objects_per_revert can be used to limit the number of objects reverted inside a single SSYNC request for revert type jobs i.e. when reverting handoff partitions. If more than max_objects_per_revert are available, the remaining objects will remain in the sender partition and will not be reverted until the next call to ssync.Sender, which would currrently be the next time the reconstructor visits that handoff partition. Note that the option only applies to handoff revert jobs, not to sync jobs. Change-Id: If81760c80a4692212e3774e73af5ce37c02e8aff	2021-12-03 12:43:23 +00:00
Alistair Coles	ada9f0eeb0	reconstructor: purge meta files in pure handoffs Previously, after reverting handoff files, the reconstructor would only purge tombstones and data files for the reverted fragment index. Any meta files were not purged because the partition might also be on a primary node for a different fragment index. For example, if, before the reconstructor visits, the object hash dir contained: t1#1#d.data t1#2#d.data t2.meta where frag index 1 is a handoff and gets reverted, then, after the reconstructor has visited, the hash dir should still contain: t1#2#d.data t2.meta If, before the reconstructor visits, the object hash dir contained: t1#1#d.data t2.meta then, after the reconstructor has visited, the hash dir would still contain: t2.meta The retention of meta files is undesirable when the partition is a "pure handoff" i.e. the node is not a primary for the partition for any fragment index. With this patch the meta files are purged after being reverted if the reconstructor has no sync job for the partition (i.e. the partition is a "pure handoff") and there are no more fragments to revert. Change-Id: I107af3bc2d62768e063ef3176645d60ef22fa6d4 Co-Authored-By: Tim Burke <tim.burke@gmail.com>	2021-11-24 12:20:52 +00:00
Alistair Coles	092d409c4b	reconstructor: silence traceback when purging Catch DiskFileNotExist exceptions when attempting to purge files. The file may have passed its reclaim age since being reverted and will be cleaned up when the reconstructor opens it for purging, raising a DiskFileNotExist. The exception is OK - the diskfile was about to be purged. Change-Id: I5dfdf5950c6bd7fb130ab557347fbe959270c6e9	2021-11-24 12:11:50 +00:00
Alistair Coles	e3069e6f7e	reconstructor: remove non-durable files on handoffs When a non-durable EC data fragment has been reverted from a handoff node, it should removed if its mtime is older than the commit_window. The test on the mtime was broken [1]: an incomplete file path was given and the test always returned False i.e. the file was never considered old enough to remove. As a result, non-durable files would remain on handoff nodes until their reclaim age had passed. [1] Related-Change: I0d519ebaaade35249fb7b17bd5f419ffdaa616c0 Change-Id: I7f6458af3ed753ef8700a456d5a977b847f17ee8 Closes-Bug: 1951598	2021-11-19 14:39:24 +00:00
Tim Burke	a5fbe6ca41	ec: Use replication network to get frags for reconstruction Closes-Bug: #1946267 Change-Id: Idb4fe7478275f71b4032024d6116181766ac6759	2021-10-06 15:17:01 -07:00
Alistair Coles	2696a79f09	reconstructor: retire nondurable_purge_delay option The nondurable_purge_delay option was introduced in [1] to prevent the reconstructor removing non-durable data files on handoffs that were about to be made durable. The DiskFileManager commit_window option has since been introduced [2] which specifies a similar time window during which non-durable data files should not be removed. The commit_window option can be re-used by the reconstructor, making the nondurable_purge_delay option redundant. The nondurable_purge_delay option has not been available in any tagged release and is therefore removed with no backwards compatibility. [1] Related-Change: I0d519ebaaade35249fb7b17bd5f419ffdaa616c0 [2] Related-Change: I5f3318a44af64b77a63713e6ff8d0fd3b6144f13 Change-Id: I1589a7517b7375fcc21472e2d514f26986bf5079	2021-07-19 21:18:06 +01:00
Alistair Coles	bbaed18e9b	diskfile: don't remove recently written non-durables DiskFileManager will remove any stale files during cleanup_ondisk_files(): these include tombstones and nondurable EC data fragments whose timestamps are older than reclaim_age. It can usually be safely assumed that a non-durable data fragment older than reclaim_age is not going to become durable. However, if an agent PUTs objects with specified older X-Timestamps (for example the reconciler or container-sync) then there is a window of time during which the object server has written an old non-durable data file but has not yet committed it to make it durable. Previously, if another process (for example the reconstructor) called cleanup_ondisk_files during this window then the non-durable data file would be removed. The subsequent attempt to commit the data file would then result in a traceback due to there no longer being a data file to rename, and of course the data file is lost. This patch modifies cleanup_ondisk_files to not remove old, otherwise stale, non-durable data files that were only written to disk in the preceding 'commit_window' seconds. 'commit_window' is configurable for the object server and defaults to 60.0 seconds. Closes-Bug: #1936508 Related-Change: I0d519ebaaade35249fb7b17bd5f419ffdaa616c0 Change-Id: I5f3318a44af64b77a63713e6ff8d0fd3b6144f13	2021-07-19 21:18:02 +01:00
Alistair Coles	2fd5b87dc5	reconstructor: make quarantine delay configurable Previously the reconstructor would quarantine isolated durable fragments that were more than reclaim_age old. This patch adds a quarantine_age option for the reconstructor which defaults to reclaim_age but can be used to configure the age that a fragment must reach before quarantining. Change-Id: I867f3ea0cf60620c576da0c1f2c65cec2cf19aa0	2021-07-06 16:41:08 +01:00
Alistair Coles	2934818d60	reconstructor: Delay purging reverted non-durable datafiles The reconstructor may revert a non-durable datafile on a handoff concurrently with an object server PUT that is about to make the datafile durable. This could previously lead to the reconstructor deleting the recently written datafile before the object-server attempts to rename it to a durable datafile, and consequently a traceback in the object server. The reconstructor will now only remove reverted nondurable datafiles that are older (according to mtime) than a period set by a new nondurable_purge_delay option (defaults to 60 seconds). More recent nondurable datafiles may be made durable or will remain on the handoff until a subsequent reconstructor cycle. Change-Id: I0d519ebaaade35249fb7b17bd5f419ffdaa616c0	2021-06-24 09:33:06 +01:00
Alistair Coles	46ea3aeae8	Quarantine stale EC fragments after checking handoffs If the reconstructor finds a fragment that appears to be stale then it will now quarantine the fragment. Fragments are considered stale if insufficient fragments at the same timestamp can be found to rebuild missing fragments, and the number found is less than or equal to a new reconstructor 'quarantine_threshold' config option. Before quarantining a fragment the reconstructor will attempt to fetch fragments from handoff nodes in addition to the usual primary nodes. The handoff requests are limited by a new 'request_node_count' config option. 'quarantine_threshold' defaults to zero i.e. no fragments will be quarantined. 'request node count' defaults to '2 * replicas'. Closes-Bug: 1655608 Change-Id: I08e1200291833dea3deba32cdb364baa99dc2816	2021-05-10 20:45:17 +01:00
Alistair Coles	eeaac713fd	reconstructor: gather rebuild fragments by x-data-timestamp Fix the reconstructor fragment rebuild to gather other fragments in buckets keyed by x-backend-data-timestamp rather than x-backend-timestamp. The former is the actual .data file timestamp; the latter can vary when .meta files have been written to some but not all fragment hash dirs, causing rebuild to fail. Change-Id: I8bbed8cb80b2796907492a39cd5b2d7069e1ca55 Closes-Bug: 1927720	2021-05-07 13:39:03 +01:00
Zuul	020a13ed3c	Merge "reconstructor: log more details when rebuild fails"	2021-04-28 23:07:28 +00:00
Clay Gerrard	2a312d1cd5	Cleanup tests' import of debug_logger Change-Id: I19ca860deaa6dbf388bdcd1f0b0f77f72ff19689	2021-04-27 12:04:41 +01:00
Alistair Coles	7960097f02	reconstructor: log more details when rebuild fails When the reconstructor fails to gather enough fragments to rebuild a missing fragment, log more details about the responses that it did get: - log total number of ok responses, as well as the number of useful responses, to reveal if, for example, there might have been duplicate frag indexes or mixed etags. - log the mix of error status codes received to reveal if, for example, they were all 404s. Also refactor reconstruct_fa to track all state related to a timestamp in a small data encapsulation class rather than in multiple dicts. Related-Bug: 1655608 Change-Id: I3f87933f788685775ce59f3724f17d5db948d502	2021-04-27 11:54:35 +01:00
Alistair Coles	1dceafa7d5	ssync: sync non-durable fragments from handoffs Previously, ssync would not sync nor cleanup non-durable data fragments on handoffs. When the reconstructor is syncing objects from a handoff node (a 'revert' reconstructor job) it may be useful, and is not harmful, to also send non-durable fragments if the receiver has older or no fragment data. Several changes are made to enable this. On the sending side: - For handoff (revert) jobs, the reconstructor instantiates SsyncSender with a new 'include_non_durable' option. - If configured with the include_non_durable option, the SsyncSender calls the diskfile yield_hashes function with options that allow non-durable fragments to be yielded. - The diskfile yield_hashes function is enhanced to include a 'durable' flag in the data structure yielded for each object. - The SsyncSender includes the 'durable' flag in the metadata sent during the missing_check exchange with the receiver. - If the receiver requests the non-durable object, the SsyncSender includes a new 'X-Backend-No-Commit' header when sending the PUT subrequest for the object. - The SsyncSender includes the non-durable object in the collection of synced objects returned to the reconstructor so that the non-durable fragment is removed from the handoff node. On the receiving side: - The object server includes a new 'X-Backend-Accept-No-Commit' header in its response to SSYNC requests. This indicates to the sender that the receiver has been upgraded to understand the 'X-Backend-No-Commit' header. - The SsyncReceiver is enhanced to consider non-durable data when determining if the sender's data is wanted or not. - The object server PUT method is enhanced to check for and 'X-Backend-No-Commit' header before committing a diskfile. If a handoff sender has both a durable and newer non-durable fragment for the same object and frag-index, only the newer non-durable fragment will be synced and removed on the first reconstructor pass. The durable fragment will be synced and removed on the next reconstructor pass. Change-Id: I1d47b865e0a621f35d323bbed472a6cfd2a5971b Closes-Bug: 1778002	2021-01-20 12:00:10 +00:00
Ade Lee	5320ecbaf2	replace md5 with swift utils version md5 is not an approved algorithm in FIPS mode, and trying to instantiate a hashlib.md5() will fail when the system is running in FIPS mode. md5 is allowed when in a non-security context. There is a plan to add a keyword parameter (usedforsecurity) to hashlib.md5() to annotate whether or not the instance is being used in a security context. In the case where it is not, the instantiation of md5 will be allowed. See https://bugs.python.org/issue9216 for more details. Some downstream python versions already support this parameter. To support these versions, a new encapsulation of md5() is added to swift/common/utils.py. This encapsulation is identical to the one being added to oslo.utils, but is recreated here to avoid adding a dependency. This patch is to replace the instances of hashlib.md5() with this new encapsulation, adding an annotation indicating whether the usage is a security context or not. While this patch seems large, it is really just the same change over and again. Reviewers need to pay particular attention as to whether the keyword parameter (usedforsecurity) is set correctly. Right now, all of them appear to be not used in a security context. Now that all the instances have been converted, we can update the bandit run to look for these instances and ensure that new invocations do not creep in. With this latest patch, the functional and unit tests all pass on a FIPS enabled system. Co-Authored-By: Pete Zaitcev Change-Id: Ibb4917da4c083e1e094156d748708b87387f2d87	2020-12-15 09:52:55 -05:00
Tim Burke	3c3cab2645	Stop invalidating suffixes post-SSYNC We only need the invalidation post-rsync, since rsync was changing data on disk behind Swift's back. Move the REPLICATE call down into the rsync() helper function and drop it from the reconstructor entirely. Change-Id: I576901344f1f3abb33b52b36fde0b25b43e54c8a Closes-Bug: #1818709	2020-11-16 08:30:07 -06:00
Romain LE DISEZ	8c0a1abf74	Fix a race condition in case of cross-replication In a situation where two nodes does not have the same version of a ring and they both think the other node is the primary node of a partition, a race condition can lead to the loss of some of the objects of the partition. The following sequence leads to the loss of some of the objects: 1. A gets and reloads the new ring 2. A starts to replicate/revert the partition P to node B 3. B (with the old ring) starts to replicate/revert the (partial) partition P to node A => replication should be fast as all objects are already on node A 4. B finished replication of (partial) partition P to node A 5. B remove the (partial) partition P after replication succeeded 6. A finishes replication of partition P to node B 7. A removes the partition P 8. B gets and reloads the new ring All data transfered between steps 2 and 5 will be lost as they are not anymore on node B and they are also removed from node A. This commit make the replicator/reconstructor to hold a replication_lock on partition P so that remote node cannot start an opposite replication. Change-Id: I29acc1302a75ed52c935f42485f775cd41648e4d Closes-Bug: #1897177	2020-10-14 19:16:18 -04:00
Zuul	3cceec2ee5	Merge "Update hacking for Python3"	2020-04-09 15:05:28 +00:00
Andreas Jaeger	96b56519bf	Update hacking for Python3 The repo is Python using both Python 2 and 3 now, so update hacking to version 2.0 which supports Python 2 and 3. Note that latest hacking release 3.0 only supports version 3. Fix problems found. Remove hacking and friends from lower-constraints, they are not needed for installation. Change-Id: I9bd913ee1b32ba1566c420973723296766d1812f	2020-04-03 21:21:07 +02:00
Romain LE DISEZ	804776b379	Optimize obj replicator/reconstructor healthchecks DaemonStrategy class calls Daemon.is_healthy() method every 0.1 seconds to ensure that all workers are running as wanted. On object replicator/reconstructor daemons, is_healthy() check if the rings changed to decide if workers must be created/killed. With large rings, this operation can be CPU intensive, especially on low-end CPU. This patch: - increases the check interval to 5 seconds by default, because none of these daemons are critical for performance (they are not in the datapath). But it allows each daemon to change this value if necessary - ensures that before doing a computation of all devices in the ring, object replicator/reconstructor checks that the ring really changed (by checking the mtime of the ring.gz files) On an Atom N2800 processor, this patch reduced the CPU usage of the main object replicator/reconstructor from 70% of a core to 0%. Change-Id: I2867e2be539f325778e2f044a151fd0773a7c390	2020-04-01 08:03:32 -04:00
Tim Burke	ff5ea003b3	ec: log durability of frags that fail to reconstruct Whether the frag is durable or non-durable greatly affects how much I care whether I can reconstruct it. Change-Id: Ie6f46267d4bb567ecc0cc195d1fd7ce55c8cb325	2019-08-20 22:23:00 -07:00
Tim Burke	e8e7106d14	py3: port obj/reconstructor tests All of the swift changes we needed for this were already done elsewhere. Change-Id: Ib2c26fdf7bd36ed1cccd5dbd1fa208f912f4d8d5	2019-06-10 08:31:41 -07:00
Kuan-Lin Chen	37fa12cd83	Do not sync suffixes when remote rejects reconstructor sync The commit `a0fcca1e` makes reconstructor not sync suffixes when remote reject reconstructor revert. However, the exact same logic should be applied to SYNC job as well. REPLICATE requests aren't generally needed when using SSYC (which the reconstructor always does). If a ssync_sender fails to finish a sync the reconstructor should skip the REPLICATE call entirely and move on to the next partition without causing any useless remote IO. Change-Id: Ida50539e645ea7e2950ba668c7f031a8d10da787 Closes-Bug: #1665141	2019-06-03 18:39:51 +08:00
Clay Gerrard	585bf40cc0	Simplify empty suffix handling We really only need to have one way to cleanup empty suffix dirs, and that's normally during suffix hashing which only happens when invalid suffixes get rehashed. When we iterate a suffix tree using yield hashes, we may discover an expired or otherwise reapable hashdir - when this happens we will now simply invalidate the suffix so that the next rehash can clean it up. This simplification removes an mis-behavior in the handling between the normal suffix rehashing cleanup and what was implemented in ssync. Change-Id: I5629de9f2e9b2331ed3f455d253efc69d030df72 Related-Change-Id: I2849a757519a30684646f3a6f4467c21e9281707 Closes-Bug: 1816501	2019-03-18 15:09:54 -05:00
Clay Gerrard	ea8e545a27	Rebuild frags for unmounted disks Change the behavior of the EC reconstructor to perform a fragment rebuild to a handoff node when a primary peer responds with 507 to the REPLICATE request. Each primary node in a EC ring will sync with exactly three primary peers, in addition to the left & right nodes we now select a third node from the far side of the ring. If any of these partners respond unmounted the reconstructor will rebuild it's fragments to a handoff node with the appropriate index. To prevent ssync (which is uninterruptible) receiving a 409 (Conflict) we must give the remote handoff node the correct backend_index for the fragments it will recieve. In the common case we will use determistically different handoffs for each fragment index to prevent multiple unmounted primary disks from forcing a single handoff node to hold more than one rebuilt fragment. Handoff nodes will continue to attempt to revert rebuilt handoff fragments to the appropriate primary until it is remounted or rebalanced. After a rebalance of EC rings (potentially removing unmounted/failed devices), it's most IO efficient to run in handoffs_only mode to avoid unnecessary rebuilds. Closes-Bug: #1510342 Change-Id: Ief44ed39d97f65e4270bf73051da9a2dd0ddbaec	2019-02-08 18:04:55 +00:00
Clay Gerrard	fb0e7837af	Cleanup EC and SSYNC frag index parameters An object node should reject a PUT with 409 when the timestamp is less than or equal to the timestamp of an existing version of the object. However, if the PUT is part of an SSYNC, and the fragment archive has a different index than the one on disk we may store it. We should store it we're the primary holder for that fragment index. Back before the related change we used to revert fragments to handoffs and it caused a lot of problems. Mainly multiple frag indexes piling up on one handoff node. Eventually we settled on handoffs only reverting to primaries but there was some crufty flailing left over. When EC frag duplication (multi-region EC) came in we also added a new complexity because a node's primary index (the index in part_nodes list) was no longer universially equal to the EC frag index (the storage policy backend end index). There was a few places we assumed node_index == frag_index, some of which caused bugs which we've fixed. This change tries to clean all that up. Related-Change-Id: Ie351d8342fc8e589b143f981e95ce74e70e52784 Change-Id: I3c5935e2d5f1cd140cf52df779596ebd6442686c	2019-02-04 17:02:17 -06:00
Tim Burke	1d4309dd71	misc test cleanup Change-Id: I21823e50af6d60bb5ee02427ddc499d700c43577 Related-Change: Ib33ff305615b2d342f0d673ded5ed8f11b663feb Related-Change: I0855d8a549d1272d056963abed03338f80d68a53	2019-01-18 18:09:56 +00:00
Clay Gerrard	1d9204ac43	Use remote frag index to calculate suffix diff ... instead of the node index, which is different in multi-region EC and wrongly leads us to always think we're out of sync. Closes-Bug: #1811268 Change-Id: I0855d8a549d1272d056963abed03338f80d68a53	2019-01-11 14:32:14 -06:00
Tim Burke	3420921a33	Clean up HASH_PATH_* patching Previously, we'd sometimes shove strings into HASH_PATH_PREFIX or HASH_PATH_SUFFIX, which would blow up on py3. Now, always use bytes. Change-Id: Icab9981e8920da505c2395eb040f8261f2da6d2e	2018-11-01 20:52:33 +00:00
Zuul	614e85d479	Merge "Remove empty directories after a revert job"	2018-11-01 04:34:04 +00:00
Clay Gerrard	441df4fc93	Use correct headers in reconstructor requests As long as the reconstructor collects parts from all policies each job must be considered to have it's storage policy index and we can't use global state for policy specific headers. It's good hygiene to avoid mutating the global state regardless. Under load with multiple policies we observed essentially empty handoff parts "re-appearing" on nodes until adding these changes. Closes-Bug: #1671180 Change-Id: Id0e5f2743e05d81da7b26b2f05c90ba3c68e4d72	2018-10-31 08:41:56 -05:00
Alexandre Lécuyer	d306345ddd	Remove empty directories after a revert job Currently, the reconstructor will not remove empty object and suffixes directories after processing a revert job. This will only happen during its next run. This patch will attempt to remove these empty directories immediately, while we have the inodes cached. Change-Id: I5dfc145b919b70ab7dae34fb124c8a25ba77222f	2018-10-26 09:29:14 +02:00
Zuul	3de21d945b	Merge "Remove empty part dirs during ssync replication"	2018-06-23 02:19:18 +00:00
Samuel Merritt	ecf47553b5	Make final stats dump after reconstructor runs once When running in multiprocess mode, the object reconstructor would periodically aggregate its workers' recon data into a single recon measurement. However, at the end of the run, all that was left in recon was the last periodic measurement; any work that took place after that point was not recored in the aggregate. However, it was recorded in the per-disk stats that the worker processes emitted. This commit adds a final recon aggregation after the worker processes have finished. Change-Id: Ia6a3a931e9e7a23824765b2ab111a5492e509be8	2018-06-04 15:24:45 -07:00
Samuel Merritt	a19548b3e6	Remove empty part dirs during ssync replication When we're pushing data to a remote node using ssync, we end up walking the entire partition's directory tree. We were already removing reclaimable (i.e. old) tombstones and non-durable EC data files plus their containing hash dirs, but we were leaving the suffix dirs around for future removal, and we weren't cleaning up partition dirs at all. Now we remove as much of the directory structure as we can, even up to the partition dir, as soon as we observe that it's empty. Change-Id: I2849a757519a30684646f3a6f4467c21e9281707 Closes-Bug: 1706321	2018-05-01 17:18:22 -07:00
Samuel Merritt	26538d3f62	Make multiprocess reconstructor's logs more readable. Much like the multiprocess object replicator, the reconstructor runs multiple concurrent worker processes who all log to the same destination. We re-use the same solution: prepend a prefix with the worker index and the pid to all the logs emitted from each worker process. Example log line: [worker 12/24 pid=8539] I did a thing Change-Id: Ie2f98201193952be4d387bbb01c7c6fccc017a8a	2018-04-25 11:18:35 -07:00
Samuel Merritt	c4751d0d55	Make reconstructor go faster with --override-devices The object reconstructor will now fork all available worker processes when operating on a subset of local devices. Example: A system has 24 disks, named "d1" through "d24" reconstructor_workers = 8 invoked with --override-devices=d1,d2,d3,d4,d5,d6 In this case, the reconstructor will now use 6 worker processes, one per disk. The old behavior was to use 2 worker processes, one for d1, d3, and d5 and the other for d2, d4, and d6 (because 24 / 8 = 3, so we assigned 3 disks per worker before creating another). I think the new behavior better matches operators' expectations. If I give a concurrent program six tasks to do and tell it to operate on up to eight at a time, I'd expect it to do all six tasks at once, not run two concurrent batches of three tasks apiece. This has no effect when --override-devices is not specified. When operating on all local devices instead of a subset, the new and old code produce the same result. The reconstructor's behavior now matches the object replicator's behavior. Change-Id: Ib308c156c77b9b92541a12dd7e9b1a8ea8307a30	2018-04-25 11:18:35 -07:00
Samuel Merritt	728b4ba140	Add checksum to object extended attributes Currently, our integrity checking for objects is pretty weak when it comes to object metadata. If the extended attributes on a .data or .meta file get corrupted in such a way that we can still unpickle it, we don't have anything that detects that. This could be especially bad with encrypted etags; if the encrypted etag (X-Object-Sysmeta-Crypto-Etag or whatever it is) gets some bits flipped, then we'll cheerfully decrypt the cipherjunk into plainjunk, then send it to the client. Net effect is that the client sees a GET response with an ETag that doesn't match the MD5 of the object and Swift has no way of detecting and quarantining this object. Note that, with an unencrypted object, if the ETag metadatum gets mangled, then the object will be quarantined by the object server or auditor, whichever notices first. As part of this commit, I also ripped out some mocking of getxattr/setxattr in tests. It appears to be there to allow unit tests to run on systems where /tmp doesn't support xattrs. However, since the mock is keyed off of inode number and inode numbers get re-used, there's lots of leakage between different test runs. On a real FS, unlinking a file and then creating a new one of the same name will also reset the xattrs; this isn't the case with the mock. The mock was pretty old; Ubuntu 12.04 and up all support xattrs in /tmp, and recent Red Hat / CentOS releases do too. The xattr mock was added in 2011; maybe it was to support Ubuntu Lucid Lynx? Bonus: now you can pause a test with the debugger, inspect its files in /tmp, and actually see the xattrs along with the data. Since this patch now uses a real filesystem for testing filesystem operations, tests are skipped if the underlying filesystem does not support setting xattrs (eg tmpfs or more than 4k of xattrs on ext4). References to "/tmp" have been replaced with calls to tempfile.gettempdir(). This will allow setting the TMPDIR envvar in test setup and getting an XFS filesystem instead of ext4 or tmpfs. THIS PATCH SIGNIFICANTLY CHANGES TESTING ENVIRONMENTS With this patch, every test environment will require TMPDIR to be using a filesystem that supports at least 4k of extended attributes. Neither ext4 nor tempfs support this. XFS is recommended. So why all the SkipTests? Why not simply raise an error? We still need the tests to run on the base image for OpenStack's CI system. Since we were previously mocking out xattr, there wasn't a problem, but we also weren't actually testing anything. This patch adds functionality to validate xattr data, so we need to drop the mock. `test.unit.skip_if_no_xattrs()` is also imported into `test.functional` so that functional tests can import it from the functional test namespace. The related OpenStack CI infrastructure changes are made in https://review.openstack.org/#/c/394600/. Co-Authored-By: John Dickinson <me@not.mn> Change-Id: I98a37c0d451f4960b7a12f648e4405c6c6716808	2017-11-03 13:30:05 -04:00
Pavel Kvasnička	163fb4d52a	Always require device dir for containers For test purposes (e.g. saio probetests) even if mount_check is False, still require check_dir for account/container server storage when real mount points are not used. This behavior is consistent with the object-server's checks in diskfile. Co-Author: Clay Gerrard <clay.gerrard@gmail.com> Related lp bug #1693005 Related-Change-Id: I344f9daaa038c6946be11e1cf8c4ef104a09e68b Depends-On: I52c4ecb70b1ae47e613ba243da5a4d94e5adedf2 Change-Id: I3362a6ebff423016bb367b4b6b322bb41ae08764	2017-09-01 10:32:12 -07:00
Clay Gerrard	63ca3a74ef	Drop reconstructor stats when worker has no devices If you're watching (new) node's reconstruction_last time to ensure a cycle finishes since the last ring rebalance you won't ever see reconstructors with no devices drop recon stats. Change-Id: I84c07fc6841119b00d1a74078fe53f4ce637187b	2017-08-21 17:50:10 +01:00
Romain LE DISEZ	69df458254	Allow to rebuild a fragment of an expired object When a fragment of an expired object was missing, the reconstructor ssync job would send a DELETE sub-request. This leads to situation where, for the same object and timestamp, some nodes have a data file, while others can have a tombstone file. This patch forces the reconstructor to reconstruct a data file, even for expired objects. DELETE requests are only sent for tombstoned objects. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Closes-Bug: #1652323 Change-Id: I7f90b732c3268cb852b64f17555c631d668044a8	2017-08-04 23:05:08 +02:00
Tim Burke	8d05325f03	Test reconstruct() with no EC policies We have a test for get_local_devices, but let's make some broader assertions as well. Related-Bug: #1707595 Change-Id: Ifa696207ffdb3b39650dfeaa3e7c6cfda94050db	2017-08-01 09:18:07 +01:00
Kota Tsuyuzaki	45cc1d02d0	Fix reconstructer to be able to run non ec policy environment Since the related change, object-reconstructor gathers the local devices for ec policy via get_local_devices method but the method causes TypeError when attempting reduce for empty set list. the list can be empty when no EC config found in swift.conf. This patch fixes the get_local_devices to return empty set even when no ec config in swift.conf without errors. Co-Authored-By: Kirill Zaitsev <k.zaitsev@me.com> Change-Id: Ic121fb547966787a43f9eae83c91bb2bf640c4be Related-Change: `701a172afa` Closes-Bug: #1707595	2017-07-31 18:46:22 +09:00
Alistair Coles	56a18ac9b7	Add unit test for ObjectReconstructor.is_healthy Add a test that verifies that get_all_devices does fetch devices from the ring. Related-Change: I28925a37f3985c9082b5a06e76af4dc3ec813abe Change-Id: Ie2f83694f14f9a614b5276bbb859b9a3c0ec5dcb	2017-07-27 14:14:26 +01:00

1 2 3

122 Commits