swift

Commit Graph

Author	SHA1	Message	Date
Alistair Coles	2da150b890	Fix diskfile test failing on macOS The existing test fails on macOS because the value of errno.ENODATA is platform dependent. On macOS ENODATA is 96: % man 2 intro\|grep ENODATA 96 ENODATA No message available. Change-Id: Ibc760e641d4351ed771f2321dba27dc4e5b367c1	2024-02-26 11:12:47 +00:00
Zuul	51ae9b00c9	Merge "lint: Consistently use assertIsInstance"	2024-02-08 04:36:37 +00:00
Tim Burke	ce9e56a6d1	lint: Consistently use assertIsInstance This has been available since py32 and was backported to py27; there is no point in us continuing to carry the old idiom forward. Change-Id: I21f64b8b2970e2dd5f56836f7f513e7895a5dc88	2024-02-07 15:48:39 -08:00
Tim Burke	c5d743347c	diskfile: Ignore invalid suffixes in invalidations file Change-Id: I0357939cf3a12712e6719c257705cf565e3afc8b	2024-02-06 20:24:03 -08:00
Tim Burke	1edf7df755	Partially revert "Pull libc-related functions out to a separate module" This reverts the fallocate- and punch_hole-related parts of commit `c78a5962b5`. Closes-Bug: #2031035 Related-Change: I3e26f8d4e5de0835212ebc2314cac713950c85d7 Change-Id: I8050296d6982f70bb64a63765b25d287a144cb8d	2023-08-18 17:50:31 +00:00
Clay Gerrard	d393fa482a	testing xattr metadata with py3.8 Change-Id: Ib3c9b274bbd2e643f3febbdf54a8a43f4775944b	2023-05-02 17:12:18 -05:00
Tim Burke	7807540962	Properly read py2 object metadata on py3 Replicated, unencrypted metadata is written down differently on py2 vs py3, and has been since we started supporting py3. Fortunately, we can inspect the raw xattr bytes to determine whether the pickle was written using py2 or py3, so we can properly read legacy py2 meta under py3 rather than hitting a unicode error. Closes-Bug: #2012531 Change-Id: I5876e3b88f0bb1224299b57541788f590f64ddd4	2023-05-02 13:20:03 -07:00
Tim Burke	105727bc89	tests: Fix test_cleanup_ondisk_files_commit_window `much_older` has to be much older than `older`, or the test gets flakey. See - test_cleanup_ondisk_files_reclaim_non_data_files, - test_cleanup_ondisk_files_reclaim_with_data_files, and - test_cleanup_ondisk_files_reclaim_with_data_files_legacy_durable for a more standard definition of "much_older". Closes-Bug: #2017024 Change-Id: I1eaa501827f4475ddc0c20d82cf0a6d4a5e98f75	2023-04-20 12:51:06 -07:00
Tim Burke	c78a5962b5	Pull libc-related functions out to a separate module Partial-Bug: #2015274 Change-Id: I3e26f8d4e5de0835212ebc2314cac713950c85d7	2023-04-12 13:17:10 -07:00
Tim Burke	20b48a6900	Clean up a bunch of deprecation warnings pytest still complains about some 20k warnings, but the vast majority are actually because of eventlet, and a lot of those will get cleaned up when upper-constraints picks up v0.33.2. Change-Id: If48cda4ae206266bb41a4065cd90c17cbac84b7f	2022-12-27 13:34:00 -08:00
indianwhocodes	745a7f04fe	Add validation method for metadata in ECDiskfile Historically, we've allowed objects to get on disk that may not have had all of their required EC metadata. Add a new method to sanity-check metadata in the auditor, and quarantine such invalid data. Additionally, call validation early in the reconstructor's reconstruct_fa() method so we do not have to attempt reconstruction for invalid EC fragments. Change-Id: I73551007843d27041f594923c59e6fd89caf17e5	2022-09-06 13:56:06 -07:00
indianwhocodes	d7c08d8ea7	Make the object replicator and auditor more robust to ENODATA Currently when the replicator or auditor hits an ENODATA corruption we bail. This patch will allow them to skip and continue in an effort to make progress on the work that needs to be done. Change-Id: Id4c6e31332a93eb64a7660cec2a42ef686a84d22	2022-07-07 17:21:38 +00:00
Zuul	92bb7108c6	Merge "diskfile: Quarantine hashdirs on ENODATA"	2022-05-09 21:08:28 +00:00
Matthew Oliver	f92be1bdd0	Obj Auditor: Quarantine ENODATA We've seen a disk corruption take place that throws -ENODATA when a diskfile it opened. We currently don't quarantine on this IOError. This patch catches and quarantines the datafile. The catch happens in diskfile.open in order to group similar quarantines in the same place for easier maintenance. Change-Id: I729e3ba5f19160aa09dd8eb983d9594cb102e85b	2022-03-31 08:14:52 +11:00
Tim Burke	d94ab813f2	diskfile: Quarantine hashdirs on ENODATA Change-Id: I7d3f81b421def4e962fe662b55d5565acc53e721	2022-03-08 21:28:05 -08:00
Alistair Coles	ada9f0eeb0	reconstructor: purge meta files in pure handoffs Previously, after reverting handoff files, the reconstructor would only purge tombstones and data files for the reverted fragment index. Any meta files were not purged because the partition might also be on a primary node for a different fragment index. For example, if, before the reconstructor visits, the object hash dir contained: t1#1#d.data t1#2#d.data t2.meta where frag index 1 is a handoff and gets reverted, then, after the reconstructor has visited, the hash dir should still contain: t1#2#d.data t2.meta If, before the reconstructor visits, the object hash dir contained: t1#1#d.data t2.meta then, after the reconstructor has visited, the hash dir would still contain: t2.meta The retention of meta files is undesirable when the partition is a "pure handoff" i.e. the node is not a primary for the partition for any fragment index. With this patch the meta files are purged after being reverted if the reconstructor has no sync job for the partition (i.e. the partition is a "pure handoff") and there are no more fragments to revert. Change-Id: I107af3bc2d62768e063ef3176645d60ef22fa6d4 Co-Authored-By: Tim Burke <tim.burke@gmail.com>	2021-11-24 12:20:52 +00:00
Tim Burke	c9b0795b9e	diskfile: Stop raising ENOENT when we race to mark durable Change-Id: Ia952303a16457a5075e9a4e060806855c21577f5	2021-07-29 09:21:16 -07:00
Alistair Coles	bbaed18e9b	diskfile: don't remove recently written non-durables DiskFileManager will remove any stale files during cleanup_ondisk_files(): these include tombstones and nondurable EC data fragments whose timestamps are older than reclaim_age. It can usually be safely assumed that a non-durable data fragment older than reclaim_age is not going to become durable. However, if an agent PUTs objects with specified older X-Timestamps (for example the reconciler or container-sync) then there is a window of time during which the object server has written an old non-durable data file but has not yet committed it to make it durable. Previously, if another process (for example the reconstructor) called cleanup_ondisk_files during this window then the non-durable data file would be removed. The subsequent attempt to commit the data file would then result in a traceback due to there no longer being a data file to rename, and of course the data file is lost. This patch modifies cleanup_ondisk_files to not remove old, otherwise stale, non-durable data files that were only written to disk in the preceding 'commit_window' seconds. 'commit_window' is configurable for the object server and defaults to 60.0 seconds. Closes-Bug: #1936508 Related-Change: I0d519ebaaade35249fb7b17bd5f419ffdaa616c0 Change-Id: I5f3318a44af64b77a63713e6ff8d0fd3b6144f13	2021-07-19 21:18:02 +01:00
Tim Burke	f25592f3cd	Quarantine on IOErrors while reading Change-Id: I3b89d10096f29652f691428065b9801d57d77ec7	2021-07-02 16:46:08 +00:00
Clay Gerrard	2a312d1cd5	Cleanup tests' import of debug_logger Change-Id: I19ca860deaa6dbf388bdcd1f0b0f77f72ff19689	2021-04-27 12:04:41 +01:00
Zuul	7594d97f38	Merge "relinker: retry links from older part powers"	2021-04-02 00:27:22 +00:00
Alistair Coles	3bdd01cf4a	relinker: retry links from older part powers If a previous partition power increase failed to cleanup all files in their old partition locations, then during the next partition power increase the relinker may find the same file to relink in more than one source partition. This currently leads to an error log due to the second relink attempt getting an EEXIST error. With this patch, when an EEXIST is raised, the relinker will attempt to create/verify a link from older partition power locations to the next part power location, and if such a link is found then suppress the error log. During the relink step, if an alternative link is verified and if a file is found that is neither linked to the next partition power location nor in the current part power location, then the file is removed during the relink step. That prevents the same EEXIST occuring again during the cleanup step when it may no longer be possible to verify that an alternative link exists. For example, consider identical filenames in the N+1th, Nth and N-1th partition power locations, with the N+1th being linked to the Nth: - During relink, the Nth location is visited and its link is verified. Then the N-1th location is visited and an EEXIST error is encountered, but the new check verifies that a link exists to the Nth location, which is OK. - During cleanup the locations are visited in the same order, but files are removed so that the Nth location file no longer exists when the N-1th location is visited. If the N-1th location still has a conflicting file then existence of an alternative link to the Nth location can no longer be verified, so an error would be raised. Therefore, the N-1th location file must be removed during relink. The error is only suppressed for tombstones. The number of partition power location that the relinker will look back over may be configured using the link_check_limit option in a conf file or --link-check-limit on the command line, and defaults to 2. Closes-Bug: 1921718 Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c	2021-04-01 18:56:57 +01:00
Alistair Coles	2a672b6788	py2: diskfile.valid_suffix should allow unicode When the replicator update method calls diskfile._get_hashes on a local partition the partition path can be unicode [1]. If _get_hashes does a listdir then os.listdir will return unicode values, and any new suffixes will be written to the hashes.pkl as unicode. When the hashes.pkl is next read, these unicode suffixes would previously fail the valid_suffix check and read_hashes would return {'valid': False}, causing a rehash. When _get_hashes is called from an object server REPLICATE handler, the device path is bytes and any listdir would return suffixes as bytes, so the suffix would then be written to hashes.pkl as bytes, and not cause an error when read_hashes was next called. This patch modifies diskfile.valid_suffix to allow suffixes to be unicode or byte strings. [1] the replication job device is set from the ring devs, and rings can have unicode device values. Change-Id: I5291aea2b4f1b3dbb6906f943ccf63ad8e716ef7	2021-03-31 15:01:19 +01:00
Tim Burke	ade3b28636	diskfile: Prevent get_hashes from creating missing partition dirs The relinker currently blindly calls get_hashes on partitions in the new, upper half of the partition space, which can cause extra handoffs to be created if you're rerunning the relink step for safety. Even if the relinker were smarter, though, there's no reason that a curious operator should end up creating empty handoffs just because they're poking devices with REPLICATE requests Drive-by: be a little more strict about the "suffixes" we're willing to put in hashes.invalid and read out of hashes.pkl. Change-Id: I9aace80088cd00d02c418fe4d782b662fb5c8bcf	2021-03-29 10:26:05 -07:00
Zuul	05284376e3	Merge "Make more use of get_partition_for_hash in unit tests"	2021-03-17 19:34:59 +00:00
Alistair Coles	acf72e8803	Make more use of get_partition_for_hash in unit tests Refactor to make more use of the utils.get_partition_for_hash() helper function that was added in [1]. [1] Related-Change: I5fbf8671f2ee63467d46f651e2c24529a2697626 Change-Id: I0d3c4cae88ed50a09e892e6d77b5aaa96e6195a5	2021-03-17 12:51:54 +00:00
Clay Gerrard	90660c90de	relinker: use abs path index in part power replace Change-Id: I799bdacb7ef5850096c2178bfb12959f4205920d	2021-03-17 12:29:23 +00:00
Alistair Coles	4bb78de611	Fix os.link exceptions in diskfile.relink_paths Previously, when a diskfile was relinked in a new partition, the diskfile.relink_paths() would test for the existence of the link before attempting to create it. This 'test then set' might race with another process creating the same diskfile (e.g. a concurrent PUT which created the very same object in an old partition that the relinker is also relinking). The race might lead to an attempt to create a link that already exists (despite the pre-check) and an EEXIST exception being raised. This patch modifies relink_paths to tolerate EEXIST exceptions but only if the existing file is a link to the target file. Otherwise the EEXIST exception is still raised. The 'check_existing' argument for relink_paths is removed since it is no longer relevant. relink_paths() might also raise an ENOENT exception if, while a diskfile is being relinked in the 'new' partition dir, another process PUTs a newer version of the same object and as a result cleans up the older diskfile in the 'old' partition before the first process has called os.link(). This patch modifies relink_paths() to tolerate ENOENT exceptions from os.link() but only if the target path (i.e. the file in the 'old' partition) no longer exists. Otherwise the ENOENT will still be raised. This patch also modifies relink_paths() to return a boolean indicating if the hard link was created by the call to the method (True) or not (False). Closes-Bug: 1917658 Change-Id: I65d4b83c56699ed566fbfb7068f9d2681ca67aa3	2021-03-12 09:24:49 -06:00
Alistair Coles	419a8ae8ac	Add unit test for diskfile.relink_paths Add unit tests for diskfile.relink_paths to cover the new object dir existing but not being a dir. Change-Id: I5fbf8671f2ee63467d46f651e2c24529a2697626 Related-Change: Ia041f7b7dbd4e264072325b4b52718fef9966708	2021-03-05 16:47:18 +11:00
Alistair Coles	24eb16fc2a	Fix race in diskfile.relink_paths When a diskfile is written in a new diskfile dir, then in diskfile.relink_paths the existence of the diskfile dir is tested before a call to makedirs. If more than one process is taking the same action concurrently they may race such that one tests for existence, then another call makedirs, then the first calls makedirs and an exception is raised. The first process therefore fails to relink the object file, which is particularly problematic if the first process was writing a newer object file. This patch changes relink_paths to unconditionally attempt the makedirs but catch and ignore any EEXIST errors. Change-Id: Ia041f7b7dbd4e264072325b4b52718fef9966708	2021-03-03 14:29:12 +00:00
Alistair Coles	1dceafa7d5	ssync: sync non-durable fragments from handoffs Previously, ssync would not sync nor cleanup non-durable data fragments on handoffs. When the reconstructor is syncing objects from a handoff node (a 'revert' reconstructor job) it may be useful, and is not harmful, to also send non-durable fragments if the receiver has older or no fragment data. Several changes are made to enable this. On the sending side: - For handoff (revert) jobs, the reconstructor instantiates SsyncSender with a new 'include_non_durable' option. - If configured with the include_non_durable option, the SsyncSender calls the diskfile yield_hashes function with options that allow non-durable fragments to be yielded. - The diskfile yield_hashes function is enhanced to include a 'durable' flag in the data structure yielded for each object. - The SsyncSender includes the 'durable' flag in the metadata sent during the missing_check exchange with the receiver. - If the receiver requests the non-durable object, the SsyncSender includes a new 'X-Backend-No-Commit' header when sending the PUT subrequest for the object. - The SsyncSender includes the non-durable object in the collection of synced objects returned to the reconstructor so that the non-durable fragment is removed from the handoff node. On the receiving side: - The object server includes a new 'X-Backend-Accept-No-Commit' header in its response to SSYNC requests. This indicates to the sender that the receiver has been upgraded to understand the 'X-Backend-No-Commit' header. - The SsyncReceiver is enhanced to consider non-durable data when determining if the sender's data is wanted or not. - The object server PUT method is enhanced to check for and 'X-Backend-No-Commit' header before committing a diskfile. If a handoff sender has both a durable and newer non-durable fragment for the same object and frag-index, only the newer non-durable fragment will be synced and removed on the first reconstructor pass. The durable fragment will be synced and removed on the next reconstructor pass. Change-Id: I1d47b865e0a621f35d323bbed472a6cfd2a5971b Closes-Bug: 1778002	2021-01-20 12:00:10 +00:00
Ade Lee	5320ecbaf2	replace md5 with swift utils version md5 is not an approved algorithm in FIPS mode, and trying to instantiate a hashlib.md5() will fail when the system is running in FIPS mode. md5 is allowed when in a non-security context. There is a plan to add a keyword parameter (usedforsecurity) to hashlib.md5() to annotate whether or not the instance is being used in a security context. In the case where it is not, the instantiation of md5 will be allowed. See https://bugs.python.org/issue9216 for more details. Some downstream python versions already support this parameter. To support these versions, a new encapsulation of md5() is added to swift/common/utils.py. This encapsulation is identical to the one being added to oslo.utils, but is recreated here to avoid adding a dependency. This patch is to replace the instances of hashlib.md5() with this new encapsulation, adding an annotation indicating whether the usage is a security context or not. While this patch seems large, it is really just the same change over and again. Reviewers need to pay particular attention as to whether the keyword parameter (usedforsecurity) is set correctly. Right now, all of them appear to be not used in a security context. Now that all the instances have been converted, we can update the bandit run to look for these instances and ensure that new invocations do not creep in. With this latest patch, the functional and unit tests all pass on a FIPS enabled system. Co-Authored-By: Pete Zaitcev Change-Id: Ibb4917da4c083e1e094156d748708b87387f2d87	2020-12-15 09:52:55 -05:00
Tim Burke	5eaf15486e	Have REPLICATE with suffixes just append to hashes.invalid This only applies to post-sync REPLICATE calls, none of which actually look at the response anyway. Change-Id: I1de62140e7eb9a23152bb9fdb1fa0934e827bfda	2020-11-16 08:21:16 -06:00
Romain LE DISEZ	8c0a1abf74	Fix a race condition in case of cross-replication In a situation where two nodes does not have the same version of a ring and they both think the other node is the primary node of a partition, a race condition can lead to the loss of some of the objects of the partition. The following sequence leads to the loss of some of the objects: 1. A gets and reloads the new ring 2. A starts to replicate/revert the partition P to node B 3. B (with the old ring) starts to replicate/revert the (partial) partition P to node A => replication should be fast as all objects are already on node A 4. B finished replication of (partial) partition P to node A 5. B remove the (partial) partition P after replication succeeded 6. A finishes replication of partition P to node B 7. A removes the partition P 8. B gets and reloads the new ring All data transfered between steps 2 and 5 will be lost as they are not anymore on node B and they are also removed from node A. This commit make the replicator/reconstructor to hold a replication_lock on partition P so that remote node cannot start an opposite replication. Change-Id: I29acc1302a75ed52c935f42485f775cd41648e4d Closes-Bug: #1897177	2020-10-14 19:16:18 -04:00
Tim Burke	69b8165cd8	obj: _finalize_durable may succeed even when data file is missing Ordinarily, an ENOENT in _finalize_durable should mean something's gone off the rails -- we expected to be able to mark data durable, but couldn't! If there are concurrent writers, though, it might actually be OK: Client A writes .data Client B writes .data Client B finalizes .data and cleans up on-disk files Client A tries to finalize but .data is gone Previously, the above would cause the object server to 500, and if enough of them did this, the client may see a 503. Now, call it good so clients get 201s. Change-Id: I4e322a7be23870a62aaa6acee8435598a056c544 Closes-Bug: #1719860	2020-04-24 15:39:37 -07:00
Tim Burke	668242c422	pep8: Turn on E305 Change-Id: Ia968ec7375ab346a2155769a46e74ce694a57fc2	2020-04-03 21:22:38 +02:00
Andreas Jaeger	96b56519bf	Update hacking for Python3 The repo is Python using both Python 2 and 3 now, so update hacking to version 2.0 which supports Python 2 and 3. Note that latest hacking release 3.0 only supports version 3. Fix problems found. Remove hacking and friends from lower-constraints, they are not needed for installation. Change-Id: I9bd913ee1b32ba1566c420973723296766d1812f	2020-04-03 21:21:07 +02:00
Tim Burke	55cdb55075	Lower the log level for falling back to mkstemp We recently started trying to use O_TMPFILE without the kernel version check, we gracefully degrade when it isn't available, and we don't give operators any way to avoid the O_TMPFILE attempt -- logging at warning seems way too noisy. Change-Id: I8f42127fd6eb930282161930c65e8614f3e3175f Related-Change: I3599e2ab257bcd99467aee83b747939afac639d8	2020-01-22 11:00:18 -08:00
Zuul	4316c53e4a	Merge "Remove invalid dict entries from hashes.pkl"	2019-06-17 20:38:45 +00:00
Christian Schwede	c9e78d15e1	Remove invalid dict entries from hashes.pkl If the data in a hashes.pkl is corrupted but still de-serialized without errors, it will mess up the replication and gets never fixed. This happens for example if one of the keys is a NULL byte. This patch checks if the dict keys in hashes.pkl are valid strings and invalidates it if not. Closes-Bug: 1830881 Change-Id: I84b062d062ff49935feed0aee3e1963bb72eb5ea	2019-05-31 11:07:42 +02:00
Tim Burke	40bd9f16a3	py3: port diskfile Change-Id: Ie73f9df460f9014014168d6e60767e66b24d30b0	2019-05-22 13:25:54 -07:00
Tim Burke	8868e2cd50	Clean up test_write_read_metadata I'm pretty sure I botched the intent of the test when I touched it in https://github.com/openstack/swift/commit/36c4297 As a side-effect, clean up a warning on py2: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal Change-Id: I4da8264f85e688bc5109016c197378ca6dfe06a5	2019-05-22 13:25:28 -07:00
Alexandre Lécuyer	6008b0a83f	Fix quarantine when object path is not a directory In the diskfile module, there are a couple of cases where we would quarantine the suffix dir if a single object within is not a directory. Change the code so that we quarantine only the object. Change-Id: I35eb16f568c00d973eb089aedd0e5a5619c25a2f	2019-05-21 15:10:56 -07:00
Clay Gerrard	585bf40cc0	Simplify empty suffix handling We really only need to have one way to cleanup empty suffix dirs, and that's normally during suffix hashing which only happens when invalid suffixes get rehashed. When we iterate a suffix tree using yield hashes, we may discover an expired or otherwise reapable hashdir - when this happens we will now simply invalidate the suffix so that the next rehash can clean it up. This simplification removes an mis-behavior in the handling between the normal suffix rehashing cleanup and what was implemented in ssync. Change-Id: I5629de9f2e9b2331ed3f455d253efc69d030df72 Related-Change-Id: I2849a757519a30684646f3a6f4467c21e9281707 Closes-Bug: 1816501	2019-03-18 15:09:54 -05:00
Zuul	32fee1dc13	Merge "Change how O_TMPFILE support is detected"	2019-02-01 00:58:39 +00:00
Thiago da Silva	0668731839	Change how O_TMPFILE support is detected Previously o_tmpfile support was detected by checking the kernel version as it was officially introduced in XFS in 3.15. The problem is that RHEL has backported the support to at least RHEL 7.6 but the kernel version is not updated. This patch changes o_tmpfile is detected by actually attempting to open a file with the O_TMPFILE flag and keeps the information cached in DiskFileManager so that the check only happens once while process is running. Change-Id: I3599e2ab257bcd99467aee83b747939afac639d8	2019-01-31 18:35:39 +00:00
Zuul	b608c9cda7	Merge "Treat all invalid frag indexes the same"	2019-01-26 13:29:34 +00:00
Zuul	b9d2c08e8d	Merge "Fix SSYNC concurrency on partition"	2018-12-11 23:34:54 +00:00
Tim Burke	43412c73de	Treat all invalid frag indexes the same Note that we also want to prevent any IndexErrors that might arise from pickled data already on disk. Change-Id: If56190134edddb4b778e41a2cdd360008fbfb0e4 Closes-Bug: 1772378	2018-12-10 10:48:12 -08:00
Romain LE DISEZ	014d46f9a7	Fix SSYNC concurrency on partition Commit `e199192cae` introduced the ability to have multiple SSYNC running on a single device. It misses a security to ensure that only one SSYNC request can be running on a partition. This commit update replication_lock to lock N times the device, then lock once the partition related to a SSYNC request. Change-Id: Id053ed7dd355d414d7920dda79a968a1c6677c14	2018-12-04 14:47:26 +01:00

1 2 3 4 5

239 Commits