swift

Commit Graph

Author	SHA1	Message	Date
Zuul	0a3c46e6c4	Merge "Refactor utils"	2024-05-02 00:27:02 +00:00
Shreeya Deshpande	bc3a59bdd3	Refactor utils - Move statsd client into it's own module - Move all logging functions into their own module - Move all config functions into their own module - Move all helper functions into their own module Partial-Bug: #2015274 Change-Id: Ic4b5005e3efffa8dba17d91a41e46d5c68533f9a	2024-04-30 20:27:47 +00:00
indianwhocodes	11eb17d3b2	support x-open-expired header for expired objects If the global configuration option 'enable_open_expired' is set to true in the config, then the client will be able to make a request with the header 'x-open-expired' set to true in order to access an object that has expired, provided it is in its grace period. If this config flag is set to false, the client will not be able to access any expired objects, even with the header, which is the default behavior unless the flag is set. When a client sets a 'x-open-expired' header to a true value for a GET/HEAD/POST request the proxy will forward x-backend-open-expired to storage server. The storage server will allow clients that set x-backend-open-expired to open and read an object that has not yet been reaped by the object-expirer, even after the x-delete-at time has passed. The header is always ignored when used with temporary URLs. Co-Authored-By: Anish Kachinthaya <akachinthaya@nvidia.com> Related-Change: I106103438c4162a561486ac73a09436e998ae1f0 Change-Id: Ibe7dde0e3bf587d77e14808b169c02f8fb3dddb3	2024-04-26 10:13:40 +01:00
Alistair Coles	dc3eda7e89	proxy: don't send multi-part terminator when no parts sent If the proxy timed out while reading a replicated policy multi-part response body, it would transform the ChunkReadTimeout to a StopIteration. This masks the fact that the backend read has terminated unexpectedly. The document_iters_to_multipart_byteranges would complete iterating over parts and send a multipart terminator line, even though no parts may have been sent. This patch removes the conversion of ChunkReadTmeout to StopIteration. The ChunkReadTimeout that is now raised prevents the document_iters_to_multipart_byteranges 'for' loop completing and therefore stops the multi-part terminator line being sent. It is raised from the GetOrHeadHandler similar to other scenarios that raise ChunkReadTimeouts while the resp body is being read. A ChunkReadTimeout exception handler is removed in the _iter_parts_from_response method. This handler was previously never reached (because StopIteration rather than ChunkReadTimeout was raised from _get_next_response_part), but if it were reached (i.e. with this change) then it would repeat logging of the error and repeat incrementing the node's error counter. This change in the GetOrHeadHandler mimics a similar change in the ECFragGetter [1]. [1] Related-Chage: I0654815543be3df059eb2875d9b3669dbd97f5b4 Co-Authored-By: Tim Burke <tim.burke@gmail.com> Change-Id: I6dd53e239f5e7eefcf1c74229a19b1df1c989b4a	2024-02-05 10:28:40 +00:00
Alistair Coles	a16e1f55a7	Improve unit tests for proxy GET ChunkReadTimeouts Unit test changes only: - Add tests for some resuming replicated GET scenarios. - Add test to cover resuming GET fast_forward "failing" when range read is complete. - Add test to verify different node_timeout for account and container vs object controller getters. - Refactor proxy.test_server.py tests to split out different scenarios. Drive-by: remove some ring device manipulation setup that's not needed. Change-Id: I38c7fa648492c9bd2173ecf92f89e423bee4abf3 Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>	2024-01-25 14:13:48 +00:00
Zuul	966340aeed	Merge "Remove per-service auto_create_account_prefix"	2023-12-01 01:48:57 +00:00
indianwhocodes	e375ce5bf0	replication network aware helper for container-updater in proxy Change-Id: I303a1ff97325654478c2a47af056deda37696b7b	2023-11-22 17:47:38 +00:00
Takashi Kajinami	49b19613d2	Remove per-service auto_create_account_prefix The per-service option was deprecated almost 4 years ago[1]. [1] `4601548dab` Change-Id: I45f7678c9932afa038438ee841d1b262d53c9da8	2023-11-22 01:58:03 +09:00
Alistair Coles	965908d753	proxy: emit metric when updating namespaces memcache set fails Previously, if a memcache set command had an unexpected response then an error was logged but no exception raised from MemcacheRing.set. With this patch, when MemcacheRing.set fails in this way, a MemcacheConnectionError will be raised if 'raise_on_error' is True. This exception is now handled when the proxy tries to set updating namespaces in memcache, and a counter metric emitted with name 'object.shard_updating.cache.set_error'. If the set is successful an 'object.shard_updating.cache.set' counter metric is emitted. The logging of MemcacheRing errors is improved to include the memcache key being set, in line with exception logging. Change-Id: Icfc97751c80f4bb4a3373796c01bff6ed5843937	2023-10-26 18:15:08 +01:00
Tim Burke	9f385c07f3	proxy: Get rid of MetricsPrefixLoggerAdapter It adds another layer of indirection and state for the sake of labeling; longer term it'll be easier to be explicit at the point of emission. Related-Change: I0522b1953722ca96021a0002cf93432b973ce626 Change-Id: Ieebafb19c3fa60334aff2914ab1ae70b8f140342	2023-08-21 13:50:24 -07:00
Tim Burke	e5d730dc56	proxy: Get rid of iter_nodes helper All it did was proxy through to NodeIter, anyway. Change-Id: Ifec8d3a40f00141a73f6e50efe0b53b382ab2ef3	2023-08-16 10:16:18 -07:00
Zuul	a57479a023	Merge "Add FakeStatsdClient to unit tests"	2023-08-08 20:40:33 +00:00
Zuul	e81f0657db	Merge "Save some effort when discovering used_source_etag"	2023-08-08 13:11:23 +00:00
Tim Burke	fe09ef2581	Save some effort when discovering used_source_etag Change-Id: I35d682e58419f141429877ac6a6964132f7e658b	2023-08-07 12:16:32 +01:00
Matthew Oliver	00bfc425ce	Add FakeStatsdClient to unit tests Currently we simply mock calls in the FakeLogger for calls statsd calls, and there are also some helper methods for counting and collating metrics that were called. This Fakelogger is overloaded and doesn't simulate the real world. In real life we use a Statsdclient that is attached to the logger. We've been in the situation where unit tests pass but the statsd client stacktraces because we don't actually fake the statsdclient based off the real one and let it's use its internal logic. This patch creates a new FakeStatsdClient that is based off the real one, this can then be used (like the real statsd client) and attached to the FakeLogger. There is quite a bit of churn in tests to make this work, because we now have to looking into the fake statsd client to check the faked calls made. The FakeStatsdClient does everything the real one does, except overrides the _send method and socket creation so no actual statsd metrics are emitted. Change-Id: I9cdf395e85ab559c2b67b0617f898ad2d6a870d4	2023-08-07 10:10:45 +01:00
Jianjian Huo	bc300a516b	proxy: add new metrics to account/container_info cache for skip/miss This patch will add more granularity to metrics of account_info or container_info cache and related backend lookups. Before this patch, related metrics are: 1.account.info.cache.[hit\|miss\|skip] 2.container.info.cache.[hit\|miss\|skip] With this patch, they are going to become: 1.account/container.info.infocache.hit cache hits with infocache. 2.account/container.info.cache.hit cache hits with memcache. 3.account/container.info.cache.[miss\|skip\|disabled] .<status_int> Those are operations made to backend due to below reasons. miss: cache misses. skip: the selective skips per skip percentage config. disabled: memcache is disabled. For each kind of operation metrics, suffix <status_int> will count operations with different status. Then a sum of all status sub-metrics will the total metrics of that operation. UpgradeImpact ============= Metrics dashboard will need updates to display those changed metrics correctly, also some infocache metrics are newly added, please see above message for all changes needed. Change-Id: I60a9f1c349b4bc78ecb850fb26ae56eb20fa39c6	2023-08-02 11:12:01 -07:00
Tim Burke	c51e81f640	proxy: Bring back logging/metrics for get__info requests A while back, we changed get_account_info and get_container_info to call the proxy-server app directly, rather than whatever was right of the current middleware. This reduced backend request amplification on cache misses. However, it also* meant that we stopped emitting logs or metrics in the proxy for these backend requests. This was an unfortunate and unintended break from earlier behavior. Now, extend the middleware decorating we're doing in loadapp() to include a "logged app", i.e., the app wrapped by it's right-most proxy-logging middleware. If there is not logging middleware (such as would happen for the backend servers), the "logged app" will be the main app. Make account and container info requests through that app, so we get logging and metrics again. Closes-Bug: #2027726 Related-Change: I49447c62abf9375541f396f984c91e128b8a05d5 Change-Id: I3f531f904340e4c8407185ed64b41d7d614a308a	2023-08-01 15:58:58 -07:00
Zuul	42f76159ac	Merge "Fix handling of non-ASCII accounts"	2023-07-05 22:48:12 +00:00
Zuul	d952951347	Merge "move test_GET_pipeline to BaseTestObjectController"	2023-06-23 23:41:03 +00:00
Tim Burke	b46b735a3e	Fix handling of non-ASCII accounts Related-Change: I4ecfae2bca6ffa08ad15e584579ebce707f4628d Related-Change: I1e244c231753b8f4b6f1cf95cb0ae4c3c959ae0f Change-Id: Ia386736b9b283858931794690538871b6e1ad9c8	2023-06-13 15:28:41 -07:00
Jianjian Huo	f150a5035c	proxy: print logs when write new shard range cache into memcache. Change-Id: I0e13e0148a6e19fd6d0a9438604aa791f948502a	2023-06-13 11:18:18 -07:00
Clay Gerrard	c4c9d5a40b	move test_GET_pipeline to BaseTestObjectController Change-Id: I1f26483dbc5935052ec8df0c874ee38070b84d5b	2023-05-12 12:24:42 -05:00
Shreeya Deshpande	647ee83906	Unit test for keepalive timeout Create a unit test to verify client timeout for multiple requests Change-Id: I974e01cd2cb18f4ea87c3966dbf4b06bff22ed39	2023-05-10 09:01:41 -07:00
Jianjian Huo	6ff90ea73e	Proxy: restructure cached updating shard ranges Restructure the shard ranges that are stored in memcache for object updating to only persist the essential attributes of shard ranges in memcache (lower bounds and names), so the aggregate of memcache values is much smaller and retrieval will be much faster too. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Co-Authored-By: Tim Burke <tim.burke@gmail.com> UpgradeImpact ============= The cache key for updating shard ranges in memcached is renamed from 'shard-updating/<account>/<container>' to 'shard-updating-v2/<account>/<container>', and cache data is changed to be a list of [lower bound, name]. As a result, this will invalid all existing updating shard ranges stored in the memcache cluster. Change-Id: If98af569f99aa1ac79b9485ce9028fdd8d22576b	2023-03-06 22:20:02 -08:00
Tim Burke	b68cc893f7	proxy: Reduce round-trips to memcache and backend on info misses Following a memcache restart in a SAIO, I've seen the following happen during an object HEAD: - etag_quoter wants to get account/container info to decide whether to quote-wrap or not - account info is a cache miss, so we make a no-auth'ed HEAD to the next filter in the pipeline - eventually this gets down to ratelimit, which also wants to get account info - still a cache miss, so we make a separate HEAD that eventually talks to the backend and populates cache - ratelimit realizes it can't ratelimit the request and lets the original HEAD through to the backend There's a related bug about how something similar can happen when the backend gets overloaded, but everything is working -- we just ought to be talking straight to the proxy app. Note that there's likely something similar going on with container info, but the hardcoded 10% sampling rate makes it harder to see if you're monitoring raw metric streams. I thought I fixed this in the related change, but no :-/ Change-Id: I49447c62abf9375541f396f984c91e128b8a05d5 Related-Change: If9249a42b30e2a2e7c4b0b91f947f24bf891b86f Related-Bug: #1883214	2023-03-01 15:02:42 -08:00
indianwhocodes	9ec90d4d56	proxy-server exception logging shows replication_ip/port Adding a "use_replication" field to the node dict, a helper function to set use_replication dict value for a node copy by looking up the header value for x-backend-use-replication-network Change-Id: Ie05af464765dc10cf585be851f462033fc6bdec7	2023-02-10 09:34:59 +00:00
Alistair Coles	37ba5577a7	Delete unused FakeObjectController The FakeObjectController class has not been used for since [1]. [1] Related-Change: Ib3b3830c246816dd549fc74be98b4bc651e7bace Change-Id: I338f89a7862b3b423f229655dcf83ddd3b0b2758	2023-02-09 13:33:37 +00:00
Jianjian Huo	2f32d07707	swift_proxy: add memcache skip success/error stats for shard range. This patch will add more granularity to shard operation cache or backend metrics, and then remove some of existing and duplicated metrics. Before this patch, related metrics are: 1.shard_<op>.cache.[hit\|miss\|skip] 2.shard_<op>.backend.<status_int> where op is 'listing' or 'updating'. With this patch, they are going to become: 1.shard_<op>.infocache.hit cache hits with infocache. 2.shard_<op>.cache.hit cache hits with memcache. 3.shard_<op>.cache.[miss\|bypass\|skip\|force_skip\|disabled\|error] .<status_int> Those are operations made to backend due to below reasons. miss: cache misses. bypass: metadata didn't support a cache lookup skip: the selective skips per skip percentage config. force_skip: the request with 'x-newest' header. disabled: memcache is disabled. error: memcache connection error. For each kind of operation metrics, suffix <status_int> will count operations with different status. Then a sum of all status sub-metrics will the total metrics of that operation. UpgradeImpact ============= Metrics dashboard will need updates to display those changed metrics correctly, also infocache metrics are newly added, please see above message for all changes needed. Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: Ib8be30d3969b4b4808664c43e94db53d10e6ef4c	2023-01-20 08:37:06 -08:00
Tim Burke	69b18e3c50	tests: Remove references to soft_lock As best as I can tell, this has never been an interface. Change-Id: I42e4b82a7af8a81e497e68ad25ac3bc4d0d74970	2023-01-11 14:05:34 -08:00
Tim Burke	f6ac7d4491	Tolerate absolute-form request targets We've seen S3 clients expecting to be able to send request lines like GET https://cluster.domain/bucket/key HTTP/1.1 instead of the expected GET /bucket/key HTTP/1.1 Testing against other, independent servers with something like ( echo -n $'GET https://www.google.com/ HTTP/1.1\r\nHost: www.google.com\r\nConnection: close\r\n\r\n' ; sleep 1 ) \| openssl s_client -connect www.google.com:443 suggests that it may be reasonable to accept them; the RFC even goes so far as to say > To allow for transition to the absolute-form for all requests in some > future version of HTTP, a server MUST accept the absolute-form in > requests, even though HTTP/1.1 clients will only send them in > requests to proxies. (See https://datatracker.ietf.org/doc/html/rfc7230#section-5.3.2) Fix it at the protocol level, so everywhere else we can mostly continue to assume that PATH_INFO starts with a / like we always have. Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: I04012e523f01e910f41d5a41cdd86d3d2a1b9c59	2023-01-03 12:49:30 -08:00
Tim Burke	597887dedc	Extract SwiftHttpProtocol to its own module Change-Id: I35cade2c46eb6acb66c064cde75d78173f46864c	2022-12-06 11:15:53 -08:00
Alistair Coles	7a0118fca0	proxy-server: include suppression time in error limit log Also, add some more test coverage and tidy up some tests. Change-Id: Icf3a1df87d52dc5f95e15e676da229a3323326a2	2022-11-21 10:52:11 +00:00
Jianjian Huo	38124221d7	Proxy: add metrics related to error limiter. There are three new metrics added: 1. 'error_limiter.incremented_limit', when one node has accumulated enough errors to trigger error suppression. 2. 'error_limiter.forced_limit', when one node runs into serious error like Insufficient Storage. 3. 'error_limiter.is_limited', when one node is error limited during node selection process due to suppression_limit. UpgradeImpact: There is change of log format when 'ERROR Insufficient Storage' is raised, from '%(msg)s %(node)s' to 'Node will be error limited from now: %s, error: %s'. Change-Id: Id9bec9b46dad82163517fd71cfdda5b751464588	2022-11-14 17:08:17 -08:00
Alistair Coles	623024848e	proxy: extract response error handling to single method This patch introduces a check_response method to proxy server Application class. The method provides common checking and reporting of backend response errors (i.e. responses with status code >= 500). This checking was previously duplicated in proxy controller classes. The formatting and content of error logging messages is now more unform, which does result in changes to some logged error messages. Change-Id: I4b935a47f39642c11364f2a8d50a5f2ebb3b84f1	2022-11-07 20:22:39 +00:00
Alistair Coles	51730f1273	proxy: refactor error limiter to a class This patch pulls the proxy's error limiting code into an ErrorLimiter class that can be used to provide error limiting. Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Change-Id: Ie15d831278a55ffab156ddce04544317e451407d	2022-10-26 11:01:03 +01:00
Zuul	8ab6af27c5	Merge "proxy: Add a chance to skip memcache for get_*_info calls"	2022-09-26 19:08:11 +00:00
indianwhocodes	9bdb77a388	Fixes Storage-Policy AttributeError in proxy Closes-Bug: #1989140 Change-Id: I99e4a1daf550d492b28c8a82c0b7a81b2f73dffe	2022-09-09 22:12:32 -07:00
Tim Burke	5c6407bf59	proxy: Add a chance to skip memcache for get__info calls If you've got thousands of requests per second for objects in a single container, you basically NEVER want that container's info to ever fall out of memcache. If it does*, all those clients are almost certainly going to overload the container. Avoid this by allowing some small fraction of requests to bypass and refresh the cache, pushing out the TTL as long as there continue to be requests to the container. The likelihood of skipping the cache is configurable, similar to what we did for shard range sets. Change-Id: If9249a42b30e2a2e7c4b0b91f947f24bf891b86f Closes-Bug: #1883324	2022-08-30 18:49:48 +10:00
Tim Burke	9bed525bfb	memcached: Give callers the option to accept errors Auth middlewares in particular may want to know when there's a communication breakdown as opposed to a cache miss. Update our shard-range cache stats to acknowlegde the distinction. Drive-by: Log an error if all memcached servers are error-limited. Change-Id: Ic8d0915235d11124d06ec940c5be9a2edbe85c83	2022-04-28 13:20:44 -07:00
Tim Burke	bab7f93223	cors: Include `Vary: Origin` when using the request's Origin Otherwise, multiple frontends attempting to use the same data may get denials because the browser served a cached response from when it used a different origin. Change-Id: I6ec8b8ceb8c6a58e74772e57e6fe5700f6ff8db1	2022-03-09 22:18:55 -08:00
Alistair Coles	eda7d5fe3c	Deprecate LogAdapter.set_statsd_prefix Previously, the set_statsd_prefix method was used to mutate a logger's StatsdClient tail prefix after a logger was instantiated. This pattern had led to unexpected mutations (see Related-Change). The tail_prefix can now be passed as an argument to get_logger(), and is then forwarded to the StatsdClient constructor, for a more explicit assignment pattern. The set_statsd_prefix method is left in place for backwards compatibility. A DeprecationWarning will be raised if it is used to mutate the StatsdClient tail prefix. Change-Id: I7692860e3b741e1bc10626e26bb7b27399c325ab Related-Change: I0522b1953722ca96021a0002cf93432b973ce626	2022-02-07 17:46:06 +00:00
Zuul	e8cecf7fcc	Merge "Move *_swift_info functions into a new registry module"	2022-02-04 07:01:15 +00:00
Zuul	8ea49a1def	Merge "Fix statsd prefix mutation in proxy controllers"	2022-02-04 06:56:53 +00:00
Matthew Oliver	589ac355f3	Move _swift_info functions into a new registry module The _swift_info functions use in module global dicts to provide a registry mechanism for registering and getting swift info. This is an abnormal pattern and doesn't quite fit into utils. Further we looking at following this pattern for sensitive info to trim in the future. So this patch does some house cleaning and moves this registry to a new module swift.common.registry. And updates all the references to it. For backwards compat we still import the *_swift_info methods into utils for any 3rd party tools or middleware. Change-Id: I71fd7f50d1aafc001d6905438f42de4e58af8421	2022-02-03 14:41:13 +00:00
Tim Burke	9bc1c008a5	Get rid of pipeline_property Instead, ensure every middleware gets a reference to the final WSGI application. Note that this reimplements much of paste.deploy's pipeline handling, but that code hasn't changed in years, anyway. Change-Id: I2fbb21cabf72849ce84760a6d2607aa2af67f286	2022-01-27 14:40:27 -08:00
Zuul	4d48004483	Merge "proxy: Add a chance to skip memcache when looking for shard ranges"	2022-01-27 21:37:03 +00:00
Alistair Coles	6942b25cc1	Fix statsd prefix mutation in proxy controllers Swift loggers encapsulate a StatsdClient that is typically initialised with a prefix, equal to the logger name (e.g. 'proxy_server'), that is prepended to metrics names. The proxy server would previously mutate its logger's prefix, using its set_statsd_prefix method, each time a controller was instantiated, extending it with the controller type (e.g. changing the prefix 'proxy_server.object'). As a result, when an object request spawned container subrequests, for example, the statsd client would be left with a 'proxy_server.container' prefix part for subsequent object request related metrics. The proxy server logger is now wrapped with a new MetricsPrefixLoggerAdapter each time a controller is instantiated, and the adapter applies the correct prefix for the controller type for the lifetime of the controller. Change-Id: I0522b1953722ca96021a0002cf93432b973ce626	2022-01-27 09:06:15 -08:00
Tim Burke	8c6ccb5fd4	proxy: Add a chance to skip memcache when looking for shard ranges By having some small portion of calls skip cache and go straight to disk, we can ensure the cache is always kept fresh and never expires (at least, for active containers). Previously, when shard ranges fell out of cache there would frequently be a thundering herd that could overwhelm the container server, leading to 503s served to clients or an increase in async pendings. Include metrics for hit/miss/skip rates. Change-Id: I6d74719fb41665f787375a08184c1969c86ce2cf Related-Bug: #1883324	2022-01-26 18:15:09 +00:00
Zuul	4606911010	Merge "Modify log_name in internal clients' pipeline configs"	2022-01-26 12:32:43 +00:00
Alistair Coles	035d91dce5	Modify log_name in internal clients' pipeline configs Modify the 'log_name' option in the InternalClient wsgi config for the following services: container-sharder, container-reconciler, container-deleter, container-sync and object-expirer. Previously the 'log_name' value for all internal client instances sharing a single internal-client.conf file took the value configured in the conf file, or would default to 'swift'. This resulted in no distinction between logs from each internal client, and no association with the service using a particular internal client. With this change the 'log_name' value will typically be <log_route>-ic where <log_route> is the service's conf file section name. For example, 'container-sharder-ic'. Note: any 'log_name' value configured in an internal client conf file will now be ignored for these services unless the option key is preceded by 'set'. Note: by default, the logger's StatdsClient uses the log_name as its tail_prefix when composing metrics' names. However, the proxy-logging middleware overrides the tail_prefix with the hard-coded value 'proxy-server'. This change to log_name therefore does not change the statsd metric names emitted by the internal client's proxy-logging. This patch does not change the logging of the services themselves, just their internal clients. Change-Id: I844381fb9e1f3462043d27eb93e3fa188b206d05 Related-Change: Ida39ec7eb02a93cf4b2aa68fc07b7f0ae27b5439	2022-01-12 11:07:25 +00:00

1 2 3 4 5 ...

582 Commits