Commit Graph

580 Commits

Author SHA1 Message Date
indianwhocodes 11eb17d3b2 support x-open-expired header for expired objects
If the global configuration option 'enable_open_expired' is set
to true in the config, then the client will be able to make a
request with the header 'x-open-expired' set to true in order
to access an object that has expired, provided it is in its
grace period. If this config flag is set to false, the client
will not be able to access any expired objects, even with the
header, which is the default behavior unless the flag is set.

When a client sets a 'x-open-expired' header to a true value for a
GET/HEAD/POST request the proxy will forward x-backend-open-expired to
storage server. The storage server will allow clients that set
x-backend-open-expired to open and read an object that has not yet
been reaped by the object-expirer, even after the x-delete-at time
has passed.

The header is always ignored when used with temporary URLs.

Co-Authored-By: Anish Kachinthaya <akachinthaya@nvidia.com>
Related-Change: I106103438c4162a561486ac73a09436e998ae1f0
Change-Id: Ibe7dde0e3bf587d77e14808b169c02f8fb3dddb3
2024-04-26 10:13:40 +01:00
Alistair Coles dc3eda7e89 proxy: don't send multi-part terminator when no parts sent
If the proxy timed out while reading a replicated policy multi-part
response body, it would transform the ChunkReadTimeout to a
StopIteration. This masks the fact that the backend read has
terminated unexpectedly. The document_iters_to_multipart_byteranges
would complete iterating over parts and send a multipart terminator
line, even though no parts may have been sent.

This patch removes the conversion of ChunkReadTmeout to StopIteration.
The ChunkReadTimeout that is now raised prevents the
document_iters_to_multipart_byteranges 'for' loop completing and
therefore stops the multi-part terminator line being sent. It is
raised from the GetOrHeadHandler similar to other scenarios that raise
ChunkReadTimeouts while the resp body is being read.

A ChunkReadTimeout exception handler is removed in the
_iter_parts_from_response method. This handler was previously never
reached (because StopIteration rather than ChunkReadTimeout was raised
from _get_next_response_part), but if it were reached (i.e. with this
change) then it would repeat logging of the error and repeat
incrementing the node's error counter.

This change in the GetOrHeadHandler mimics a similar change in the
ECFragGetter [1].

[1] Related-Chage: I0654815543be3df059eb2875d9b3669dbd97f5b4
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Change-Id: I6dd53e239f5e7eefcf1c74229a19b1df1c989b4a
2024-02-05 10:28:40 +00:00
Alistair Coles a16e1f55a7 Improve unit tests for proxy GET ChunkReadTimeouts
Unit test changes only:

- Add tests for some resuming replicated GET scenarios.

- Add test to cover resuming GET fast_forward "failing" when range
  read is complete.

- Add test to verify different node_timeout for account and container
  vs object controller getters.

- Refactor proxy.test_server.py tests to split out different
  scenarios.

Drive-by: remove some ring device manipulation setup that's not needed.

Change-Id: I38c7fa648492c9bd2173ecf92f89e423bee4abf3
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
2024-01-25 14:13:48 +00:00
Zuul 966340aeed Merge "Remove per-service auto_create_account_prefix" 2023-12-01 01:48:57 +00:00
indianwhocodes e375ce5bf0 replication network aware helper for container-updater in proxy
Change-Id: I303a1ff97325654478c2a47af056deda37696b7b
2023-11-22 17:47:38 +00:00
Takashi Kajinami 49b19613d2 Remove per-service auto_create_account_prefix
The per-service option was deprecated almost 4 years ago[1].

[1] 4601548dab

Change-Id: I45f7678c9932afa038438ee841d1b262d53c9da8
2023-11-22 01:58:03 +09:00
Alistair Coles 965908d753 proxy: emit metric when updating namespaces memcache set fails
Previously, if a memcache set command had an unexpected response then
an error was logged but no exception raised from MemcacheRing.set.
With this patch, when MemcacheRing.set fails in this way, a
MemcacheConnectionError will be raised if 'raise_on_error' is
True. This exception is now handled when the proxy tries to set
updating namespaces in memcache, and a counter metric emitted with
name 'object.shard_updating.cache.set_error'. If the set is successful
an 'object.shard_updating.cache.set' counter metric is emitted.

The logging of MemcacheRing errors is improved to include the memcache
key being set, in line with exception logging.

Change-Id: Icfc97751c80f4bb4a3373796c01bff6ed5843937
2023-10-26 18:15:08 +01:00
Tim Burke 9f385c07f3 proxy: Get rid of MetricsPrefixLoggerAdapter
It adds another layer of indirection and state for the sake of labeling;
longer term it'll be easier to be explicit at the point of emission.

Related-Change: I0522b1953722ca96021a0002cf93432b973ce626
Change-Id: Ieebafb19c3fa60334aff2914ab1ae70b8f140342
2023-08-21 13:50:24 -07:00
Tim Burke e5d730dc56 proxy: Get rid of iter_nodes helper
All it did was proxy through to NodeIter, anyway.

Change-Id: Ifec8d3a40f00141a73f6e50efe0b53b382ab2ef3
2023-08-16 10:16:18 -07:00
Zuul a57479a023 Merge "Add FakeStatsdClient to unit tests" 2023-08-08 20:40:33 +00:00
Zuul e81f0657db Merge "Save some effort when discovering used_source_etag" 2023-08-08 13:11:23 +00:00
Tim Burke fe09ef2581 Save some effort when discovering used_source_etag
Change-Id: I35d682e58419f141429877ac6a6964132f7e658b
2023-08-07 12:16:32 +01:00
Matthew Oliver 00bfc425ce Add FakeStatsdClient to unit tests
Currently we simply mock calls in the FakeLogger for calls statsd calls,
and there are also some helper methods for counting and collating
metrics that were called. This Fakelogger is overloaded and doesn't
simulate the real world.
In real life we use a Statsdclient that is attached to the logger.

We've been in the situation where unit tests pass but the statsd client
stacktraces because we don't actually fake the statsdclient based off
the real one and let it's use its internal logic.

This patch creates a new FakeStatsdClient that is based off the real
one, this can then be used (like the real statsd client) and attached to
the FakeLogger.
There is quite a bit of churn in tests to make this work, because we now
have to looking into the fake statsd client to check the faked calls
made.
The FakeStatsdClient does everything the real one does, except overrides
the _send method and socket creation so no actual statsd metrics are
emitted.

Change-Id: I9cdf395e85ab559c2b67b0617f898ad2d6a870d4
2023-08-07 10:10:45 +01:00
Jianjian Huo bc300a516b proxy: add new metrics to account/container_info cache for skip/miss
This patch will add more granularity to metrics of account_info or
container_info cache and related backend lookups.

Before this patch, related metrics are:
  1.account.info.cache.[hit|miss|skip]
  2.container.info.cache.[hit|miss|skip]

With this patch, they are going to become:
  1.account/container.info.infocache.hit
    cache hits with infocache.
  2.account/container.info.cache.hit
    cache hits with memcache.
  3.account/container.info.cache.[miss|skip|disabled]
                    .<status_int>
    Those are operations made to backend due to below reasons.
      miss: cache misses.
      skip: the selective skips per skip percentage config.
      disabled: memcache is disabled.
    For each kind of operation metrics, suffix <status_int> will
    count operations with different status. Then a sum of all
    status sub-metrics will the total metrics of that operation.

UpgradeImpact
=============
Metrics dashboard will need updates to display those changed metrics
correctly, also some infocache metrics are newly added, please see
above message for all changes needed.

Change-Id: I60a9f1c349b4bc78ecb850fb26ae56eb20fa39c6
2023-08-02 11:12:01 -07:00
Tim Burke c51e81f640 proxy: Bring back logging/metrics for get_*_info requests
A while back, we changed get_account_info and get_container_info to
call the proxy-server app directly, rather than whatever was right
of the current middleware. This reduced backend request amplification
on cache misses.

However, it *also* meant that we stopped emitting logs or metrics in
the proxy for these backend requests. This was an unfortunate and
unintended break from earlier behavior.

Now, extend the middleware decorating we're doing in loadapp() to
include a "logged app", i.e., the app wrapped by it's right-most
proxy-logging middleware. If there is not logging middleware (such
as would happen for the backend servers), the "logged app" will be
the main app. Make account and container info requests through
*that* app, so we get logging and metrics again.

Closes-Bug: #2027726
Related-Change: I49447c62abf9375541f396f984c91e128b8a05d5
Change-Id: I3f531f904340e4c8407185ed64b41d7d614a308a
2023-08-01 15:58:58 -07:00
Zuul 42f76159ac Merge "Fix handling of non-ASCII accounts" 2023-07-05 22:48:12 +00:00
Zuul d952951347 Merge "move test_GET_pipeline to BaseTestObjectController" 2023-06-23 23:41:03 +00:00
Tim Burke b46b735a3e Fix handling of non-ASCII accounts
Related-Change: I4ecfae2bca6ffa08ad15e584579ebce707f4628d
Related-Change: I1e244c231753b8f4b6f1cf95cb0ae4c3c959ae0f
Change-Id: Ia386736b9b283858931794690538871b6e1ad9c8
2023-06-13 15:28:41 -07:00
Jianjian Huo f150a5035c proxy: print logs when write new shard range cache into memcache.
Change-Id: I0e13e0148a6e19fd6d0a9438604aa791f948502a
2023-06-13 11:18:18 -07:00
Clay Gerrard c4c9d5a40b move test_GET_pipeline to BaseTestObjectController
Change-Id: I1f26483dbc5935052ec8df0c874ee38070b84d5b
2023-05-12 12:24:42 -05:00
Shreeya Deshpande 647ee83906 Unit test for keepalive timeout
Create a unit test to verify client timeout for multiple requests

Change-Id: I974e01cd2cb18f4ea87c3966dbf4b06bff22ed39
2023-05-10 09:01:41 -07:00
Jianjian Huo 6ff90ea73e Proxy: restructure cached updating shard ranges
Restructure the shard ranges that are stored in memcache for
object updating to only persist the essential attributes of
shard ranges in memcache (lower bounds and names), so the
aggregate of memcache values is much smaller and retrieval
will be much faster too.

Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Co-Authored-By: Tim Burke <tim.burke@gmail.com>

UpgradeImpact
=============
The cache key for updating shard ranges in memcached is renamed
from 'shard-updating/<account>/<container>' to
'shard-updating-v2/<account>/<container>', and cache data is
changed to be a list of [lower bound, name]. As a result, this
will invalid all existing updating shard ranges stored in the
memcache cluster.

Change-Id: If98af569f99aa1ac79b9485ce9028fdd8d22576b
2023-03-06 22:20:02 -08:00
Tim Burke b68cc893f7 proxy: Reduce round-trips to memcache and backend on info misses
Following a memcache restart in a SAIO, I've seen the following happen
during an object HEAD:

- etag_quoter wants to get account/container info to decide whether to
  quote-wrap or not
- account info is a cache miss, so we make a no-auth'ed HEAD to the next
  filter in the pipeline
- eventually this gets down to ratelimit, which *also* wants to get
  account info
- still a cache miss, so we make a *separate* HEAD that eventually talks
  to the backend and populates cache
- ratelimit realizes it can't ratelimit the request and lets the
  original HEAD through to the backend

There's a related bug about how something similar can happen when the
backend gets overloaded, but *everything is working* -- we just ought to
be talking straight to the proxy app.

Note that there's likely something similar going on with container info,
but the hardcoded 10% sampling rate makes it harder to see if you're
monitoring raw metric streams.

I thought I fixed this in the related change, but no :-/

Change-Id: I49447c62abf9375541f396f984c91e128b8a05d5
Related-Change: If9249a42b30e2a2e7c4b0b91f947f24bf891b86f
Related-Bug: #1883214
2023-03-01 15:02:42 -08:00
indianwhocodes 9ec90d4d56 proxy-server exception logging shows replication_ip/port
Adding a "use_replication" field to the node dict, a helper function to
set use_replication dict value for a node copy by looking up the header
value for x-backend-use-replication-network

Change-Id: Ie05af464765dc10cf585be851f462033fc6bdec7
2023-02-10 09:34:59 +00:00
Alistair Coles 37ba5577a7 Delete unused FakeObjectController
The FakeObjectController class has not been used for since [1].

[1] Related-Change: Ib3b3830c246816dd549fc74be98b4bc651e7bace

Change-Id: I338f89a7862b3b423f229655dcf83ddd3b0b2758
2023-02-09 13:33:37 +00:00
Jianjian Huo 2f32d07707 swift_proxy: add memcache skip success/error stats for shard range.
This patch will add more granularity to shard operation cache or
backend metrics, and then remove some of existing and duplicated
metrics.

Before this patch, related metrics are:
  1.shard_<op>.cache.[hit|miss|skip]
  2.shard_<op>.backend.<status_int>
where op is 'listing' or 'updating'.

With this patch, they are going to become:
  1.shard_<op>.infocache.hit
    cache hits with infocache.
  2.shard_<op>.cache.hit
    cache hits with memcache.
  3.shard_<op>.cache.[miss|bypass|skip|force_skip|disabled|error]
                    .<status_int>
    Those are operations made to backend due to below reasons.
      miss: cache misses.
      bypass: metadata didn't support a cache lookup
      skip: the selective skips per skip percentage config.
      force_skip: the request with 'x-newest' header.
      disabled: memcache is disabled.
      error: memcache connection error.
    For each kind of operation metrics, suffix <status_int> will
    count operations with different status. Then a sum of all
    status sub-metrics will the total metrics of that operation.

UpgradeImpact
=============
Metrics dashboard will need updates to display those changed metrics
correctly, also infocache metrics are newly added, please see above
message for all changes needed.

Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: Ib8be30d3969b4b4808664c43e94db53d10e6ef4c
2023-01-20 08:37:06 -08:00
Tim Burke 69b18e3c50 tests: Remove references to soft_lock
As best as I can tell, this has *never* been an interface.

Change-Id: I42e4b82a7af8a81e497e68ad25ac3bc4d0d74970
2023-01-11 14:05:34 -08:00
Tim Burke f6ac7d4491 Tolerate absolute-form request targets
We've seen S3 clients expecting to be able to send request lines like

    GET https://cluster.domain/bucket/key HTTP/1.1

instead of the expected

    GET /bucket/key HTTP/1.1

Testing against other, independent servers with something like

    ( echo -n $'GET https://www.google.com/ HTTP/1.1\r\nHost: www.google.com\r\nConnection: close\r\n\r\n' ; sleep 1 ) | openssl s_client -connect www.google.com:443

suggests that it may be reasonable to accept them; the RFC even goes so
far as to say

> To allow for transition to the absolute-form for all requests in some
> future version of HTTP, a server MUST accept the absolute-form in
> requests, even though HTTP/1.1 clients will only send them in
> requests to proxies.

(See https://datatracker.ietf.org/doc/html/rfc7230#section-5.3.2)

Fix it at the protocol level, so everywhere else we can mostly continue
to assume that PATH_INFO starts with a / like we always have.

Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: I04012e523f01e910f41d5a41cdd86d3d2a1b9c59
2023-01-03 12:49:30 -08:00
Tim Burke 597887dedc Extract SwiftHttpProtocol to its own module
Change-Id: I35cade2c46eb6acb66c064cde75d78173f46864c
2022-12-06 11:15:53 -08:00
Alistair Coles 7a0118fca0 proxy-server: include suppression time in error limit log
Also, add some more test coverage and tidy up some tests.

Change-Id: Icf3a1df87d52dc5f95e15e676da229a3323326a2
2022-11-21 10:52:11 +00:00
Jianjian Huo 38124221d7 Proxy: add metrics related to error limiter.
There are three new metrics added:
 1. 'error_limiter.incremented_limit', when one node has accumulated
    enough errors to trigger error suppression.
 2. 'error_limiter.forced_limit', when one node runs into serious
    error like Insufficient Storage.
 3. 'error_limiter.is_limited', when one node is error limited
    during node selection process due to suppression_limit.

UpgradeImpact:
There is change of log format when 'ERROR Insufficient Storage' is
raised, from '%(msg)s %(node)s' to
'Node will be error limited from now: %s, error: %s'.

Change-Id: Id9bec9b46dad82163517fd71cfdda5b751464588
2022-11-14 17:08:17 -08:00
Alistair Coles 623024848e proxy: extract response error handling to single method
This patch introduces a check_response method to proxy server
Application class. The method provides common checking and reporting
of backend response errors (i.e. responses with status code >=
500). This checking was previously duplicated in proxy controller
classes.

The formatting and content of error logging messages is now more
unform, which does result in changes to some logged error messages.

Change-Id: I4b935a47f39642c11364f2a8d50a5f2ebb3b84f1
2022-11-07 20:22:39 +00:00
Alistair Coles 51730f1273 proxy: refactor error limiter to a class
This patch pulls the proxy's error limiting code into an ErrorLimiter
class that can be used to provide error limiting.

Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Change-Id: Ie15d831278a55ffab156ddce04544317e451407d
2022-10-26 11:01:03 +01:00
Zuul 8ab6af27c5 Merge "proxy: Add a chance to skip memcache for get_*_info calls" 2022-09-26 19:08:11 +00:00
indianwhocodes 9bdb77a388 Fixes Storage-Policy AttributeError in proxy
Closes-Bug: #1989140
Change-Id: I99e4a1daf550d492b28c8a82c0b7a81b2f73dffe
2022-09-09 22:12:32 -07:00
Tim Burke 5c6407bf59 proxy: Add a chance to skip memcache for get_*_info calls
If you've got thousands of requests per second for objects in a single
container, you basically NEVER want that container's info to ever fall
out of memcache. If it *does*, all those clients are almost certainly
going to overload the container.

Avoid this by allowing some small fraction of requests to bypass and
refresh the cache, pushing out the TTL as long as there continue to be
requests to the container. The likelihood of skipping the cache is
configurable, similar to what we did for shard range sets.

Change-Id: If9249a42b30e2a2e7c4b0b91f947f24bf891b86f
Closes-Bug: #1883324
2022-08-30 18:49:48 +10:00
Tim Burke 9bed525bfb memcached: Give callers the option to accept errors
Auth middlewares in particular may want to *know* when there's a
communication breakdown as opposed to a cache miss.

Update our shard-range cache stats to acknowlegde the distinction.

Drive-by: Log an error if all memcached servers are error-limited.

Change-Id: Ic8d0915235d11124d06ec940c5be9a2edbe85c83
2022-04-28 13:20:44 -07:00
Tim Burke bab7f93223 cors: Include `Vary: Origin` when using the request's Origin
Otherwise, multiple frontends attempting to use the same data may get
denials because the browser served a cached response from when it used a
different origin.

Change-Id: I6ec8b8ceb8c6a58e74772e57e6fe5700f6ff8db1
2022-03-09 22:18:55 -08:00
Alistair Coles eda7d5fe3c Deprecate LogAdapter.set_statsd_prefix
Previously, the set_statsd_prefix method was used to mutate a logger's
StatsdClient tail prefix after a logger was instantiated. This pattern
had led to unexpected mutations (see Related-Change). The tail_prefix
can now be passed as an argument to get_logger(), and is then
forwarded to the StatsdClient constructor, for a more explicit
assignment pattern.

The set_statsd_prefix method is left in place for backwards
compatibility. A DeprecationWarning will be raised if it is used
to mutate the StatsdClient tail prefix.

Change-Id: I7692860e3b741e1bc10626e26bb7b27399c325ab
Related-Change: I0522b1953722ca96021a0002cf93432b973ce626
2022-02-07 17:46:06 +00:00
Zuul e8cecf7fcc Merge "Move *_swift_info functions into a new registry module" 2022-02-04 07:01:15 +00:00
Zuul 8ea49a1def Merge "Fix statsd prefix mutation in proxy controllers" 2022-02-04 06:56:53 +00:00
Matthew Oliver 589ac355f3 Move *_swift_info functions into a new registry module
The *_swift_info functions use in module global dicts to provide a
registry mechanism for registering and getting swift info.

This is an abnormal pattern and doesn't quite fit into utils. Further
we looking at following this pattern for sensitive info to trim in the
future.
So this patch does some house cleaning and moves this registry to a new
module swift.common.registry. And updates all the references to it.

For backwards compat we still import the *_swift_info methods into utils
for any 3rd party tools or middleware.

Change-Id: I71fd7f50d1aafc001d6905438f42de4e58af8421
2022-02-03 14:41:13 +00:00
Tim Burke 9bc1c008a5 Get rid of pipeline_property
Instead, ensure every middleware gets a reference to the final WSGI
application. Note that this reimplements much of paste.deploy's pipeline
handling, but that code hasn't changed in years, anyway.

Change-Id: I2fbb21cabf72849ce84760a6d2607aa2af67f286
2022-01-27 14:40:27 -08:00
Zuul 4d48004483 Merge "proxy: Add a chance to skip memcache when looking for shard ranges" 2022-01-27 21:37:03 +00:00
Alistair Coles 6942b25cc1 Fix statsd prefix mutation in proxy controllers
Swift loggers encapsulate a StatsdClient that is typically initialised
with a prefix, equal to the logger name (e.g. 'proxy_server'), that is
prepended to metrics names. The proxy server would previously mutate
its logger's prefix, using its set_statsd_prefix method, each time a
controller was instantiated, extending it with the controller type
(e.g. changing the prefix 'proxy_server.object'). As a result, when an
object request spawned container subrequests, for example, the statsd
client would be left with a 'proxy_server.container' prefix part for
subsequent object request related metrics.

The proxy server logger is now wrapped with a new
MetricsPrefixLoggerAdapter each time a controller is instantiated, and
the adapter applies the correct prefix for the controller type for the
lifetime of the controller.

Change-Id: I0522b1953722ca96021a0002cf93432b973ce626
2022-01-27 09:06:15 -08:00
Tim Burke 8c6ccb5fd4 proxy: Add a chance to skip memcache when looking for shard ranges
By having some small portion of calls skip cache and go straight to
disk, we can ensure the cache is always kept fresh and never expires (at
least, for active containers). Previously, when shard ranges fell out of
cache there would frequently be a thundering herd that could overwhelm
the container server, leading to 503s served to clients or an increase
in async pendings.

Include metrics for hit/miss/skip rates.

Change-Id: I6d74719fb41665f787375a08184c1969c86ce2cf
Related-Bug: #1883324
2022-01-26 18:15:09 +00:00
Zuul 4606911010 Merge "Modify log_name in internal clients' pipeline configs" 2022-01-26 12:32:43 +00:00
Alistair Coles 035d91dce5 Modify log_name in internal clients' pipeline configs
Modify the 'log_name' option in the InternalClient wsgi config for the
following services: container-sharder, container-reconciler,
container-deleter, container-sync and object-expirer.

Previously the 'log_name' value for all internal client instances
sharing a single internal-client.conf file took the value configured
in the conf file, or would default to 'swift'. This resulted in no
distinction between logs from each internal client, and no association
with the service using a particular internal client.

With this change the 'log_name' value will typically be <log_route>-ic
where <log_route> is the service's conf file section name. For
example, 'container-sharder-ic'.

Note: any 'log_name' value configured in an internal client conf file
will now be ignored for these services unless the option key is
preceded by 'set'.

Note: by default, the logger's StatdsClient uses the log_name as its
tail_prefix when composing metrics' names. However, the proxy-logging
middleware overrides the tail_prefix with the hard-coded value
'proxy-server'. This change to log_name therefore does not change the
statsd metric names emitted by the internal client's proxy-logging.

This patch does not change the logging of the services themselves,
just their internal clients.

Change-Id: I844381fb9e1f3462043d27eb93e3fa188b206d05
Related-Change: Ida39ec7eb02a93cf4b2aa68fc07b7f0ae27b5439
2022-01-12 11:07:25 +00:00
Tim Burke f7101f3795 tests: Unify FakeMemcaches
Change-Id: I114d1628bb6dea04f246ff3ab12f4ccfdc4ec358
2022-01-06 10:13:15 -08:00
Alistair Coles 40e0f577a9 Add stats for shard range cache hits/misses
Add some stats to indicate when the proxy fails to get shard ranges
into memcache for object updates.

Change-Id: I63332b3794a1a61a512a70aad240459fec4810f3
2021-12-17 17:45:13 -08:00