Commit Graph

100 Commits

Author SHA1 Message Date
Tim Burke 307315bde2 docs: Move metric name/description tables out to separate page(s)
Offer it both by service and as a single, more easily searchable, page.

That admin guide is *still* too long, but this should help a bit.

Change-Id: I946c72f40dce2f33ef845a0ca816038727848b3a
2023-05-30 11:38:42 -07:00
Tim Burke 52a4fe37aa Various doc formatting cleanups
* Get rid of a bunch of accidental blockquote formatting
* Always declare a lexer to use for ``.. code::`` blocks

Change-Id: I8940e75b094843e542e815dde6b6be4740751813
2022-08-02 14:28:36 -07:00
Matthew Oliver 7a105b5ef0 Add and pipe reconstructor stats through recon
This patch plumbs the object-reconstructor stats that are dropped
into recon cache out through the middleware and swift-recon tool.

This adds a '/recon/reconstruction/object' to the middleware. As such
the swift-recon tool has grown a '-R' or '--reconstruction' option
access this data from each node.

Plus some tests and documentation updates.

Change-Id: I98582732ca5ccb2e7d2369b53abf9aa8c0ede00c
2021-08-20 00:03:40 +00:00
Tim Burke 314347a3cb Update SAIO & docker image to use 62xx ports
Note that existing SAIOs with 60xx ports should still work fine.

Change-Id: If5dd79f926fa51a58b3a732b212b484a7e9f00db
Related-Change: Ie1c778b159792c8e259e2a54cb86051686ac9d18
2020-07-20 15:17:12 -07:00
Darrell Bishop 1107f24179 Seamlessly reload servers with SIGUSR1
Swift servers can now be seamlessly reloaded by sending them a SIGUSR1
(instead of a SIGHUP).  The server forks off a synchronized child to
wait to close the old listen socket(s) until the new server has started
up and bound its listen socket(s).  The new server is exec'ed from the
old one so its PID doesn't change.  This makes Systemd happier, so a
ReloadExec= stanza can now be used.

The seamless part means that incoming connections will alwyas get
accepted either by the old server or the new one.  This eliminates
client-perceived "downtime" during server reloads, while allowing the
server to fully reload, re-reading configuration, becoming a fresh
Python interpreter instance, etc.  The SO_REUSEPORT socket option has
already been getting used, so nothing had to change there.

This patch also includes a non-invasive fix for a current eventlet bug;
see https://github.com/eventlet/eventlet/pull/590
That bug prevents a SIGHUP "reload" from properly servicing existing
requests before old worker processes close sockets and exit.  The
existing probtests missed this, but the new ones, in this patch, caught
it.

New probe tests cover both old SIGHUP "reload" behavior as well as the
new SIGUSR1 seamless reload behavior.

Change-Id: I3e5229d2fb04be67e53533ff65b0870038accbb7
2019-11-07 10:15:26 -08:00
Alexandra Settle 0c16fd9536 Fixing broken links
Small changes, but helpful, mostly.

Backport: stein rocky

Change-Id: Ic4b6524d7804d2f74b2973b6acdb9e2679209cd4
2019-08-16 11:45:52 +00:00
Samuel Merritt 8e651a2d3d Add fallocate_reserve to account and container servers.
The object server can be configured to leave a certain amount of disk
space free; default is 1%. This is useful in avoiding 100%-full
filesystems, as those can get Swift in a state where the filesystem is
too full to write tombstones, so you can't delete objects to free up
space.

When a cluster has accounts/containers and objects on the same disks,
then you can wind up with a 100%-full disk since account and container
servers don't respect fallocate_reserve. This commit makes account and
container servers respect fallocate_reserve so that disks shared
between account/container and object rings won't get 100% full.

When a disk's free space falls below the configured reserve, account
and container PUT, POST, and REPLICATE requests will fail with a 507
status code. These are the operations that can significantly increase
the disk space used by a given database.

I called the parameter "fallocate_reserve" for consistency with the
object server. No actual fallocate() call happens under Swift's
control in the account or container servers (sqlite3 might make such a
call, but it's out of our hands).

Change-Id: I083442eef14bf83c0ea717b1decb3e6b56dbf1d0
2018-07-18 17:27:11 +10:00
chengebj5238 222df91857 Modify redirection URL and broken URL
Change-Id: I9a04cb2fbe61e1fbd8185ab2fac9abbcea4d55cc
2018-01-18 17:05:10 +08:00
Clay Gerrard 7013e70ca6 Represent dispersion worse than one replicanth
With a sufficiently undispersed ring it's possible to move an entire
replicas worth of parts and yet the value of dispersion may not get any
better (even though in reality dispersion has dramatically improved).
The problem is dispersion will currently only represent up to one whole
replica worth of parts being undispersed.

However with EC rings it's possible for more than one whole replicas
worth of partitions to be undispersed, in these cases the builder will
require multiple rebalance operations to fully disperse replicas - but
the dispersion value should improve with every rebalance.

N.B. with this change it's possible for rings with a bad dispersion
value to measure as having a significantly smaller dispersion value
after a rebalance (even though they may not have had their dispersion
change) because the total amount of bad dispersion we can measure has
been increased but we're normalizing within a similar range.

Closes-Bug: #1697543

Change-Id: Ifefff0260deac0c3e8b369a1e158686c89936686
2017-12-28 11:16:17 -08:00
junboli df00122e74 doc migration: update the doc link address[2/3]
Update the doc link brought by the doc migration.
Although we had some effort to fix these, it still left lots of bad
doc link, I separate these changes into 3 patches aim to fix all of
these, this is the 2st patch for doc/manpages.

Change-Id: Id426c5dd45a812ef801042834c93701bb6e63a05
2017-09-15 06:31:00 +00:00
shangxiaobj c93c0c0c6e [Trivialfix]Fix typos in swift
Fix typos that found in swift.

Change-Id: I52fad1a4882cec4456f22174b46d54e42ec66d97
2017-08-04 07:50:10 +00:00
Tim Burke 13a07aa77a Misc doc cleanup
* Change some absolute URLs to internal links
* Fix some bulletted list indentation
* Choose a better lexer for some syntax highlighting
* Use ``inline code`` instead of `italics` for some example command
  lines
* Change some quoted paragraphs that only included inlined code to be
  proper code blocks

Change-Id: Iaaa7eefb690122f5af9dcb1c871358c22335c743
2017-07-12 12:14:45 -07:00
lijunbo 21396bc106 keep consistent naming convention of swift and urls
Change-Id: Iddd4f69abf77a5c643ce8b164fc6cfd72c068229
2017-03-23 02:28:41 +00:00
Jenkins 8ed8077a04 Merge "Add missing expirer recon metric to admin_guide" 2016-12-01 17:52:27 +00:00
Alistair Coles 463e22a314 Add missing expirer recon metric to admin_guide
Add expirer/object to recon metrics, which reports such as:

$ curl -s http://localhost:6010/recon/expirer/object
{"object_expiration_pass": 0.19765901565551758, "expired_last_pass": 1}

Change-Id: Ia9a171c09efebe5ad56c9de2952a8f29188c4970
2016-12-01 10:32:19 +00:00
Jenkins b4fd962cad Merge "Add missing recon metrics to admin_guide" 2016-12-01 10:04:44 +00:00
Ondřej Nový 9847796f01 Set owner of drive-audit recon cache to swift user
Fixies this problem:
* swift-drive-audit needs to be run by root, because only root have
  "umount" permission
* swift-object servers typically runs as user swift
* if swift-drive-audit is run by root, /var/cache/swift/drive.recon is
  owned by root, with 0o600
* recon middleware (inside swift-object-server) can't read this cache
  file: swift-object: Error reading recon cache file

This patch adds "user" option to drive-audit config file. Recon cache
is chowned to this user.

Change-Id: Ibf20543ee690b7c5a37fabd1540fd5c0c7b638c9
2016-10-19 17:16:42 +00:00
Jenkins 6daa382c34 Merge "Revises 'url' to 'URL' and 'json' to 'JSON'" 2016-10-06 00:23:41 +00:00
Yushiro FURUKAWA 9b98c89983 Revises 'url' to 'URL' and 'json' to 'JSON'
Change-Id: I44743fbb9bcbce3a50ed6770264ba0f4b17803d7
2016-09-30 22:21:03 +09:00
zheng yin 05642d2958 fix word spelling mistake
Change-Id: Ia7b03e52b8d6a334fc2b67c94912effe0e659941
2016-09-30 16:43:54 +08:00
Kota Tsuyuzaki dfa5523d8c Add Pros/Cons docs for global cluster consideration
This comes from discussion in Bristol Hackathon (Feb 2016).
Currently Swift has a couple of choices (Global Cluster and Container
Sync) to sync the stored data into geographically distributed locations.

This patch adds the summary of the discussion comparing between
Global Cluster and Container Sync to enable operators to know which
functionality fits their own use case.

And, to be fairness with container-sync, this patch moves global
cluster docs into overview_global_cluster.rst from admin_guide.rst.

Co-Authored-By: Alistair Coles <alistair.coles@hpe.com>

Change-Id: I624eb519503ae71dbc82245c33dab6e8637d0f8b
2016-08-17 12:52:25 +01:00
Christian Schwede 699953508a Add doc entry to check partition count
An high or increasing partition count due to storing handoffs can have
some severe side-effects, and replication might never be able to catch
up. This patch adds a note to the admin_guide how to check this.

Change-Id: Ib4e161d68f1a82236dbf5fac13ef9a13ac4bbf18
2016-07-26 12:23:54 +02:00
Christian Schwede b5a16beb38 Add missing recon metrics to admin_guide
Change-Id: Ibd484e088c915269a46f5fffe3ce627a80b3418e
2016-07-17 14:31:37 +00:00
Jenkins 11c5ef7d22 Merge "[Docs] Document prevention of disk full scenarios" 2016-06-08 21:51:02 +00:00
Nelson Almeida daae74ca65 Adding sorting_method to admin_guide
Change-Id: I1162f154e3a577a95f9f5ea0e0f723b7df5a4baf
2016-06-01 17:29:10 -03:00
Clay Gerrard b52eccb3b1 Clarify overload best practices in admin guide
Change-Id: Ib7c08bdeab6374771bb8e2b05053e7e16973524d
2016-05-25 11:21:25 -07:00
Christian Schwede f1fd50723b Add dispersion --verbose example to admin guide
Change-Id: I5f9cacedde2a329332ccf744800b6f2453e8b28e
2016-05-25 09:53:33 +02:00
Matthew Oliver b3ab715c05 Add ring-builder dispersion command to admin guide
This change updates the admin guide to point out the dispersion command
in swift-ring-builder and mentions the dispersion verbose table to make
it more obvious to operators.

Change-Id: I72b4c8b2d718e6063de0fdabbaf4f2b73694e0a4
2016-05-25 14:35:54 +10:00
Andy McCrae efdf123a40 [Docs] Document prevention of disk full scenarios
Adds section to detail how to prevent disk full scenarios from
occurring.

Change-Id: Iafb4a47fa4892f6067252f3a80de87cd76506a40
2016-05-16 10:09:33 +00:00
Shashirekha Gundur cf48e75c25 change default ports for servers
Changing the recommended ports for Swift services
from ports 6000-6002 to unused ports 6200-6202;
so they do not conflict with X-Windows or other services.

Updated SAIO docs.

DocImpact
Closes-Bug: #1521339
Change-Id: Ie1c778b159792c8e259e2a54cb86051686ac9d18
2016-04-29 14:47:38 -04:00
Donagh McCabe e38b53393f Cleanup of Swift Ops Runbook
This patch cleans up some rough edges that were left (due to
time constraints) in the original commit.

Change-Id: Id4480be8dc1b5c920c19988cb89ca8b60ace91b4
Co-Authored-By: Gerry Drudy gerry.drudy@hpe.com
2016-03-10 17:39:54 +00:00
Christian Schwede 043fbca6d0 Remove Erasure Coding beta status from docs
This removes notes stating support for Erasure coding as beta. Questions
regarding the stability of EC are coming up regularly, and are often referring
to the docs that state EC as still in beta.

Besides this, a note marking statsd support as beta has been removed as well.

Change-Id: If4fb6a5c4cb741d42953db3cee8cb17a1d774e15
2016-03-04 14:27:23 +00:00
Jenkins eaf6af3179 Merge "Allow IPv6 addresses/hostnames in StatsD target" 2016-02-04 03:23:01 +00:00
Darrell Bishop 26327e1e8b Allow IPv6 addresses/hostnames in StatsD target
The log_statsd_host value can now be an IPv6 address or a hostname
which only resolves to an IPv6 address.  In both cases, the new
behavior is to use an AF_INET6 socket on which .sendto() is called
with the originally-configured hostname (or IP).  This means the
Swift process is not caching a DNS resolution for the lifetime of
the process (a good thing).

If a hostname resolves to both an IPv6 or IPv4 address, an AF_INET
socket is used (i.e. only the IPv4 address will receive the UDP
packet).

The old behavior is preserved: any invalid IP address literals and
failures in DNS resolution or actual StatsD packet sending do not
halt the process or bubble up; they are caught, logged, and
otherwise ignored.

Change-Id: Ibddddcf140e2e69b08edf3feed3e9a5fa17307cf
2016-02-03 00:26:31 -08:00
HugoKuo e75888b281 Add more description for write_affinity_node_count parameter in the doc.
Change-Id: Iad410a2be4f9a2cd5c53e860b9f91993aa7f2369
Closes-Bug: #1531173
2016-01-06 14:33:23 +08:00
Ondřej Nový e0430fc74a Compare Swift config checksum in swift-recon --all
Change-Id: I796fe0895f4e5ddeb04c0d79a73579ce8bb9aa40
2015-11-05 21:21:21 +01:00
Paul Dardeau 73e032049f Update admin guide with region.
Added region prefix to example commands for adding devices to ring.
Also updates description to include region prefix.

Change-Id: Ie6d6485b497cea973e37909b5b19b44946c8aa89
2015-10-23 18:20:25 +00:00
Jenkins 63ab40db9a Merge "Improving statistics sent to Graphite." 2015-09-09 07:12:01 +00:00
Carlos Cavanna 4765189ef3 Improving statistics sent to Graphite.
Currently, statistics are organized by command. However, it would also be
useful to display statistics organized by policy. Different policies may be
based on different storage properties (ie, faster disks).
With this change, all the statistics for object timers will be sent per policy
as well.
Policy statistics reporting will use policy index and the name in Graphite will
show as proxy-server.object.policy.<policy-index>.<verb>, etc.
Updated unit tests for per-policy stat reporting and added new unit tests for
invalid cases.
Updated documentation in the Administrator's Guide to reflect this new
aggregation.

Change-Id: Id70491e4833791a3fb8ff385953d69018514cd9c
2015-08-21 13:45:00 -04:00
Hisashi Osanai 79ba4a8598 Enable Object Replicator's failure count in recon
This patch makes the count of object replication failure in recon.
And "failure_nodes" is added to Account Replicator and
Container Replicator.

Recon shows the count of object repliction failure as follows:
$ curl http://<ip>:<port>/recon/replication/object
{
    "replication_last": 1416334368.60865,
    "replication_stats": {
        "attempted": 13346,
        "failure": 870,
	"failure_nodes": {
            "192.168.0.1": {"sdb1": 3},
            "192.168.0.2": {"sdb1": 851,
                            "sdc1": 1,
                            "sdd1": 8},
            "192.168.0.3": {"sdb1": 3,
                            "sdc1": 4}
	},
        "hashmatch": 0,
        "remove": 0,
        "rsync": 0,
        "start": 1416354240.9761429,
        "success": 1908
    },
    "replication_time": 2316.5563162644703,
    "object_replication_last": 1416334368.60865,
    "object_replication_time": 2316.5563162644703
}

Note that 'object_replication_last' and 'object_replication_time' are
considered to be transitional and will be removed in the subsequent
releases. Use 'replication_last' and 'replication_time' instead.

Additionaly this patch adds the count in swift-recon and it will be
showed as follows:
$ swift-recon object -r
========================================================================
=======
--> Starting reconnaissance on 4 hosts
========================================================================
=======
[2014-11-27 16:14:09] Checking on replication
[replication_failure] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%,
no_result: 0, reported: 4
[replication_success] low: 3, high: 3, avg: 3.0, total: 12,
Failed: 0.0%, no_result: 0, reported: 4
[replication_time] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%,
no_result: 0, reported: 4
[replication_attempted] low: 1, high: 1, avg: 1.0, total: 4,
Failed: 0.0%, no_result: 0, reported: 4
Oldest completion was 2014-11-27 16:09:45 (4 minutes ago) by
192.168.0.4:6002.
Most recent completion was 2014-11-27 16:14:19 (-10 seconds ago) by
192.168.0.1:6002.
========================================================================
=======

In case there is a cluster which has servers, a server runs with this
patch and the other servers run without this patch. If swift-recon
executes on the server which runs with this patch, there are unnecessary
information on the output such as [failure], [success] and [attempted].
Because other servers which run without this patch are not able to
send a response with information that this patch needs.
Therefore once you apply this patch, you also apply this patch to other
servers before you execute swift-recon.

DocImpact
Change-Id: Iecd33655ae2568482833131f422679996c374d78
Co-Authored-By: Kenichiro Matsuda <matsuda_kenichi@jp.fujitsu.com>
Co-Authored-By: Brian Cline <bcline@softlayer.com>
Implements: blueprint enable-object-replication-failure-in-recon
2015-08-18 11:40:02 +09:00
Jenkins 617c6b0107 Merge "Time synchronization check in recon." 2015-08-18 01:21:22 +00:00
Jenkins 57791b6cd2 Merge "+Document method to avoid rsync filling root drive" 2015-08-11 08:27:17 +00:00
Ben Martin 89f5906286 +Document method to avoid rsync filling root drive
When rsync pushes to a remote node with an unmounted drive and if
certain steps are not taken, rsync may attempt to write files to
the local drive at the location where the drive was mounted.

There are two suggested solutions for this issue:
  1) Set the permissions for all mount points in /srv/node/
       to root:root 755
  2) Mount the drives elsewhere and symlink the drives to /srv/.../

The first method ensures that only root and not the swift user
can write in the /srv/.../ directories.

The second method will prompt a broken link issue if rsync
attempts to write to an unmounted drive.

Change-Id: I60ce4ed9ef8401768d5f78b6806cbb2e2a65303e
Closes-Bug: #1470576
2015-08-05 09:29:07 -05:00
Jenkins e1683fdb2e Merge "Support keystone v3 domains in swift-dispersion" 2015-07-31 06:59:01 +00:00
Falk Reimann 363a256e58 Support keystone v3 domains in swift-dispersion
This provides the capability to specify a project_name,
project_domain_name and user_domain_name in /etc/swift/dispersion.conf.
If this values are set in dispersion.conf they get populated to the
swift-client.  With this it is possible to have a specific dispersion
project specified, which is not the keystone default domain.  Changes
were applied to swift-dispersion-populate and swift-dispersion-report.
Relevant man pages, the example dispersion.conf and the admin guide were
updated accordingly.

DocImpact
Closes-Bug: #1468374

Change-Id: I0e716f8d281b4d0f510bc568bcee4a13fc480ff7
2015-07-24 13:40:24 -05:00
Ondrej Novy dd2f1be3b1 Time synchronization check in recon.
This change add call time to recon middleware and param --time to
recon CLI. This is usefull for checking if time in cluster is
synchronized.

Change-Id: I62373e681f64d0bd71f4aeb287953dd3b2ea5662
2015-07-23 11:35:02 +02:00
paul luse e6165a7879 Add policy support to dispersion tools
Doesn't work for anything other than policy 0. updated to allow user
to specify policy name on cmd line (as with object-info) which
then makes populate/report work with 3x, 2x, or EC style policies

Change-Id: Ib7c298f0f6d666b1ecca25315b88539f45cf9f95
Closes-Bug: 1458688
2015-06-23 02:14:02 -07:00
Christian Schwede 55dd705a86 Add missing statsd metrics section for object-reconstructor
Change-Id: Id3f98e5f637ff537a387262b40f21c05876fca91
2015-05-06 19:53:09 +02:00
Samuel Merritt 8d3b3b2ee0 Add some debug output to the ring builder
Sometimes, I get handed a builder file in a support ticket and a
question of the form "why is the balance [not] doing $thing?". When
that happens, I add a bunch of print statements to my local
swift/common/ring/builder.py, figure things out, and then delete the
print statements. This time, instead of deleting the print statements,
I turned them into debug() calls and added a "--debug" flag to the
rebalance command in hopes that someone else will find it useful.

Change-Id: I697af90984fa5b314ddf570280b4585ba0ba363c
2015-03-30 17:47:28 -07:00
Shilla Saebi a1872b0498 Fix 2 typos in admin_guide file
Change-Id: Ibf1e5dbf6ff4747c7f23f6638321ab41bba3021b
2014-11-24 15:38:25 +00:00