Commit Graph

51 Commits

Author SHA1 Message Date
Clark Boylan 688dd78a08 Add more info to afs fileserver recovery docs
During the debian buster mirror cleanup we lost a volume backing afs on
afs01.dfw.openstack.org. Our existing docs gave us a good starting point
for recovery, but they could use more specifics. Add that info.

Change-Id: Ib334759314f0fd493e9b1bc8c06a8060ba8917ee
2024-03-04 13:48:25 -08:00
Tony Breeds a06d1281c4 [docs] Use RST url link syntax to improve layout
The URL for upstream's RT is quite long which makes then causes the
rendering to look
k
i
n
d
a strange[1,2]

Snip the URL but preserve the full URL as a hyperlink

[1] https://docs.opendev.org/opendev/system-config/latest/afs.html
[2] https://pasteboard.co/Joxai7GgRoLG.png

Change-Id: I2cd52c376e4935efed8f22779ae1722768bb8a6c
2023-05-19 12:38:24 +10:00
Ian Wienand a6ece2cacc
mirror-update: make jobs interactive by default
If you are running these jobs by hand you are doing something that
will be expected to take a long time (initial sync, recovery, etc.).
Make these scripts assume interactivity and default to *not* running
under timeout -- it's too easy to forget NO_TIMEOUT when running
manually and having the job killed.

We already have an UNDER_CRON variable set so that we only send stats
when running ... under cron.  Reuse this here for the timeout flag.

Change-Id: Ic2d2f39bb18d247c853284512fe0dc37485c00a4
2022-09-14 08:22:14 +10:00
Clark Boylan b400dfcb90 Add note about afs01's mirror-update vos releases to docs
I tripped over this during recent afs fileserver reboots. Note it in the
docs so that we are aware of this in the future when doing maintenance.

Change-Id: Iac20fa6b9ec17f1eb69c50bc8f5736b34967fd83
2021-06-17 09:53:08 -07:00
Clark Boylan c8be6be1b8 Fix some hostnames in afs docs
Noticed this when doing some afs maintenance. We want the bos status of
fileservers when rebooting those servers not the status of the db
servers.

Change-Id: I30f6a2320487c302fda2ffe300daa1d91c7dec45
2021-06-11 14:21:03 -07:00
Ian Wienand 8a1f6d9764 Cleanup eavesdrop puppet references
Cleanup documenation, puppet references and the eavesdrop_opendev
group.

Change-Id: I67096d8eced0be54db9b1ee277b24602d8c20f00
2021-06-10 09:02:23 +10:00
Ian Wienand ce7ef6536a openafs-server-config: install UserList
This was missed during recent updates; this UserList needs to be on
all servers to allow bos, vos and backup commands.

Update the documentation to reflect the centralised copy.

Change-Id: I8ada3d5035bb7ef77b19ce6aaffb48335974a124
2021-03-30 09:49:53 +11:00
Ian Wienand 3f1d67b99f Add afsdb03 openstack.org
We are in the process of upgrading the AFS servers to focal.  As
explained by auristor (extracted from IRC below) we need 3 servers to
actually perform HA with the ubik protocol:

 the ubik quorum is defined by the list of voting primary ip addresses
 as specified in the ubik service's CellServDB file.  The server with
 the lowest ip address gets 1.5 votes and the others 1 vote.  To win
 election requires greater than 50% of the votes.  In a two server
 configuration there are a total of 2.5 votes to cast.  1.5 > 2.5/2 so
 afsdb02.openstack.org always wins regardless of what
 afsdb01.openstack.org says.  And afsb01.openstack.org can never win
 because 1 < 2.5/2.  by adding a third ubik server to the quorum, the
 total votes cast are 3.5 and it always requires the vote of two
 servers to elect a winner ...  if afsdb03 is added with the highest
 ip address, then either afsdb01 or afsdb02 can be elected

Add a third server which is a focal host and related configuration.

Change-Id: I59e562dd56d6cbabd2560e4205b3bd36045d48c2
2021-03-01 15:51:49 +11:00
Ian Wienand 61e9d0948a Remove AFS puppet
This has all been replaced by Ansible roles and is no longer used

Change-Id: Ic807498ad3ca4f305b168464b86fe197a61b4d13
2021-01-21 07:08:37 +11:00
Ian Wienand f8852b76fb Remove mirror-update server and related puppet
This has all transitioned to Ansible and the mirror-update.opendev.org
server now.

Change-Id: I5f82139c981c2716f568b15b118690e943b02d52
2020-10-28 11:39:54 +11:00
Ian Wienand ceb711e6d9 Swap mirror-update01 for mirror-update02
This is a new Focal based host, which we want for it's more recent
rsync which hopefully causes less issues resyncing things to AFS
volumes.

See 4918594aa4 for discussion of the
original issues; we have found that without "-t" all new data seems to
be copied continuously.  Empirical testing shows later rsync doesn't
have this issue.

Depends-On: https://review.opendev.org/736859
Change-Id: Iebfffdf8aea6f123e36f264c87d6775771ce2dd8
2020-06-19 08:41:44 +10:00
Ian Wienand d19e567576 AFS: add note on volume creation servers
The inline note describes the problem we hit recently creating wheel
volumes.

Change-Id: I58064288c5cf21342b73e5ceb6aed685b3014578
2020-06-12 16:38:10 +10:00
Monty Taylor ebae022d07 Use project-config from zuul instead of direct clones
We use project-config for gerrit, gitea and nodepool config. That's
cool, because can clone that from zuul too and make sure that each
prod run we're doing runs with the contents of the patch in question.

Introduce a flag file that can be touched in /home/zuulcd that will
block zuul from running prod playbooks. By default, if the file is
there, zuul will wait for an hour before giving up.

Rename zuulcd to zuul

To better align prod and test, name the zuul user zuul.

Change-Id: I83c38c9c430218059579f3763e02d6b9f40c7b89
2020-04-15 12:29:33 -05:00
Zuul 1f67b8ed37 Merge "Add docs for deleting an AFS volume" 2020-02-10 17:09:04 +00:00
James E. Blair cfc1841c06 Add warning about kerberos key rotation
Change-Id: I9e4caf8feeb775c02208a5e5f1627f03a90e4211
2020-01-31 16:22:52 -08:00
James E. Blair 255f996916 Add docs for deleting an AFS volume
Change-Id: I1763eb2bf580591b68bf4e2853378331b8261293
2020-01-20 09:43:34 -08:00
James E. Blair 87fccc8e9b Add docs for recovering an OpenAFS fileserver
This should be a smooth recovery process.

Change-Id: I3c68b077e38a88160286d94e71676c0c4dbb6a51
2019-09-13 10:42:17 -07:00
Ian Wienand 35f1321e14 AFS server restart and audit logging : helper script
This script helps restart the AFS servers, which is useful when
updating parameters.  It can also enable audit logging.

It can also stop and start the servers, although it's unlikely we'd
want all the servers offline at the same time so stopping has a
warning included.

Documentation is updated to refer to the helper script

Change-Id: Idcb3e43a3f6e614cdb787d4334e692a98bffdd15
2019-08-02 16:37:00 +10:00
Ian Wienand 23f4f3989d mirror-update: update docs for mirror-update.opendev.org
Update AFS docs to refer to the new host

Change-Id: Ib6b54729e0b186ceb7d0beffbbd68bcab0e2e1ba
2019-07-04 09:11:40 +10:00
Ian Wienand abf11982ce Raise callbacks for AFS server
As documented in [1]

 If the number next to "GotSomeSpaces" or any of the "GSS*" fields is
 greater than 0, then the fileserver ran out of callback space and had
 to prematurely revoke callback promises from clients in order to free
 up space.

Here's our stats on afs01:

  $ xstat_fs_test localhost -collID 3 -onceonly

  Starting up the xstat_fs service, no debugging, one-shot operation

  ------------------------------------------------------------
            13547865 DeleteFiles
          1849223729 DeleteCallBacks
            45049055 BreakCallBacks
          2098382037 AddCallBack
                 174 GotSomeSpaces
                7800 DeleteAllCallBacks
               20778 nFEs
               21184 nCBs
             1500000 nblks
            43425561 CBsTimedOut
                   0 nbreakers
                   8 GSS1
                   4 GSS2
                   5 GSS3
                 169 GSS4
                   4 GSS5

So as noted, the server ran out of callback spaces a few times.
Raising it takes only a little memory, but will help performance.

Thanks to Jeffrey Altman (auristor) for pointing this out.

[1] https://www.openafs.org/pages/newsletter/newsletter-2013-03-volume004-issue05.html

Change-Id: I2ad33dd8918cb559634d2c5b8c4e4e7f2d6d4051
2019-06-28 12:14:47 +10:00
Monty Taylor d500651367 Rename cgit_file to git_file
In sphinx, we have a :cgit_file: directive that makes links to files.
Thing is - we're not using cgit anymore. So just rename it to git_file.

Change-Id: I80aca5fb3cc84281e29843944fea33e6f4d9fe6f
2019-04-22 11:47:11 +00:00
Monty Taylor e01ed4f066 Update some docs for opendev
There's a lot of these, so doing them in chunks. This fixes
the custom roles.

Remove the git and jjb docs, since we don't use them anymore.

Change-Id: I0c5b74f7b73315dac93bce6be0d920cddb94fb58
2019-04-20 09:41:45 -07:00
François Magimel 46260a79ee Fix spelling mistakes and reST typos in the doc
Change-Id: I61d9780f3f1937c6e8d326a670c40fb6a931dbce
2018-12-08 19:13:53 +01:00
Zuul c7b7801b3b Merge "Add afs client docs for non Debuntu" 2018-10-17 00:22:26 +00:00
Clark Boylan 9a16571f0d Add afs client docs for non Debuntu
Add info on how to kinit and aklog if not using Debuntu deb.conf to set
the correct realm and cell settings.

Change-Id: I80a698649f03863b73399873cf190fda4fa41776
2018-10-16 15:46:44 -07:00
Jeremy Stanley d97ac5e50a Add note about mounting one AFS volume in another
This ate a good chunk of my day before a more AFS-savvy colleague
pointed out that a mountpoint within a volume is just a special kind
of file record and so needed the parent volume released before it
would appear in the read-only path.

Change-Id: Ic3d717d70c8bf2548447550472a52849dd85ffd3
2018-10-05 14:03:30 +00:00
Monty Taylor 7ed39c17f5
Fix AFS and CA docs references to puppetmaster
Also, update the locations that we're told to hieraedit.

Change-Id: I41824ff9dc52b3e70a5e55ae71ef49f29511e8e3
2018-08-19 10:26:10 -05:00
Monty Taylor 1a8c2f66da
Move /opt/system-config/production to /opt/system-config
The production directory is a relic from the puppet environment concept,
which we do not use. Remove it.

The puppet apply tests run puppet locally, where the production
environment is still needed, so don't update the paths in the
tools/prep-apply.sh.

Depends-On: https://review.openstack.org/592946
Change-Id: I82572cc616e3c994eab38b0de8c3c72cb5ec5413
2018-08-17 09:41:02 -05:00
Ian Wienand 882b730fdf Update to openstackdocstheme
This modernises the openstack-infra documentation by switching to
openstackdocstheme.  Update dependencies as required.

To remove non-relevant stuff from conf.py, I have just taken the demo
file from openstackdocstheme and lightly modified it.

It seems later sphinx has included it's own ":file:" role which now
conflicts.  Change it it ":cgit_file:" in our documentation.  Remove
the custom header template which no longer applies.  Add the
post-2.0-pbr sphinx-based warning-as-error, which fixes the original
problem that I actually noticed that errors could slip through the
gate tests :)

Change-Id: Ic7bec57b971bb4c75fc839e7269d1f69a576b85c
2018-06-25 11:19:43 +10:00
Jeremy Stanley 2e92731929 Document an example for deleting content from AFS
A simple walkthrough of using an AFS superuser to perform write
operations under an AFS read-write path, including authenticating
and unauthenticating.

Change-Id: If27376745b43f94f27f104bca9309035d265ee72
2018-06-08 16:00:41 +00:00
Jeremy Stanley 113d455c70 Note missing AAAA records for AFS servers
Document why we don't maintain AAAA resource records in DNS for our
OpenAFS servers.

Change-Id: Ib295e79b32af43f26782e4277464bd130f4318e4
2018-04-09 21:56:05 +00:00
Zuul c3466f398c Merge "Add NO_TIMEOUT flag for reprepro" 2018-03-23 00:24:33 +00:00
Ian Wienand 8cf4b59796 Update AFS fileserver settings
Jeffrey Altman has pointed out that our settings are not optimal for
our use cases.  Turning up threads and callbacks is a start.  We
should evaluate the other settings too.

Add notes on how to apply settings manually

Change-Id: I1405b21f97c1ac2d3bd99ffbba18e5fd0ff959b1
2018-03-20 10:58:09 +11:00
Ian Wienand 09d080a337 Add NO_TIMEOUT flag for reprepro
During manual runs, you want to do this without a timeout.  Add a
flag; I always end up copying the script and manually editing this
stuff in.  Add a quick note in the AFS docs.

Change-Id: I239bc1a0b5928673b42cc67291bb519d5f5d2471
2018-03-08 15:15:48 +11:00
James E. Blair faa31fa404 Add kerberos / afs dns info
Change-Id: Id2cc43f1d67584ac26709d61679b3c6659df8daa
2017-12-15 08:24:26 -08:00
Clark Boylan b712584e53 Add AFS maintenance docs
This adds documentation on how to maintenance on the AFS cluster with no
service outages.

Change-Id: Idf9ab67603a1c5e8ac062458f3d17399d807e3a8
2017-04-14 11:16:35 -07:00
Clark Boylan fb5391b142 Add info on reverse proxy caches
This includes some basic info on the new mirror host reverse proxy
caches for resources that aren't simple/easy/practical for proper
mirroring.

Change-Id: If71fa6bf1769ef82ab3a4d2c8a5e78005fc6d7e5
2017-04-06 10:54:52 +10:00
Jenkins f4e3118709 Merge "Add a short note on removing mirror from AFS" 2017-04-06 00:36:52 +00:00
Ian Wienand 4e2e12e1d0 Add a short note on removing mirror from AFS
A short note on getting rid of a old or incorrect volume.

Change-Id: If8aa00773c71eacc58255bfb8a44661728e3084e
2017-04-06 10:20:30 +10:00
Ian Wienand a84488f4a9 Updates to adding mirror documentation
Add some more details and enhance some formatting around adding new
mirror volumes.

Change-Id: I1c8c9432fe0f96bd6be659bdc6facebaf35eb915
2017-04-05 10:01:52 +10:00
Jenkins a919e34237 Merge "Correct new afs fileserver docs" 2016-05-12 09:00:45 +00:00
Jenkins d6418c8793 Merge "Update AFS docs to reference afs02.dfw" 2016-05-12 08:59:48 +00:00
Paul Belanger c3fc2462a4
Fix typo with pts commands for service user
Change-Id: I4c116b6087000e59c084b8ecfb6dd48146f16aee
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-05-10 14:37:05 -04:00
James E. Blair 7e67b9f6a4 Correct new afs fileserver docs
Change-Id: I77de4ab498c197f22c4d69b43b83d4a934bf0698
2016-05-10 09:59:14 -07:00
James E. Blair a0a076a506 Update AFS docs to reference afs02.dfw
Change-Id: Iac78791f12b040207e6bfeb3e224a3cc2159a81a
2016-05-10 09:58:36 -07:00
Jenkins b4b4a81de7 Merge "Document read-write afs volumes" 2016-05-08 14:11:39 +00:00
James E. Blair 0a7e70220e Document how mirrors are contstructed in AFS
Change-Id: I197be39687c016ba8cbc5023bf6e05cdd8c7e5b1
2016-04-14 16:22:05 -07:00
Paul Belanger 81c9d16567 Document read-write afs volumes
Change-Id: Ie9c9ba9af25582e75581d89cf669b3fdb92c253f
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-03-29 18:40:04 -04:00
James E. Blair 16e99c61ad Update afs superuser docs
Super users should be created on all servers

Change-Id: I995085f3716ff93385cb2c3aebfadd79148a0adf
2016-01-25 11:49:01 -08:00
Elizabeth K. Joseph 11a9b7ccce Update documentation with new Puppet modules
Location of our Puppet modules has changed now that they are split
from system-config, update documentation accordingly.

Change-Id: I4d4adc5d41f50dd92fbd642ac30f95c327a416b2
2015-01-28 19:48:10 -08:00