Commit Graph

88 Commits

Author SHA1 Message Date
Jeremy Stanley aa3f4d71b0 Document adding Zuul WebUI admins
Step-by-step process for adding your account to the zuul realm in
Keycloak, so that you can access the admin capabilities of our Zuul
WebUI.

Change-Id: I613e3b45316471df2054300a8b115da78debdcb2
2024-02-14 16:54:47 +00:00
Clark Boylan 2e961b1af0 Cleanup force merging docs
At some point we shifted from doing this task using the web UI to
primarily using ssh only admin accounts. The docs ended up in a slightly
confusing place with steps that only make sense when you interact with
the web UI. Update the force merge docs to assume ssh only which is far
more aligned with our admin account expectations.

Change-Id: Ia99afe7ee10927765733891f72bd428e52fa2225
2022-11-16 14:50:11 -08:00
James E. Blair 1ff685488e Combine / reconcile two force-merge docs
The force-merge procedure was documented twice, neither one complete.
Combine them.

Change-Id: If4350a0a90d455b64227befde2f1be7475ac8120
2022-07-28 07:57:21 -07:00
Ian Wienand 3a09bf7e8a gerrit docs: cleanup and use shell-session
A few formatting fixes

* try to more consistently use shell-session formatting for shell
  sessions (makes it easier to copy-paste).
* fix up and use more `` around verbatim/code things.

Fixes:

 * Gerrit Configuration : there's no db to set the ICLA fields in now,
   remove
 * Duplicate Accounts : add required arg "origin" to git fetch command
 * Deactivating account : can not delete comments via sql query,
   remove

Change-Id: Ia481750aa59fc88bef5c00bb0fd9e6f9e23b2777
2022-06-24 15:37:52 +10:00
Ian Wienand 4c86706e5e docs: reorganise around a open infrastructure overview
This introduces and "Open Infrastructure" page which is designed for a
moderately experienced developer with some understanding of Zuul,
Ansible and basic Linux admin skills to have an entrypoint to
navigating the system-config and related repositories.

It is designed to re-enforce the idea of open infrastructure, and
explain how development, testing and production come together at a
level high enough to be understood, but with links or descriptions of
specific places in the code to get started.

It moves a little of what was in the sysadmin page into this, and
leaves that page as more low-level descriptions of various tasks.

Change-Id: I60a9299df455b98ad549ac0075a59d381722bc06
2022-03-04 12:18:42 +11:00
Jack Morgan 93e20041f9 Minor update to documentation.
Signed-off-by: Jack Morgan <jack@jento.io>
Change-Id: Ic2b7d71796634ca024dae33547d1b17f349b7f1e
2022-01-11 17:53:16 -08:00
Ian Wienand 3d63b3b8a4 borg-backup-server: log prune output to file
This saves prune output to a log file automatically.  Add a bit more
info on the process too.

Change-Id: I2607ddbc313dfebc122609af78bb5eed63906f6b
2021-08-04 14:47:50 +10:00
James E. Blair ec4baa8bcb Fix typo in gerrit sysadmin doc
The label arguments require "=".

Change-Id: I35442033d26060fa639f414aa1a8c6e508716831
2021-05-19 13:19:26 -07:00
Ian Wienand 116a2ca4a4 doc: update backup instructions
Update the backup instructions for some recent changes.  Make a note
of the streaming backup method, discuss some caveats with append-only
mode and discuss the pruning scripts and when to run
(c.f. I9559bb8aeeef06b95fb9e172a2c5bfb5be5b480e,
I250d84c4a9f707e63fef6f70cfdcc1fb7807d3a7).

Change-Id: Idb04ebfa5666cd3c20bc0132683d187e705da3f1
2021-02-09 12:15:24 +11:00
Zuul 15d579cf31 Merge "Document dual account split for Gerrit admins" 2020-11-05 17:19:50 +00:00
Ian Wienand eb07ab3613 borg-backup: add fuse
Add the FUSE dependencies for our hosts backed up with borg, along
with a small script to make mounting the backups easier.  This is the
best way to recover something quickly in what is sure to be a
stressful situation.

Documentation and testing is updated.

Change-Id: I1f409b2df952281deedff2ff8f09e3132a2aff08
2020-11-05 11:56:46 +11:00
Jeremy Stanley 427ae2a2aa Document dual account split for Gerrit admins
Our Gerrit admins follow this model of access management now, in
order to shield Administrators permission from external identity
provider risks.

Change-Id: I3070c28c26548d364da38d366bfa2ac8b2fb4668
2020-10-28 21:03:20 +00:00
Zuul 083e8b43ea Merge "Add borg-backup roles" 2020-10-01 07:36:47 +00:00
Ian Wienand e3fb7d2be0 docs: Update some of sysadmin details
Give a little more details on the current ci/cd setup; remove puppet
cruft.

Change-Id: I684df4459cf5940d70b89e4c05103f8a8352af87
2020-09-07 17:14:21 +10:00
Ian Wienand 028d655375 Add borg-backup roles
This adds roles to implement backup with borg [1].

Our current tool "bup" has no Python 3 support and is not packaged for
Ubuntu Focal.  This means it is effectively end-of-life.  borg fits
our model of servers backing themselves up to a central location, is
well documented and seems well supported.  It also has the clarkb seal
of approval :)

As mentioned, borg works in the same manner as bup by doing an
efficient back up over ssh to a remote server.  The core of these
roles are the same as the bup based ones; in terms of creating a
separate user for each host and deploying keys and ssh config.

This chooses to install borg in a virtualenv on /opt.  This was chosen
for a number of reasons; firstly reading the history of borg there
have been incompatible updates (although they provide a tool to update
repository formats); it seems important that we both pin the version
we are using and keep clients and server in sync.  Since we have a
hetrogenous distribution collection we don't want to rely on the
packaged tools which may differ.  I don't feel like this is a great
application for a container; we actually don't want it that isolated
from the base system because it's goal is to read and copy it offsite
with as little chance of things going wrong as possible.

Borg has a lot of support for encrypting the data at rest in various
ways.  However, that introduces the possibility we could lose both the
key and the backup data.  Really the only thing stopping this is key
management, and if we want to go down this path we can do it as a
follow-on.

The remote end server is configured via ssh command rules to run in
append-only mode.  This means a misbehaving client can't delete its
old backups.  In theory we can prune backups on the server side --
something we could not do with bup.  The documentation has been
updated but is vague on this part; I think we should get some hosts in
operation, see how the de-duplication is working out and then decide
how we want to mange things long term.

Testing is added; a focal and bionic host both run a full backup of
themselves to the backup server.  Pretty cool, the logs are in
/var/log/borg-backup-<host>.log.

No hosts are currently in the borg groups, so this can be applied
without affecting production.  I'd suggest the next steps are to bring
up a borg-based backup server and put a few hosts into this.  After
running for a while, we can add all hosts, and then deprecate the
current bup-based backup server in vexxhost and replace that with a
borg-based one; giving us dual offsite backups.

[1] https://borgbackup.readthedocs.io/en/stable/

Change-Id: I2a125f2fac11d8e3a3279eb7fa7adb33a3acaa4e
2020-07-21 17:36:50 +10:00
Clark Boylan 3beb50a3b3 Add bit more info on disabling ansible runs
We've got a section on using the emergency file and disabled ansible
group. Add info about the special DISABLE-ANSIBLE file there to help
make that info easier to find.

Change-Id: I2e750b9b87ca7a4f800d3ac161a195d49543a7da
2020-06-15 14:41:51 -05:00
Monty Taylor 83ced7f6e6 Split inventory into multiple dirs and move hostvars
Make inventory/service for service-specific things, including the
groups.yaml group definitions, and inventory/base for hostvars
related to the base system, including the list of hosts.

Move the exisitng host_vars into inventory/service, since most of
them are likely service-specific. Move group_vars/all.yaml into
base/group_vars as almost all of it is related to base things,
with the execption of the gerrit public key.

A followup patch will move host-specific values into equivilent
files in inventory/base.

This should let us override hostvars in gate jobs. It should also
allow us to do better file matchers - and to be able to organize
our playbooks move if we want to.

Depends-On: https://review.opendev.org/731583
Change-Id: Iddf57b5be47c2e9de16b83a1bc83bee25db995cf
2020-06-04 07:44:36 -05:00
Dr. Jens Harbott 46b4053a0a Document the need to use sudo in order to access OSC
Change-Id: I9e80f0b57bc9758e6b0458428315b1087856ddec
2020-05-19 10:09:23 +00:00
Monty Taylor cba5129465 Remove puppet and cron mentions from docs
We've got some old out of date docs in some places. This isn't even
a full reworking, but at least tries to remove some of the more
egregiously wrong things.

Change-Id: I9033acb9572e1ce1b3e4426564b92706a4385dcb
2020-04-16 07:04:14 -07:00
Monty Taylor 8af7b47812 Get rid of all-clouds.yaml
We had the clouds split from back when we used the openstack
dynamic inventory plugin. We don't use that anymore, so we don't
need these to be split. Any other usage we have directly references
a cloud.

Change-Id: I5d95bf910fb8e2cbca64f92c6ad4acd3aaeed1a3
2020-04-09 16:44:20 -05:00
James E. Blair 06d5ce1423 Correct launch readme link
This has a .rst extension now.

Change-Id: Icafdb12f91315f5c37f95755034d216bc4a5c837
2020-03-27 09:45:42 -07:00
Andreas Jaeger 2c0b82e5e8 Update infra-manual location
The infra-manual now lives on docs.opendev.org, update links.

New location is: https://docs.opendev.org/opendev/infra-manual/latest

Change-Id: I7716c68cbff4f3a640d7161f59cfc034a7ccca52
2020-03-20 22:03:09 +01:00
Dr. Jens Harbott c86525ccd3 Update references to IRC channels
With the move from OpenStack governance to our own OpenDev team, we
should also move to use the #opendev IRC channel in preference to
the #openstack-infra channel which will remain in use for OpenStack
specific discussions.

Update the references in our docs accordingly.

Change-Id: I448704f5d2664fd233a69a2ad12578ca24d9878a
2020-03-18 17:33:08 +01:00
Zuul 44935bca39 Merge "Add notes on manual host configuration runs" 2020-01-16 22:53:05 +00:00
Ian Wienand 814e4be128 Ansible roles for backup
This introduces two new roles for managing the backup-server and hosts
that we wish to back up.

Firstly the "backup" role runs on hosts we wish to backup.  This
generates and configures a separate ssh key for running bup and
installs the appropriate cron job to run the backup daily.

The "backup-server" job runs on the backup server (or, indeed
servers).  It creates users for each backup host, accepts the remote
keys mentioned above and initalises bup.  It is then ready to receive
backups from the remote hosts.

This eliminates a fairly long-standing requirement for manual setup of
the backup server users and keys; this section is removed from the
documentation.

testinfra coverage is added.

Change-Id: I9bf74df351e056791ed817180436617048224d2c
2019-08-05 16:59:57 +10:00
Jeremy Stanley 4c04ad5436 Correct emergency file reference in launch script
The launch script is referring to the wrong path for the emergency
inventory. Also correct the references in the sysadmin guide and
update the example for using it.

Change-Id: I80bdbd440ec451bcd6fb1a3eb552ffda32407c44
2019-07-26 14:55:32 +00:00
Jeremy Stanley 861f5e893f Streamline documented bup setup process
Reorder some of the commands used to set up and configure the bup
user on backup servers so the process is more straightforward and
requires fewer mental context switches.

Change-Id: I73cb80a04b8b5a74bb0857b4c8b6fb09030d6306
2019-06-18 23:57:19 +00:00
Monty Taylor d500651367 Rename cgit_file to git_file
In sphinx, we have a :cgit_file: directive that makes links to files.
Thing is - we're not using cgit anymore. So just rename it to git_file.

Change-Id: I80aca5fb3cc84281e29843944fea33e6f4d9fe6f
2019-04-22 11:47:11 +00:00
Monty Taylor eaa74543de Finish updating docs for opendev
The zuul and zuulv3 docs need to be merged, but that seemed like
too much for this. Also, the 3rd party CI doc is out of date, but
in this patch only removed sections that linked to docs or files
that don't exist anymore.

Change-Id: Ie5497edd762d2146165608f3227b0bac88a913df
2019-04-20 18:25:37 +00:00
Ian Wienand d4a6f1269a Backup rotation procedure
Add a backup rotation procedure to the sysadmin documentation

Change-Id: I366198c635c7fd7f8e1876296bf9357dd577bf56
2019-03-19 12:12:16 +11:00
Ian Wienand 1c48bfe327 Enable github shared admin account
This change describes the shared github administrator account.

This is inspired by I0c61f192a6b5164af7babde5c99e5ee2b77a652c.  As
described there, this allows for admins to have private accounts in
the organisation, but requires that 2FA be turned on.  If people wish
to keep this as a single account which they do "real" work with
(commits, etc) that is probably OK, but add a note that you'll end up
with a lot of mostly irrelevant stuff in your feeds.

Change-Id: Ic408250571133796b4b4639715fe8d01f91898f2
2018-12-12 10:48:16 +11:00
Ian Wienand 8a95c976e9 Add a workflow overview for adding a cloud
Add some details about how we integrate a new cloud into the
ecosystem.  I feel like this is an appropriate level of detail given
we're dealing with clueful admins who just need a rough guide on what
to do and can fill in the gaps.

Fix up the formatting a bit while we're here.

Change-Id: Iba3440e67ab798d5018b9dffb835601bb5c0c6c7
2018-10-19 16:38:00 +00:00
Ian Wienand cccbeb781c Add notes on manual host configuration runs
Change-Id: I7cf2ea77a378920eacb35ff7743062966ece1487
2018-09-20 09:53:28 +10:00
Andreas Jaeger 1c6b4876eb Cleanup docs formatting
Fix indents of some pages, the wrong indent let to gray bars besides
them.

Also, fix a typo and add some markup.

Change-Id: I6e7126ef7b782b376efcc7c6d69c6de9a504ddb5
2018-08-24 22:13:37 +02:00
Monty Taylor c716240692
Clean up puppetmaster puppet config handled by ansible
We have a bunch of this handled now in ansible, so remove the old stuff.

Remove puppetmaster group management files. It's confusing for there to
be two files. Remove the old one.

Remove mqtt config. This isn't really a thing currently, and we're
eyeing running things from zuul anyway, so no need to port to ansible.

Change-Id: I8b64d21eadcc4a08bd5e5440fc5f756ae5bcd46b
2018-08-17 11:53:52 -05:00
Monty Taylor bab6fcad3c
Remove base.yaml things from openstack_project::server
Now that we've got base server stuff rewritten in ansible, remove the
old puppet versions.

Depends-On: https://review.openstack.org/588326
Change-Id: I5c82fe6fd25b9ddaa77747db377ffa7e8bf23c7b
2018-08-16 17:25:10 -05:00
Zuul 04aac06820 Merge "Update Gerrit project renaming for Zuul v3" 2018-08-01 16:45:10 +00:00
Ian Wienand 882b730fdf Update to openstackdocstheme
This modernises the openstack-infra documentation by switching to
openstackdocstheme.  Update dependencies as required.

To remove non-relevant stuff from conf.py, I have just taken the demo
file from openstackdocstheme and lightly modified it.

It seems later sphinx has included it's own ":file:" role which now
conflicts.  Change it it ":cgit_file:" in our documentation.  Remove
the custom header template which no longer applies.  Add the
post-2.0-pbr sphinx-based warning-as-error, which fixes the original
problem that I actually noticed that errors could slip through the
gate tests :)

Change-Id: Ic7bec57b971bb4c75fc839e7269d1f69a576b85c
2018-06-25 11:19:43 +10:00
Jeremy Stanley cbbceb2330 Update Gerrit project renaming for Zuul v3
With the switch to Zuul v3, we need to resolve some configuration
catch-22s where project names and related in-repository job
definitions can't happen without a complex multi-stage removal and
reintroduction process to get it through speculative testing
successfully. For now, just punt and use monolithic changes
bypassing CI in code review. As an up side, the Ansible automation
of this process coupled with Zuul v3's increased resilience to
on-the-fly configuration changes means we can skip stopping/starting
it now and significantly simplify the process.

Since we're here, correct the section heading level for
"Force-Merging a Change" in the sysadmin document.

Change-Id: I335c23abd0b5706f43bbea2dd8cfffa4280dd5db
2018-03-19 15:26:58 +00:00
Ian Wienand 5b2ac45099 Add a note on the shared infra root mail account
Change-Id: Id8ae73f99f46d5f0224c8d9145d5c06ee9ea09da
2018-02-08 12:01:55 +11:00
Zuul 81a86fa41a Merge "Add docs to replace a cinder volume" 2017-11-20 21:13:22 +00:00
Ian Wienand 60b89d662e Remove ci-backup-rs-ord.openstack.org
Migrate backups to new backup01.ord.rax.ci.openstack.org

We decided to start fresh backups on the new server, so this is ready
to go.  I have performed an initial backup on each server so it has
accepted the host key of the new server and been tested (I also fixed
up review-dev.o.o, which was rebuilt but keys not updated ... todo:
add this to puppet, but since it changes so infrequently not high
priority).

Change-Id: I0872f9fcf4a334d32f632b3cb04801deefab4fd1
2017-11-15 09:28:55 +11:00
James E. Blair b8722bc67c Add documentation on force-merging a change
Change-Id: Ie6fd2a7fa968909440ae3a30b64a6b80792dd1c5
2017-10-12 01:50:05 +00:00
Paul Belanger d485cc7e11 Add docs to replace a cinder volume
We usually want to do these steps to avoid volume outages when
rackspace is doing updates.

Change-Id: Ie5de97484dddb9136c240baf46724646e39df67e
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-03-23 13:22:47 -04:00
Clark Boylan b61a3eb7a4 Clean up backups documentation
This adds the now required bup init command to the server to be backed
up. Also remove now gone HPCloud backup server and fix quotes around
command for catting public ssh key.

Change-Id: I607a7c079b16d7f1e94d6b0888cd6e302a04f68f
2017-02-08 10:38:27 -08:00
Jenkins 1960078a1d Merge "Use an ordinal server naming pattern" 2016-06-30 20:14:40 +00:00
Jenkins abf31b52e9 Merge "Update cinder mgmt docs to use openstackclient" 2016-06-19 00:54:31 +00:00
Jenkins 4d04652b3f Merge "Add more lvm commands to cinder documentation" 2016-05-26 02:32:39 +00:00
Jeremy Stanley 3ac0a5eb69 Use an ordinal server naming pattern
As discussed during the "Launch Node, Ansible and Puppet" summit
session in Austin, we're making things unnecessarily hard on
ourselves by insisting on having multiple servers in our inventory
with the same name. In order to make server addition and replacement
automation simpler, start using an ordinal suffix on server short
names to differentiate them (we can still easily rely on DNS for
their non-numbered convenience names).

Change-Id: I040a5c3b5e1abc50c3e4676bcab0bf4eaa550f4b
2016-05-23 19:42:18 +00:00
Leif Madsen bdd7085987
Minor documentation tweaks
Change-Id: Iece51871918979875f10eeaac0795c23232832d3
2016-04-27 22:29:05 -04:00