Make inventory/service for service-specific things, including the
groups.yaml group definitions, and inventory/base for hostvars
related to the base system, including the list of hosts.
Move the exisitng host_vars into inventory/service, since most of
them are likely service-specific. Move group_vars/all.yaml into
base/group_vars as almost all of it is related to base things,
with the execption of the gerrit public key.
A followup patch will move host-specific values into equivilent
files in inventory/base.
This should let us override hostvars in gate jobs. It should also
allow us to do better file matchers - and to be able to organize
our playbooks move if we want to.
Depends-On: https://review.opendev.org/731583
Change-Id: Iddf57b5be47c2e9de16b83a1bc83bee25db995cf
We're going to try using this in some other organizations, so
simplify thing.
Add in a flush handlers so that we don't have to split plays.
Remove kubernetes group, this isn't actually a thing right now.
Change-Id: I26b21aa8ffca1ac5112136831aa7664d5c3becac
The jitsi video bridge (jvb) appears to be the main component we'll need
to scale up to handle more users on meetpad. Start preliminary
ansiblification of scale out jvb hosts.
Note this requires each new jvb to run on a separate host as the jvb
docker images seem to rely on $HOSTNAME to uniquely identify each jvb.
Change-Id: If6d055b6ec163d4a9d912bee9a9912f5a7b58125
This adds a new variable for the iptables role that allows us to
indicate all members of an ansible inventory group should have
iptables rules added.
It also removes the unused zuul-executor-opendev group, and some
unused variables related to the snmp rule.
Also, collect the generated iptables rules for debugging.
Change-Id: I48746a6527848a45a4debf62fd833527cc392398
Depends-On: https://review.opendev.org/728952
This autogenerates the list of ssl domains for the ssl-cert-check tool
directly from the letsencrypt list.
The first step is the install-certcheck role that replaces the
puppet-ssl_cert_check module that does the same. The reason for this
is so that during gate testing we can test this on the test
bridge.openstack.org server, and avoid adding another node as a
requirement for this test.
letsencrypt-request-certs is updated to set a fact
letsencrypt_certcheck_domains for each host that is generating a
certificate. As described in the comments, this defaults to the first
host specified for the certificate and the listening port can be
indicated (if set, this new port value is stripped when generating
certs as is not necessary for certificate generation).
The new letsencrypt-config-certcheck role runs and iterates all
letsencrypt hosts to build the final list of domains that should be
checked. This is then extended with the
letsencrypt_certcheck_additional_domains value that covers any hosts
using certificates not provisioned by letsencrypt using this
mechanism.
These additional domains are pre-populated from the openstack.org
domains in the extant check file, minus those openstack.org domain
certificates we are generating via letsencrypt (see
letsencrypt-create-certs/handlers/main.yaml). Additionally, we
update some of the certificate variables in host_vars that are
listening on port !443.
As mentioned, bridge.openstack.org is placed in the new certcheck
group for gate testing, so the tool and config file will be deployed
to it. For production, cacti is added to the group, which is where
the tool currently runs. The extant puppet installation is disabled,
pending removal in a follow-on change.
Change-Id: Idbe084f13f3684021e8efd9ac69b63fe31484606
Remove the separate "mirror_opendev" group and rename it to just
"mirror". Update various parts to reflect that change.
We no longer deploy any mirror hosts with puppet, remove the various
configuration files.
Depends-On: https://review.opendev.org/728345
Change-Id: Ia982fe9cb4357447989664f033df976b528aaf84
We want to replace the current executors with focal executors.
Make sure zuul-executor can run there.
Kubic is apparently the new source for libcontainers stuff:
https://podman.io/getting-started/installation.html
Use only timesyncd on focal
ntp and timesyncd have a hard conflict with each other. Our test
images install ntp. Remove it and just stay with timesyncd.
Change-Id: I0126f7c77d92deb91711f38a19384a9319955cf5
We have two standalone roles, puppet and cloud-launcher, but we
currently install them with galaxy so depends-on patches don't
work. We also install them every time we run anything, even if
we don't need them for the playbook in question.
Add two roles, one to install a set of ansible roles needed by
the host in question, and the other to encapsulate the sequence
of running puppet, which now includes installing the puppet
role, installing puppet, disabling the puppet agent and then
running puppet.
As a followup, we'll do the same thing with the puppet modules,
so that we arent' cloning and rsyncing ALL of the puppet modules
all the time no matter what.
Change-Id: I69a2e99e869ee39a3da573af421b18ad93056d5b
Rather than running a local zookeeper, just run a real zookeeper.
Also, get rid of nb01-test and just use nb04 - what could possibly
go wrong?
Dynamically write zookeeper host information to nodepool.yaml
So that we can run an actual zk using the new zk role on hosts in
ansible inventory, we need to write out the ip addresses of the
hosts that we build in zuul. This means having the info baked in
to the file in project-config isn't going to work.
We can do this in prod too, it shouldn't hurt anything.
Increase timeout for run-service-nodepool
We need to fix the playbook, but we'll do that after we get the
puppet gone.
Change-Id: Ib01d461ae2c5cec3c31ec5105a41b1a99ff9d84a
Zuul is publishing lovely container images, so we should
go ahead and start using them.
We can't use containers for zuul-executor because of the
docker->bubblewrap->AFS issue, so install from pip there.
Don't start any of the containers by default, which should
let us safely roll this out and then do a rolling restart.
For things (like web or mergers) where it's safe to do so,
a followup change will swap the flag.
Change-Id: I37dcce3a67477ad3b2c36f2fd3657af18bc25c40
Make a service playbook, manifest and jobs for codesearch.
Remove openstack_project::server - it doesn't do anything.
Change-Id: I44c140de4ae0b283940f8e23e8c47af983934471
Migration plan:
* add zk* to emergency
* copy data files on each node to a safe place for DR backup
* make a json data backup: zk-shell localhost:2181 --run-once 'mirror / json://!tmp!zookeeper-backup.json/'
* manually run a modified playbook to set up the docker infra without starting containers
* rolling restart; for each node:
* stop zk
* split data and log files and move them to new locations
* remove zk packages
* start zk containers
* remove from emergency; land this change.
Change-Id: Ic06c9cf9604402aa8eb4bb79238021c14c5d9563
Upstream likes building the settings file into the image, but that's
less exciting, let's bind-mount ours in.
Depends-On: https://review.opendev.org/717491/
Change-Id: Ia1894d884ef2a84e1282345b77fe07bf8898f367
These hosts have been removed; remove the old references and
unnecessary groups, add the new host to cacti.
Change-Id: Ibcfd78a37e20e514c190ef801c2d44320c8b3f74
Story: #2006598
This should be mostly a no-op - but we will need to do a shutdown
in emergency mode.
Tell the gerrit role to not run compose up when run as part of
remote_puppet_git.
Change-Id: Id45376c2697656a12afeacf317b6f26c85c08dad
This is a start at ansible-deployed nodepool environments.
We rename the minimal-nodepool element to nodepool-base-legacy, and
keep running that for the old nodes.
The groups are updated so that only the .openstack.org hosts will run
puppet. Essentially they should remain unchanged.
We start a nodepool-base element that will replace the current
puppet-<openstackci|nodepool> deployment parts. For step one, this
grabs project-config and links in the elements and config file.
A testing host is added for gate testing which should trigger these
roles. This will build into a full deployment test of the builder
container.
Change-Id: If0eb9f02763535bf200062c51a8a0f8793b1e1aa
Depends-On: https://review.opendev.org/#/c/710700/
We have LE dns entries for review.o.o, but we're not actually
requesting the cert. Go ahead and request it - it'll make the
apache config easier to sort out.
Get the openstack.org certs for review-dev while we're at it.
Change-Id: I91d06c97993ba37204bd1fc326ae823e1b9c0c1a
Depends-On: https://review.opendev.org/707267
Depends-On: https://review.opendev.org/707255
Add a new review-dev server on the opendev domain with LE support
enabled.
Depends-On: https://review.opendev.org/705661
Change-Id: Ie32124cd617e9986602301f230e83bb138524fdf
The review-dev service playbook should do everything now that
the puppet did. Update how we're running things.
Change-Id: I70303c48328ea6713c24bf9c6f63d4808d30b95c
This adds a new handler to restart the zuul registry to pick up the new
cert. We may want to consider updating zuul registry to accept a reload
of ssl config without restarting the service.
Depends-On: https://review.opendev.org/702050
Change-Id: I23f6bea68285bc7cb0d12224235eaa16f0d07986
This provisions the cert but does not use it yet. We will do the
switchover once the cert is confirmed to be in place.
Depends-On: https://review.opendev.org/701819
Change-Id: I04fee48b9a79758527d8f9e8128c0fa915cd133e
This is the first step in managing the opendev.org cert with LE. We
modify gitea01.opendev.org only to request the cert so that if this
breaks the other 7 giteas can continue to serve opendev.org. When we are
happy with the results we can merge the followup change to update the
other 7 giteas.
Depends-On: https://review.opendev.org/694182
Change-Id: I9587b8c2896975aa0148cc3d9b37f325a0be8970
The backup roles have been debugged and are ready to run.
A note is added about having the backup server in a default disabled
state. This was discussed at an infra meeting where consensus was to
keep it disabled [1].
[1] http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-06-11-19.01.log.html#l-184
Change-Id: I2a3d2d08a9d1514bf6bdcf15bc5bc95689f3020f
This is a new backup server for use with the roles in
I9bf74df351e056791ed817180436617048224d2c
Restrict the puppet group to only the openstack.org servers as this
new server doesn't need puppet.
Depends-On: https://review.opendev.org/674549
Change-Id: Ia8e2e01f579ed9475830c159bf266b63bed52c36
This can be used in an apache vhost later, but should be fine to
merge now.
Depends-On: https://review.opendev.org/673902
Change-Id: Ic2cb7585433351ec1bdabd88915fa1ca07da44e7
We ended up running into a problem with nodepool built control plane
images (has to do with boot from volume not allowing us to delete images
that are in use by a nova instance). We have decided to clean this up
and go back to not doing this until we can do it more properly.
Note this isn't a revert because having a group for access to control
plane clouds does seem like a good idea in general and I believe there
have been changes we'd have to resolve in the clouds.yaml files anyway.
Depends-On: https://review.opendev.org/#/c/665012/
Change-Id: I5e72928ec2dec37afa9c8567eff30eb6e9c04f1d
Add the new mirror-update server as a follow-on to
I525ac18b55f0e11b0a541b51fa97ee5d6512bf70.
Also ensure that the new mirror server isn't in the puppet groups by
only matching the openstack.org one.
Also remove from the afsadmin group. This group is only used for
keytabs stored on bridge.o.o. I don't think that we need group for
the keytabs -- a keytab should only ever be in use on one host at a
time, so we are better off keeping the keytabs in a specific host_var
for the host they are used on, rather than being in a group and
possibly deployed on servers where they are not used.
Depends-On: https://review.opendev.org/668610
Change-Id: Icda92bb234adc00f6718c1c656e8f069ce2704c4