It should run before other services but after base, so run it early
and add it to the dependency list in the infra-prod-service-base job.
Change-Id: I4f65b0ff0fbf3cf1f98060d2b3d3c77eb3c45ec7
Make a base job for the various service playbooks to capture the
fact that we should run these after update-system-config, and if
they run, after both install-ansible and base.
Attach a semphore to the base job, because while many of the
playbooks should be independent, some may not be, and we need to
make sure things don't double-run in periodic and promote.
Transition service-bridge from run_all to zuul, basing it on the
new base job.
While we're in here, reduce manage-projects forks to 10, because
let's face it, that's a more sensible number when there aren't
that many hosts.
Change-Id: I22e9edaea75dcfdab56f667f7c93cdd3ee25406c
We should run manage-projects when the manage-projects code on
system-config changes. To do that, we need to run the system-config
playbook so that the system-config content will be updated.
In order to that properly, we need to run base, which means we
need to run bridge. So we really want to do all three so that we're
doing the correct dependent sequence. Subsequent changes can
then just pick off single service playbooks and make them jobs
that depend on base.
Change-Id: I3560feff4309f6be21b72b30a7a6d61a60829e52
There are two different concerns here. One is configuring the gitea
and gerrit services. This is independent from the management of
projects running inside them.
Make a manage-projects playbook which currently runs gitea-git-repos
but will also get a gerrit-git-repos role in a bit. Make a
service-gitea playbook for deploying gitea itself and update
run_all to take all of that into account. This should make our
future world of turning these into zuul jobs easier.
Add several missing files to the files matchers for run-gitea
and run-review.
Also - nothing about this has anything to do with puppet.
Change-Id: I5eaf75129d76138c61013a3a7ed7c381d567bb8b
The review-dev service playbook should do everything now that
the puppet did. Update how we're running things.
Change-Id: I70303c48328ea6713c24bf9c6f63d4808d30b95c
The backup roles have been debugged and are ready to run.
A note is added about having the backup server in a default disabled
state. This was discussed at an infra meeting where consensus was to
keep it disabled [1].
[1] http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-06-11-19.01.log.html#l-184
Change-Id: I2a3d2d08a9d1514bf6bdcf15bc5bc95689f3020f
It looks like I forgot to add this in
I525ac18b55f0e11b0a541b51fa97ee5d6512bf70 so the mirror-update
specific roles aren't running automatically.
Change-Id: Iee60906c367c9dec1143ee5ce2735ed72160e13d
Prior to https://review.opendev.org/#/c/656871/ this code was executed
by run_all.sh in every pass but seems to have been missed as part of
656871's base.yaml split up.
Add service-bridge.yaml to run_all.sh to get these updates applying to
bridge again. In particular things like clouds.yaml updates are missing
otherwise.
Note I've not merged bridge.yaml and service-bridge.yaml as it appears
we want all of the service stuff to happen after base.yaml but
bridge.yaml needs to happen before. I think this is why they were split
in the first place.
Change-Id: I0a7ce1a65cd19459bbaf244b94a23ddde360da1a
Fix the reported stat name for the mirror playbook.
Run the mirror job in gate.
Set follow=false so that we're telling Ansible to set the perms
on the link rather than the target (which is the default).
Change-Id: Id594cf3f7ab1dacae423cd2b7e158a701d086af6
We ignore E006 which is line lenght longer than 79 characters. We don't
actually care about that. Fix E042 in run_all.sh this represents a
potential real issue in bash as it will hide errors.
This makes the bashate output much cleaner which should make it easier
for people to understand why it fails when it fails in check.
Change-Id: I2249b76e33003b57a1d2ab5fcdb17eda4e5cd7ad
This impelements mirrors to live in the opendev.org namespace. The
implementation is Ansible native for deployment on a Bionic node.
The hostname prefix remains the same (mirrorXX.region.provider.) but
the groups.yaml splits the opendev.org mirrors into a separate group.
The matches in the puppet group are also updated so to not run puppet
on the hosts.
The kerberos and openafs client parts do not need any updating and
works on the Bionic host.
The hosts are setup to provision certificates for themselves from
letsencrypt. Note we've added a new handler for mirror nodes to use
that restarts apache on certificate issue/renewal.
The new "mirror" role is a port of the existing puppet mirror.pp. It
installs apache, sets up some modules, makes some symlinks, sets up a
cleanup cron job and installs the apache vhost configuration.
The vhost configuration is also ported from the extant puppet. It is
simplified somewhat; but the biggest change is that we have extracted
the main port 80 configuration into a macro which is applied to both
port 80 and 443; i.e. the host will have SSL support. The other ports
are left alone for now, but can be updated in due course.
Thus we should be able to CNAME the existing mirrors to new nodes, and
any existing http access can continue. We can update our mirror setup
scripts to point to https resources as appropriate.
Change-Id: Iec576d631dd5b02f6b9fb445ee600be060f9cf1e
This is a first step toward making smaller playbooks which can be
run by Zuul in CD.
Zuul should be able to handle missing projects now, so remove it
from the puppet_git playbook and into puppet.
Make the base playbook be merely the base roles.
Make service playbooks for each service.
Remove the run-docker job because it's covered by service jobs.
Stop testing that puppet is installed in testinfra. It's accidentally
working due to the selection of non-puppeted hosts only being on
bionic nodes and not installing puppet on bionic. Instead, we can now
rely on actually *running* puppet when it's important, such as in the
eavesdrop job. Also remove the installation of puppet on the nodes in
the base job, since it's only useful to test that a synthetic test
of installing puppet on nodes we don't use works.
Don't run remote_puppet_git on gitea for now - it's too slow. A
followup patch will rework gitea project creation to not take hours.
Change-Id: Ibb78341c2c6be28005cea73542e829d8f7cfab08
The server has been removed, remove it from inventory.
While we're here, s/graphite.openstack.org/graphite.opendev.org/'
... it's a CNAME redirect but we might as well clean up.
Change-Id: I36c951c85316cd65dde748b1e50ffa2e058c9a88
Our old puppet 4 process was to run the install_puppet.sh script to
transition from puppet 3 to puppet 4 but this ran after base.yaml which
enforces a puppet version.
Unfortunately we were enforcing puppet version 3 in the base.yaml
playbook via the puppet-install role which meant base would install
pupept 3 and our upgrade playbook would install puppet 4 in a loop.
Thankfully we run puppet after the upgrade so we were using the puppet
version we wanted.
To fix this needless reinstall loop we do two things. We move the
upgrade playbook before base.yaml so that we upgrade before we enforce a
version. Then we update group vars for the puppet4 group to enforce the
puppet 4 version.
Change-Id: I97ca81ed5331e664f8e2e65b283793f0919f6033
Most of these playbooks finish much faster than 2 hours. Set
timeouts which are approximately 3x as long as they are currently
running, rounded to the nearest 10m.
Emit the name of the timer to the log at the end of each run so
that it's more clear which playbook just finished.
Correct the timer name for one of the playbooks.
The k8s cluster deployment playbooks are not yet functional --
run times for those are still unknown.
Change-Id: I43a06baaec908cba7d88c4b0932dcc95f1a9a108
First, we need an @ before the extra vars files. Why? Because
an @ is needed.
Second, the rook playbook was stringing all 4 commands on to one
exec call which was working poorly. Instead, make 4 tasks so that
it's slightly better represented in ansible output, each of which
has a (presumably) valid command.
Change-Id: I30efe84d2041237a00da0c0aac02afa92d29c0fb
The current code runs k8s-on-openstack's ansible in an ansible
task. This makes debugging failures especially difficult.
Instead, move the prep task to update-system-config, which will
ensure the repo is cloned, and move the post task to its own
playbook. The cinder storage class k8s action can be removed from
this completely as it's handled in the rook playbook.
Then just run the k8s-on-openstack playbook as usual, but without
the cd first so that our normal ansible.cfg works.
Change-Id: I6015e58daa940914d46602a2cb64ecac5d59fa2e
Since the gitea cluster doesn't appear in any ansible inventory,
we need to create a dedicated file to hold the extra variables.
Change-Id: Ib2365c9204bff549fdc0116243376d6e895f2296
The k8s-on-openstack project produces an opinionated kubernetes
that is correctly set up to be integrated with OpenStack. All of the
patches we've submitted to update it for our environment have been
landed upstream, so just consume it directly.
It's possible we might want to take a more hands-on forky approach in
the future, but for now it seems fairly stable.
Change-Id: I4ff605b6a947ab9b9f3d0a73852dde74c705979f
Add some coarse-grained statsd tracking for the global ansible runs.
Adds a timer for each step, along with an overall timer.
This adds a single argument so that we only try to run stats when
running from the cron job (so if we're debugging by hand or something,
this doesn't trigger). Graphite also needs to accept stats from
bridge.o.o. The plan is to present this via a simple grafana
dashboard.
Change-Id: I299c0ab5dc3dea4841e560d8fb95b8f3e7df89f2
In run_all.sh, increase the number of ansible forks to 50 for most
playbooks in an attempt to speed up the process.
Change-Id: I487605fd3b2d20d7b1f19c40d22018deeae9c112
And revert "Set Ansible forks to 50"
This doesn't seem to have helped, and may have made the run longer.
I suspect a problem with the env var, but let's revert back to the
old value and mechanism (cli flag) to re-establish a baseline,
then we'll change the value of the cli flag.
This reverts commit 8419909571.
This reverts commit 97d8f9d0bf.
Change-Id: I825b2b3db26ce6dd7d70fcc8b33e70b511eb52db
20 is working fine with plenty of ram/cpu to spare, increase to 50
to attempt to speed up the runtime.
The environment variable should be used by default, but the "-f"
option will override that, in the one case where we need it.
Change-Id: Ie6a1d991a346702ec58cd716b0b94af5c93554ac