Commit Graph

2865 Commits

Author SHA1 Message Date
Zuul 629dfe2e6f Merge "Cleanup opensuse mirroring configs entirely" 2024-03-26 20:05:01 +00:00
Zuul 336a4ae440 Merge "Switch install-docker playbook to include_tasks" 2024-03-22 22:48:03 +00:00
Zuul a572751996 Merge "Upgrade Refstack's MariaDB to 10.11" 2024-03-22 17:06:26 +00:00
Clark Boylan 515abdec64 Cleanup opensuse mirroring configs entirely
This should cleanup our mirror update server so that we no longer have
configes (cron, scripts, logrotate rules, etc) for mirroring opensuse.
It won't clean up the afs volume, but we can get to that later (and it
will probably require manual intervention). This cleanup is done in a
way that it should be able to be applied to future cleanups too (like
when centos 8 stream goes away and everything is centos stream
specific).

Change-Id: Ib5d15ce800ff0620187345e1cfec0b7b5d65bee5
2024-03-18 15:49:43 -07:00
Clark Boylan a0ae3481dd Update opensuse mirror script to more completely clean up
There are a number of issues with opensuse mirroring content cleanup
that this change aims to address. First up we fix the prefix for the
CentOS 7 networking content; it needed a repositories/ prefix. At the
same time we don't bother deleting the leaf data and instead delete the
more top level directory since we're cleaning this all up.

We then apply this top level cleanup to all of the repositories,
distributions, and updates. This is largely a noop (just some directory
removals) except in the case of update/ which still contains leap 15.2
update packages. These were apparently missed in the initial opensuse
cleaup.

After this lands we should end up with a largely empty volume.

Change-Id: Ic854fcecd1a0fabc388640a33da7e4e1f9ec07c0
2024-03-18 15:46:28 -07:00
Zuul 772cd8e2ad Merge "Stop mirroring CentOS 7 packages" 2024-03-18 16:21:22 +00:00
Zuul bbc8116886 Merge "Add backups for the new Keycloak server" 2024-03-18 16:21:16 +00:00
Clark Boylan 6df6c6507f Stop mirroring CentOS 7 packages
We have removed CentOS 7 from nodepool now we can stop mirroring
pacakges for it. This deletes official CentOS 7 package mirror content
and OBS packages mirrored by the OpenSUSE mirror script for CentOS 7.

A followup change will remove the OpenSUSE mirroring entirely as this
was the last thing it was used for.

Change-Id: I484651b0845eaab933e98106684e0a2a6215b3d7
2024-03-15 15:30:46 -07:00
Jeremy Stanley 68af2b31d4 Deduplicate Rackspace control plane API keys
The clouds.yaml and rackdns config files do not need to use two
different Ansible vars to refer to the same credentials. Note that
the forward DNS account is separate, and so we still keep those
intact.

Change-Id: I9dd657f357d32083f2cfd7f074ba0d122ca803c3
2024-03-12 19:17:09 +00:00
Jeremy Stanley 40dddea014 Clean up unused Rackspace password test values
These are no longer needed since we've switched to API keys.

Change-Id: I06aeef0d6ae5f70faab0147dfb591e8d9e53740e
2024-03-07 19:11:16 +00:00
James E. Blair cf73eda44f Switch rackspace clouds to api key auth
After this merges, the temporary credential set opendevci_rax_*
and opendevzuul_rax_* can be removed from hostvars.

Depends-On: https://review.opendev.org/911163
Change-Id: I2e9067aa2f11100d311c86beb4df5bf15c72db69
2024-03-07 09:05:12 -08:00
Jeremy Stanley 601e4a4a55 Transition to Rackspace API keys
Rackspace is requiring multi-factor authentication for all users
beginning 2024-03-26. Enabling MFA on our accounts will immediately
render password-based authentication inoperable for the API. In
preparation for this switch, add new cloud entries for the provider
which authenticate by API key so that we can test and move more
smoothly between the two while we work out any unanticipated kinks.

Change-Id: I787df458aa048ad80e246128085b252bb5888285
2024-03-05 19:31:09 +00:00
Clark Boylan 7ad66ad0cf Upgrade Refstack's MariaDB to 10.11
We are currently running MariaDB 10.4 for refstack. We use the
MARIADB_AUTO_UPGRADE flag to automatically upgrade the mariadb install
to 10.11 when switching the image version over to 10.11. This was
successfully performed against the lodgeit paste service.

Change-Id: I75262bc8eba3dd59d5869be9bf568fd66dc7f608
2024-03-04 13:27:20 -08:00
Zuul cc8011fe14 Merge "Remove debian buster package mirrors" 2024-02-29 17:18:48 +00:00
Zuul 3cb82cc7ac Merge "Upgrade the lodgeit mariadb to 10.11" 2024-02-28 20:05:13 +00:00
Alfredo Moralejo e70c0a0402 Exclude CentOS automotive SIGs repos from mirror synchronization
Those repos are produced by the Automotive SIG [1], are not used by
OpenStack and increase the size of the centos stream repositories
needlessly.

[1] https://sigs.centos.org/automotive/

Change-Id: I8a12956aa2079ce851ad0bb5ff60f49677f5b7d3
2024-02-26 13:42:18 +01:00
Clark Boylan 7136db339e Remove debian buster package mirrors
We have successfully removed debian buster from nodepool and zuul at
this point. The last major TODO in debian buster cleanup is to remove it
from our package mirrors. This change is the first step in making that
happen.

For step two we follow the manual process documented in our reprepro
docs [0] for cleaning up mirror components. We will need to perform
these actions against the debian, debian security, and ceph octopus
mirrors.

[0] https://docs.opendev.org/opendev/system-config/latest/reprepro.html#removing-components

Depends-On: https://review.opendev.org/c/openstack/project-config/+/910031
Change-Id: Ic1fc6a45cb7f644d7862312589254b6100e17222
2024-02-23 13:27:17 -08:00
Clark Boylan 8ec8ee66b7 Stop mirroring OpenSUSE Leap 15
This change updates the opensuse mirror script to stop mirroring
opensuse 15. However, we do not entirely remove the opensuse mirroring
script as it is currently mirring some centos 7 packages from OBS for
kolla. We will clean this up more fully when we remove centos 7.

Depends-On: https://review.opendev.org/c/openstack/project-config/+/909776
Change-Id: I0c3546b79219180b796ca02fa8d82dba2316878a
2024-02-21 09:45:21 -08:00
Clark Boylan 7526de2410 Upgrade the lodgeit mariadb to 10.11
I have tested this upgrade on a held node going straight from 10.4 to
10.11 in one go. The resulting logs can be found in this paste [0].

The resulting backups of system tables are small enough that it seems
reasonable to keep those enabled (though they can be disabled). Also, we
can either land this change and let docker-compose do the upgrade for
us, or we can put the host in the emergency file, do the upgrade by
hand, then merge this change to reflect the new state of the world.
One advantage to doing this by hand is that we can manually run a db
backup with the service turned off to avoid any lost data between the
time the upgrade occurs and the time of our last backup should anything
go wrong.

In either case we should probably double check that db backups look good
in borg before proceeding. Comments on approach are very much welcome.

[0] https://paste.opendev.org/show/bWhZZH97IMLv44eeiWlB/

Change-Id: I1bfcaeb9b90838a80d002732215f45a14a158fed
2024-02-20 14:25:42 -08:00
Zuul 380f64ce07 Merge "Update Zuul auth config for new Keycloak images" 2024-02-13 20:11:06 +00:00
Jeremy Stanley 9ca359a843 Increase Jaeger start timeout to 300
Our deployment tasks wait for Jaeger to be listening on its network
socket, but storage-related delays and slowdowns can sometimes cause
it to take longer than the 120 seconds we budgeted. Increase this to
300 seconds so we can be sure we've given it plenty of time to sort
that out.

Change-Id: I4eaffe2d00fca8b9c10ed9235583fca671413dab
2024-02-12 22:45:39 +00:00
Jeremy Stanley f1ad3c5198 Add backups for the new Keycloak server
We should really be backing this up before it begins to get used by
additional services. Also, since our newer deployment uses a
separate RDBMS, back that up safely.

Change-Id: I4510dd05204f4b0f450d1925ed7be148d7d73e6e
2024-02-09 17:35:02 +00:00
Jeremy Stanley 38e2a00a5b Update Zuul auth config for new Keycloak images
The newer Quarkus-based Keycloak container images no longer include
an "auth/" prefix to all the URL paths by default. Rather than alter
the Keycloak deployment, switch Zuul configuration to use the new
default instead.

Change-Id: I9f7f52e80c39c8bd41c728bf9e2b38dcece29978
2024-02-09 17:34:21 +00:00
Zuul 606229382f Merge "Upgrade to Keycloak 23.0" 2024-02-08 15:09:50 +00:00
Alfredo Moralejo 7586728faf Use centos hosted mirror to sync CentOS content
Instead of using external public mirrors, the CentOS team is allowing
openinfra to sync to the core repo mirrors which should provide a more
reliable source for the repos [1].

In case of issues with the repos, the official way to contact CentOS
team is via a ticket to the centos-infra tracker [2].

[1] https://pagure.io/centos-infra/issue/1354
[2] https://pagure.io/centos-infra/issues

Change-Id: Iec8c664a35157de4527c2b83723f3947af959756
2024-02-07 12:53:48 +01:00
Jeremy Stanley f477e35561 Upgrade to Keycloak 23.0
This includes a switch from the "legacy" style Wildfly-based image
to a new setup using Quarkus.

Because Keycloak maintainers consider H2 databases as a test/dev
only option, there are no good migration and upgrade paths short of
export/import data. Go ahead and change our deployment model to rely
on a proper RDBMS, run locally from a container on the same server.

Change-Id: I01f8045563e9f6db6168b92c5a868b8095c0d97b
2024-02-06 05:33:37 +00:00
Zuul af14ca1aba Merge "Increase gitea db connection limit" 2024-02-05 23:21:42 +00:00
Clark Boylan dbe477b205 Increase gitea db connection limit
By default our mariadb database for gitea nodes limits itself to a
maximum of 100 connections. We've seen errors like this:

 ...eb/routing/logger.go:102:func1() [I] router: completed POST /openstack/requirements/git-upload-pack for 127.0.0.1:50562, 500 Internal Server Error in 2.6ms @ context/user.go:17(web.gitHTTPRouters.UserAssignmentWeb)
 ...ules/context/repo.go:467:RepoAssignment() [E] GetUserByName: Error 1040: Too many connections

And after reading gitea's source code this appears to be related to user
lookups to determine if the user making a request against a repo owns
the repo. To do this gitea does a db request to lookup the user from the
request and when this hits the connection limit it bubbles up the mysql
error 1040: Too may connections error.

This problem seems infrequent so we double the limit to 200 which is
both much larger but still a reasonable number.

We also modify the test that checks for gitea server errors without an
http 500 return code to avoid it matching this change improperly. This
was happening because the commit message ends up in the rendered pages
for system-config in the test gitea.

Change-Id: If8c72ab277e88ae09a44a64a1571f94e43df23f8
2024-02-05 10:40:03 -08:00
yatinkarel 7ac488d8f4 [centos-stream] Exclude altimages from SIGs
This consist of some iso files which are not
used and rsync also fails[1] for these, let's exclude
these.

[1]
rsync: read error: Connection timed out (110)
rsync error: error in socket IO (code 10) at io.c(801) [receiver=3.1.3]

Change-Id: I3accc16ff8a1e71e499a09e7aae6625e3a183a12
2024-02-05 11:28:11 +05:30
Jeremy Stanley 2891745508 Revert "Switch from legacy to new style keycloak container"
The image change switches from Wildfly to Quarkus, which seems to
come with undocumented impact to H2 databases because Keycloak
maintainers consider that "for development purposes only" and not to
be used in production.

When reintroducing this change, we'll include an actual RDBMS in
order to ease future upgrade work.

Retain the added test that exercises the admin credentials and API,
but adjust it back to the path used by the legacy image.

This reverts commit fb47277a56.

Change-Id: I0908490cea852853f086e594a816343edaf6a454
2024-01-29 20:37:33 +00:00
Zuul d4c209e7a4 Merge "Switch from legacy to new style keycloak container" 2024-01-26 22:07:44 +00:00
Jeremy Stanley fb47277a56 Switch from legacy to new style keycloak container
When moving from DockerHub to Quay in 2022, we had to specify the
legacy container tag because something also changed with the images
themselves at that time in such a way that they no longer worked
with our configs. The legacy images ceased being updated past v19,
so specify the 19.0 tag in order to match the major version we're
running in production, and work through the necessary container
config changes before resuming upgrades to a more current version.

Change-Id: I5bf587fe3d8327c17d71908104c0896f8baf0973
2024-01-25 19:44:27 +00:00
Clark Boylan 7d13452ae9 Disable gitea's update checker cron job
This cron job is part of the extended cron job list which are supposed
to be disabled by default. This cron job is enabled by default and does
a fetch of some json data to determine if the current version of gitea
is the latest.

We don't need gitea phoning home to check if it is up to date because we
can (and some of us do) subscribe to github release notifications for
gitea over email. This is better as it isn't a poll that will be
ignored.

Change-Id: Icae1136fbb17a29996e2fa0dbea4b874ae4850dd
2024-01-08 10:28:42 -08:00
Clark Boylan 331ca64055 Enable gitea delete_repo_archives cron job
This enables an undocumented (yay) gitea cron job which will serve to
clean up all repo archives once a week on Sunday at 03:00. The
purpose of this cronjob is that the default cronjob which clears out
older repo archives hasn't kept up and we ran out of disk on gitea09. By
running this cronjob weekly we'll hopefully keep disk usage in check.

Note we run this at 03:00 because gitea runs a number of cron jobs daily
at midnight. This offset of a few hours should help avoid conflicts
between cron jobs. A held node confirms that running this cronjob at
midnight results in failures due to conflicts with the cleanup of old
archives cron job.

Change-Id: Ib4085a4df220f9c312592e299e7274635434e761
2024-01-08 09:32:40 -08:00
Zuul 8734fa7c6e Merge "Add hints to borg backup error logging" 2024-01-05 02:28:14 +00:00
Zuul cd74ebc2bf Merge "Upgrade to etherpad 1.9.5" 2024-01-04 06:55:08 +00:00
Clark Boylan a0089cfac6 Upgrade to etherpad 1.9.5
This bumps etherpad to 1.9.5. The changelog is minimal for this update,
but upstream switches to nodejs 20 by default so we make the same update
here. We also remove TidyHTML configs from our configs to match upstream
updates that did the same thing. Complete release notes can be found
here:

  https://github.com/ether/etherpad-lite/blob/v1.9.5/CHANGELOG.md

We should hold a node and test functionality before merging this change.

Change-Id: Ib6cd888f35624490f630e091f184946e9c4e48aa
2024-01-02 08:41:39 -08:00
Zuul f7217bed63 Merge "Temporarily pin Grafana to 10.2.2" 2024-01-02 08:02:46 +00:00
Jeremy Stanley 5921ce487b Temporarily pin Grafana to 10.2.2
Switch from the "latest" grafana-oss image to 10.2.2 for now due to
provider-specific Nodepool dashboards coming up "no data" after
upgrading. We can revert this once the cause has been addressed.

Change-Id: Ic13b12212e6063231de5f993fe01ca9e641555f7
2023-12-31 19:15:56 +00:00
Ian Wienand 86dfc9ee25 graphite: add grafana header to CORS allowed list
Honestly we're not 100% sure where this CORS setup came from, it's
always been like this.  grafana recently added a header it's sending
to graphite [1] (found by frickler) which we need to white-list so it
works with modern grafana.

[1] 1a281ac49d (diff-70030faa245250908d55db47258ce505d474db1559995683f8df1a951504236fR24)

Change-Id: I21570bd1a350e430bc04a14ad9d2cb1bf652d021
2023-12-31 14:28:49 +00:00
Jeremy Stanley f4a20b0502 Downgrade haproxy image from latest to lts
Starting with the automated update to the haproxy 2.9.1 image at
04:00 today, we noticed the service immediately spiking up to 100%
CPU and quickly filling its session table. Downgrading from the
latest tag to lts (currently 2.8.5) appears to have solved it for
now. This might be https://github.com/haproxy/haproxy/issues/2393 .

Change-Id: I3085e7921f43665118678a660d777601f08debd3
2023-12-20 13:41:53 +00:00
Clark Boylan af653f3371 Pin py docker when installing docker-compose
Py docker 7.0.0 introduced an incompatibility with old python
docker-compose. Pin it to the older version to ensure compatibility.

Notes on the change can be found here:
  https://stackoverflow.com/questions/77641240/getting-docker-compose-typeerror-kwargs-from-env-got-an-unexpected-keyword-ar

And you can see our jobs installing the wrong version here and then
failing later:
  https://zuul.opendev.org/t/openstack/build/6575c9d7fb56463fa6e97b2a12d6f389/log/job-output.txt#16116-16117

Change-Id: I7ecf7c43b762855881c137e218adcc51e3a32444
2023-12-15 09:33:28 -08:00
Clark Boylan dda97505d8 Add hints to borg backup error logging
I spent a number of hours recently trying to debug why mysqldump stopped
working. In reality it appears that the problem was borg-backup's ssh
connectivity causing the mysql-dump to not be able to stream properly.
Add a logging note about checking the other half of backup streaming
when the streaming script fails to try and make it more obvious this may
ocurr.

Change-Id: Idf53fc9e61f05077b954d730d88beab0cc1db09b
2023-12-11 08:42:04 -08:00
Clark Boylan 4d25261bb6 Force borg backups to run over ipv4
We've recently been unable to backup from gitea09 to the vexxhost backup
server. Testing indicates that ipv6 connectivity between the two servers
is the likely issue. Address this by forcing all backups to run over
ipv4 instead of ipv6. We could restrict this to only gitea09 if we
wanted to and/or only when the vexxhost server is the target, but this
is the simplest way to make the change in the existing configuration
management.

Change-Id: Ic868ded7d923b822d757a57416f879fd59c003e9
2023-12-11 08:32:14 -08:00
Zuul 6ba06de8e4 Merge "Add gerrit 3.8 to 3.9 upgrade testing" 2023-12-07 17:59:08 +00:00
Zuul 21fd65e03f Merge "Add gerrit 3.9 image builds" 2023-12-07 17:59:06 +00:00
Zuul c6b985eb7a Merge "Reapply "Switch Gerrit replication to a larger RSA key"" 2023-12-06 19:19:43 +00:00
Clark Boylan 70589a5a05 Reapply "Switch Gerrit replication to a larger RSA key"
This reverts commit d346d5375f.

We make small edits to the .ssh/config file to make MINA ssh client
happy. In particular we need to use the path to the ssh key within the
Gerrit container and not on the host side.

This exact .ssh/config file has been tested on held nodes that appears
to properly replication from a test gerrit99 to a test gitea99 after
adding the pubkey to gerrit and accepting the hostkey for gitea on the
gerrit side.

Change-Id: I41caac08f6713ad385c98eea46fb004a414fab5d
2023-12-06 09:02:17 -08:00
Jeremy Stanley b2c9c2f1e8 Increase jaeger startup timeout
Recently, jaeger has started taking around 80 seconds to listen on
its socket, resulting in deployment job failures. Double the timeout
to 120 seconds.

Change-Id: I1c53ba1a9282309d3f1f772221a5bff69f04d134
2023-12-06 16:42:50 +00:00
Jeremy Stanley 9876445f0a Switch install-docker playbook to include_tasks
The old include directive is deprecated in ansible-core 2.15 and
will cease to be recognized in 2.16 according to
https://docs.ansible.com/ansible-core/2.15/user_guide/playbooks_reuse_includes.html
so switch the two remaining uses of it in this repository to
include-tasks instead.

Change-Id: I427e8aa8dd789f13b8501806ec175951db337fec
2023-12-06 13:55:50 +00:00