We've noticed that our uwsgi queues are filling up and a lot of requests
are being made to robots.txt which ends up 500/503 erroring. Add a
robots.txt file which allows crawling of our lists and archives with a
delay value in hopes this will cause bots to cache results and not fill
up the queue with repetetive requests.
Change-Id: I660d8d43f6b2d96663212d93ec48e67d86e9e761
Ansible Galaxy appears to be served behind cloudflare and is currently
throwing 504 errors waiting for the backend to respond on /api/
requests. Since we know this is already not working and this is
preventing other changes from landing lets just go ahead and completely
disable the testing of this proxy.
We can always add the test back if and when effort is made to proxy the
new version of galaxy's api.
Change-Id: Iad35fc19ce4f8bb8ec3c57eb995b44fc5e61a06c
This should cleanup our mirror update server so that we no longer have
configes (cron, scripts, logrotate rules, etc) for mirroring opensuse.
It won't clean up the afs volume, but we can get to that later (and it
will probably require manual intervention). This cleanup is done in a
way that it should be able to be applied to future cleanups too (like
when centos 8 stream goes away and everything is centos stream
specific).
Change-Id: Ib5d15ce800ff0620187345e1cfec0b7b5d65bee5
We are currently running MariaDB 10.4 for etherpad. We use the
MARIADB_AUTO_UPGRADE flag to automatically upgrade the mariadb install
to 10.11 when switching the image version over to 10.11. This was
successfully performed against the lodgeit paste service.
Change-Id: Id7dae260f3611fc1f88858730567455fef782b1c
Trivial cleanup of some variable name copy-paste I overlooked,
making the source code for the test clearer.
Change-Id: I5a15e0733b3cf2ceb26f46a2f3d9a9f059d4f702
This includes a switch from the "legacy" style Wildfly-based image
to a new setup using Quarkus.
Because Keycloak maintainers consider H2 databases as a test/dev
only option, there are no good migration and upgrade paths short of
export/import data. Go ahead and change our deployment model to rely
on a proper RDBMS, run locally from a container on the same server.
Change-Id: I01f8045563e9f6db6168b92c5a868b8095c0d97b
By default our mariadb database for gitea nodes limits itself to a
maximum of 100 connections. We've seen errors like this:
...eb/routing/logger.go:102:func1() [I] router: completed POST /openstack/requirements/git-upload-pack for 127.0.0.1:50562, 500 Internal Server Error in 2.6ms @ context/user.go:17(web.gitHTTPRouters.UserAssignmentWeb)
...ules/context/repo.go:467:RepoAssignment() [E] GetUserByName: Error 1040: Too many connections
And after reading gitea's source code this appears to be related to user
lookups to determine if the user making a request against a repo owns
the repo. To do this gitea does a db request to lookup the user from the
request and when this hits the connection limit it bubbles up the mysql
error 1040: Too may connections error.
This problem seems infrequent so we double the limit to 200 which is
both much larger but still a reasonable number.
We also modify the test that checks for gitea server errors without an
http 500 return code to avoid it matching this change improperly. This
was happening because the commit message ends up in the rendered pages
for system-config in the test gitea.
Change-Id: If8c72ab277e88ae09a44a64a1571f94e43df23f8
The image change switches from Wildfly to Quarkus, which seems to
come with undocumented impact to H2 databases because Keycloak
maintainers consider that "for development purposes only" and not to
be used in production.
When reintroducing this change, we'll include an actual RDBMS in
order to ease future upgrade work.
Retain the added test that exercises the admin credentials and API,
but adjust it back to the path used by the legacy image.
This reverts commit fb47277a56.
Change-Id: I0908490cea852853f086e594a816343edaf6a454
When moving from DockerHub to Quay in 2022, we had to specify the
legacy container tag because something also changed with the images
themselves at that time in such a way that they no longer worked
with our configs. The legacy images ceased being updated past v19,
so specify the 19.0 tag in order to match the major version we're
running in production, and work through the necessary container
config changes before resuming upgrades to a more current version.
Change-Id: I5bf587fe3d8327c17d71908104c0896f8baf0973
When manually testing the gitea 1.21.3 upgrade tonyb discovered "500"
errors on the code search page. The http side reported all 2XX response
codes but the page rendered a giant 500. Turns out the problem was in
template rendering which produces the giant 500 in the page but doesn't
necessarily send a 500 http error code.
Test for this automatically on a number of pages by inspecting the page
content for 500 status page content.
Note this is somewhat fragile because they could change the template
content at any time, but it seems better to do this than do nothing at
all.
Change-Id: I1964be7be87ef5a6e75c6639a4d75d9090a14db8
Upgrade Gitea to 1.21.3. The changelogs for this release can be found
here:
https://github.com/go-gitea/gitea/blob/v1.21.3/CHANGELOG.md
I have attempted to collect the interesting bits in this commit message
as well as information on why we do or don't make changes to address
these items.
1.21.0
* BREAKING
* Restrict certificate type for builtin SSH server (https://github.com/go-gitea/gitea/pull/26789)
* We don't use the builtin SSH server and don't use certificates
for auth. Nothing to do here.
* Refactor to use urfave/cli/v2 (https://github.com/go-gitea/gitea/pull/25959)
* The major change here updated `gitea` to stop accepting
`gitea web`'s command options. Our dockerfile is set up to use
`CMD ["/usr/local/bin/gitea", "web"]` so we are not affected.
* Move public asset files to the proper directory (https://github.com/go-gitea/gitea/pull/25907)
* We update the testinfra test for robots.txt to more robustly
check file contents. Previously it checked a very generic
prefix which may indicate a generic file being served.
* We move custom/public/img into custom/public/assets/img.
Screenshots should be used to confirm this works as expected.
* Remove commit status running and warning to align GitHub (https://github.com/go-gitea/gitea/pull/25839)
(partially reverted: Restore warning commit status (https://github.com/go-gitea/gitea/pull/27504) (https://github.com/go-gitea/gitea/pull/27529))
* We don't rely on commit statuses as this is a read only replica
of Gerrit.
* Remove "CHARSET" config option for MySQL, always use "utf8mb4" (https://github.com/go-gitea/gitea/pull/25413)
* We don't set [database].CHARSET. Doesn't affect us.
* Set SSH_AUTHORIZED_KEYS_BACKUP to false (https://github.com/go-gitea/gitea/pull/25412)
* We don't set this value explicitly so the default will flip from
true to false for us. I don't think this is an issue because we
keep track of our pubkeys in git.
* SECURITY
* Dont leak private users via extensions (https://github.com/go-gitea/gitea/pull/28023) (https://github.com/go-gitea/gitea/pull/28029)
* We don't use private users.
* Expanded minimum RSA Keylength to 3072 (https://github.com/go-gitea/gitea/pull/26604)
* We have rotated keys used to replicate from gerrit to gitea to
work around this. Now are keys are long enough to make gitea
happy.
* BUILD
* Dockerfile small refactor (https://github.com/go-gitea/gitea/pull/27757) (https://github.com/go-gitea/gitea/pull/27826)
* I've updated our Dockerfile to mimic these changes. Comment
whitespace as well as how things are copied and chmoded in the
build image have been updated.
* TODO the file copies aren't working for us. I think due to how we
ultimately clone the git repo. We use RUN but upstream is using
COPY against the local build dir. I've aligned as best as I can,
but we should see if we can do a similar COPY on our end.
* Fix build errors on BSD (in BSDMakefile) (#27594) (#27608)
* We don't run on BSD.
* Fully replace drone with actions (#27556) (#27575)
* This is how upstream builds their images. Doesn't affect our
builds.
* Enable markdownlint no-duplicate-header (#27500) (#27506)
* Build time linters are somethign we don't care too much about on
our end.
* Enable production source maps for index.js, fix CSS sourcemaps (https://github.com/go-gitea/gitea/pull/27291) (https://github.com/go-gitea/gitea/pull/27295)
* This emits a source map for index.js which can be used for in
browser debugging. Don't think this is anything we need to take
action on.
* Update snap package (#27021)
* We don't use a snap package.
* Bump go to 1.21 (https://github.com/go-gitea/gitea/pull/26608)
* Our go version is updated in the Dockerfile.
* Bump xgo to go-1.21.x and node to 20 in release-version (https://github.com/go-gitea/gitea/pull/26589)
* Our node version is updated in the Dockerfile.
* Add template linting via djlint (#25212)
* Build time linters are somethign we don't care too much about on
our end.
1.21.1
* SECURITY
* Fix comment permissions (https://github.com/go-gitea/gitea/pull/28213) (https://github.com/go-gitea/gitea/pull/28216)
* This affects disclosure of private repo content. We don't have
private repos so shouldn't be affected.
1.21.2
* SECURITY
* Rebuild with recently released golang version
* We'll automatically rebuild with newer golang too.
* Fix missing check (https://github.com/go-gitea/gitea/pull/28406) (https://github.com/go-gitea/gitea/pull/28411)
* There is minimal info here but it appears to be related to
issues. We don't use issues so shouldn't affect us.
* Do some missing checks (https://github.com/go-gitea/gitea/pull/28423) (https://github.com/go-gitea/gitea/pull/28432)
* There is minimal info here but it appears to be related to
checks around private repos. We don't use private repos so this
shouldn't affect us.
1.21.3
* SECURITY
* Update golang.org/x/crypto (https://github.com/go-gitea/gitea/pull/28519)
* This addresses recent concerns found in ssh for gitea's built in
ssh implementation. We use openssh as provided by debian so will
rely on our distro to provide fixes.
Finally 1.21.x broke rendering of code search templates. The issue is
here: https://github.com/go-gitea/gitea/issues/28607. To address this
I've vendored the two fixed template files
(https://github.com/go-gitea/gitea/pull/28576/files)into our custom
template dirs. Once upstream makes a release with these fixes we can
drop the custom files entirely as we don't override anything special in
them.
Change-Id: Id714826a9bc7682403afcf90f2761db8c84eacbf
With the haproxy 2.9.x release we saw that once we hit maxconns on the
front end we stopped accepting new requests. Add a testinfra test to
ensure that we can process more than maxconn limit requests.
Change-Id: Iad70ad8c1d511eb8875ea638d010868c01576426
At some stage in the last 24 hours it looks like galaxy.ansible.com
changed and the current canary we look for, "Ansible NG", is no longer
present in the returned HTML:
$ curl -s https://galaxy.ansible.com/ ; echo
<!doctype html><html lang="en-US"><head><meta charset="UTF-8"><title>Ansible Galaxy</title><link rel="icon" href="/favicon.ico"><script defer="defer" src="/js/App.9bfa0e2736606eaddfe9.js"></script><link href="/css/App.4aff5598f0220c63b019.css" rel="stylesheet"></head><body><div id="root"></div></body></html>
$ curl -s https://mirror01.dfw.rax.opendev.org:4448/ ; echo
<!doctype html><html lang="en-US"><head><meta charset="UTF-8"><title>Ansible Galaxy</title><link rel="icon" href="/favicon.ico"><script defer="defer" src="/js/App.9bfa0e2736606eaddfe9.js"></script><link href="/css/App.4aff5598f0220c63b019.css" rel="stylesheet"></head><body><div id="root"></div></body></html>
The api however still contains "galaxy_ng_version":
$ curl -s https://galaxy.ansible.com/api/ | jq '.galaxy_ng_version'
"4.10.0dev"
Update testinfra to match the current HTML.
Change-Id: I55431311ef742efdd4aa4304692e5096e1bb2895
This project didn't proceed past the test phase,
let's clean it up.
Revert "Add a functional test for registry.zuul-ci.org"
This reverts commit e701fdd3ca.
Revert "Add testinfra for registry.zuul-ci.org"
This reverts commit e00f4e59b3.
Revert "Add static site for registry.zuul-ci.org"
This reverts commit 31b505d3ba.
Revert "Add SSL cert for registry.zuul-ci.org"
This reverts commit d0a8473d42.
Change-Id: I1d39306187c7b2d7a908389f88d1a60e1b29ffe3
This adds a testinfra test that creates a pad and retrieves its
contents. Additionally adding 4 byte utf 8 characters to the stream to
ensure the database is configured properly.
Change-Id: Ie6855e201631ecf4fde6cda0c65941f094ed55d4
The curl manpage explains that port isn't optional:
--resolve <[+]host:port:addr[,addr]...>
Provide a custom address for a specific host and port pair. Us‐
ing this, you can make the curl requests(s) use a specified ad‐
dress and prevent the otherwise normally resolved address to be
used. Consider it a sort of /etc/hosts alternative provided on
the command line. The port number should be the number used for
the specific protocol the host will be used for. It means you
need several entries if you want to provide address for the same
host but different ports.
Change-Id: I40117768bbc149678a69905a8f6ecd3519301ce1
Clean up references to lists.openstack.org other than as a virtual
host on the new lists01.opendev.org Mailman v3 server. Update a few
stale references to the old openstack-infra mailing list (and
accompanying stale references to the OpenStack Foundation and
OpenStack Infra team). Update our mailing list service documentation
to reflect the new system rather than the old one. Once this change
merges, we can create an archival image of the old server and delete
it (as well as removing it from our emergency skip list for
Ansible).
Side note, the lists.openstack.org server will be 11.5 years old on
November 1, created 2012-05-01 21:14:53 UTC. Farewell, old friend!
Change-Id: I54eddbaaddc7c88bdea8a1dbc88f27108c223239
Previously we were checking that myid was written to disk in the
expected location. However, it is possible that zk would stop looking in
that location for the myid value. To ensure we actually set the value in
the running service check the logs for the [myid:4] string.
Change-Id: Iee3b126abac13e19dab9ddf4c64ed133d0a98956
Previously we checked that "Ansible Galaxy" shows up in the html result
requesting the root of the Galaxy proxy. This now fails and looking at
the results of the fetch the title of the page is "Galaxy NG". Update
our test to check for "Galaxy NG" instead.
Additionally our content checks of actual collections are affected by an
api bump from v2 to v3. Among other things this appears to be a
completely new implementation that does not have backward compatible
support for v2 and may require authentication to use. I've commented out
our old test content for the content checks and someone will need to fix
this later.
Change-Id: I6b17eea82ac95200ba5069de74e9a7dc30d6fed8
This uncomments the list additions for the lists.openinfra.dev and
lists.starlingx.io sites on the new mailman server, removing the
configuration for them from the lists.openstack.org server and also
cleaning up some benign entries which were missed in the previous
migration change. With this, the old server should only be hosting
specifically lists.openstack.org mailing lists.
Change-Id: I1e2d332cd4addb8970a3759157bbeceddd77ea95
This uncomments the list additions for the lists.airshipit.org and
lists.katacontainers.io sites on the new mailman server, removing
the configuration for them from the lists.opendev.org server and, in
the case of the latter, removing all our configuration management
for the server as it was the only site hosted there.
Change-Id: Ic1c735469583e922313797f709182f960e691efc
This adds a second registry host. We will remove the other once we've
cut over successfully (should just depend on a DNS update).
Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/886874
Change-Id: Ib6be5ef242ed038c23e0007488f2c21ce10f4fcb
The testinfra_hosts didn't match any hosts under test so the
host.is_listening() check was never run. Skip that test for now, but
add a new test to verify that docker-compose has at least defined the
registry container and is restarting it to keep it "Up"
Change-Id: Ic8d3c7833dd0924fc8a7eb4cbd776cf488d0f928
zp01 is older and is in the list of servers to replace. Replace it with
the newly launched jammy zp02 server.
Change-Id: Ic01433e96328f5775f73a97cdbf2ae2a07c1a6fa
Switch the DNS testing names to "99" which helps disambiguate testing
from production, and makes you think harder about ensuring references
are abstracted properly.
The LE zone gets installed on the hidden primary, so it should just
use the inventory_hostname rather than hard-coding. Instead of
hard-coding the secondaries, we grab them from the secondary DNS
group. This should allow us to start up replacement DNS servers which
will be inactive until they are enabled for the domain.
This requires an update to the LE job, as it currently doesn't have a
secondary nameserver as part of the nodes. This means the
"adns-secondary" group is blank there. Even though this node isn't
doing anything, I think it's worth adding to cover this path (I did
consider some sort of dummy host add type thing, but that just makes
things hard to follow). We also use the 99 suffix in that job just
for consistency.
Change-Id: I1a4be41b70180deab51a3cc8a2b3e83ffd0ff1dc
This switches us to running the services against the etherpad group. We
also define vars in a group_vars file rather than a host specific
file. This allows us to switch testing over to etherpad99 to decouple it
from our production hostnames.
A followup change will add a new etherpad production server that will be
deployed alongside the existing one. This refactor makes that a bit
simpler.
Change-Id: I838ad31eb74a3abfd02bbfa77c9c2d007d57a3d4
This tests some canary URLs for the registry. Full functional
testing is handled by a test playbook.
Change-Id: I9a19709df6711e5f4dea21906608c34ac3e8a2b4
The mirror in our Limestone Networks donor environment is now
unreachable, but we ceased using this region years ago due to
persistent networking trouble and the admin hasn't been around for
roughly as long, so it's probably time to go ahead and say goodbye
to it.
Change-Id: Ibad440a3e9e5c210c70c14a34bcfec1fb24e07ce
I noticed on our hosts some logrotate files named '*.1234.conf' --
these are coming from callers of logrotate role specifying
'/var/log/program/*.log', where the '*' is turning into a literal
filename. I didn't really consider this case.
Having a file-name starting with '*' may technically be fine, but is a
bad idea for everyone's sanity and it's potential to foot-gun some
sort of operation that suddenly wipes out a lot more than you wanted
to.
Let's just use the hash of the name to be unambiguous and still
idempotent. Make it more git-ish by using the same 7 digits as a
default short-hash.
Change-Id: I13d376f85a25a7b8c3a0bc0dcbabd916e8a9774a
The mix of <Location> and ProxyPass [path] <target> lead to some issue.
This patch corrects them and makes the config more consistent.
Until now, the last URI was actually an error page from the main galaxy
website. With this change, we now hit the S3 bucket as we should,
allowing ansible-galaxy to download the archive, validate its checksum,
and install the intended collection/role.
This patch was fully tested locally using the httpd container image, a
minimal configuration adding only the needed modules and the
ansible-galaxy vhost/proxy, and running ansible-galaxy directly.
In addition, this patch also makes a better testing of the proxy, using
cURL until we actually download the file.
Since ansible galaxy will provide a file under any condition, we also
assert the downloaded file is really what it should be - a plain
archive. If it's a miss on S3 side, it would be a JSON. And if we get an
ansible galaxy answer, that would be an HTML file.
The following commands were used:
Run the container:
podman run --rm --security-opt label=disable \
-v ./httpd.conf:/usr/local/apache2/conf/httpd.conf:ro \
-p 8080:8080 httpd:2.4
Run ansible-galaxy while ensuring we don't rely on its own internal
cache:
rm -rf operator ~/.ansible/galaxy_cache/api.json
ansible-galaxy collection download -vvvvvvv \
-s http://localhost:8080/ -p ./operator tripleo.operator
Then, the following URI were shown in the ansible-galaxy log:
http://localhost:8080/http://localhost:8080/apihttp://localhost:8080/api/v2/collections/tripleo/operator/http://localhost:8080/api/v2/collections/tripleo/operator/versions/?page_size=100http://localhost:8080/api/v2/collections/tripleo/operator/versions/0.9.0/
Then, the actual download:
http://localhost:8080/download/tripleo-operator-0.9.0.tar.gz
Then the checksum validation, and eventually it ended with:
Collection 'tripleo.operator:0.9.0' was downloaded successfully
Change-Id: Ibfe846b59bf987df3f533802cb329e15ce83500b
Take the site configuration for lists.opendev.org and
lists.zuul-ci.org off of the old lists.openstack.org server, and
also clean up tests of the same.
Change-Id: Ic6095889c29d8a34def113204052300558f0a77c
ansible-galaxy CLI makes multiple calls to the remote server, with
various API endpoint, and expects JSON containing fully qualified URI
(scheme://host/path), meaning we must inspect the different files and
ensure we're rewriting the content so that it points to the proxy all
the time.
Also, the remote galaxy.ansible.com has some redirects with absolute
paths, breaking for some reason the ProxyPassReverse - this is why we
get yet a new pair of dedicated ports for this proxy (TLS/non-TLS).
Then, there's the protocol issue: since mod_substitute is apparently
unable to take httpd variables such as the REQUEST_SCHEME, we have to
use some If statement in order to ensure we're passing the correct
scheme, being http or https. Note that ansible-galaxy doesn't understand
the "//host/path".
This patch also adds some more tests in order to ensure the API answers
as expected through the proxy.
Change-Id: Icf6f5c83554b51854fabde6e4cc2d646d120c0e9
Currently "openstack" command on bridge doesn't work, because we need
cinder client pinned to an older version for RAX support. The
upstream container uses the latest versions of everything and it fails
to parse the "volume_api_version: 2" pin for RAX in the config file.
In general, the version of openstackclient we can probably most likely
rely on to work is the one from the launch-node virtualenv. It also
means we just have one place to manage a broadly-compatible version,
instead of trying to manage versions in separate containers, etc.
This converts the /usr/local/bin/openstack command from calling into
the container, to calling into the launch venv.
Change-Id: I604d5c17268a8219d51d432ba21feeb2e752a693
These images have a number of issues we've identified and worked
around. The current iteration of this change is essentially
identical to upstream but with a minor tweak to allow the latest
mailman version, and adjusts the paths for hyperkitty and postorius
URLs to match those in the upstream mailman-web codebase, but
doesn't try to address the other items. However, we should consider
moving our fixes from ansible into the docker images where possible
and upstream those updates.
Unfortunately upstream hasn't been super responsive so far hence this
fork. For tracking purposes here are the issues/PRs we've already filed
upstream:
https://github.com/maxking/docker-mailman/pull/552https://github.com/maxking/docker-mailman/issues/548https://github.com/maxking/docker-mailman/issues/549https://github.com/maxking/docker-mailman/issues/550
Change-Id: I3314037d46c2ef2086a06dea0321d9f8cdd35c73
This turns launch-node into an installable package. This is not meant
for distribution, we just encapsulate the installation in a virtualenv
on the bastion host. Small updates to documentation and simple
testing are added (also remove some spaces to make test_bridge.py
consistent).
Change-Id: Ibcb4774114d73600753ca155ed277d775964bc79