This reverts commit 9d25a857da.
We revert this as our testing shows the MFA updates to our regular
accounts don't seem to affect swift uploads currently. That may change
when MFA is required across the board on the 26th but we can reapply
this sort of disablement and testing then and in the meantime continue
to have two providers of log upload locations.
Change-Id: I9f24f67253934a6a128a6cee3cceb9c1f0bcdf37
Rax is requiring everyone to use multi factor auth by March 26, 2024.
We're currently transitioning to MFA early to control when and how it
happens. One benefit of doing this early is we can pull rax out of the
log upload destination list to enable us to test it still works after
the MFA switch.
Do this by removing Rax from the prod uploads and removing ovh from the
test uploads so that only rax is used in base-test. After we update to
MFA we can check that uploads for base-test still work then revert this
change.
Change-Id: I8dafb5ea7ad6b10989ca6258c3f56bc8b91d0e06
This restores our standard log upload config now that the outage
in ovh is resolved.
This reverts commit babd74eb4f.
This reverts commit 7e033243f3.
Change-Id: Ib308489085a6ad2ace501e4ae078606634eda50a
This resets the log upload targets to our full list of options. This
should only be landed after the previous commit has beened used to
confirm all is well in rax iad and dfw again.
Change-Id: Icc22521b812bfb10b158c5e1e07665f3e45aaa4e
These were disabled due to service issues. Independent testing seems to
indicate that things are happy again. Set up base-test to upload only to
these to regions to confirm. We'll reset base and base-test jobs to the
full set of upload targets if this goes well.
Change-Id: I90141b8f666101fabbcc0b820b370fe20af73632
We've seen errors with uploads to rax iad and rax dfw resulting in jobs
reporting POST_FAILURE with no logs uploaded. Testing container listings
against iad and dfw results in 500 errors but the same requests succeed
against ord so we leave it in place.
Based on the 500 errors received in testing above we believe this issue
is not on our end (client/sdk updates for example) but rather on the
server/service side and we need to disable the endpoints and ride it
out.
Change-Id: Ic38fb681c1de84bdb96edb1ecaa7c632a5bec706
Since run-buildset-registry depends on the container_command,
but buildset-registry supports only Docker, we need to enforce it.
Change-Id: I8966251030dcb3342befa727b2cc6e20b7229b11
This will ensure that properly defined container images are created in
quay registries for us. You need to do this out of band of the docker
push if you want the image repos to be public. The ensure-quay-repo role
should ignore images that don't have the correct metadata making this
safe for all container base jobs.
Depends-On: https://review.opendev.org/c/zuul/zuul-jobs/+/881521
Change-Id: Ic358f5e2f44c2a1e02140f8c848fe352214ba65a
This is in the zuul-jobs pre-playbook, but we don't actually inherit
from those jobs so we need to duplicate it.
Change-Id: I875df74936736b80dbb2f29bbb474b993f4616ea
These are the analogs of the opendev-build-docker-image jobs,
using the newer container roles.
Change-Id: Ifec8fd7db3b238536b396a9012bdf93d0d19547e
Depends-On: https://review.opendev.org/c/zuul/zuul-jobs/+/878291
This reverts commit 5ce784d816.
This reverts commit 5497c3aa3a.
Revert the two chagnes that were used to disable rax and then force rax
under base-test for testing. The testing performed after that change
landed seems to indicate things are working.
Change-Id: Ibc3e71399205895d1508786f1eb40cb13d44817a
We recently disabled rax swift log uploads due to errors. Force all
uploads under the base-test job to go to rax so that we can test this
more now that the immediate fire is contained.
Change-Id: I7cb8b312356fbaf0d8b4db02b6cc9363f3b13c6f
We are seeing failures to these regions. Disable them until we can debug
further to avoid unnecessary job failures.
Change-Id: If47636adf08279f8c691c3e9b6351b08067f3191
These regions are returning 503s for file retrievals and some jobs are
failing to upload logs there. Disable until it stabilizes.
Change-Id: Ic5d75b95bf8e3c71025c7297644e7fb3ed2fd9b3
This adds a copy of the tox-docs related jobs but using nox instead.
Depends-On: https://review.opendev.org/868134
Change-Id: I445202f366c748191fe6a05e145c05cbad1bb8f5
This reverts commit 85e1ff20ea.
The incident [0] has been marked as resolved by our provider. We should
be good to return to our full set of swift backends for log storage.
[0] https://public-cloud.status-ovhcloud.com/incidents/by8279p6sdjd
Change-Id: I46d5ae367412081808c22f6b2626fbb83fe2e34c
There is an incident producing errors with ovh swift object storage [0].
Disable these regions until that incident is resolved.
Note we disable rax on the test job so that we can easily test things
are functional once this incident is resolved. Reverting this change
will reenable all swift endpoints in base and base-test jobs.
[0] https://public-cloud.status-ovhcloud.com/incidents/by8279p6sdjd
Change-Id: I8f0655f95308a31881680d1b0c25ed6af8f54fb7
This is similar in purpose to
I137ab824b9a09ccb067b8d5f0bb2896192291883 to separate out where we are
talking to the bastion host from the executor, versus the nested
ansible CD run.
Add the host in the "prod_bastion" group, and switch the source setup
playbook to use "prod_bastion[0]". This reduces the number of places
you have to update the bridge name when you change the host.
Change-Id: I66df4057b3990eed2230d894ff42d0a425a2381a
Currently we reset trees to master in two places; here and in
sync-project-config (Ib999731fe132b1e9f197e51d74066fa75cb6c69b). This
is a bit confusing, and requires delegating tasks to the bridge node
which isn't great. Also, as we think about trying to make jobs run in
parallel it's another place to get things wrong.
This merges the update into one place.
Change-Id: I6ffeb6e6562fb34db89f4e475da27b60e30f6fe7
This puts the dynamically added bridge node in the "bastion" group.
This way the production jobs can refer to the generic group name, and
be abstracted from the actual hostname.
Change-Id: Ie35f3f003f21472be2ca87ab962141d17fc2a7b6
Similar to Ie0a0d8f4ae137dc12f4c13f901096ee39d9a088e in system-config;
fix the typo on this variable name.
Change-Id: I579af80831ec6c317aa4c03d68a1e1934c2fe16c
This stops the bridge trying to write out console streaming files that
will never be read, because we don't allow connections to the
streaming port. c.f. Ifbb5b8acb1f231812905cf9643bfec6fbbd08324
Change-Id: I82f194631c2a6d4ed2e46e057a609e5d68ffd2dc
This will enable us to test changes to test-prepare-workspace-git and
ultimately prepare-workspace-git.
Change-Id: Ic6badd58a7021595508cad0d3ecb9c7d80780858
This uses the new argument provided in the dependent change to enable
the extras-common repo for 9-stream. Since this is already running
with the default arguments, it should be low-risk to change them here
and only affect CentOS 9-stream.
Change-Id: I185657987fd1b454db683bd1329a985940014750
In doing some work on the Zuul console, I noticed that none of these
have descriptive names. That looks a bit ugly in the console where
you just get a generic "Play: all". Give them some names as a clue to
what's going on.
Change-Id: I40f2592a316bb8293f91d90be3996a6c697de196
The Keystone API endpoint at identity.api.rackspacecloud.com 443/tcp
is currently serving an X.509 cert with an expiration of
2022-07-28 12:00:00 UTC (roughly 1.5 hours ago), so logs can't be
uploaded to their swift by our base job, resulting in widespread
POST_FAILURE results. Remove them from the round-robin destination
list for now, and we can revert this once the cert has been renewed.
Change-Id: Icfc593196a1176cb41657c277f80cb01cf2eb654
Select the provider in a seperate task so that the provider name
can then be included in the task name for the upload. This will
enable the provider to be seen in the job console even if the upload
subseqently fails.
In addition, the upload role can now be called more than once in a
future patch, keeping the provider constant between invocations.
base-test results https://review.opendev.org/c/zuul/zuul-jobs/+/848880
Change-Id: Ie69cbfaebfbe80ad9ce7de789c12b5db7cb6e0c2
We've suddenly started getting errors from OVH's Swift endpoints
saying "Payment Required: Access was denied for financial reasons."
Stop uploading new logs here since this may also be causing
POST_FAILURE results for builds.
Change-Id: I4928ed439a34484ac73a4162d6ab09e5d11de106
Make the variable different from 'item' to avoid any possible
collision with other uses of that name from ansible loops.
Change-Id: I6dfb6f8494538acfdfa4f3f93e02cb955fd2bd9c
Select the provider in a seperate task so that the provider name
can then be included in the task name for the upload. This will
enable the provider to be seen in the job console even if the upload
subseqently fails.
In addition, the upload role can now be called more than once in a
future patch, keeping the provider constant between invocations.
Change-Id: I37ec05125824a0442652e6444369967bc5170aae
After pinning openstacksdk<0.99 in the zuul images with
If1cf1f8c301de09df1d212b6cef151317f6dc6bf, the problem with missing
CORS headers for Rackspace Swift object uploads seems to have
subsided, so we no longer need to limit where we write our logs.
This reverts commit 7b85fb90df.
Change-Id: I7e1a9cc87fea1bd1517b9340342e74ab578c9cb5
Something has broken with CORS headers for logs recently uploaded to
the three Rackspace regions we use. They can still be browsed in raw
form, but the Zuul results page is unable to provide a failure
summary, console breakdown, or deep-linkable logs.
While we work to identify the underlying cause, avoid uploading
further logs to those locations with our base/post-logs playbook,
relying instead exclusively on the two OVH regions configured (which
seem to still work as before). Leave the Rackspace entries in
base-test/post-logs so we'll still be able to easily replicate the
problem for further troubleshooting.
Change-Id: I92ede6bf4717c07e78f43c11fb2b1cd94e1a5478
The openstack health server stopped working a few months ago and we
ended up shutting down the subunit workers and the health api server as
a result. This means we can stop submitting gearman jobs to process
subunit files.
Also about a year ago we indicated to OpenStack that we could keep the
logstash tooling running through the yoga cycle which is now over. We
haven't had any volunteers or help to continue running the ELK stack in
opendev so we're going to shut it down now that yoga is out the door.
Openstack did end up working with AWS to set up an opensearch
replacement which users can look to for log indexing of CI jobs in
OpenStack.
Change-Id: I5f0f3805e191f0cd6354285299ed33c42d3899fd
As noted inline, this fails the job if using the centos-8 label (see
If32d0c4c503e11285fdcb7c45188568a5dc010bf) that actually points at a
centos-8-stream node. This should encourage people to fix the node
usage.
Change-Id: I602f2c48fa4845288d72de0cf1d46149815d1cbc
We don't need to quote the when: statements. Follow-on to
I84fe5bca76884d8f258a292d0814ad43ac7f2be1.
Change-Id: Ieb114467dea3be0a0ec7a96fbd10ba47f7f00cac
As noted inline, this fails the job if using the centos-8 label (see
If32d0c4c503e11285fdcb7c45188568a5dc010bf) that actually points at a
centos-8-stream node. This should encourage people to fix the node
usage.
This adds the job to base-test.
I602f2c48fa4845288d72de0cf1d46149815d1cbc adds it to production.
Change-Id: I84fe5bca76884d8f258a292d0814ad43ac7f2be1
This playbook does not need to actually access the bastion host
(bridge.o.o) so does not need the inventory setup steps here.
It was pointed out in review of the prior change
(I1bbf4f1402938216401dd924da62aa869a08875b) that we could drop this
job and do this known_hosts setup in system-config for each job.
However, I think it's not a bad idea to keep a synchronization point
for the infra-prod jobs here in a trusted playbook.
Depends-On: https://review.opendev.org/c/opendev/system-config/+/807808
Change-Id: I43285bf61a2902851a15929ac3725fe131ef5b1f