The openstack health server stopped working a few months ago and we
ended up shutting down the subunit workers and the health api server as
a result. This means we can stop submitting gearman jobs to process
subunit files.
Also about a year ago we indicated to OpenStack that we could keep the
logstash tooling running through the yoga cycle which is now over. We
haven't had any volunteers or help to continue running the ELK stack in
opendev so we're going to shut it down now that yoga is out the door.
Openstack did end up working with AWS to set up an opensearch
replacement which users can look to for log indexing of CI jobs in
OpenStack.
Change-Id: I5f0f3805e191f0cd6354285299ed33c42d3899fd
Prepopulate the VERSION_ID from ansible's
ansible_distribution_major_version since in some cases it may not be
assigned in /etc/os-release (for example in debian/testing prior to
release). Under most circumstances, this will be overridden when we
later source the /etc/os-release file.
This also reverts commit 7ac676aaca.
Change-Id: Ieb3e068ae715ae2de058dff0d5d526527466f91e
This change was made to test https://review.opendev.org/c/zuul/zuul-jobs/+/773474 using base-test.
Now that the PR is merged, the changes made to test should be reverted.
Change-Id: I0e310dcbdb6c1e47f3575ef9da7d5560267ee3d9
Ansible changed, then unchanged their behavior around file modes, but
being explicit is likely a good idea to handle any new future changes
from ansible.
We set modes generously (to 755 for dirs and 644 for files) to avoid
unexpected access problems. Note that depending on the perms in AFS this
may cause a perms ot update on existing dirs but that should be fine as
long as we aren't making them more restrictive.
Finally we skip two cases where modes are required by the linting rule
because they are tarball extraction steps and applying a single mode to
all dirs and files in a tarball doens't make a ton of sense.
Includes bumping linter configuration.
Change-Id: Iacf41549928ba7f05f0f71a79ddef1b6e1154e2a
Co-authored-by: Sorin Sbarnea <ssbarnea@redhat.com>
These logs have grown very large because neutron is doing unconditional
function performance profiling via debug logging. Exclude them until
this is cleaned up as we cannot keep up otherwise and everyone else is
sad as a result.
Change-Id: I2e2d3d5213a2a0dba400e26ea6dcc881dca594ee
The previous vendoring attempt did not work because of the way
Ansible handles imports. Instead, we now rely on the module_utils
method of bundling supporting python code.
Change-Id: I01f57e9eab77f0c39b45bb52b573642ab8f29f22
We use the gear library to submit log processing jobs in the post-run
playbooks. That is installed in the ansible virtualenvs on our vms,
but it is not installed in the virtualenvs on the zuul-executor image.
Aside from our use of it, there isn't a compelling reason to ask the Zuul
project to add that to the zuul-executor image. And we would prefer to
continue using the upstream image rather than make our own. It seems
the simplest way to avoid any further complexity here is just to vendor
the gear library. It's small and does not change often.
Change-Id: I733361be3b7eaaa60dacc0b09112eebaa06e0d9a
We've found the screen-monasca-api.txt logs can be in the multiple
hundreds of megabytes. Unfortunately, this causes problems for our
indexing system. monasca-persister is already excluded, add monasca-api
to the list.
Change-Id: Iafa3dcfcc1714a2560940a4616b08872ae694dc8
As described in the inline comments, this is working around a broken
version of setuptools vendored by virtualenv currently. Specify in a
config file that it should download the latest helper tools, rather
then rely on the inbuilt versions.
Add to the base-test pre-playbook; after testing we can move to the
base playbook.
Change-Id: Ib17017637eae81a3ff57302e7c77945f2045b5ac
This host has migrated to tarballs.opendev.org, but rather than
updating this caching rule, an audit of codesearch and a cursory look
at several mirror logs has determined this wasn't ever really used.
Since it is now on AFS, we could export this directly from the
filesystem on mirror nodes and implicitly use AFS caching if we need
to reinstate this.
Change-Id: I4feb2a2229dcf0626ec8ee8494ad6801468c1440
Story: #2006598
Task: #39014
This will help us better track how often jobs are being rerun. We should
be able to query for numeric values of this field and count the number
of jobs that had more than one attempt. From that we should also be able
to identify if specific jobs hit these problems more than others.
Change-Id: I6ad21ad3a55140356b9a5f19e5e5bb83663898f3
This is no longer needed after we switch to prepare-workspace-git.
Depends-On: https://review.opendev.org/680703
Change-Id: I3830e4c84c6f8bea465cce22b7e3ae048bf60882
This is currently failing in some cases (I'm sure it is when executor
only jobs end up with things that aren't matched, but can't pinpoint
exactly) but you only get
"details": "IndexError('list index out of range',)",
which doesn't help find out which list index was out of range.
The get_execption() is for python 2.6 support (described [1]) so we
can use the regular format. Then include the full traceback in the
details so we can see the line causing problems.
[1] https://docs.ansible.com/ansible/2.5/dev_guide/developing_python_3.html
Change-Id: I20d1d99a48b7a173ab6c792dc3508c27b8047f6a
With our switch to swift hosted logs files ended in .gz are no longer
equivalent to the filenames without the .gz. Swift expects us to be
speific. For this reason normalize on using the actual file names as
stored and reported rather than removing the suffix for human
friendliness.
Note we keep the .gz less tag value in the tags list. This is because
e-r typically uses tags to identify files rather than pure paths or
filenames.
Change-Id: Ie9063f0ab35317357280690d9ad5e273025e3240
Related-Change: https://review.opendev.org/#/c/677236
https://review.opendev.org/#/c/676120/ adds the ubuntu mirror host
but it contains typo NODEPOOLMIRROR_HOST, it breaks the CI while
sourcing /etc/ci/mirror_info.sh. Fixing the typo fixes the issue.
Change-Id: I95fb295d81494782120603e830d60678474a8cfe
Signed-off-by: Chandan kumar <chkumar@redhat.com>
Update task names so that it's clear which task is doing what and thus
we avoid e.g twice "Set target directory" and you need to check whether that
is correct.
Add missing task name for one include_role.
Change-Id: I550307bdfdf2815ce7b5001bd89928311ce8cc02
We think we have growing pains in some clouds where ipv6 is available
but we want to force the instances to use ipv4 dns anyway. We can do
this by telling nodepool to use ipv4 for the instance ip which we can
then check in our role to configure unbound.
Do this to rule out ipv6 as a problem in these clouds.
Change-Id: I354015df64032e70422231f1105807f024fe2393
CloudFlare's public recursive DNS resolvers are available at
multiple anycast addresses. For some reason 1.1.1.1 is unreachable
from parts of OVH's BHS1 region, but 1.0.0.1 seems to be
consistently reachable. Swap this for improved reliability.
Depends-On: https://review.opendev.org/655687
Change-Id: I403961828f4af3f121a6fa2193a933c9fc4a7bc7
NODEPOOL_DOCKER_REGISTRY_V2_PROXY will be:
http://$NODEPOOL_MIRROR_HOST:8082/registry-1.docker/
Which is the URL of the mirror for the Docker Registry v2 provided by
Opendev Infra team.
Change-Id: I46e2bc72ad2eb29e3fdcaf15ce96f8c4e03e406e
Ianw noticed problems on fedora29 with unbound. That resulted in a bug
filed upstream,
https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=4226. In this bug
the helpful unbound maintainers point out that OpenDNS servers are
having trouble with RRSIG records which leads to not validating dnssec
which we require in our unbound config.
Address this by switching to CloudFlare DNS which is suppsoed to be
super localized (aka responsive), and not record queries against it.
Also if we want to we can update our config to do dns over tls against
these servers.
Change-Id: I8137239c2f53381afd87d420a5fe44064c669f87