Merge "policies: Add policy for rechecking failed jobs on Gerrit"

This commit is contained in:
Jenkins 2017-04-13 12:40:08 +00:00 committed by Gerrit Code Review
commit be4571704f
3 changed files with 45 additions and 2 deletions

View File

@ -40,8 +40,10 @@ story is to check for `uncategorized <http://status.openstack.org/elastic-rechec
failures. This is where failures for new (unknown) gate breaking bugs end up; on the other hand also infra
error causing job failures end up here. It should be duty of the diligent Neutron developer to ensure the
classification rate for neutron jobs is as close as possible to 100%. To this aim, the diligent Neutron
developer should adopt the following procedure:
developer should adopt the procedure outlined in the following sections.
Troubleshooting Tempest jobs
----------------------------
1. Open logs for failed jobs and look for logs/testr_results.html.gz.
2. If that file is missing, check console.html and see where the job failed.
1. If there is a failure in devstack-gate-cleanup-host.txt it's likely to be an infra issue.
@ -50,10 +52,24 @@ developer should adopt the following procedure:
logstash.
4. On logstash, search for occurrences of this error message, and try to identify the root cause for the failure
(see below).
5. File a bug for this failure, and push a elastic-recheck query for it (see below).
5. File a bug for this failure, and push an `Elastic Recheck Query <http://docs.openstack.org/developer/neutron/policies/gate-failure-triage.html#filing-an-elastic-recheck-query>`_ for it.
6. If you are confident with the area of this bug, and you have time, assign it to yourself; otherwise look for an
assignee or talk to the Neutron's bug czar to find an assignee.
Troubleshooting functional/fullstack job
----------------------------------------
1. Go to the job link provided by Jenkins CI.
2. Look at logs/testr_results.html.gz for which particular test failed.
3. More logs from a particular test are stored at
logs/dsvm-functional-logs/<path_of_the_test> (or dsvm-fullstack-logs
for fullstack job).
4. Find the error in the logs and search for similar errors in existing
launchpad bugs. If no bugs were reported, create a new bug report. Don't
forget to put a snippet of the trace into the new launchpad bug. If the
log file for a particular job doesn't contain any trace, pick the one
from testr_results.html.gz.
5. Create an `Elastic Recheck Query <http://docs.openstack.org/developer/neutron/policies/gate-failure-triage.html#filing-an-elastic-recheck-query>`_
Root Causing a Gate Failure
---------------------------
Time-based identification, i.e. find the naughty patch by log scavenging.

View File

@ -0,0 +1,26 @@
Recheck Failed CI jobs in Neutron
=================================
This document provides guidelines on what to do in case your patch fails one of
the Jenkins CI jobs. In order to discover potential bugs hidden in the code or
tests themselves, it's very helpful to check failed scenarios to investigate
the cause of the failure. Sometimes the failure will be caused by the patch
being tested, while other times the failure can be caused by a previously
untracked bug. Such failures are usually related to tests that interact with
a live system, like functional, fullstack and tempest jobs.
Before issuing a recheck on your patch, make sure that the gate failure is not
caused by your patch. Failed job can be also caused by some infra issue, for
example unable to fetch things from external resources like git or pip due to
outage. Such failures outside of OpenStack world are not worth tracking in
launchpad and you can recheck leaving couple of words what went wrong. Data
about gate stability is collected and visualized via
`Grafana <http://grafana.openstack.org/dashboard/db/neutron-failure-rate>`_.
Please, do not recheck without providing the bug number for the failed job.
For example, do not just put an empty "recheck" comment but find the related
bug number and put a "recheck bug ######" comment instead. If a bug does not
exist yet, create one so other team members can have a look. It helps us
maintain better visibility of gate failures. You can find how to troubleshoot
gate failures in the `Gate Failure Triage <http://docs.openstack.org/developer/neutron/policies/gate-failure-triage.html#troubleshooting-tempest-job>`_
documentation.

View File

@ -32,3 +32,4 @@ items.
code-reviews
release-checklist
thirdparty-ci
gerrit-recheck