From 6721d89dc0c62ad5f87c3c71ac807f8e1f99585d Mon Sep 17 00:00:00 2001 From: Jakub Libosvar Date: Mon, 30 Jan 2017 16:26:34 +0100 Subject: [PATCH] policies: Add policy for rechecking failed jobs on Gerrit In order to get a better habit on how to recheck gate failures, this patch introduces a new policy and howto related to usage of recheck on gerrit. Change-Id: Iaef998108de80fffcf5400e357a51e2f70dc047a --- doc/source/policies/gate-failure-triage.rst | 20 ++++++++++++++-- doc/source/policies/gerrit-recheck.rst | 26 +++++++++++++++++++++ doc/source/policies/index.rst | 1 + 3 files changed, 45 insertions(+), 2 deletions(-) create mode 100644 doc/source/policies/gerrit-recheck.rst diff --git a/doc/source/policies/gate-failure-triage.rst b/doc/source/policies/gate-failure-triage.rst index 433ee7cceea..08277619d7c 100644 --- a/doc/source/policies/gate-failure-triage.rst +++ b/doc/source/policies/gate-failure-triage.rst @@ -40,8 +40,10 @@ story is to check for `uncategorized `_ for it. 6. If you are confident with the area of this bug, and you have time, assign it to yourself; otherwise look for an assignee or talk to the Neutron's bug czar to find an assignee. +Troubleshooting functional/fullstack job +---------------------------------------- +1. Go to the job link provided by Jenkins CI. +2. Look at logs/testr_results.html.gz for which particular test failed. +3. More logs from a particular test are stored at + logs/dsvm-functional-logs/ (or dsvm-fullstack-logs + for fullstack job). +4. Find the error in the logs and search for similar errors in existing + launchpad bugs. If no bugs were reported, create a new bug report. Don't + forget to put a snippet of the trace into the new launchpad bug. If the + log file for a particular job doesn't contain any trace, pick the one + from testr_results.html.gz. +5. Create an `Elastic Recheck Query `_ + Root Causing a Gate Failure --------------------------- Time-based identification, i.e. find the naughty patch by log scavenging. diff --git a/doc/source/policies/gerrit-recheck.rst b/doc/source/policies/gerrit-recheck.rst new file mode 100644 index 00000000000..55d1756ae11 --- /dev/null +++ b/doc/source/policies/gerrit-recheck.rst @@ -0,0 +1,26 @@ +Recheck Failed CI jobs in Neutron +================================= + +This document provides guidelines on what to do in case your patch fails one of +the Jenkins CI jobs. In order to discover potential bugs hidden in the code or +tests themselves, it's very helpful to check failed scenarios to investigate +the cause of the failure. Sometimes the failure will be caused by the patch +being tested, while other times the failure can be caused by a previously +untracked bug. Such failures are usually related to tests that interact with +a live system, like functional, fullstack and tempest jobs. + +Before issuing a recheck on your patch, make sure that the gate failure is not +caused by your patch. Failed job can be also caused by some infra issue, for +example unable to fetch things from external resources like git or pip due to +outage. Such failures outside of OpenStack world are not worth tracking in +launchpad and you can recheck leaving couple of words what went wrong. Data +about gate stability is collected and visualized via +`Grafana `_. + +Please, do not recheck without providing the bug number for the failed job. +For example, do not just put an empty "recheck" comment but find the related +bug number and put a "recheck bug ######" comment instead. If a bug does not +exist yet, create one so other team members can have a look. It helps us +maintain better visibility of gate failures. You can find how to troubleshoot +gate failures in the `Gate Failure Triage `_ +documentation. diff --git a/doc/source/policies/index.rst b/doc/source/policies/index.rst index 95ba2b9ed1f..c40a79f1000 100644 --- a/doc/source/policies/index.rst +++ b/doc/source/policies/index.rst @@ -32,3 +32,4 @@ items. code-reviews release-checklist thirdparty-ci + gerrit-recheck