Add contributor guide page on using elastic recheck

This commit adds a new contributor guide page on interacting with
elastic-recheck and how to submit new bug fingerprints.

Story: 2001356
Task: 5911

Change-Id: I9870be60a1eadec8e864680a99e2179249c8d215
This commit is contained in:
Matthew Treinish 2018-04-18 14:04:40 -04:00
parent ed3bddb3e3
commit a85f00fb43
No known key found for this signature in database
GPG Key ID: FD12A0F214C9E177
4 changed files with 95 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 119 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 244 KiB

View File

@ -0,0 +1,94 @@
#####################
Using Elastic Recheck
#####################
.. note:: This section assumes you have completed :doc:`/common/zuul-status`
What to do if a test job fails
==============================
When you submit a patch to gerrit and zuul returns the test results for the
jobs it ran, sometimes one of those tests fails. Most of the time this
indicates there is an issue with your proposed change and the tests are
catching it. Sometimes the test run might have tripped over an underlying
pre-existing bug in OpenStack. Additionally, some times the infrastructure for
running the tests might have had a failure. To figure this out you'll always
need to look at the logs from the failed job to understand what's happening.
What is elastic-recheck
=======================
elastic-recheck is a tool used to track failures in test jobs. It keeps a
repository of fingerprints for known bugs that are affecting jobs in the gate.
It is then used to both track the rates those bugs are being found and also to
leave comments in gerrit and in IRC when it has found a known bug fingerprint
in a failure.
elastic-recheck is built on top of an ELK (`Elastic Search`_, `Logstash`_,
`Kibana`_) stack where we use Logstash to store all logs from CI jobs in an
Elastic Search cluster. We also host a `Kibana dashboard`_ which can be used
to run queries on the cluster and interacts with the data. elastic-recheck
queries the elastic-search cluster for the fingerprints.
.. _Elastic Search: https://github.com/elastic/elasticsearch
.. _Logstash: https://github.com/elastic/logstash
.. _Kibana: https://github.com/elastic/kibana
.. _Kibana dashboard: http://logstash.openstack.org/
You can see the current status of the bugs being tracked by elastic recheck at:
http://status.openstack.org/elastic-recheck/index.html
.. image:: /_assets/elastic_recheck/er_status.png
:scale: 65
Each graph shows how many matches were found for that fingerprint over the past
10 days. It also provides a link to both the launchpad page for the bug, and
the kibana dashboard for the underlying elastic-search query used for the
fingerprint.
elastic-recheck also has a page to show how many failures were encountered that
do not have a matching fingerprint. Typically the more failures that go
uncategorized the more unstable the gate is (and OpenStack as a whole).
You can find these pages at:
http://status.openstack.org/elastic-recheck/data/integrated_gate.html
and
http://status.openstack.org/elastic-recheck/data/others.html
depending on which jobs you're interested in.
.. image:: /_assets/elastic_recheck/er_uncategorized.png
:scale: 65
If you're interested in more of the theory and history behind the project, this
talk from the Juno OpenStack Summit provides a good overview:
https://www.youtube.com/watch?v=Byo26Pioq1Y
Tracking a new bug in elastic-recheck
=====================================
When you encounter a failure that's not being tracked by elastic-recheck
and you've looked through the logs to determine that it's not being caused
by the proposed change and is affecting other changes you can propose a new
elastic-recheck fingerprint.
This guide won't go into the details of tracing through the logs of a run
and finding a good fingerprint, since that's quite involved, dependent on the
job you're looking at, and already documented in a few places including:
* https://www.openstack.org/videos/vancouver-2015/tales-from-the-gate-how-debugging-the-gate-helps-your-enterprise
* https://docs.openstack.org/infra/elastic-recheck/readme.html#queries
Once you've identified a message in the logs that can be used for
fingerprinting you need to turn that into an elastic-search query. You can
use any of the existing fingerprints as an example:
https://git.openstack.org/cgit/openstack-infra/elastic-recheck/tree/queries
You should also check any elastic search queries using kibana at:
http://logstash.openstack.org/
Once you've constructed a query and checked in on elastic-search you should
create a yaml file in the queries directory of the elastic-recheck git repo.
The file name is the launchpad bug number for the bug and the contents are
the elastic-search query.

View File

@ -17,3 +17,4 @@ Code & Documentation Contributor Guide
/common/governance
/common/zuul-status
/common/patch-best-practices
/code-and-documentation/elastic-recheck