Cleanup and clarify README
* Clarify what elastic-recheck does. Initially the primary goal was to report back to gerrit, but we are using elastic-recheck for much more then that now, so update docs to reflect that. * Remove section on dependencies, it was used before we had a proper requirements file. * Remove references to adding the resolved_at option, as that is now implemented Change-Id: I6aa55bdf02174f13d86ad3309f5ad53110dc647d
This commit is contained in:
parent
487e64f2fd
commit
c507fa1892
34
README.rst
34
README.rst
|
@ -2,35 +2,40 @@
|
||||||
elastic-recheck
|
elastic-recheck
|
||||||
===============================
|
===============================
|
||||||
|
|
||||||
"Classify tempest-devstack failures using ElasticSearch"
|
"Use ElasticSearch to classify OpenStack gate failures"
|
||||||
|
|
||||||
* Open Source Software: Apache license
|
* Open Source Software: Apache license
|
||||||
* Documentation: http://docs.openstack.org/developer/elastic-recheck
|
* Documentation: http://docs.openstack.org/developer/elastic-recheck
|
||||||
|
|
||||||
Idea
|
Idea
|
||||||
----
|
----
|
||||||
When a tempest job failure is detected, by monitoring gerrit (using
|
Identifying the specific bug that is causing a transient error in the gate
|
||||||
gerritlib), a collection of logstash queries will be run on the failed
|
is very hard. Just identifying which tempest test failed is not enough
|
||||||
job to detect what the bug was.
|
because a single bug can potentially cause multiple tempest tests to fail.
|
||||||
|
If we can find a fingerprint for a specific bug using logs, then we can use
|
||||||
|
ElasticSearch to automatically detect any occurrences of the bug.
|
||||||
|
|
||||||
Eventually this can be tied into the rechecker tool and launchpad
|
Using these fingerprints elastic-recheck can:
|
||||||
|
|
||||||
|
* Search ElasticSearch for all occurrences of a bug.
|
||||||
|
* Identify bug trends such as: when it started, is the bug fixed, is it
|
||||||
|
getting worse, etc.
|
||||||
|
* Classify bug failures in real time and report back to gerrit if we find a
|
||||||
|
match, so a patch author knows why the test failed.
|
||||||
|
|
||||||
queries/
|
queries/
|
||||||
--------
|
--------
|
||||||
|
|
||||||
All queries are stored in separate yaml files in a queries directory
|
All queries are stored in separate yaml files in a queries directory
|
||||||
at the top of the elastic_recheck code base. The format of these files
|
at the top of the elastic-recheck code base. The format of these files
|
||||||
is ######.yaml (where ###### is the bug number), the yaml should have
|
is ######.yaml (where ###### is the launchpad bug number), the yaml should have
|
||||||
a ``query`` keyword which is the query text for elastic search.
|
a ``query`` keyword which is the query text for elastic search.
|
||||||
|
|
||||||
Guidelines for good queries
|
Guidelines for good queries
|
||||||
|
|
||||||
- After a bug is resolved and has no more hits in elasticsearch, we
|
- After a bug is resolved and has no more hits in elasticsearch, we
|
||||||
should flag it with a resolved_at keyword. This will let us keep
|
should flag it with a resolved_at keyword. This will let us keep
|
||||||
some memory of past bugs, and see if they come back. (Note: this is
|
some memory of past bugs, and see if they come back.
|
||||||
a forward looking statement, sorting out resolved_at will come in
|
|
||||||
the future)
|
|
||||||
- Queries should get as close as possible to fingerprinting the root cause
|
- Queries should get as close as possible to fingerprinting the root cause
|
||||||
- Queries should not return any hits for successful jobs, this is a
|
- Queries should not return any hits for successful jobs, this is a
|
||||||
sign the query isn't specific enough
|
sign the query isn't specific enough
|
||||||
|
@ -69,14 +74,7 @@ Future Work
|
||||||
- Add debug mode flag
|
- Add debug mode flag
|
||||||
- Expand gating testing
|
- Expand gating testing
|
||||||
- Cleanup and document code better
|
- Cleanup and document code better
|
||||||
- Sort out resolved_at stamping to remove active bugs
|
- Add ability to check if any resolved bugs return
|
||||||
- Move away from polling ElasticSearch to discover if its ready or not
|
- Move away from polling ElasticSearch to discover if its ready or not
|
||||||
- Add nightly job to propose a patch to remove bug queries that return
|
- Add nightly job to propose a patch to remove bug queries that return
|
||||||
no hits -- Bug hasn't been seen in 2 weeks and must be closed
|
no hits -- Bug hasn't been seen in 2 weeks and must be closed
|
||||||
- implement resolved_at in loader
|
|
||||||
|
|
||||||
|
|
||||||
Main Dependencies
|
|
||||||
------------------
|
|
||||||
- gerritlib
|
|
||||||
- pyelasticsearch
|
|
||||||
|
|
Loading…
Reference in New Issue