Cleanup and clarify README

* Clarify what elastic-recheck does. Initially the primary goal was to report back to gerrit, but we are using elastic-recheck for much more then that now, so update docs to reflect that. * Remove section on dependencies, it was used before we had a proper requirements file. * Remove references to adding the resolved_at option, as that is now implemented Change-Id: I6aa55bdf02174f13d86ad3309f5ad53110dc647d
2013-12-13 17:37:58 +01:00 · 2013-12-13 17:37:58 +01:00 · c507fa1892
parent 487e64f2fd
commit c507fa1892
1 changed files with 16 additions and 18 deletions
--- a/README.rst
+++ b/README.rst
@ -2,35 +2,40 @@
 elastic-recheck
 ===============================

-"Classify tempest-devstack failures using ElasticSearch"
+"Use ElasticSearch to classify OpenStack gate failures"

 * Open Source Software: Apache license
 * Documentation: http://docs.openstack.org/developer/elastic-recheck

 Idea
 ----
-When a tempest job failure is detected, by monitoring gerrit (using
-gerritlib), a collection of logstash queries will be run on the failed
-job to detect what the bug was.
+Identifying the specific bug that is causing a transient error in the gate
+is very hard. Just identifying which tempest test failed is not enough
+because a single bug can potentially cause multiple tempest tests to fail.
+If we can find a fingerprint for a specific bug using logs, then we can use
+ElasticSearch to automatically detect any occurrences of the bug.

-Eventually this can be tied into the rechecker tool and launchpad
+Using these fingerprints elastic-recheck can:

+* Search ElasticSearch for all occurrences of a bug.
+* Identify bug trends such as: when it started, is the bug fixed, is it
+  getting worse, etc.
+* Classify bug failures in real time and report back to gerrit if we find a
+  match, so a patch author knows why the test failed.

 queries/
 --------

 All queries are stored in separate yaml files in a queries directory
-at the top of the elastic_recheck code base. The format of these files
-is ######.yaml (where ###### is the bug number), the yaml should have
+at the top of the elastic-recheck code base. The format of these files
+is ######.yaml (where ###### is the launchpad bug number), the yaml should have
 a ``query`` keyword which is the query text for elastic search.

 Guidelines for good queries

 - After a bug is resolved and has no more hits in elasticsearch, we
  should flag it with a resolved_at keyword. This will let us keep
-  some memory of past bugs, and see if they come back. (Note: this is
-  a forward looking statement, sorting out resolved_at will come in
-  the future)
+  some memory of past bugs, and see if they come back.
 - Queries should get as close as possible to fingerprinting the root cause
 - Queries should not return any hits for successful jobs, this is a
  sign the query isn't specific enough
@ -69,14 +74,7 @@ Future Work
 - Add debug mode flag
 - Expand gating testing
 - Cleanup and document code better
- Sort out resolved_at stamping to remove active bugs
+- Add ability to check if any resolved bugs return
 - Move away from polling ElasticSearch to discover if its ready or not
 - Add nightly job to propose a patch to remove bug queries that return
  no hits -- Bug hasn't been seen in 2 weeks and must be closed
- implement resolved_at in loader
-
-
-Main Dependencies
------------------
- gerritlib
- pyelasticsearch