diff --git a/README.rst b/README.rst index 3e1ee58a..92eb0f9a 100644 --- a/README.rst +++ b/README.rst @@ -2,35 +2,40 @@ elastic-recheck =============================== -"Classify tempest-devstack failures using ElasticSearch" +"Use ElasticSearch to classify OpenStack gate failures" * Open Source Software: Apache license * Documentation: http://docs.openstack.org/developer/elastic-recheck Idea ---- -When a tempest job failure is detected, by monitoring gerrit (using -gerritlib), a collection of logstash queries will be run on the failed -job to detect what the bug was. +Identifying the specific bug that is causing a transient error in the gate +is very hard. Just identifying which tempest test failed is not enough +because a single bug can potentially cause multiple tempest tests to fail. +If we can find a fingerprint for a specific bug using logs, then we can use +ElasticSearch to automatically detect any occurrences of the bug. -Eventually this can be tied into the rechecker tool and launchpad +Using these fingerprints elastic-recheck can: +* Search ElasticSearch for all occurrences of a bug. +* Identify bug trends such as: when it started, is the bug fixed, is it + getting worse, etc. +* Classify bug failures in real time and report back to gerrit if we find a + match, so a patch author knows why the test failed. queries/ -------- All queries are stored in separate yaml files in a queries directory -at the top of the elastic_recheck code base. The format of these files -is ######.yaml (where ###### is the bug number), the yaml should have +at the top of the elastic-recheck code base. The format of these files +is ######.yaml (where ###### is the launchpad bug number), the yaml should have a ``query`` keyword which is the query text for elastic search. Guidelines for good queries - After a bug is resolved and has no more hits in elasticsearch, we should flag it with a resolved_at keyword. This will let us keep - some memory of past bugs, and see if they come back. (Note: this is - a forward looking statement, sorting out resolved_at will come in - the future) + some memory of past bugs, and see if they come back. - Queries should get as close as possible to fingerprinting the root cause - Queries should not return any hits for successful jobs, this is a sign the query isn't specific enough @@ -69,14 +74,7 @@ Future Work - Add debug mode flag - Expand gating testing - Cleanup and document code better -- Sort out resolved_at stamping to remove active bugs +- Add ability to check if any resolved bugs return - Move away from polling ElasticSearch to discover if its ready or not - Add nightly job to propose a patch to remove bug queries that return no hits -- Bug hasn't been seen in 2 weeks and must be closed -- implement resolved_at in loader - - -Main Dependencies ------------------- -- gerritlib -- pyelasticsearch