elastic-recheck

Commit Graph

Author	SHA1	Message	Date
Sorin Sbarnea	3901d2fd93	Made parse_jenkins_failure a non static Replaces static implementation that received password and a member function that can make use of the config object. Change-Id: If9617b6db73eb49c5193f098d45e357a267529dd	2020-09-15 10:04:15 +01:00
Sorin Sbarnea	9d37c88c8f	pylint: 4 more Change-Id: I4cc928d8212a5192927a994b4248f32fe05ca723	2020-09-11 11:19:35 +01:00
Sorin Sbarnea	360c57118c	pylint: 6 more Change-Id: Ic16db7972fe6f9da86592d56f4983572d7c68989	2020-09-10 15:28:52 +01:00
Sorin Sbarnea	c41b9c6fa0	pylint: fixed logging-not-lazy Change-Id: Ic25366a9afdfc67ab2beddbe2b8d02544c51e480	2020-09-10 14:56:52 +01:00
Sorin Sbarnea	ed5296999e	pylint: fixed imports Fixed pylint violations around imports. Implements standard import ordering (isort). Change-Id: Ib89108925487e49109d18ae315cd4892b8b48837	2020-09-10 13:46:38 +01:00
Sorin Sbarnea	78a8098354	pylint fixes Resolves several code style violations. Change-Id: Id03dad8f8ce141eb1e630a77d0c9ae497de9f2ed	2020-09-10 12:42:12 +01:00
Sorin Sbarnea	f68a8719af	Bumped flake8 - Upgraded hacking(flake8) - Added more modern tox linters environment (pep8 alias) - Temporary added skips for broken newer rules - Fixed few basic rule violations - Moved flake8 config to setup.cfg (tox.ini is not recommended) Change-Id: I75b3ce5d2ce965a9dc5bdfaa49b2aacd8f0195ad	2020-05-23 08:54:14 +01:00
Clark Boylan	02d0651f29	Better event checking timeouts Previously we gave every event a 20 minute timeout. This meant that we could eventually rollover on the day and start querying against current indexes for data in older indexes. If this happens every query would fail because we are looking in the wrong index. Every query failing means we run the 20 minute timeout every time. All this results in snowballing never being able to check if events are indexed. Address this by using the gerrit eventCreatedOn timestamp to determine when our timeout is hit. We will timeout 20 minutes from that timestamp regardless of how long interim processing has taken us. This should over longer periods of time ensure we query the current index for current events. Change-Id: Ic9ed7fefae37d2668de5d89e0d06b8326eadfbb9	2018-11-30 19:34:54 +00:00
Sorin Sbarnea	6c4f466282	Made elastic-recheck py3 compatible - Adds py36/py37 jobs. - Fixed invalid syntax errors - Bumps dependencies to versions that are py3 compatible Change-Id: I0cebc35993b259cc86470c5ceb1016462a1d649b Related-Bug: #1803402	2018-11-29 20:15:07 +00:00
Ihar Hrachyshka	f3e41b2d72	grenade: wait for service logs before matching against patterns Before the patch, the daemon was not waiting for all service logs to upload to logstash, just console and grenade logs. It means that queries that targeted service log files sometimes missed a legit match. Change-Id: I96ae09c1be8f1b12117bcfc635589e7c149d5df2	2018-01-05 21:39:27 +00:00
David Moreau-Simard	97f6408b54	Add support for Zuulv3-specific parameters in elastic-recheck This commit ensures elastic-recheck is able to support zuul v2 and v3 simultaneously: - Add message queries based on v3 completion messages - Include job-output.txt where console.html was expected Change-Id: If3d7990d892a9698a112d9ff1dd5160998a4efe6 Depends-On: I7e34206d7968bf128e140468b9a222ecbce3a8f1	2017-10-26 19:18:16 -04:00
Jenkins	91dbae986c	Merge "Pass config from Stream to FailEvent"	2017-08-01 19:03:39 +00:00
Matt Riedemann	88cdb49304	Bump console log timeout to 20 minutes 13.3 minutes doesn't seem to be adequate anymore for getting the console logs from elasticsearch, so this change increases the timeout to 20 minutes. Change-Id: I77f9d79833e23f2b9cda3622832d4315ea574f4a	2017-01-14 09:46:06 -05:00
Matt Riedemann	3ac609c43c	Pass config from Stream to FailEvent Every time we create a FailEvent for a failed job gerrit comment event we're reconstructing a Config object unnecessarily, we can just pass the config in from the Stream object to the FailEvent object. Change-Id: Ibd85a4f0e813bc9bfff69de8f4f42951face88e4	2016-11-15 17:12:08 -05:00
Ramy Asselin	49999256f4	Make Elastic Recheck Watch more reusable Refactor to use a config class to hold all the params needed so that they can be more easily overridden and reused across all the elastic-recheck tools. In addition, use the new class to make the jobs_regex and ci_username configurable. Change-Id: Ic6f115a6882494bf4c087ded4d7cafa557765c28	2016-09-20 18:11:30 -07:00
Jenkins	9c468aedc0	Merge "10 day count is too high"	2016-07-20 20:36:25 +00:00
James E. Blair	b32e422008	Support Zuul as a Gerrit user equivalent to Jenkins Change-Id: I72c76fc1891b56aeb827a86e33f594324dce26bc	2016-06-16 09:29:41 -07:00
Ramy Asselin	af21482811	10 day count is too high Graphs counts were looking at all history instead of just 10 days as intended. Update the search to only look at the most recent 10 days. Change-Id: I9495888a818986b3ac187bac7fd65fbcad6135a3	2016-03-03 12:10:39 -08:00
Sean Dague	f9b2619fe4	add a string repr for FailEvent This makes debugging code gone wrong a bit simpler. Also fix other __str__ function to use __repr__ as well, to make it consistent that objects which want representations implement __repr__ and not __str__. Change-Id: I6913da8f3ef6a4632d5f1c9d6ed26a38cdcd5e73	2015-12-02 14:44:56 -05:00
Sean Dague	314d578653	only query voting changes Elastic recheck is about failures, all queries should only include voting changes. We do this by explicitly adding voting:1 to all queries that load in the query builder. Change-Id: I4bd4827f72d85bf69bf501be2f5744e71de35a3c	2015-12-02 12:41:03 -05:00
Matt Riedemann	429d4aca85	Fix default port in elastic search URL pyelasticsearch>1.0 defaults the port to 9200 but logstash.o.o/es is on port 80, so update the defaults in code and config samples. Change-Id: Ibb85cd29e1cbc3ff448aa8470854fe0f8bede260	2015-11-10 09:00:51 -08:00
Ramy Asselin	96dca00b19	Enable configurable uris in uncategorized_fails.py Currently it is not possible to point to a different database or elastic search engine. Make these configurable by using the same configuration file used by bot.py. Also add a logstash url so that it can be configured separately from elastic search url. Change-Id: I77e4215765e32c34b67c38e37e5764c6c0e45c84	2015-10-20 20:23:18 +00:00
Matthew Treinish	48ebc14283	Add config flags for data source configuration This commit adds options to the config file for the elastic recheck bot configuration file. This enables users to specify how to connect to an elastic recheck server and a subunit2sql database, but things will still default to using the openstack-infra servers to prevent breaking the running service. Change-Id: I10db1a568cc01e137e5f4d8a8814b17201c4c438	2015-08-18 17:34:10 -04:00
Matthew Treinish	d83ef2e5ea	Add support to filter results by failure test_ids This commit adds a new field to the query yaml test_ids which is a list of test_ids that will be query the subunit2sql db to verify that at least one of them failed on the failed uuid. Change-Id: If3668709e3294b5d6bf9e1f082396fbc39c08512	2015-08-14 17:11:39 -04:00
Joe Gordon	2b97c0d156	Upgrade to hacking 0.10.x Fix up issues detected by new version of hacking Change-Id: Ie12b9f5ccaa1ce5f49ee6bf35d3275bc9dbcbc15	2015-04-30 17:00:22 -07:00
Joe Gordon	612d43f971	Add support to suppress bot notifications * Similar to suppress-graph There are some gate failures that are expected and are real errors (such as global-requirements mismatches in requirements jobs). suppress-notifications allows us to classify these failures and remove them from the unclassified page while not telling developers to recheck. This can be used along with suppress-graph. Change-Id: I6d905ba65e66e799a65598f8a5d5c3dd684feb8c	2015-01-23 09:42:54 -08:00
Joe Gordon	2a1767767b	Make build_short_uuid work with URLs with trailing slashes jenkins now includes a trailing slash http://logs.openstack.org/89/141489/6/check/gate-horizon-pep8/48238d7/ so update code and unit test to support an optional trailing slash Change-Id: I2b180ffb5c15436ac40a70b5e746c2d719a8152f	2015-01-13 14:09:03 +13:00
Joe Gordon	845fc73057	Update jenkins string to detect failed jobs I252ae31e7a4cb919e3c98c35591147cc96cfc3cc added the pipeline name to the zuul gerrit comments. Update the string matching here to work with new comment format. Change-Id: I7c09b8f40d594733309660ed76647886653e53ec	2014-10-24 09:09:15 -07:00
Sean Dague	cf0e9d6ef2	add support for computing relevant dates This records the current time when the data is constructed, the date of the last valid looking piece of data in elastic search, and how far behind we seem to be on indexing. The json payload is adjusted to be able to take additional metadata to support displaying this on the ER page. Change-Id: I0068ca0bbe72943d5d92dea704659ed865fea198	2014-09-29 11:08:37 -04:00
Matt Riedemann	30b7c43f24	Sort the gerrit failed event bug urls map for predictive tests The bug_urls_map method is actually returning a list so just sort the list and fix the tests that are racing due to random hashseed issues with the dict. This also updates the docstring which was incorrect before. Related-Bug: #1348818 Change-Id: I13ca69b3e685083d4ced2b054e0d42a440259854	2014-08-21 23:39:24 -07:00
Yuriy Taraday	b95e3b5f8d	Force newline after each part of comment message This would force a whitespace between message parts so that for example URL at the end of 'unrecognized' part won't get joined with first word of 'footer'. This change also fixes hidden (I guess) bug that should've been producing UnboundLocalError if FailedEvent.get_all_bugs() returns None. Change-Id: I3a44db0b7018c49f87702d900961ea7119081b12	2014-08-15 01:21:46 +04:00
Sean Dague	ea7590acd5	add support for an external message catalog Instead of having the messages inline, we should do them in the yaml file so that changing the UX for the bot reporting isn't a code change. Depends-On: I9208123a4cb3be02c272cd8a6eba460f4130a960 Change-Id: I8fdb07f9964f616addba6e8f25e5bd9de27d077a	2014-07-24 21:54:17 +00:00
Sean Dague	7beb69b933	ensure grenade logs exist It turns out that we broke grenade logs being indexed at all. This will at least give us some warning on looking for them in jobs. Change-Id: Ic6023b9c2cf64ac57eb023a7c6d60c2d1d731550	2014-07-07 08:24:53 -04:00
Jenkins	99a592fa50	Merge "have realtime engine only search recent indexes"	2014-06-13 16:17:58 +00:00
Sean Dague	b4591df9e9	have realtime engine only search recent indexes Elastic Recheck is really 2 things, real time searching, and bulk offline categorization. While the bulk categorization needs to look over the entire dataset, the real time portion is really deadline oriented. So only cares about the last hour's worth of data. As such we really don't need to search all the indexes in ES, but only the most recent one (and possibly the one before that if we are near rotation). Implement this via a recent= parameter for our search feature. If set to true then we specify the most recently logstash index. If it turns out that we're within an hour of rotation, also search the one before that. Adjust all the queries the bot uses to be recent=True. This will hopefully reduce the load generated by the bot on the ES cluster. Change-Id: I0dfc295dd9b381acb67f192174edd6fdde06f24c	2014-06-12 17:53:26 -04:00
Sean Dague	0d9bb900a6	gate-tempest-dsvm-virtual-ironic is in our gate because of the olsotest join, ironic is now in our main gate, and causing actual main gate failing. Treat it as such for triage purposes. Change-Id: Ib43130c3a0eb970dfda79ec422439340ac36bd5d	2014-06-12 07:45:13 -04:00
Joe Gordon	a9a2694439	Ignore non-voting jobs in gerrit We shouldn't be reporting back to users why a non-voting job is failing, Non-voting jobs are non-voting because the are unstable, so we don't want folks running recheck on a bug for a non-voting job. Update the unit tests to cover this case. Change-Id: I61f4e7bb28235d2974f3dcf70187437c80f918d3	2014-06-03 06:49:12 +00:00
Joe Gordon	14bfee5646	Don't include recheck instructions when unclassified failures If there is an unclassified failure in the check queue, we want to make it clear to the user so they will investigate the error as its most likely a valid failure. Also don't include recheck instructions when unclassified failure as they shouldn't be running a recheck if there is an unclassified failure. With us now classifying many failures from non-voting jobs, it is common to see classified failures and no mention of the job that legitimately failed. Partial revert of I52044afb4f3a1bf3f22ba4c0e8d38d76271ffc00 Change-Id: I6b471b9ab9c7f36eeed93993ea086bbc9daa56b0	2014-04-01 21:21:44 -07:00
Clark Boylan	fea4d1f7ee	Take advantage of the new build_short_uuid field. Recently the elasticsearch schema was updated to include a build_short_uuid field which has indexed the first 7 chars of the build_uuid. This field is useful because it allows e-r to filter on that field instead of searching on build_uuid. Update e-r to filter on build_short_uuid which should make queries much more performant. As part of this change replace variables named short_build_uuid with build_short_uuid for consistency with the elasticsearch schema. Change-Id: Iae5323f3f5d2fd01f2c69f78b9403baf5ebafe85	2014-03-26 12:14:32 -07:00
Sergey Lukjanov	f5c7bd47fa	Don't separate bug links with ',' It breaks links in some browsers (it opens link with ',' in the end of it). Change-Id: Ic4b98a8d9ce9567e2a06a38a3d49f7f4dd124259	2014-03-18 21:14:32 +04:00
Sean Dague	a04553ee89	only care about unclassifieds in gate queue we really don't care about check failures for classification, because those might just be terrible code, which we get a lot of. So only report unclassified tests to the user on gate failures. Now with extra tests for this behavior! Change-Id: I52044afb4f3a1bf3f22ba4c0e8d38d76271ffc00	2014-03-12 17:07:43 -04:00
Clark Boylan	34cdbeb3ca	Use correct time.sleep argument. Use the correct time.sleep argument when sleeping. Also replace a post for loop if check with an else to make the code more readable. Change-Id: Icdfb41d1436abe930e4f45243ff6fe378ba3f91b	2014-03-10 10:19:27 -07:00
Joe Gordon	b5be07cbb1	Add link to status.o.o/e-r to gerrit comment Point users to status.openstack.org/elastic-recheck to find further information and links on the bugs they hit. Change-Id: I9e6a70151d4f94c574b2eae55ff8ba0172189d7a	2014-03-06 16:48:48 -08:00
Joe Gordon	831781954e	Fix nesting for required files Only need to look for neutron or nova network if running tempest Change-Id: I3ade31e5e2b1fe777a2bcd3c3a6c403dea1de8f8	2014-03-06 13:19:41 -08:00
Joe Gordon	6c47bad772	Unbreak elastic-recheck A few bugs have crept into elastic-recheck causing it to fail. This patch fixes them. * an update to gerritlib broke FailedEvent.rev and change, since both of these should always be numbers cast to ints * We appear to be missing files occasionally, add better logging for (also simplify Exception classes) * Remove last usage of skip_resolved (removed in a previous patch) Change-Id: Ifc180989832be152e08a4873e62857a899835484	2014-03-06 11:43:21 -08:00
Clark Boylan	8b2e067b8c	Revert "move to static LOG" This reverts commit `e75b996e60`. Change is being reverted because we can't actually use a static LOG object if we expect setup_logging to do the right thing at runtime. Python logging will load logging objects at import time using the static LOG object before setup_logging can run otherwise. Conflicts: elastic_recheck/bot.py elastic_recheck/elasticRecheck.py Change-Id: I582c7e9c9b3c2ccab6a695bfba00a61f7c0a04a9	2014-03-06 11:12:43 -08:00
Jenkins	0d63741e11	Merge "ensure all the required files are there"	2014-02-08 01:09:44 +00:00
Sean Dague	2224bfbe98	ensure all the required files are there add in neutron, glance, and n-net logs as required files when appropriate. This will help ensure that we don't miss a pattern because we searched before the log was in the system. Change-Id: Ia8f2cdedfc9964f1d9589fda253174e972fcc770	2014-02-07 09:13:41 -05:00
Joe Gordon	61190329f7	Map failed jobs to bugs in gerrit comment Instead of just listing which bugs were seen in an entire gerrit event (multiple jenkins/zuul jobs), list which bugs were seen in which job. If one of the jobs has an unrecognized error don't display the comment about running recheck, just list which bugs were seen on which jobs (and which has an unrecognized error) Change-Id: I55b2eb8f0efe43ab22540294150d4bc9f5885510	2014-02-06 15:58:27 -08:00
Joe Gordon	2308b4a947	Convert failed_job into an object We are starting to track a decent amount of data per zuul/jenkins job, so track data in an object instead of assorted variables and dictionaries. For example bugs are now tracked by job and not gerrit event. Now, we can support reporting which bug caused which specific job to fail. This also does some assorted object related cleanups. This consists of internal changes only, a future patch will make the gerrit and irc comments take advantage of this. Change-Id: I2116cd0e10b45617a8d572b27f1672f695fa91d0	2014-02-06 15:56:27 -08:00

1 2

98 Commits