elastic-recheck

Commit Graph

Author	SHA1	Message	Date
frenzyfriday	01b6b8299b	Fixing: TypeError: '<' not supported between instances of 'list' and 'str' The above exception [1] occurs for example [2] when elasticsearch returns data with more than one zuul_executor as a list. This is what l#58 is able to sort [(12.5, '5'), (12.5, '4'), (12.5, '3'), (25.0, '6'), (18.75, '2'), (6.25, '13'), (12.5, '1')] This is when it throws exception [(8.13953488372093, 'ze06.opendev.org'), (12.790697674418604, 'ze10.opendev.org'), (5.813953488372093, 'ze05.opendev.org'), (8.13953488372093, 'ze01.opendev.org'), (16.27906976744186, 'ze04.opendev.org'), (4.651162790697675, 'ze03.opendev.org'), (3.488372093023256, 'ze02.opendev.org'), (4.651162790697675, 'ze08.opendev.org'), (12.790697674418604, 'ze09.opendev.org'), (20.930232558139537, 'ze12.opendev.org'), (1.1627906976744187, 'ze11.opendev.org'), (1.1627906976744187, ['ze12.opendev.org', 'ze11.opendev.org'])] [1] https://0050cb9fd8118437e3e0-3c2a18acb5109e625907972e3aa6a592.ssl.cf5.rackcdn.com/790065/7/check/openstack-tox-py38/4968a73/tox/test_results/1449136.yaml.log [2] https://review.opendev.org/c/openstack/tripleo-ci-health-queries/+/787569/6/output/elastic-recheck/1449136.yaml Change-Id: Ie559d5764d9f68420119a7f9608389f0745a9c02	2021-05-19 22:10:24 +02:00
Sorin Sbarnea	5104f277b8	Make file writing atomic This should render the need to use wrappers obsolete as all file writing operations are now atomic, assuring that we either write the entire file or fail. That is important as we do not want to end-up serving partial files with the web-server. Change-Id: I696e2474b557e6b5fea707a198f32cea721cc150	2020-11-09 10:02:18 +00:00
Zuul	262ec2c995	Merge "Enable configuration via environment variables"	2020-09-18 07:14:43 +00:00
Sorin Sbarnea	e6b1354d91	Enforce use of safe yaml loader Sorts runtime warning related to use of unsafe loading on yaml files. Change-Id: I11f1332f34fe341b13100d8ae4c263cfbd5b90e0	2020-09-17 15:18:36 +01:00
Sorin Sbarnea	3d1411f3a1	Enable configuration via environment variables Refactors configuration loading in order to simplify it and to allow overriding defaults using environment variables. This behavior is similar to other tools like pip or ansible, which can load any configurable option from env. This step ease migration towards containerized use, where we do not want to keep any secrets inside containers and we may want to avoid volume mounting, especially when testing. Change-Id: I0d3a9f19b0ba8d1604d0ca63db01296a3219fb47	2020-09-17 10:59:38 +01:00
Sorin Sbarnea	3901d2fd93	Made parse_jenkins_failure a non static Replaces static implementation that received password and a member function that can make use of the config object. Change-Id: If9617b6db73eb49c5193f098d45e357a267529dd	2020-09-15 10:04:15 +01:00
Sorin Sbarnea	9d37c88c8f	pylint: 4 more Change-Id: I4cc928d8212a5192927a994b4248f32fe05ca723	2020-09-11 11:19:35 +01:00
Sorin Sbarnea	360c57118c	pylint: 6 more Change-Id: Ic16db7972fe6f9da86592d56f4983572d7c68989	2020-09-10 15:28:52 +01:00
Sorin Sbarnea	c41b9c6fa0	pylint: fixed logging-not-lazy Change-Id: Ic25366a9afdfc67ab2beddbe2b8d02544c51e480	2020-09-10 14:56:52 +01:00
Sorin Sbarnea	ed5296999e	pylint: fixed imports Fixed pylint violations around imports. Implements standard import ordering (isort). Change-Id: Ib89108925487e49109d18ae315cd4892b8b48837	2020-09-10 13:46:38 +01:00
Zuul	bbd3a2f2d5	Merge "pylint fixes"	2020-09-10 12:14:05 +00:00
Sorin Sbarnea	78a8098354	pylint fixes Resolves several code style violations. Change-Id: Id03dad8f8ce141eb1e630a77d0c9ae497de9f2ed	2020-09-10 12:42:12 +01:00
Sorin Sbarnea	c6f07d7f93	Use pytest for queries Switches queries testing to use of pytest which provides the following: - test generator for each query (parametrize) - ability to test a single query test - generate html report with test results, making easier to investigate failures. - parallel executions - minor bugfix which prevented running queries from running with py38 as the config parser requires only strings (None being invalid). Change-Id: I982c694a5160a9ecfd117d177d30b911cfe53425	2020-09-10 10:24:37 +00:00
Sorin Sbarnea	97ca1c24c3	Drop py27 and add py38 jobs - Dropping py27 as is out of support - Enable py38 testing, already default python on several distros. - removes six as a dependency as is no longer needed for pure py3 Change-Id: I1e825073abc6cd55aa2fdc363358f2701152c57b	2020-09-08 10:21:02 +01:00
Sorin Sbarnea	8f709c1d67	Resolve unsafe yaml.load use Fxed deprecation warning about unsafe call Change-Id: I474454f438d6345dea76daf788be14c93fee6fb6	2020-08-19 09:52:17 +01:00
Sorin Sbarnea	f68a8719af	Bumped flake8 - Upgraded hacking(flake8) - Added more modern tox linters environment (pep8 alias) - Temporary added skips for broken newer rules - Fixed few basic rule violations - Moved flake8 config to setup.cfg (tox.ini is not recommended) Change-Id: I75b3ce5d2ce965a9dc5bdfaa49b2aacd8f0195ad	2020-05-23 08:54:14 +01:00
Clark Boylan	94ab7eb16b	Don't pretty print json files The json file outputs of e-r are loaded by web browsers in order to render our graphs. These json files are actually quite large and part of the reason why is we pretty print them with 4 space indents and they have large nesting. Stop pretty printing (humans can pass the files through a filter if necessary) in order to reduce the size of these files and make browsers happier (less time spent downloading). Change-Id: I19dedc2994169932eb0e90b6cdea3856637f5ef0	2020-01-29 10:05:38 -08:00
Matt Riedemann	62e42f4322	Handle ElasticHttpError in graph generation Getting elasticsearch data for bug 1708704 is failing in the check queue with: pyelasticsearch.exceptions.ElasticHttpError: \ (500, 'ArrayIndexOutOfBoundsException[null]') This might have to do with the size of the resulting messages from the hits on the tripleo and kolla jobs, I'm not sure. What's clear though is the graph generation is blowing up in the check queue on that bug but not the gate queue, maybe due to a smaller result set, so this adds some error handling in the graph generation for when a specific bug query fails so it does not halt the entire build of the graph. Change-Id: Ibe18c9cccc421a6549a18148f1a2ce3c1e4339d4	2019-12-18 15:46:53 -05:00
Matt Riedemann	73a1e85c67	Hard-code os-brick into TestQueries.openstack_projects The elastic-recheck-tox-queries job is failing because there is a query on an os-brick bug and the os-brick project in launchpad is not part of the openstack project group. This change simply hard-codes it since we know os-brick is part of openstack. Change-Id: Ia05c009226f88da427ec6ad9724410cd6ebed859 Story: 2006736 Task: 37197	2019-10-16 16:45:40 -04:00
Matt Riedemann	d753cf0190	Include "Invalid" bugs in cleanup CLI If a bug is invalid in a project then we should probably consider its query for removal in the cleanup command. For example, bug 1663529 and bug 1828244 were both marked Invalid and had no hits but weren't processed by the cleanup command. Change-Id: I7bac9fc169601c86a26565e9fa5b3d72c362a8fc	2019-08-29 16:15:17 -04:00
Matt Riedemann	dbeeceeb8e	Add script to remove queries for fixed bugs This automates the process to remove old queries for fixed bugs. It's a bit conservative to start so it doesn't check for open reviews nor does it filter out affected projects with non-Fix* status on the bug. It can be made more robust once we're confident in how it works and play with it on the open queries. Change-Id: Iaaf17892804453b99a846be27457c88e5a8f8a55	2019-08-19 17:54:47 -04:00
Zuul	af7cce55d7	Merge "Use pyyaml safe_load to avoid deprecation warning"	2019-05-15 18:46:53 +00:00
Matt Riedemann	fbdbbc0f40	Switch gerrit URL to review.opendev.org As of the great renaming of 2019 we need to update the openstack gerrit URL default to review.opendev.org. Change-Id: I2e3f7e7fb03be0deba0c95995265376dbce3c5b6 Story: #2005498 Task: #30599	2019-04-22 11:55:55 -04:00
Matt Riedemann	562187cf85	Use pyyaml safe_load to avoid deprecation warning yaml.load without a Loader is deprecated in pyyaml 5.1 [1] so this change switches to use safe_load. [1] https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation Change-Id: I7fdda49d27732ae35c468e7f52adf1032d64a441	2019-04-22 09:26:42 -04:00
Jens Harbott	e83d0dd58b	Make the functional queries test work with python3 This seems to have been missed in the py3 conversion and seems to have been broken for a while. Change-Id: I9314ead6b0d14ed79ffd43c0f880daf57a014871	2019-03-05 11:25:28 +00:00
Matt Riedemann	2abdc593e8	Fix ALL_FAILS_QUERY The playbook location changed and is no longer in project-config so just make the query generic on the sub-directory rather than the git repo. Change-Id: I8532c193992adef0e996a3f42e9e84f491000c32	2019-03-04 15:53:24 -05:00
Matt Riedemann	a8694f804a	Log a warning if uncategorized_fails finds nothing Chances are probably 0 that we won't have failures or that we'd have 100% categorization rates, which probably mean if we don't get any failures the default ALL_FAILS_QUERY is broken, which can easily happen: I208675c2258b6c635925c7b9ea9fae5afd000565 This logs a warning if a group yields no failures based on the default ALL_FAILS_QUERY. Change-Id: Ib2c12b1fc276389297cf4ac15775e6b2da828fdd	2019-01-17 17:18:05 -05:00
Clark Boylan	cdf6ee031e	Fix all fails query not matching any jobs Monty updated the post-ssh.yaml playbook in project config to do other post tasks and renamed it to post.yaml as a result. Change If01bdd7b7656b1a9ebaa5d5d7d021f82093db8ac has all the details. We need to accomodate that in the all fails query of e-r by updating the all fails query to look for post.yaml instead of post-ssh.yaml. Note that there are no query matches for post-ssh.yaml so we don't need an interim period if matching both. Change-Id: I208675c2258b6c635925c7b9ea9fae5afd000565	2019-01-17 13:40:34 -08:00
Clark Boylan	02d0651f29	Better event checking timeouts Previously we gave every event a 20 minute timeout. This meant that we could eventually rollover on the day and start querying against current indexes for data in older indexes. If this happens every query would fail because we are looking in the wrong index. Every query failing means we run the 20 minute timeout every time. All this results in snowballing never being able to check if events are indexed. Address this by using the gerrit eventCreatedOn timestamp to determine when our timeout is hit. We will timeout 20 minutes from that timestamp regardless of how long interim processing has taken us. This should over longer periods of time ensure we query the current index for current events. Change-Id: Ic9ed7fefae37d2668de5d89e0d06b8326eadfbb9	2018-11-30 19:34:54 +00:00
Sorin Sbarnea	6c4f466282	Made elastic-recheck py3 compatible - Adds py36/py37 jobs. - Fixed invalid syntax errors - Bumps dependencies to versions that are py3 compatible Change-Id: I0cebc35993b259cc86470c5ceb1016462a1d649b Related-Bug: #1803402	2018-11-29 20:15:07 +00:00
Chuck Short	3d0ea8cd30	Drop mox usage Mox is not being used anywhere, so remove it. Change-Id: I66bee0fa0d22e99554eae51c74b149d212ede7ed Signed-off-by: Chuck Short <chucks@redhat.com>	2018-08-21 21:40:15 -04:00
Matt Riedemann	320932bbc1	Fix query_builder.result_ready() query for zuulv3 results The first condition does not hit because the message with the path to post-ssh.yaml was not hitting without the .yaml suffix on the path. Assuming the unescaped . was messing with the query. Change-Id: I293e9c6fa215cb3f8638763895fccb4bfcf3c235	2018-05-08 17:57:42 -04:00
Clark Boylan	610841f9a3	Fix parens matching in result ready query There was a missing closing paren (we were short one). Change-Id: Icb87aaa7509fd1fc3b595d20ef2ba29d4456317a	2018-05-08 14:25:16 -07:00
Zuul	521d6442df	Merge "grenade: wait for service logs before matching against patterns"	2018-03-22 16:00:43 +00:00
Ihar Hrachyshka	f3e41b2d72	grenade: wait for service logs before matching against patterns Before the patch, the daemon was not waiting for all service logs to upload to logstash, just console and grenade logs. It means that queries that targeted service log files sometimes missed a legit match. Change-Id: I96ae09c1be8f1b12117bcfc635589e7c149d5df2	2018-01-05 21:39:27 +00:00
Zuul	ef7e3bdb84	Merge "SearchEngine search only supports ints for days"	2017-12-20 22:29:52 +00:00
Clark Boylan	53f45539c0	Fix all fails query There were two problems with the all fails query as sorted out by manually running the query in kibana. First the query didn't properly group the two sides of the job log ending query. They were separated by an OR and were meant to be grouped together as one clause in the query. Second zuul now requires the .yaml suffix on playbook names so the query looking for the post ssh playbook needs to end with .yaml. Change-Id: I951b2824fe6934eca667d1b14f8caf63428da89a	2017-11-30 15:42:57 -08:00
Clark Boylan	69b588cb2b	Balance parens in all fails query Updating uncategorized failures is currently failing on a query parse error in elasticsearch. This appears to be due to unbalanced parens in the new all fails query. Rebalance the parens by removing the extra leading paren. Change-Id: I05626c563a9a053e396782c54dae4c6fa7d6e269	2017-10-27 10:10:19 -07:00
Zuul	c96ced5953	Merge "Add support for Zuulv3-specific parameters in elastic-recheck"	2017-10-27 10:22:54 +00:00
David Moreau-Simard	97f6408b54	Add support for Zuulv3-specific parameters in elastic-recheck This commit ensures elastic-recheck is able to support zuul v2 and v3 simultaneously: - Add message queries based on v3 completion messages - Include job-output.txt where console.html was expected Change-Id: If3d7990d892a9698a112d9ff1dd5160998a4efe6 Depends-On: I7e34206d7968bf128e140468b9a222ecbce3a8f1	2017-10-26 19:18:16 -04:00
Matt Riedemann	57aab15c11	Retry change queries on a 502 response When gerrit is running slow we get 502 responses back which kills the graph builder. We can retry these requests from the client to keep going. Generally a single retry fixes it. Change-Id: I745d7c9b80ab8861972193d82c037df76af69e06	2017-10-09 20:18:58 -04:00
Paul Belanger	1af432d0a5	Switch to jobs_re for config setting To avoid confusion, switch everything to use jobs_re for recheckwatch config. Change-Id: I1a84db6ec346a32f38e00560c1b322e7d377d434 Needed-By: I1e2369225c9bd83296684af0dd9ea0514d9098c4 Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2017-08-07 13:53:53 -04:00
Jenkins	91dbae986c	Merge "Pass config from Stream to FailEvent"	2017-08-01 19:03:39 +00:00
Jenkins	a3ee8216ff	Merge "Wait until the most recent index is available"	2017-01-30 14:46:05 +00:00
Matt Riedemann	88cdb49304	Bump console log timeout to 20 minutes 13.3 minutes doesn't seem to be adequate anymore for getting the console logs from elasticsearch, so this change increases the timeout to 20 minutes. Change-Id: I77f9d79833e23f2b9cda3622832d4315ea574f4a	2017-01-14 09:46:06 -05:00
Jenkins	ec0b1986f4	Merge "Interpolate strings using logging own methods"	2016-12-01 23:53:16 +00:00
Matt Riedemann	3ac609c43c	Pass config from Stream to FailEvent Every time we create a FailEvent for a failed job gerrit comment event we're reconstructing a Config object unnecessarily, we can just pass the config in from the Stream object to the FailEvent object. Change-Id: Ibd85a4f0e813bc9bfff69de8f4f42951face88e4	2016-11-15 17:12:08 -05:00
Ihar Hrachyshka	69807fc6eb	Capture all dsvm jobs with the default jobs regex This should make elastic recheck to capture queries in projects like neutron where the previous regex was not working for quite some time. (In neutron gate, full job is called gate-tempest-dsvm-neutron-full-ubuntu-xenial; there are some jobs that don't even have 'tempest' in their names that should still participate in the elastic recheck, like grenade jobs, or rally; all of them have 'dsvm' part though). Speaking of the regex, probably it should have also be applied to separate job names before classifying them. But I'll leave it for a follow-up. Change-Id: If98951d13ba82833444ef4ffbb7c6be179126f2b	2016-11-10 11:09:49 +00:00
zhangyanxian	0348b7e001	Interpolate strings using logging own methods String interpolation should be delayed to be handled by the logging code, rather than being done at the point of the logging call. Ref:http://docs.openstack.org/developer/oslo.i18n/guidelines.html#log-translation For example: # WRONG LOG.info(_LI('some message: variable=%s') % variable) # RIGHT LOG.info(_LI('some message: variable=%s'), variable) Change-Id: I44b85cbf9f4b27d1fee2c1465029fca8cde4f87e	2016-11-08 01:49:57 +00:00
Ramy Asselin	de437439ad	Wait until the most recent index is available When elastic search indexing is behind, and the day has progressed forward to a new day, the latest index is not yet available for use. Exclude it from searches until it is ready in order to avoid the ElasticHttpNotFoundError. Add Unit tests for this case as well as for when multiple days are specified for the search. Change-Id: Ifd27d1ab21bebcb63b48ea164f425c4a2ac8759c	2016-10-20 10:48:55 -07:00

1 2 3 4 5 ...

330 Commits