Commit Graph

330 Commits

Author SHA1 Message Date
frenzyfriday 01b6b8299b Fixing: TypeError: '<' not supported between instances of 'list' and 'str'
The above exception [1] occurs for example [2] when elasticsearch returns data
with more than one zuul_executor as a list.

This is what l#58 is able to sort
[(12.5, '5'), (12.5, '4'), (12.5, '3'), (25.0, '6'), (18.75, '2'), (6.25, '13'), (12.5, '1')]

This  is when it throws exception
[(8.13953488372093, 'ze06.opendev.org'),
(12.790697674418604, 'ze10.opendev.org'),
(5.813953488372093, 'ze05.opendev.org'),
(8.13953488372093, 'ze01.opendev.org'),
(16.27906976744186, 'ze04.opendev.org'),
(4.651162790697675, 'ze03.opendev.org'),
(3.488372093023256, 'ze02.opendev.org'),
(4.651162790697675, 'ze08.opendev.org'),
(12.790697674418604, 'ze09.opendev.org'),
(20.930232558139537, 'ze12.opendev.org'),
(1.1627906976744187, 'ze11.opendev.org'),
(1.1627906976744187, ['ze12.opendev.org', 'ze11.opendev.org'])]

[1] https://0050cb9fd8118437e3e0-3c2a18acb5109e625907972e3aa6a592.ssl.cf5.rackcdn.com/790065/7/check/openstack-tox-py38/4968a73/tox/test_results/1449136.yaml.log
[2] https://review.opendev.org/c/openstack/tripleo-ci-health-queries/+/787569/6/output/elastic-recheck/1449136.yaml

Change-Id: Ie559d5764d9f68420119a7f9608389f0745a9c02
2021-05-19 22:10:24 +02:00
Sorin Sbarnea 5104f277b8 Make file writing atomic
This should render the need to use wrappers obsolete as all
file writing operations are now atomic, assuring that we either
write the entire file or fail.

That is important as we do not want to end-up serving partial files
with the web-server.

Change-Id: I696e2474b557e6b5fea707a198f32cea721cc150
2020-11-09 10:02:18 +00:00
Zuul 262ec2c995 Merge "Enable configuration via environment variables" 2020-09-18 07:14:43 +00:00
Sorin Sbarnea e6b1354d91 Enforce use of safe yaml loader
Sorts runtime warning related to use of unsafe loading on yaml files.

Change-Id: I11f1332f34fe341b13100d8ae4c263cfbd5b90e0
2020-09-17 15:18:36 +01:00
Sorin Sbarnea 3d1411f3a1 Enable configuration via environment variables
Refactors configuration loading in order to simplify it and to
allow overriding defaults using environment variables.

This behavior is similar to other tools like pip or ansible, which
can load any configurable option from env.

This step ease migration towards containerized use, where we do not
want to keep any secrets inside containers and we may want to
avoid volume mounting, especially when testing.

Change-Id: I0d3a9f19b0ba8d1604d0ca63db01296a3219fb47
2020-09-17 10:59:38 +01:00
Sorin Sbarnea 3901d2fd93 Made parse_jenkins_failure a non static
Replaces static implementation that received password and a member
function that can make use of the config object.

Change-Id: If9617b6db73eb49c5193f098d45e357a267529dd
2020-09-15 10:04:15 +01:00
Sorin Sbarnea 9d37c88c8f pylint: 4 more
Change-Id: I4cc928d8212a5192927a994b4248f32fe05ca723
2020-09-11 11:19:35 +01:00
Sorin Sbarnea 360c57118c pylint: 6 more
Change-Id: Ic16db7972fe6f9da86592d56f4983572d7c68989
2020-09-10 15:28:52 +01:00
Sorin Sbarnea c41b9c6fa0 pylint: fixed logging-not-lazy
Change-Id: Ic25366a9afdfc67ab2beddbe2b8d02544c51e480
2020-09-10 14:56:52 +01:00
Sorin Sbarnea ed5296999e pylint: fixed imports
Fixed pylint violations around imports. Implements
standard import ordering (isort).

Change-Id: Ib89108925487e49109d18ae315cd4892b8b48837
2020-09-10 13:46:38 +01:00
Zuul bbd3a2f2d5 Merge "pylint fixes" 2020-09-10 12:14:05 +00:00
Sorin Sbarnea 78a8098354 pylint fixes
Resolves several code style violations.

Change-Id: Id03dad8f8ce141eb1e630a77d0c9ae497de9f2ed
2020-09-10 12:42:12 +01:00
Sorin Sbarnea c6f07d7f93 Use pytest for queries
Switches queries testing to use of pytest which provides the following:
- test generator for each query (parametrize)
- ability to test a single query test
- generate html report with test results, making easier to investigate
  failures.
- parallel executions
- minor bugfix which prevented running queries from running with py38
  as the  config parser requires only strings (None being invalid).

Change-Id: I982c694a5160a9ecfd117d177d30b911cfe53425
2020-09-10 10:24:37 +00:00
Sorin Sbarnea 97ca1c24c3 Drop py27 and add py38 jobs
- Dropping py27 as is out of support
- Enable py38 testing, already default python on several distros.
- removes six as a dependency as is no longer needed for pure py3

Change-Id: I1e825073abc6cd55aa2fdc363358f2701152c57b
2020-09-08 10:21:02 +01:00
Sorin Sbarnea 8f709c1d67 Resolve unsafe yaml.load use
Fxed deprecation warning about unsafe call

Change-Id: I474454f438d6345dea76daf788be14c93fee6fb6
2020-08-19 09:52:17 +01:00
Sorin Sbarnea f68a8719af Bumped flake8
- Upgraded hacking(flake8)
- Added more modern tox linters environment (pep8 alias)
- Temporary added skips for broken newer rules
- Fixed few basic rule violations
- Moved flake8 config to setup.cfg (tox.ini is not recommended)

Change-Id: I75b3ce5d2ce965a9dc5bdfaa49b2aacd8f0195ad
2020-05-23 08:54:14 +01:00
Clark Boylan 94ab7eb16b Don't pretty print json files
The json file outputs of e-r are loaded by web browsers in order to
render our graphs. These json files are actually quite large and part of
the reason why is we pretty print them with 4 space indents and they
have large nesting. Stop pretty printing (humans can pass the files
through a filter if necessary) in order to reduce the size of these
files and make browsers happier (less time spent downloading).

Change-Id: I19dedc2994169932eb0e90b6cdea3856637f5ef0
2020-01-29 10:05:38 -08:00
Matt Riedemann 62e42f4322 Handle ElasticHttpError in graph generation
Getting elasticsearch data for bug 1708704 is failing
in the check queue with:

  pyelasticsearch.exceptions.ElasticHttpError: \
  (500, 'ArrayIndexOutOfBoundsException[null]')

This might have to do with the size of the resulting
messages from the hits on the tripleo and kolla jobs,
I'm not sure.

What's clear though is the graph generation is blowing
up in the check queue on that bug but not the gate queue,
maybe due to a smaller result set, so this adds some
error handling in the graph generation for when a specific
bug query fails so it does not halt the entire build of the
graph.

Change-Id: Ibe18c9cccc421a6549a18148f1a2ce3c1e4339d4
2019-12-18 15:46:53 -05:00
Matt Riedemann 73a1e85c67 Hard-code os-brick into TestQueries.openstack_projects
The elastic-recheck-tox-queries job is failing because
there is a query on an os-brick bug and the os-brick
project in launchpad is not part of the openstack project
group. This change simply hard-codes it since we know os-brick
is part of openstack.

Change-Id: Ia05c009226f88da427ec6ad9724410cd6ebed859
Story: 2006736
Task: 37197
2019-10-16 16:45:40 -04:00
Matt Riedemann d753cf0190 Include "Invalid" bugs in cleanup CLI
If a bug is invalid in a project then we should probably
consider its query for removal in the cleanup command.
For example, bug 1663529 and bug 1828244 were both marked
Invalid and had no hits but weren't processed by the
cleanup command.

Change-Id: I7bac9fc169601c86a26565e9fa5b3d72c362a8fc
2019-08-29 16:15:17 -04:00
Matt Riedemann dbeeceeb8e Add script to remove queries for fixed bugs
This automates the process to remove old queries
for fixed bugs. It's a bit conservative to start
so it doesn't check for open reviews nor does it
filter out affected projects with non-Fix* status
on the bug. It can be made more robust once we're
confident in how it works and play with it on the
open queries.

Change-Id: Iaaf17892804453b99a846be27457c88e5a8f8a55
2019-08-19 17:54:47 -04:00
Zuul af7cce55d7 Merge "Use pyyaml safe_load to avoid deprecation warning" 2019-05-15 18:46:53 +00:00
Matt Riedemann fbdbbc0f40 Switch gerrit URL to review.opendev.org
As of the great renaming of 2019 we need to update the
openstack gerrit URL default to review.opendev.org.

Change-Id: I2e3f7e7fb03be0deba0c95995265376dbce3c5b6
Story: #2005498
Task: #30599
2019-04-22 11:55:55 -04:00
Matt Riedemann 562187cf85 Use pyyaml safe_load to avoid deprecation warning
yaml.load without a Loader is deprecated in pyyaml 5.1 [1]
so this change switches to use safe_load.

[1] https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation

Change-Id: I7fdda49d27732ae35c468e7f52adf1032d64a441
2019-04-22 09:26:42 -04:00
Jens Harbott e83d0dd58b Make the functional queries test work with python3
This seems to have been missed in the py3 conversion and seems to have
been broken for a while.

Change-Id: I9314ead6b0d14ed79ffd43c0f880daf57a014871
2019-03-05 11:25:28 +00:00
Matt Riedemann 2abdc593e8 Fix ALL_FAILS_QUERY
The playbook location changed and is no longer in project-config
so just make the query generic on the sub-directory rather than
the git repo.

Change-Id: I8532c193992adef0e996a3f42e9e84f491000c32
2019-03-04 15:53:24 -05:00
Matt Riedemann a8694f804a Log a warning if uncategorized_fails finds nothing
Chances are probably 0 that we won't have failures
or that we'd have 100% categorization rates, which
probably mean if we don't get any failures the
default ALL_FAILS_QUERY is broken, which can easily
happen:

  I208675c2258b6c635925c7b9ea9fae5afd000565

This logs a warning if a group yields no failures
based on the default ALL_FAILS_QUERY.

Change-Id: Ib2c12b1fc276389297cf4ac15775e6b2da828fdd
2019-01-17 17:18:05 -05:00
Clark Boylan cdf6ee031e Fix all fails query not matching any jobs
Monty updated the post-ssh.yaml playbook in project config to do other
post tasks and renamed it to post.yaml as a result. Change
If01bdd7b7656b1a9ebaa5d5d7d021f82093db8ac has all the details.

We need to accomodate that in the all fails query of e-r by updating the
all fails query to look for post.yaml instead of post-ssh.yaml. Note
that there are no query matches for post-ssh.yaml so we don't need an
interim period if matching both.

Change-Id: I208675c2258b6c635925c7b9ea9fae5afd000565
2019-01-17 13:40:34 -08:00
Clark Boylan 02d0651f29 Better event checking timeouts
Previously we gave every event a 20 minute timeout. This meant that we
could eventually rollover on the day and start querying against current
indexes for data in older indexes. If this happens every query would
fail because we are looking in the wrong index. Every query failing
means we run the 20 minute timeout every time.

All this results in snowballing never being able to check if events are
indexed.

Address this by using the gerrit eventCreatedOn timestamp to determine
when our timeout is hit. We will timeout 20 minutes from that timestamp
regardless of how long interim processing has taken us. This should over
longer periods of time ensure we query the current index for current
events.

Change-Id: Ic9ed7fefae37d2668de5d89e0d06b8326eadfbb9
2018-11-30 19:34:54 +00:00
Sorin Sbarnea 6c4f466282 Made elastic-recheck py3 compatible
- Adds py36/py37 jobs.
- Fixed invalid syntax errors
- Bumps dependencies to versions that are py3 compatible

Change-Id: I0cebc35993b259cc86470c5ceb1016462a1d649b
Related-Bug: #1803402
2018-11-29 20:15:07 +00:00
Chuck Short 3d0ea8cd30 Drop mox usage
Mox is not being used anywhere, so remove it.

Change-Id: I66bee0fa0d22e99554eae51c74b149d212ede7ed
Signed-off-by: Chuck Short <chucks@redhat.com>
2018-08-21 21:40:15 -04:00
Matt Riedemann 320932bbc1 Fix query_builder.result_ready() query for zuulv3 results
The first condition does not hit because the message with
the path to post-ssh.yaml was not hitting without the .yaml
suffix on the path. Assuming the unescaped . was messing
with the query.

Change-Id: I293e9c6fa215cb3f8638763895fccb4bfcf3c235
2018-05-08 17:57:42 -04:00
Clark Boylan 610841f9a3 Fix parens matching in result ready query
There was a missing closing paren (we were short one).

Change-Id: Icb87aaa7509fd1fc3b595d20ef2ba29d4456317a
2018-05-08 14:25:16 -07:00
Zuul 521d6442df Merge "grenade: wait for service logs before matching against patterns" 2018-03-22 16:00:43 +00:00
Ihar Hrachyshka f3e41b2d72 grenade: wait for service logs before matching against patterns
Before the patch, the daemon was not waiting for all service logs to
upload to logstash, just console and grenade logs. It means that queries
that targeted service log files sometimes missed a legit match.

Change-Id: I96ae09c1be8f1b12117bcfc635589e7c149d5df2
2018-01-05 21:39:27 +00:00
Zuul ef7e3bdb84 Merge "SearchEngine search only supports ints for days" 2017-12-20 22:29:52 +00:00
Clark Boylan 53f45539c0 Fix all fails query
There were two problems with the all fails query as sorted out by
manually running the query in kibana. First the query didn't properly
group the two sides of the job log ending query. They were separated by
an OR and were meant to be grouped together as one clause in the query.

Second zuul now requires the .yaml suffix on playbook names so the query
looking for the post ssh playbook needs to end with .yaml.

Change-Id: I951b2824fe6934eca667d1b14f8caf63428da89a
2017-11-30 15:42:57 -08:00
Clark Boylan 69b588cb2b Balance parens in all fails query
Updating uncategorized failures is currently failing on a query parse
error in elasticsearch. This appears to be due to unbalanced parens in
the new all fails query. Rebalance the parens by removing the extra
leading paren.

Change-Id: I05626c563a9a053e396782c54dae4c6fa7d6e269
2017-10-27 10:10:19 -07:00
Zuul c96ced5953 Merge "Add support for Zuulv3-specific parameters in elastic-recheck" 2017-10-27 10:22:54 +00:00
David Moreau-Simard 97f6408b54
Add support for Zuulv3-specific parameters in elastic-recheck
This commit ensures elastic-recheck is able to support zuul v2 and v3
simultaneously:

- Add message queries based on v3 completion messages
- Include job-output.txt where console.html was expected

Change-Id: If3d7990d892a9698a112d9ff1dd5160998a4efe6
Depends-On: I7e34206d7968bf128e140468b9a222ecbce3a8f1
2017-10-26 19:18:16 -04:00
Matt Riedemann 57aab15c11 Retry change queries on a 502 response
When gerrit is running slow we get 502 responses
back which kills the graph builder. We can retry
these requests from the client to keep going. Generally
a single retry fixes it.

Change-Id: I745d7c9b80ab8861972193d82c037df76af69e06
2017-10-09 20:18:58 -04:00
Paul Belanger 1af432d0a5 Switch to jobs_re for config setting
To avoid confusion, switch everything to use jobs_re for recheckwatch
config.

Change-Id: I1a84db6ec346a32f38e00560c1b322e7d377d434
Needed-By: I1e2369225c9bd83296684af0dd9ea0514d9098c4
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-08-07 13:53:53 -04:00
Jenkins 91dbae986c Merge "Pass config from Stream to FailEvent" 2017-08-01 19:03:39 +00:00
Jenkins a3ee8216ff Merge "Wait until the most recent index is available" 2017-01-30 14:46:05 +00:00
Matt Riedemann 88cdb49304 Bump console log timeout to 20 minutes
13.3 minutes doesn't seem to be adequate anymore for getting the
console logs from elasticsearch, so this change increases the timeout
to 20 minutes.

Change-Id: I77f9d79833e23f2b9cda3622832d4315ea574f4a
2017-01-14 09:46:06 -05:00
Jenkins ec0b1986f4 Merge "Interpolate strings using logging own methods" 2016-12-01 23:53:16 +00:00
Matt Riedemann 3ac609c43c Pass config from Stream to FailEvent
Every time we create a FailEvent for a failed job
gerrit comment event we're reconstructing a Config
object unnecessarily, we can just pass the config in
from the Stream object to the FailEvent object.

Change-Id: Ibd85a4f0e813bc9bfff69de8f4f42951face88e4
2016-11-15 17:12:08 -05:00
Ihar Hrachyshka 69807fc6eb Capture all dsvm jobs with the default jobs regex
This should make elastic recheck to capture queries in projects like
neutron where the previous regex was not working for quite some time.

(In neutron gate, full job is called
gate-tempest-dsvm-neutron-full-ubuntu-xenial; there are some jobs that
don't even have 'tempest' in their names that should still participate
in the elastic recheck, like grenade jobs, or rally; all of them have
'dsvm' part though).

Speaking of the regex, probably it should have also be applied to
separate job names before classifying them. But I'll leave it for a
follow-up.

Change-Id: If98951d13ba82833444ef4ffbb7c6be179126f2b
2016-11-10 11:09:49 +00:00
zhangyanxian 0348b7e001 Interpolate strings using logging own methods
String interpolation should be delayed to be handled by the logging code, 
rather than being done at the point of the logging call. 
Ref:http://docs.openstack.org/developer/oslo.i18n/guidelines.html#log-translation
For example:
# WRONG
LOG.info(_LI('some message: variable=%s') % variable)
# RIGHT
LOG.info(_LI('some message: variable=%s'), variable)

Change-Id: I44b85cbf9f4b27d1fee2c1465029fca8cde4f87e
2016-11-08 01:49:57 +00:00
Ramy Asselin de437439ad Wait until the most recent index is available
When elastic search indexing is behind, and the day has
progressed forward to a new day,  the latest
index is not yet available for use. Exclude it from searches
until it is ready in order to avoid the ElasticHttpNotFoundError.

Add Unit tests for this case as well as for when multiple days
are specified for the search.

Change-Id: Ifd27d1ab21bebcb63b48ea164f425c4a2ac8759c
2016-10-20 10:48:55 -07:00