Commit Graph

98 Commits

Author SHA1 Message Date
Sorin Sbarnea 3901d2fd93 Made parse_jenkins_failure a non static
Replaces static implementation that received password and a member
function that can make use of the config object.

Change-Id: If9617b6db73eb49c5193f098d45e357a267529dd
2020-09-15 10:04:15 +01:00
Sorin Sbarnea 9d37c88c8f pylint: 4 more
Change-Id: I4cc928d8212a5192927a994b4248f32fe05ca723
2020-09-11 11:19:35 +01:00
Sorin Sbarnea 360c57118c pylint: 6 more
Change-Id: Ic16db7972fe6f9da86592d56f4983572d7c68989
2020-09-10 15:28:52 +01:00
Sorin Sbarnea c41b9c6fa0 pylint: fixed logging-not-lazy
Change-Id: Ic25366a9afdfc67ab2beddbe2b8d02544c51e480
2020-09-10 14:56:52 +01:00
Sorin Sbarnea ed5296999e pylint: fixed imports
Fixed pylint violations around imports. Implements
standard import ordering (isort).

Change-Id: Ib89108925487e49109d18ae315cd4892b8b48837
2020-09-10 13:46:38 +01:00
Sorin Sbarnea 78a8098354 pylint fixes
Resolves several code style violations.

Change-Id: Id03dad8f8ce141eb1e630a77d0c9ae497de9f2ed
2020-09-10 12:42:12 +01:00
Sorin Sbarnea f68a8719af Bumped flake8
- Upgraded hacking(flake8)
- Added more modern tox linters environment (pep8 alias)
- Temporary added skips for broken newer rules
- Fixed few basic rule violations
- Moved flake8 config to setup.cfg (tox.ini is not recommended)

Change-Id: I75b3ce5d2ce965a9dc5bdfaa49b2aacd8f0195ad
2020-05-23 08:54:14 +01:00
Clark Boylan 02d0651f29 Better event checking timeouts
Previously we gave every event a 20 minute timeout. This meant that we
could eventually rollover on the day and start querying against current
indexes for data in older indexes. If this happens every query would
fail because we are looking in the wrong index. Every query failing
means we run the 20 minute timeout every time.

All this results in snowballing never being able to check if events are
indexed.

Address this by using the gerrit eventCreatedOn timestamp to determine
when our timeout is hit. We will timeout 20 minutes from that timestamp
regardless of how long interim processing has taken us. This should over
longer periods of time ensure we query the current index for current
events.

Change-Id: Ic9ed7fefae37d2668de5d89e0d06b8326eadfbb9
2018-11-30 19:34:54 +00:00
Sorin Sbarnea 6c4f466282 Made elastic-recheck py3 compatible
- Adds py36/py37 jobs.
- Fixed invalid syntax errors
- Bumps dependencies to versions that are py3 compatible

Change-Id: I0cebc35993b259cc86470c5ceb1016462a1d649b
Related-Bug: #1803402
2018-11-29 20:15:07 +00:00
Ihar Hrachyshka f3e41b2d72 grenade: wait for service logs before matching against patterns
Before the patch, the daemon was not waiting for all service logs to
upload to logstash, just console and grenade logs. It means that queries
that targeted service log files sometimes missed a legit match.

Change-Id: I96ae09c1be8f1b12117bcfc635589e7c149d5df2
2018-01-05 21:39:27 +00:00
David Moreau-Simard 97f6408b54
Add support for Zuulv3-specific parameters in elastic-recheck
This commit ensures elastic-recheck is able to support zuul v2 and v3
simultaneously:

- Add message queries based on v3 completion messages
- Include job-output.txt where console.html was expected

Change-Id: If3d7990d892a9698a112d9ff1dd5160998a4efe6
Depends-On: I7e34206d7968bf128e140468b9a222ecbce3a8f1
2017-10-26 19:18:16 -04:00
Jenkins 91dbae986c Merge "Pass config from Stream to FailEvent" 2017-08-01 19:03:39 +00:00
Matt Riedemann 88cdb49304 Bump console log timeout to 20 minutes
13.3 minutes doesn't seem to be adequate anymore for getting the
console logs from elasticsearch, so this change increases the timeout
to 20 minutes.

Change-Id: I77f9d79833e23f2b9cda3622832d4315ea574f4a
2017-01-14 09:46:06 -05:00
Matt Riedemann 3ac609c43c Pass config from Stream to FailEvent
Every time we create a FailEvent for a failed job
gerrit comment event we're reconstructing a Config
object unnecessarily, we can just pass the config in
from the Stream object to the FailEvent object.

Change-Id: Ibd85a4f0e813bc9bfff69de8f4f42951face88e4
2016-11-15 17:12:08 -05:00
Ramy Asselin 49999256f4 Make Elastic Recheck Watch more reusable
Refactor to use a config class to hold all the
params needed so that they can be more easily
overridden and reused across all the
elastic-recheck tools.

In addition, use the new class to make the
jobs_regex and ci_username configurable.

Change-Id: Ic6f115a6882494bf4c087ded4d7cafa557765c28
2016-09-20 18:11:30 -07:00
Jenkins 9c468aedc0 Merge "10 day count is too high" 2016-07-20 20:36:25 +00:00
James E. Blair b32e422008 Support Zuul as a Gerrit user equivalent to Jenkins
Change-Id: I72c76fc1891b56aeb827a86e33f594324dce26bc
2016-06-16 09:29:41 -07:00
Ramy Asselin af21482811 10 day count is too high
Graphs counts were looking at all history instead of just 10 days
as intended. Update the search to only look at the most recent 10
days.

Change-Id: I9495888a818986b3ac187bac7fd65fbcad6135a3
2016-03-03 12:10:39 -08:00
Sean Dague f9b2619fe4 add a string repr for FailEvent
This makes debugging code gone wrong a bit simpler.

Also fix other __str__ function to use __repr__ as well, to make it
consistent that objects which want representations implement __repr__
and not __str__.

Change-Id: I6913da8f3ef6a4632d5f1c9d6ed26a38cdcd5e73
2015-12-02 14:44:56 -05:00
Sean Dague 314d578653 only query voting changes
Elastic recheck is about failures, all queries should only include
voting changes. We do this by explicitly adding voting:1 to all
queries that load in the query builder.

Change-Id: I4bd4827f72d85bf69bf501be2f5744e71de35a3c
2015-12-02 12:41:03 -05:00
Matt Riedemann 429d4aca85 Fix default port in elastic search URL
pyelasticsearch>1.0 defaults the port to 9200 but logstash.o.o/es
is on port 80, so update the defaults in code and config samples.

Change-Id: Ibb85cd29e1cbc3ff448aa8470854fe0f8bede260
2015-11-10 09:00:51 -08:00
Ramy Asselin 96dca00b19 Enable configurable uris in uncategorized_fails.py
Currently it is not possible to point to a different database or
elastic search engine. Make these configurable by using the
same configuration file used by bot.py.

Also add a logstash url so that it can be configured separately
from elastic search url.

Change-Id: I77e4215765e32c34b67c38e37e5764c6c0e45c84
2015-10-20 20:23:18 +00:00
Matthew Treinish 48ebc14283
Add config flags for data source configuration
This commit adds options to the config file for the elastic recheck
bot configuration file. This enables users to specify how to connect
to an elastic recheck server and a subunit2sql database, but things
will still default to using the openstack-infra servers to prevent
breaking the running service.

Change-Id: I10db1a568cc01e137e5f4d8a8814b17201c4c438
2015-08-18 17:34:10 -04:00
Matthew Treinish d83ef2e5ea
Add support to filter results by failure test_ids
This commit adds a new field to the query yaml test_ids which is a
list of test_ids that will be query the subunit2sql db to verify that
at least one of them failed on the failed uuid.

Change-Id: If3668709e3294b5d6bf9e1f082396fbc39c08512
2015-08-14 17:11:39 -04:00
Joe Gordon 2b97c0d156 Upgrade to hacking 0.10.x
Fix up issues detected by new version of hacking

Change-Id: Ie12b9f5ccaa1ce5f49ee6bf35d3275bc9dbcbc15
2015-04-30 17:00:22 -07:00
Joe Gordon 612d43f971 Add support to suppress bot notifications
* Similar to suppress-graph

There are some gate failures that are expected and are real errors (such
as global-requirements mismatches in requirements jobs).
suppress-notifications allows us to classify these failures and remove
them from the unclassified page while not telling developers to recheck.

This can be used along with suppress-graph.

Change-Id: I6d905ba65e66e799a65598f8a5d5c3dd684feb8c
2015-01-23 09:42:54 -08:00
Joe Gordon 2a1767767b Make build_short_uuid work with URLs with trailing slashes
jenkins now includes a trailing slash
  http://logs.openstack.org/89/141489/6/check/gate-horizon-pep8/48238d7/

so update code and unit test to support an optional trailing slash

Change-Id: I2b180ffb5c15436ac40a70b5e746c2d719a8152f
2015-01-13 14:09:03 +13:00
Joe Gordon 845fc73057 Update jenkins string to detect failed jobs
I252ae31e7a4cb919e3c98c35591147cc96cfc3cc added the pipeline name to the
zuul gerrit comments. Update the string matching here to work with new
comment format.

Change-Id: I7c09b8f40d594733309660ed76647886653e53ec
2014-10-24 09:09:15 -07:00
Sean Dague cf0e9d6ef2 add support for computing relevant dates
This records the current time when the data is constructed, the date
of the last valid looking piece of data in elastic search, and how far
behind we seem to be on indexing. The json payload is adjusted to be
able to take additional metadata to support displaying this on the ER
page.

Change-Id: I0068ca0bbe72943d5d92dea704659ed865fea198
2014-09-29 11:08:37 -04:00
Matt Riedemann 30b7c43f24 Sort the gerrit failed event bug urls map for predictive tests
The bug_urls_map method is actually returning a list so just sort the
list and fix the tests that are racing due to random hashseed issues
with the dict.

This also updates the docstring which was incorrect before.

Related-Bug: #1348818

Change-Id: I13ca69b3e685083d4ced2b054e0d42a440259854
2014-08-21 23:39:24 -07:00
Yuriy Taraday b95e3b5f8d Force newline after each part of comment message
This would force a whitespace between message parts so that for
example URL at the end of 'unrecognized' part won't get joined with
first word of 'footer'.

This change also fixes hidden (I guess) bug that should've been
producing UnboundLocalError if FailedEvent.get_all_bugs() returns None.

Change-Id: I3a44db0b7018c49f87702d900961ea7119081b12
2014-08-15 01:21:46 +04:00
Sean Dague ea7590acd5 add support for an external message catalog
Instead of having the messages inline, we should do them in the
yaml file so that changing the UX for the bot reporting isn't a
code change.

Depends-On: I9208123a4cb3be02c272cd8a6eba460f4130a960

Change-Id: I8fdb07f9964f616addba6e8f25e5bd9de27d077a
2014-07-24 21:54:17 +00:00
Sean Dague 7beb69b933 ensure grenade logs exist
It turns out that we broke grenade logs being indexed at all. This
will at least give us some warning on looking for them in jobs.

Change-Id: Ic6023b9c2cf64ac57eb023a7c6d60c2d1d731550
2014-07-07 08:24:53 -04:00
Jenkins 99a592fa50 Merge "have realtime engine only search recent indexes" 2014-06-13 16:17:58 +00:00
Sean Dague b4591df9e9 have realtime engine only search recent indexes
Elastic Recheck is really 2 things, real time searching, and bulk
offline categorization. While the bulk categorization needs to look
over the entire dataset, the real time portion is really deadline
oriented. So only cares about the last hour's worth of data. As such
we really don't need to search *all* the indexes in ES, but only
the most recent one (and possibly the one before that if we are near
rotation).

Implement this via a recent= parameter for our search feature. If set
to true then we specify the most recently logstash index. If it turns
out that we're within an hour of rotation, also search the one before
that.

Adjust all the queries the bot uses to be recent=True. This will
hopefully reduce the load generated by the bot on the ES cluster.

Change-Id: I0dfc295dd9b381acb67f192174edd6fdde06f24c
2014-06-12 17:53:26 -04:00
Sean Dague 0d9bb900a6 gate-tempest-dsvm-virtual-ironic is in our gate
because of the olsotest join, ironic is now in our main gate, and
causing actual main gate failing. Treat it as such for triage
purposes.

Change-Id: Ib43130c3a0eb970dfda79ec422439340ac36bd5d
2014-06-12 07:45:13 -04:00
Joe Gordon a9a2694439 Ignore non-voting jobs in gerrit
We shouldn't be reporting back to users why a non-voting job is failing,
Non-voting jobs are non-voting because the are unstable, so we don't want
folks running recheck on a bug for a non-voting job.

Update the unit tests to cover this case.

Change-Id: I61f4e7bb28235d2974f3dcf70187437c80f918d3
2014-06-03 06:49:12 +00:00
Joe Gordon 14bfee5646 Don't include recheck instructions when unclassified failures
If there is an unclassified failure in the check queue, we want to make
it clear to the user so they will investigate the error as its most
likely a valid failure. Also don't include recheck instructions when
unclassified failure as they shouldn't be running a recheck if there is
an unclassified failure.

With us now classifying many failures from non-voting jobs, it is common
to see classified failures and no mention of the job that legitimately
failed.

Partial revert of I52044afb4f3a1bf3f22ba4c0e8d38d76271ffc00

Change-Id: I6b471b9ab9c7f36eeed93993ea086bbc9daa56b0
2014-04-01 21:21:44 -07:00
Clark Boylan fea4d1f7ee Take advantage of the new build_short_uuid field.
Recently the elasticsearch schema was updated to include a
build_short_uuid field which has indexed the first 7 chars of the
build_uuid. This field is useful because it allows e-r to filter on that
field instead of searching on build_uuid.

Update e-r to filter on build_short_uuid which should make queries much
more performant. As part of this change replace variables named
short_build_uuid with build_short_uuid for consistency with the
elasticsearch schema.

Change-Id: Iae5323f3f5d2fd01f2c69f78b9403baf5ebafe85
2014-03-26 12:14:32 -07:00
Sergey Lukjanov f5c7bd47fa Don't separate bug links with ','
It breaks links in some browsers (it opens link with ',' in the end of
it).

Change-Id: Ic4b98a8d9ce9567e2a06a38a3d49f7f4dd124259
2014-03-18 21:14:32 +04:00
Sean Dague a04553ee89 only care about unclassifieds in gate queue
we really don't care about check failures for classification,
because those might just be terrible code, which we get a lot of.
So only report unclassified tests to the user on gate failures.

Now with extra tests for this behavior!

Change-Id: I52044afb4f3a1bf3f22ba4c0e8d38d76271ffc00
2014-03-12 17:07:43 -04:00
Clark Boylan 34cdbeb3ca Use correct time.sleep argument.
Use the correct time.sleep argument when sleeping. Also replace a post
for loop if check with an else to make the code more readable.

Change-Id: Icdfb41d1436abe930e4f45243ff6fe378ba3f91b
2014-03-10 10:19:27 -07:00
Joe Gordon b5be07cbb1 Add link to status.o.o/e-r to gerrit comment
Point users to status.openstack.org/elastic-recheck to find further
information and links on the bugs they hit.

Change-Id: I9e6a70151d4f94c574b2eae55ff8ba0172189d7a
2014-03-06 16:48:48 -08:00
Joe Gordon 831781954e Fix nesting for required files
Only need to look for neutron or nova network if running tempest

Change-Id: I3ade31e5e2b1fe777a2bcd3c3a6c403dea1de8f8
2014-03-06 13:19:41 -08:00
Joe Gordon 6c47bad772 Unbreak elastic-recheck
A few bugs have crept into elastic-recheck causing it to fail. This
patch fixes them.

* an update to gerritlib broke FailedEvent.rev and change, since both of
  these should always be numbers cast to ints
* We appear to be missing files occasionally, add better logging for
  (also simplify Exception classes)
* Remove last usage of skip_resolved (removed in a previous patch)

Change-Id: Ifc180989832be152e08a4873e62857a899835484
2014-03-06 11:43:21 -08:00
Clark Boylan 8b2e067b8c Revert "move to static LOG"
This reverts commit e75b996e60.

Change is being reverted because we can't actually use a static LOG
object if we expect setup_logging to do the right thing at runtime.
Python logging will load logging objects at import time using the static
LOG object before setup_logging can run otherwise.

Conflicts:
	elastic_recheck/bot.py
	elastic_recheck/elasticRecheck.py

Change-Id: I582c7e9c9b3c2ccab6a695bfba00a61f7c0a04a9
2014-03-06 11:12:43 -08:00
Jenkins 0d63741e11 Merge "ensure all the required files are there" 2014-02-08 01:09:44 +00:00
Sean Dague 2224bfbe98 ensure all the required files are there
add in neutron, glance, and n-net logs as required files when
appropriate. This will help ensure that we don't miss a pattern
because we searched before the log was in the system.

Change-Id: Ia8f2cdedfc9964f1d9589fda253174e972fcc770
2014-02-07 09:13:41 -05:00
Joe Gordon 61190329f7 Map failed jobs to bugs in gerrit comment
Instead of just listing which bugs were seen in an entire gerrit event
(multiple jenkins/zuul jobs), list which bugs were seen in which job.
If one of the jobs has an unrecognized error don't display the comment
about running recheck, just list which bugs were seen on which jobs (and
which has an unrecognized error)

Change-Id: I55b2eb8f0efe43ab22540294150d4bc9f5885510
2014-02-06 15:58:27 -08:00
Joe Gordon 2308b4a947 Convert failed_job into an object
We are starting to track a decent amount of data per zuul/jenkins job,
so track data in an object instead of assorted variables and
dictionaries. For example bugs are now tracked by job and not
gerrit event. Now, we can support reporting which bug caused which
specific job to fail. This also does some assorted object related
cleanups. This consists of internal changes only, a future patch will
make the gerrit and irc comments take advantage of this.

Change-Id: I2116cd0e10b45617a8d572b27f1672f695fa91d0
2014-02-06 15:56:27 -08:00