Commit Graph

20 Commits

Author SHA1 Message Date
Sorin Sbarnea 9d37c88c8f pylint: 4 more
Change-Id: I4cc928d8212a5192927a994b4248f32fe05ca723
2020-09-11 11:19:35 +01:00
Sorin Sbarnea 360c57118c pylint: 6 more
Change-Id: Ic16db7972fe6f9da86592d56f4983572d7c68989
2020-09-10 15:28:52 +01:00
Sorin Sbarnea 78a8098354 pylint fixes
Resolves several code style violations.

Change-Id: Id03dad8f8ce141eb1e630a77d0c9ae497de9f2ed
2020-09-10 12:42:12 +01:00
Ramy Asselin de437439ad Wait until the most recent index is available
When elastic search indexing is behind, and the day has
progressed forward to a new day,  the latest
index is not yet available for use. Exclude it from searches
until it is ready in order to avoid the ElasticHttpNotFoundError.

Add Unit tests for this case as well as for when multiple days
are specified for the search.

Change-Id: Ifd27d1ab21bebcb63b48ea164f425c4a2ac8759c
2016-10-20 10:48:55 -07:00
Matt Riedemann 44c967d3e3 Skip hits that don't have data
The e-r graph and uncategorized fails jobs are both
currently failing because we're getting back hits
with empty data for the 'timestamp' attribute because
the value is too large, e.g.:

"[FIELDDATA] Data too large, data for [@timestamp] would "
"be larger than limit of [22469214208/20.9gb]]"

To workaround this for now we check to see if the hit
item list has anything it it before returning the
data for the facet results.

We should follow this up by logging errors on hits that
have bad data and we should remove those indexes.

Change-Id: Icf19af6580632ef52a55d3fb4bed3bced140024a
Closes-Bug: #1630355
2016-10-04 15:21:13 -04:00
Ramy Asselin 1e2984def0 Only query indexes that exist
Use a dictionary to save which indexes exist and
avoid using non-existing indexes in searches.

Change-Id: Ic5871cb0f7fd5741d86c1ed5569d2fc1a89c9ad1
2016-09-27 09:59:09 -07:00
Nate Marsella 9308535283 Make the index format string configurable in the conf file.
This allows query to work against indices not named the
typical "logstash-%Y.%m.%d".

Change-Id: I89a5b293b8e23cc81a8ed33d86c173adca927d28
2016-04-25 09:54:30 -04:00
Ramy Asselin 19712edc2d Fix elastic-recheck query command
This code doesn't work at all. Bring it back to life.
Also accept inputs from a config file.
Closes-Bug: #1526921

Change-Id: I8f45dc9d42f7547f9d849686739b9a641c176814
2015-12-18 12:14:34 -08:00
Matt Riedemann 9f44f79a2d Fix the module index link in the docs
Since we weren't autodoc'ing the modules during the docs build, the
module index link was broken.

This generates the module docs (but hides them from the main top-level
table of contents) so they can be accessed via the 'Module Index' link.

Also cleans up some docs issues so that warnerrors=True works during
sphinx-build.

Closes-Bug: #1472642

Change-Id: I5a3a16d1e81b12237452d5a3a3f7f0cc42618e88
2015-07-08 07:54:23 -07:00
Sean Dague b4591df9e9 have realtime engine only search recent indexes
Elastic Recheck is really 2 things, real time searching, and bulk
offline categorization. While the bulk categorization needs to look
over the entire dataset, the real time portion is really deadline
oriented. So only cares about the last hour's worth of data. As such
we really don't need to search *all* the indexes in ES, but only
the most recent one (and possibly the one before that if we are near
rotation).

Implement this via a recent= parameter for our search feature. If set
to true then we specify the most recently logstash index. If it turns
out that we're within an hour of rotation, also search the one before
that.

Adjust all the queries the bot uses to be recent=True. This will
hopefully reduce the load generated by the bot on the ES cluster.

Change-Id: I0dfc295dd9b381acb67f192174edd6fdde06f24c
2014-06-12 17:53:26 -04:00
Sean Dague 4ea5d02a70 Improved timestamp parsing
Use dateutil to accept be more flexible in parsing timestamps. A recent
upgrade to ElasticSearch changed the timestamp format to use '+00:00' to
note the timezone instead of 'Z'

Co-Authored-By: Joe Gordon <joe.gordon0@gmail.com>
Change-Id: I11f441ba3bf7ba46c55921352fcc87eb5d1ce3ae
2014-02-20 20:39:41 -05:00
Joe Gordon 2cc815ca20 Parse both new and old ES timestamp formats
The new ElasticSearch uses the +00:00 notation instead of 'Z' to signify
the timezone. Since we have both old and new data this change is
backward compatible.

Change-Id: Iaccb6a21b6929826e08f3adfc0b601e4a90fa4d5
Note: this patch assumes the timezone is always +00:00
2014-02-14 15:55:49 -08:00
Joe Gordon f4992d0421 Remove remaining cases of '@message'
Our elasticSearch cluster previously used '@message', but we have since
moved over to using just 'message'. The rest of the uses of '@message'
were removed in I6fb0aa87a291660df879282e9a7851bbb27e9ac2

Change-Id: I2b5d0f176deddb1b1ab9e831395c3216e927d8bf
2014-01-19 19:10:33 -08:00
Sean Dague 51a2100ffe correct issue of dangling µs in buckets
we are parsing at microsecond resolution, however the previous
floor methodology was only zeroing out seconds, not also
microseconds. This causes bucket alignment issues, and broke the
graphs page.

Change-Id: I688bb4bc9ef9fee2167dd2e94a25f060d4025afd
2013-12-18 18:19:37 -05:00
Sean Dague 95962a0e2c allow facets to work at different resolutions
we need to support different histogram resolutions, this adds a
new parameter which is the number of seconds to bucket on.

Change-Id: If839c238f93a07b17240c8774e826f3217d447ef
2013-12-17 09:09:17 -05:00
Sean Dague 398425bc3d move the graph generating code over to facets
Bring histogram facets (currently hard coded to 1h buckets) into
the FacetSet module for "timestamp" keys from elastic search. This
then enables us to gut a bunch of code from graph.py and do all
the calculations with facet counts instead.

At the same time, make all the graphs run for a full 2 weeks of
data, so they are comparible to each other visually (the sliding
window start time was less useful in seeing how the graphs
compared)

Change-Id: I971d52b5de514d0607bd8217837aed3895472d05
2013-12-06 17:40:31 -05:00
Sean Dague 32d98ae233 implementation of FacetSet for client side nested facets
this is an implementation of facets, client side, with elastic
search results. This will let us get rid of a bunch of the
uniquify code in the graph and check_success scripts, and make
it simpler to analyze by other dimensions in web console additions.

Also make Hit implement __getitem__ for easier dynamic access of
contents. Useful for programatically accessing tags.

Change-Id: Ib63ff887eb82cff0ba00109471ee48d210fda571
2013-12-05 17:48:57 -05:00
Sean Dague 8852f0d979 make ResultSet actually inherit from list
it turns out, I was spending way to much time to make ResultSet act
like a list, when I could have just made it inherit from list and
be done with it. It manages to remove code and work just the same.

In addition, make the __repr__ for Hit be more meaningful by using
pprint. Makes print debugging of all the datastructures actually
work like you expect.

Change-Id: Ie104d4bfc06a0875f8da85121742c053b642f8f9
2013-12-03 13:49:50 -05:00
Jeremy Stanley 552edc0f8a Correct conditional branching in field search
* elastic_recheck/results.py(Hit): Simple but subtle typographical
error in branching conditionals caused at_attr to be searched for in
_source even if attr was already present there. Brown bag fix.

Change-Id: I730f7b7c74a9d772edd0bf483f0089523cb5f6e8
2013-10-24 19:00:50 +00:00
Sean Dague 4915ebb1a7 add SearchResultSet and Hit objects
in an attempt for long term simplification of the source tree, this
is the beginning of a ResultSet and Hit object type. The ResultSet
is contructed from the ElasticSearch returned json structure, and
it builds hits internally.

ResultSet is an iterator, and indexable, so that you can easily loop
through them. Both ResultSet and Hit objects have dynamic attributes
to make accessing the deep data structures easier (and without having
to make everything explicit), and also handling the multiline collapse
correctly.

A basic set of tests is included, as well as sample json dumps for all
the current bugs in the system for additional unit testing. Fortunately
this includes bugs which have hits, and those that don't.

In order to use ResultSet we need to pass everything through
our own SearchEngine object, so we get results back as expected.

We also need to teach ResultSet about facets, as those get used
when attempting to find specific files.

Lastly, we need __len__ implementation for ResultSet to support
the wait loop correctly.

ResultSet lets us simplify a bit of the code in elasticRecheck,
port it over.

There is a short term fix in the test_classifier test to get us
working here until real stub data can be applied.

Change-Id: I7b0d47a8802dcf6e6c052f137b5f9494b1b99501
2013-10-21 13:45:55 -04:00