elastic-recheck

Commit Graph

Author	SHA1	Message	Date
Sorin Sbarnea	9d37c88c8f	pylint: 4 more Change-Id: I4cc928d8212a5192927a994b4248f32fe05ca723	2020-09-11 11:19:35 +01:00
Sorin Sbarnea	360c57118c	pylint: 6 more Change-Id: Ic16db7972fe6f9da86592d56f4983572d7c68989	2020-09-10 15:28:52 +01:00
Sorin Sbarnea	78a8098354	pylint fixes Resolves several code style violations. Change-Id: Id03dad8f8ce141eb1e630a77d0c9ae497de9f2ed	2020-09-10 12:42:12 +01:00
Ramy Asselin	de437439ad	Wait until the most recent index is available When elastic search indexing is behind, and the day has progressed forward to a new day, the latest index is not yet available for use. Exclude it from searches until it is ready in order to avoid the ElasticHttpNotFoundError. Add Unit tests for this case as well as for when multiple days are specified for the search. Change-Id: Ifd27d1ab21bebcb63b48ea164f425c4a2ac8759c	2016-10-20 10:48:55 -07:00
Matt Riedemann	44c967d3e3	Skip hits that don't have data The e-r graph and uncategorized fails jobs are both currently failing because we're getting back hits with empty data for the 'timestamp' attribute because the value is too large, e.g.: "[FIELDDATA] Data too large, data for [@timestamp] would " "be larger than limit of [22469214208/20.9gb]]" To workaround this for now we check to see if the hit item list has anything it it before returning the data for the facet results. We should follow this up by logging errors on hits that have bad data and we should remove those indexes. Change-Id: Icf19af6580632ef52a55d3fb4bed3bced140024a Closes-Bug: #1630355	2016-10-04 15:21:13 -04:00
Ramy Asselin	1e2984def0	Only query indexes that exist Use a dictionary to save which indexes exist and avoid using non-existing indexes in searches. Change-Id: Ic5871cb0f7fd5741d86c1ed5569d2fc1a89c9ad1	2016-09-27 09:59:09 -07:00
Nate Marsella	9308535283	Make the index format string configurable in the conf file. This allows query to work against indices not named the typical "logstash-%Y.%m.%d". Change-Id: I89a5b293b8e23cc81a8ed33d86c173adca927d28	2016-04-25 09:54:30 -04:00
Ramy Asselin	19712edc2d	Fix elastic-recheck query command This code doesn't work at all. Bring it back to life. Also accept inputs from a config file. Closes-Bug: #1526921 Change-Id: I8f45dc9d42f7547f9d849686739b9a641c176814	2015-12-18 12:14:34 -08:00
Matt Riedemann	9f44f79a2d	Fix the module index link in the docs Since we weren't autodoc'ing the modules during the docs build, the module index link was broken. This generates the module docs (but hides them from the main top-level table of contents) so they can be accessed via the 'Module Index' link. Also cleans up some docs issues so that warnerrors=True works during sphinx-build. Closes-Bug: #1472642 Change-Id: I5a3a16d1e81b12237452d5a3a3f7f0cc42618e88	2015-07-08 07:54:23 -07:00
Sean Dague	b4591df9e9	have realtime engine only search recent indexes Elastic Recheck is really 2 things, real time searching, and bulk offline categorization. While the bulk categorization needs to look over the entire dataset, the real time portion is really deadline oriented. So only cares about the last hour's worth of data. As such we really don't need to search all the indexes in ES, but only the most recent one (and possibly the one before that if we are near rotation). Implement this via a recent= parameter for our search feature. If set to true then we specify the most recently logstash index. If it turns out that we're within an hour of rotation, also search the one before that. Adjust all the queries the bot uses to be recent=True. This will hopefully reduce the load generated by the bot on the ES cluster. Change-Id: I0dfc295dd9b381acb67f192174edd6fdde06f24c	2014-06-12 17:53:26 -04:00
Sean Dague	4ea5d02a70	Improved timestamp parsing Use dateutil to accept be more flexible in parsing timestamps. A recent upgrade to ElasticSearch changed the timestamp format to use '+00:00' to note the timezone instead of 'Z' Co-Authored-By: Joe Gordon <joe.gordon0@gmail.com> Change-Id: I11f441ba3bf7ba46c55921352fcc87eb5d1ce3ae	2014-02-20 20:39:41 -05:00
Joe Gordon	2cc815ca20	Parse both new and old ES timestamp formats The new ElasticSearch uses the +00:00 notation instead of 'Z' to signify the timezone. Since we have both old and new data this change is backward compatible. Change-Id: Iaccb6a21b6929826e08f3adfc0b601e4a90fa4d5 Note: this patch assumes the timezone is always +00:00	2014-02-14 15:55:49 -08:00
Joe Gordon	f4992d0421	Remove remaining cases of '@message' Our elasticSearch cluster previously used '@message', but we have since moved over to using just 'message'. The rest of the uses of '@message' were removed in I6fb0aa87a291660df879282e9a7851bbb27e9ac2 Change-Id: I2b5d0f176deddb1b1ab9e831395c3216e927d8bf	2014-01-19 19:10:33 -08:00
Sean Dague	51a2100ffe	correct issue of dangling µs in buckets we are parsing at microsecond resolution, however the previous floor methodology was only zeroing out seconds, not also microseconds. This causes bucket alignment issues, and broke the graphs page. Change-Id: I688bb4bc9ef9fee2167dd2e94a25f060d4025afd	2013-12-18 18:19:37 -05:00
Sean Dague	95962a0e2c	allow facets to work at different resolutions we need to support different histogram resolutions, this adds a new parameter which is the number of seconds to bucket on. Change-Id: If839c238f93a07b17240c8774e826f3217d447ef	2013-12-17 09:09:17 -05:00
Sean Dague	398425bc3d	move the graph generating code over to facets Bring histogram facets (currently hard coded to 1h buckets) into the FacetSet module for "timestamp" keys from elastic search. This then enables us to gut a bunch of code from graph.py and do all the calculations with facet counts instead. At the same time, make all the graphs run for a full 2 weeks of data, so they are comparible to each other visually (the sliding window start time was less useful in seeing how the graphs compared) Change-Id: I971d52b5de514d0607bd8217837aed3895472d05	2013-12-06 17:40:31 -05:00
Sean Dague	32d98ae233	implementation of FacetSet for client side nested facets this is an implementation of facets, client side, with elastic search results. This will let us get rid of a bunch of the uniquify code in the graph and check_success scripts, and make it simpler to analyze by other dimensions in web console additions. Also make Hit implement __getitem__ for easier dynamic access of contents. Useful for programatically accessing tags. Change-Id: Ib63ff887eb82cff0ba00109471ee48d210fda571	2013-12-05 17:48:57 -05:00
Sean Dague	8852f0d979	make ResultSet actually inherit from list it turns out, I was spending way to much time to make ResultSet act like a list, when I could have just made it inherit from list and be done with it. It manages to remove code and work just the same. In addition, make the __repr__ for Hit be more meaningful by using pprint. Makes print debugging of all the datastructures actually work like you expect. Change-Id: Ie104d4bfc06a0875f8da85121742c053b642f8f9	2013-12-03 13:49:50 -05:00
Jeremy Stanley	552edc0f8a	Correct conditional branching in field search * elastic_recheck/results.py(Hit): Simple but subtle typographical error in branching conditionals caused at_attr to be searched for in _source even if attr was already present there. Brown bag fix. Change-Id: I730f7b7c74a9d772edd0bf483f0089523cb5f6e8	2013-10-24 19:00:50 +00:00
Sean Dague	4915ebb1a7	add SearchResultSet and Hit objects in an attempt for long term simplification of the source tree, this is the beginning of a ResultSet and Hit object type. The ResultSet is contructed from the ElasticSearch returned json structure, and it builds hits internally. ResultSet is an iterator, and indexable, so that you can easily loop through them. Both ResultSet and Hit objects have dynamic attributes to make accessing the deep data structures easier (and without having to make everything explicit), and also handling the multiline collapse correctly. A basic set of tests is included, as well as sample json dumps for all the current bugs in the system for additional unit testing. Fortunately this includes bugs which have hits, and those that don't. In order to use ResultSet we need to pass everything through our own SearchEngine object, so we get results back as expected. We also need to teach ResultSet about facets, as those get used when attempting to find specific files. Lastly, we need __len__ implementation for ResultSet to support the wait loop correctly. ResultSet lets us simplify a bit of the code in elasticRecheck, port it over. There is a short term fix in the test_classifier test to get us working here until real stub data can be applied. Change-Id: I7b0d47a8802dcf6e6c052f137b5f9494b1b99501	2013-10-21 13:45:55 -04:00

20 Commits