fuel-plugin-lma-collector

Commit Graph

Author	SHA1	Message	Date
Andreas Jaeger	c929899400	Retire repository Fuel repositories are all retired in openstack namespace, retire remaining fuel repos in x namespace since they are unused now. This change removes all content from the repository and adds the usual README file to point out that the repository is retired following the process from https://docs.openstack.org/infra/manual/drivers.html#retiring-a-project See also http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011675.html A related change is: https://review.opendev.org/699752 . Change-Id: I8aded54f1b9f3b79f3a4bf8f607d3695b92f528b	2019-12-18 19:39:39 +01:00
Swann Croiset	4d025aa0ab	Fix Logger setting in sandboxes A filter sandbox cannot modify the Logger field Change-Id: Ie50bf2acb3d764504be398685c8e4e61c4e1c61b Closes-bug: #1662879	2017-02-09 13:16:24 +01:00
Simon Pasquier	72fe1f64fe	Send log_messages metric as bulk Using bulk metrics for the log counters reduces largely the likelihood of blocking the Heka pipeline. Instead of injecting (x services * y levels) metric messages, the filter injects only one big message. This changes also updates the configuration of the metric_collector service to deserialize the bulk metric to support alarms on log counters. Change-Id: Icb71fd6faa4191795c0470ecc24aeafd25794f42 Closes-Bug: #1643280	2017-01-06 15:24:03 +01:00
Simon Pasquier	dce9c7e23f	Fix more metrics without hostname This change fixes metrics that aren't associated to a particular hostname. Change-Id: I2acafb801add178d90b76a17b32922a5825c3820	2016-09-29 18:02:59 +02:00
Simon Pasquier	382408db6f	Isolate InfluxDB encoder into a Lua module This will allow to reuse it for other projects. Change-Id: I1ea651b25ae2974716eae612d2b8d0ac25dce395	2016-08-30 10:08:26 +02:00
Swann Croiset	a9ae5301e9	Factorize the metrics value extraction from message Change-Id: Ie61337251e53c0dff76479fd18234a8480df7bbd	2016-08-08 16:12:19 +02:00
Swann Croiset	8bc362d745	Add Sandboxes to handle Ceilometer samples/resources The Lua sandboxes aren't used for now. To give it a try, one could the configuration stored in contrib/ceilometer.toml. blueprint: ceilometer-stacklight-integration Co-Authored-By: Igor Degtiarov <idegtiarov@mirantis.com> Co-Authored-By: Ilya Tyaptin <ityaptin@mirantis.com> Change-Id: I7634dd0ee4f3200d1a82ab26feafa54a8ac74e51	2016-05-12 16:34:26 +02:00
Guillaume Thouvenin	560cbd46b9	Fix a bug in heka monitoring collector filter This patch declares type as local to avoid the termination of the heka monitoring collector filter due to an attempt to call global 'type' (a nil value) Change-Id: I8358e8a38f6db70e2058fd276263b4825746ed17 Closes-Bug: #1579796	2016-05-10 06:56:00 +00:00
Swann Croiset	391ca132b3	Emit aggregated HTTP metrics HTTP metrics are now statistics aggregated every 10 seconds. A new metric is emitted openstack_<service>_response_times with these values: - min - max - sum - count - percentile Hence, the previous metric disappears (openstack_<service>_responses). Implements-blueprint: aggregated-http-metrics Change-Id: I48e92df6f4baa7be942ad138b7f23c3d15f5a24e	2016-05-04 14:34:39 +02:00
Guillaume Thouvenin	f0520cd46c	Fix the OOM of heka monitoring filter This change resets the table that holds data for the heka monitoring filter. Otherwise the table may grow infinitely and the sandbox will eventually be killed by Heka. Change-Id: If8c07944e42700d913831b500466b33831a41482 Partial-Bug: #1545743	2016-02-18 10:33:01 +00:00
Simon Pasquier	ca031a41aa	Truncate Nagios plugin output to 1024 bytes max It appears that Nagios cannot ingest output which is larger than 1024 bytes so this change makes sure that the Nagios encoder complies with this requirement. Change-Id: I22c7186f0dc6edabe8c3372a8c06197b276a9d4d Closes-Bug: #1517917	2015-11-20 09:19:46 +01:00
Simon Pasquier	1bec6d457d	Create a Lua module for table functions This change moves some functions related to table manipulation from the lma_utils module to a dedicated module named table_utils. Change-Id: I2263088d70ef7e9bc617e982a32f2bd26f714af0	2015-10-20 14:47:04 +02:00
Simon Pasquier	9292d25982	Improve error handling in the InfluxDB accumulator Change-Id: I3f63f6783783ec9101404e955b058974c83ec7fc	2015-10-19 14:54:28 +02:00
Simon Pasquier	5667ea6d87	Catch buffer exceptions in the Lua sandboxes A Lua sandbox raises an exception when it tries to inject a message larger than the configured output_limit value (default: 63KiB). The same applies to the cjson library when trying to encode a Lua structure resulting in a string larger than the same limit. This change adds safe_* versions of the inject_message(), inject_payload() and cjson.encode() functions. It also modifies the existing Lua plugins to use the safe versions instead. Change-Id: I7351783e51efa046d483921cb79e14279178a13a Closes-Bug: #1504141	2015-10-19 14:54:28 +02:00
Simon Pasquier	d49b5fb1c8	Rework the GSE filters This change modifies the implementation of the GSE filters. The main differences are: - level-1 dependencies define now the members of a cluster and the status of a cluster is defined by the highest severity among all members. - level-2 dependencies are now known as 'hints', they define relationships between clusters (eg, Nova depends on Keystone) but have no influence on the status of a cluster. Change-Id: I58bd79463de78b04b9bad92d02e3fb0da4bacdf4	2015-10-09 11:23:09 +02:00
Swann Croiset	933311a72b	Add the AFD framework for threshold alarms This patch provides Lua libraries to evaluate metrics against thresholds. The AFD evaluates a list of alarms, with an alarm defined like the following: name: 'fs-warning' description: 'Filesystem usage' severity: 'warning' trigger: logical_operator: 'or' rules: - metric: fs_space_percent_free fields: fs: '' relational_operator: '<' threshold: 5 window: 60 period: 1 function: avg where: - name* is required and must be unique, - description is required, - severity is one of 'okay', 'warning', 'critical', 'down', 'unknown' - logical_operator optional (can be 'or' or 'and', default 'or') - metric, relational_operator, threshold, window and function are required, - fields is optional The AFD evaluates alarms in specified order and stop evaluation at the first triggered alarm. This implementation doesn't fully support all the specification, the current limitation are: - aggregation functions supported are: max, min, avg, sum, sd and variance and these ones are not: last, median, mww, mww_nonparametric. - periods rules parameter is supported for these functions in the sense that thresholds are compared on the entire interval "window * periods" but not compared between each period. In other words: it's equivalent to write a rule with 'window=300/periods=1\|0' and 'window=100/periods=3'. Change-Id: Ia739ceb080971e3b7bb5a2212275d2a15d65d3e9	2015-10-08 11:32:52 +02:00
Simon Pasquier	792577e69d	Fix crash in lma_utils Lua module Change-Id: I22737cff7fee74c197e7808713efadab6ba0f216	2015-10-07 10:21:00 +02:00
Swann Croiset	5027315b45	Remove inferences on status of level 2 in GSE cluster Level 2 dependencies are only some hints about current status but don't modify the status of the cluster. Change-Id: I2f41bc5b26af93c9083bf92ccd4c866841826224	2015-09-30 17:08:23 +02:00
Simon Pasquier	fc391d641f	Remove legacy status filters This change removes the Heka filters that computed the services statuses. It also cleans up the Puppet code that referred to it. Change-Id: Ib6c1c9054333b9e71f5a8a2f08600eae5d287816	2015-09-17 17:15:05 +02:00
Simon Pasquier	a0213d4454	Update annotation filter for InfluxDB 0.9 Change-Id: I801cf8dd07f4c7ee19baf2034a74f898f5b4d692 Implements: blueprint upgrade-influxdb-grafana	2015-08-07 14:30:07 +02:00
Simon Pasquier	8f268f04ff	Add support for bulk metric message This change introduces a new type of Heka message called 'bulk_metric'. A bulk metric message can be emitted by any filter plugin using the add_to_metric() and inject_bulk_metric() function from the lma_utils module: local ts = read_message('Timestamp') utils.add_to_metric('foo', 1, {tag1 = value1}) utils.add_to_metric('bar', 2, {}) utils.inject_bulk_metric(ts, 'node-1', 'custom_filter') The structure of the message injected in the Heka pipeline will be: Timestamp: <ts> Severity: INFO Hostname: node-1 Payload: > [{"name":"foo","value":1,"tags":{"tag1":"value1"}}, {"name":"bar","value":2,"tags":[]}] Fields: - source: custom_filter - hostname: node-1 Eventually the bulk metric message is caught by the InfluxDB accumulator filter and encoded using the InfluxDB line protocol. Change-Id: I96986fd8287d65ae018c7636f9dd745dba2fc761 Implements: blueprint upgrade-influxdb-grafana	2015-08-06 09:34:55 +02:00
Swann Croiset	6e914f0d1c	Replace the event message logic with status message This patch introduce the following changes: * Message type is now 'status' (instead of 'event'). * Send one 'status' message per service in place of list of 'events'. * Annotation titles are now formatted in the influxdb-annotations filter. The status message structure is: { Timestamp = timestamp in nanosecond, Payload = a list of events occured on the last period (JSON encoded), Type = 'status', -- prepended with 'heka.sandbox', Severity = INFO by default or mapped from the 'status code' bellow, Fields = { service = the service name (ie 'nova'), status = the general status code of the service, previous_status = the general previous status code, updated = a boolean to indicates if the status has been updated, } } The mapping from 'status code' to severity is the following: * OKAY -> INFO * WARN -> WARNING * FAIL -> CRITICAL * UNKNOWN -> NOTICE implements blueprint alerting-lma-collector Change-Id: Id92a5cb905fb477adb3d0455c89bf50cf51afb1a	2015-07-21 17:41:06 +00:00
Swann Croiset	1631d18891	Support several APIs per service for status determination Change-Id: Idd45613194db6a08644d8a387e24945bbdd99993	2015-06-02 20:12:37 +00:00
Swann Croiset	b5053aed3f	Rename service status labels for annotations Distinguish "global status" and "service status": Global status is one of: * OKAY * WARN * FAIL * UNKNOWN Service status is one of: * UP * DEGRADED * DOWN * UNKNOWN Change-Id: Id3d8b2237788d8710b309197575aa1a82a90400a	2015-06-01 14:31:39 +02:00
Swann Croiset	5f3db05622	Add service status metrics A first Heka filter catches all service related metrics to consolidate service states and periodically emit a message containing the whole information. A second Heka filter consumes the previous message to compute the detailed status of Openstack services and emit 2 kind of messages: - metrics status: the general status of the service, HAproxy backend server status, and per service/agent status when available (nova, cinder, neutron). - events with status transition description, these events will be handled by a future filter to fill the influxdb and enable annotation on Grafana graphs. Note that events are only emitted by the node from where the "vip__public" pacemaker resource is active to avoid duplicated events. The general status of a service depends on underlying metrics: - API checks: openstack.<service>.check_api - HAproxy backend states: haproxy.backend.<backend>.server.(up\|down) - service/agent states for nova, cinder, neutron: openstack.<service>.(services\|agents).(up\|down) Status is one of OK, DEGRADED, DOWN, UNKNOWN. Change-Id: Ifa34edadce87e1ecdd131315462f80b49c7edd6d	2015-04-23 15:35:56 +02:00
Simon Pasquier	11bfda8cd9	Parse Syslog messages without priority Some log files may contain messages generated before RSYSLOG is fully configured (for instance, /var/log/kern.log). This change adds a fallback Syslog grammar that will handle this kind of messages. Change-Id: I9184abda924fbbb7d19884ead6775331d41f9468	2015-04-14 12:13:26 +02:00
Simon Pasquier	c9ee4d30d9	Initial import of the LMA collector plugin This is an import of the initial LMA PoC code. For now, it only covers the collection of logs (notifications will be added in a subsequent commit). There's been a bit of rewrite to: - decouple the Heka configuration from the LMA collector. - run the Heka service as non-root when possible (Ubuntu only for now due to file permission issues on CentOS [1]). - adapt to version 0.9 of Heka. [1] https://bugs.launchpad.net/fuel/+bug/1425954 Change-Id: I4472b49a25e18e06984b5b29bdce18f917137bc8	2015-02-27 14:16:49 +01:00

27 Commits