Commit Graph

27 Commits

Author SHA1 Message Date
Andreas Jaeger c929899400 Retire repository
Fuel repositories are all retired in openstack namespace, retire
remaining fuel repos in x namespace since they are unused now.

This change removes all content from the repository and adds the usual
README file to point out that the repository is retired following the
process from
https://docs.openstack.org/infra/manual/drivers.html#retiring-a-project

See also
http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011675.html

A related change is: https://review.opendev.org/699752 .

Change-Id: I8aded54f1b9f3b79f3a4bf8f607d3695b92f528b
2019-12-18 19:39:39 +01:00
Swann Croiset 4d025aa0ab Fix Logger setting in sandboxes
A filter sandbox cannot modify the Logger field

Change-Id: Ie50bf2acb3d764504be398685c8e4e61c4e1c61b
Closes-bug: #1662879
2017-02-09 13:16:24 +01:00
Simon Pasquier 72fe1f64fe Send log_messages metric as bulk
Using bulk metrics for the log counters reduces largely the likelihood
of blocking the Heka pipeline. Instead of injecting (x services
* y levels) metric messages, the filter injects only one big message.

This changes also updates the configuration of the metric_collector
service to deserialize the bulk metric to support alarms on log
counters.

Change-Id: Icb71fd6faa4191795c0470ecc24aeafd25794f42
Closes-Bug: #1643280
2017-01-06 15:24:03 +01:00
Simon Pasquier dce9c7e23f Fix more metrics without hostname
This change fixes metrics that aren't associated to a particular
hostname.

Change-Id: I2acafb801add178d90b76a17b32922a5825c3820
2016-09-29 18:02:59 +02:00
Simon Pasquier 382408db6f Isolate InfluxDB encoder into a Lua module
This will allow to reuse it for other projects.

Change-Id: I1ea651b25ae2974716eae612d2b8d0ac25dce395
2016-08-30 10:08:26 +02:00
Swann Croiset a9ae5301e9 Factorize the metrics value extraction from message
Change-Id: Ie61337251e53c0dff76479fd18234a8480df7bbd
2016-08-08 16:12:19 +02:00
Swann Croiset 8bc362d745 Add Sandboxes to handle Ceilometer samples/resources
The Lua sandboxes aren't used for now. To give it a try, one could the
configuration stored in contrib/ceilometer.toml.

blueprint: ceilometer-stacklight-integration

Co-Authored-By: Igor Degtiarov <idegtiarov@mirantis.com>
Co-Authored-By: Ilya Tyaptin <ityaptin@mirantis.com>

Change-Id: I7634dd0ee4f3200d1a82ab26feafa54a8ac74e51
2016-05-12 16:34:26 +02:00
Guillaume Thouvenin 560cbd46b9 Fix a bug in heka monitoring collector filter
This patch declares type as local to avoid the termination of the heka
monitoring collector filter due to an attempt to call global 'type' (a
nil value)

Change-Id: I8358e8a38f6db70e2058fd276263b4825746ed17
Closes-Bug: #1579796
2016-05-10 06:56:00 +00:00
Swann Croiset 391ca132b3 Emit aggregated HTTP metrics
HTTP metrics are now statistics aggregated every 10 seconds.
A new metric is emitted openstack_<service>_response_times with these
values:
- min
- max
- sum
- count
- percentile

Hence, the previous metric disappears (openstack_<service>_responses).

Implements-blueprint: aggregated-http-metrics

Change-Id: I48e92df6f4baa7be942ad138b7f23c3d15f5a24e
2016-05-04 14:34:39 +02:00
Guillaume Thouvenin f0520cd46c Fix the OOM of heka monitoring filter
This change resets the table that holds data for the heka monitoring
filter. Otherwise the table may grow infinitely and the sandbox will
eventually be killed by Heka.

Change-Id: If8c07944e42700d913831b500466b33831a41482
Partial-Bug: #1545743
2016-02-18 10:33:01 +00:00
Simon Pasquier ca031a41aa Truncate Nagios plugin output to 1024 bytes max
It appears that Nagios cannot ingest output which is larger than 1024
bytes so this change makes sure that the Nagios encoder complies with
this requirement.

Change-Id: I22c7186f0dc6edabe8c3372a8c06197b276a9d4d
Closes-Bug: #1517917
2015-11-20 09:19:46 +01:00
Simon Pasquier 1bec6d457d Create a Lua module for table functions
This change moves some functions related to table manipulation from the
lma_utils module to a dedicated module named table_utils.

Change-Id: I2263088d70ef7e9bc617e982a32f2bd26f714af0
2015-10-20 14:47:04 +02:00
Simon Pasquier 9292d25982 Improve error handling in the InfluxDB accumulator
Change-Id: I3f63f6783783ec9101404e955b058974c83ec7fc
2015-10-19 14:54:28 +02:00
Simon Pasquier 5667ea6d87 Catch buffer exceptions in the Lua sandboxes
A Lua sandbox raises an exception when it tries to inject a message
larger than the configured output_limit value (default: 63KiB). The
same applies to the cjson library when trying to encode a Lua structure
resulting in a string larger than the same limit.

This change adds safe_* versions of the inject_message(),
inject_payload() and cjson.encode() functions. It also modifies the
existing Lua plugins to use the safe versions instead.

Change-Id: I7351783e51efa046d483921cb79e14279178a13a
Closes-Bug: #1504141
2015-10-19 14:54:28 +02:00
Simon Pasquier d49b5fb1c8 Rework the GSE filters
This change modifies the implementation of the GSE filters. The main
differences are:

- level-1 dependencies define now the members of a cluster and the
  status of a cluster is defined by the highest severity among all
  members.
- level-2 dependencies are now known as 'hints', they define
  relationships between clusters (eg, Nova depends on Keystone) but
  have no influence on the status of a cluster.

Change-Id: I58bd79463de78b04b9bad92d02e3fb0da4bacdf4
2015-10-09 11:23:09 +02:00
Swann Croiset 933311a72b Add the AFD framework for threshold alarms
This patch provides Lua libraries to evaluate metrics against thresholds.
The AFD evaluates a list of alarms, with an alarm defined like the
following:

name: 'fs-warning'
description: 'Filesystem usage'
severity: 'warning'
trigger:
  logical_operator: 'or'
  rules:
    - metric: fs_space_percent_free
      fields:
        fs: '*'
      relational_operator: '<'
      threshold: 5
      window: 60
      period: 1
      function: avg

where:
- *name* is required and must be unique,
- *description* is required,
- *severity* is one of 'okay', 'warning', 'critical', 'down', 'unknown'
- *logical_operator* optional (can be 'or' or 'and', default 'or')
- *metric*, *relational_operator*, *threshold*, *window* and *function*
  are required,
- *fields* is optional

The AFD evaluates alarms in specified order and stop evaluation at
the first triggered alarm.

This implementation doesn't fully support all the specification, the
current limitation are:

 - aggregation functions supported are: max, min, avg, sum, sd and variance and
   these ones are not: last, median, mww, mww_nonparametric.
 - *periods* rules parameter is supported for these functions in the sense that
   thresholds are compared on the entire interval "window * periods" but
   not compared between each period. In other words: it's equivalent
   to write a rule with 'window=300/periods=1|0' and 'window=100/periods=3'.

Change-Id: Ia739ceb080971e3b7bb5a2212275d2a15d65d3e9
2015-10-08 11:32:52 +02:00
Simon Pasquier 792577e69d Fix crash in lma_utils Lua module
Change-Id: I22737cff7fee74c197e7808713efadab6ba0f216
2015-10-07 10:21:00 +02:00
Swann Croiset 5027315b45 Remove inferences on status of level 2 in GSE cluster
Level 2 dependencies are only some hints about current status but don't
modify the status of the cluster.

Change-Id: I2f41bc5b26af93c9083bf92ccd4c866841826224
2015-09-30 17:08:23 +02:00
Simon Pasquier fc391d641f Remove legacy status filters
This change removes the Heka filters that computed the services
statuses. It also cleans up the Puppet code that referred to it.

Change-Id: Ib6c1c9054333b9e71f5a8a2f08600eae5d287816
2015-09-17 17:15:05 +02:00
Simon Pasquier a0213d4454 Update annotation filter for InfluxDB 0.9
Change-Id: I801cf8dd07f4c7ee19baf2034a74f898f5b4d692
Implements: blueprint upgrade-influxdb-grafana
2015-08-07 14:30:07 +02:00
Simon Pasquier 8f268f04ff Add support for bulk metric message
This change introduces a new type of Heka message called 'bulk_metric'.

A bulk metric message can be emitted by any filter plugin using the
add_to_metric() and inject_bulk_metric() function from the lma_utils
module:

  local ts = read_message('Timestamp')
  utils.add_to_metric('foo', 1, {tag1 = value1})
  utils.add_to_metric('bar', 2, {})
  utils.inject_bulk_metric(ts, 'node-1', 'custom_filter')

The structure of the message injected in the Heka pipeline will be:

  Timestamp: <ts>
  Severity: INFO
  Hostname: node-1
  Payload: >
    [{"name":"foo","value":1,"tags":{"tag1":"value1"}},
    {"name":"bar","value":2,"tags":[]}]
  Fields:
    - source: custom_filter
    - hostname: node-1

Eventually the bulk metric message is caught by the InfluxDB
accumulator filter and encoded using the InfluxDB line protocol.

Change-Id: I96986fd8287d65ae018c7636f9dd745dba2fc761
Implements: blueprint upgrade-influxdb-grafana
2015-08-06 09:34:55 +02:00
Swann Croiset 6e914f0d1c Replace the event message logic with status message
This patch introduce the following changes:
 * Message type is now 'status' (instead of 'event').
 * Send one 'status' message per service in place of list of 'events'.
 * Annotation titles are now formatted in the influxdb-annotations filter.

The status message structure is:
{
    Timestamp = timestamp in nanosecond,
    Payload = a list of events occured on the last period (JSON encoded),
    Type = 'status', -- prepended with 'heka.sandbox',
    Severity = INFO by default or mapped from the 'status code' bellow,
    Fields = {
      service = the service name (ie 'nova'),
      status = the general status code of the service,
      previous_status = the general previous status code,
      updated = a boolean to indicates if the status has been updated,
    }
}

The mapping from 'status code' to severity is the following:
* OKAY    -> INFO
* WARN    -> WARNING
* FAIL    -> CRITICAL
* UNKNOWN -> NOTICE

implements blueprint alerting-lma-collector

Change-Id: Id92a5cb905fb477adb3d0455c89bf50cf51afb1a
2015-07-21 17:41:06 +00:00
Swann Croiset 1631d18891 Support several APIs per service for status determination
Change-Id: Idd45613194db6a08644d8a387e24945bbdd99993
2015-06-02 20:12:37 +00:00
Swann Croiset b5053aed3f Rename service status labels for annotations
Distinguish "global status" and "service status":

Global status is one of:
* OKAY
* WARN
* FAIL
* UNKNOWN

Service status is one of:
* UP
* DEGRADED
* DOWN
* UNKNOWN

Change-Id: Id3d8b2237788d8710b309197575aa1a82a90400a
2015-06-01 14:31:39 +02:00
Swann Croiset 5f3db05622 Add service status metrics
A first Heka filter catches all service related metrics to consolidate service states
and periodically emit a message containing the whole information.

A second Heka filter consumes the previous message to compute the detailed status
of Openstack services and emit 2 kind of messages:
- metrics status: the general status of the service, HAproxy backend
  server status, and per service/agent status when available (nova, cinder, neutron).
- events with status transition description, these events will be handled by a
  future filter to fill the influxdb and enable annotation on Grafana graphs.
  Note that events are only emitted by the node from where the
  "vip__public" pacemaker resource is active to avoid duplicated events.

The general status of a service depends on underlying metrics:
- API checks: openstack.<service>.check_api
- HAproxy backend states: haproxy.backend.<backend>.server.(up|down)
- service/agent states for nova, cinder, neutron:
  openstack.<service>.(services|agents).(up|down)

Status is one of OK, DEGRADED, DOWN, UNKNOWN.

Change-Id: Ifa34edadce87e1ecdd131315462f80b49c7edd6d
2015-04-23 15:35:56 +02:00
Simon Pasquier 11bfda8cd9 Parse Syslog messages without priority
Some log files may contain messages generated before RSYSLOG is fully
configured (for instance, /var/log/kern.log). This change adds a
fallback Syslog grammar that will handle this kind of messages.

Change-Id: I9184abda924fbbb7d19884ead6775331d41f9468
2015-04-14 12:13:26 +02:00
Simon Pasquier c9ee4d30d9 Initial import of the LMA collector plugin
This is an import of the initial LMA PoC code. For now, it only covers
the collection of logs (notifications will be added in a subsequent
commit).

There's been a bit of rewrite to:
- decouple the Heka configuration from the LMA collector.
- run the Heka service as non-root when possible (Ubuntu only for now
  due to file permission issues on CentOS [1]).
- adapt to version 0.9 of Heka.

[1] https://bugs.launchpad.net/fuel/+bug/1425954

Change-Id: I4472b49a25e18e06984b5b29bdce18f917137bc8
2015-02-27 14:16:49 +01:00