Commit Graph

32 Commits

Author SHA1 Message Date
Andreas Jaeger c929899400 Retire repository
Fuel repositories are all retired in openstack namespace, retire
remaining fuel repos in x namespace since they are unused now.

This change removes all content from the repository and adds the usual
README file to point out that the repository is retired following the
process from
https://docs.openstack.org/infra/manual/drivers.html#retiring-a-project

See also
http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011675.html

A related change is: https://review.opendev.org/699752 .

Change-Id: I8aded54f1b9f3b79f3a4bf8f607d3695b92f528b
2019-12-18 19:39:39 +01:00
Swann Croiset debe1883d7 Allow deployment without InfluxDB and Elasticsearch
This allows to support several deployment scenarii where backends are not
deployed initialy, for instance when using the 'virt' nodes to deploy
LMA backends.

The patch factorizes manifests by moving all the configuration data of
InfluxDB and Elasticsearch into hiera.

DocImpact

Fixes-bug: #1570386
Change-Id: I8688bbd10d88bc8ef68b5d31e9edd62a764dc23d
2016-05-23 13:29:50 +02:00
Swann Croiset ebac150f8a Separate the (L)og of the LMA collector
This change separates the processing of the logs/notifications and
metric/alerting into 2 dedicated hekad processes, these services are
named 'log_collector' and 'metric_collector'.

Both services are managed by Pacemaker on controller nodes and by Upstart on
other nodes.

All metrics computed by log_collector (HTTP response times and creation time
for instances and volumes) are sent directly to the metric_collector via TCP.
Elasticsearch output (log_collector) uses full_action='block' and the
TCP output uses full_action='drop'.

All outputs of metric_collector (InfluxDB, HTTP and TCP) use
full_action='drop'.

The buffer size configurations are:
* metric_collector:
  - influxdb-output buffer size is increased to 1Gb.
  - aggregator-output (tcp) buffer size is decreased to 256Mb (vs 1Gb).
  - nagios outputs (x3) buffer size are decreased to 1Mb.
* log_collector:
  - elasticsearch-output buffer size is decreased to 256Mb (vs 1Gb).
  - tcp-output buffer size is set to 256Mb.

Implements: blueprint separate-lma-collector-pipelines
Fixes-bug: #1566748

Change-Id: Ieadb93b89f81e944e21cf8e5a65f4d683fd0ffb8
2016-05-04 14:34:14 +02:00
Simon Pasquier 9babc1e1a3 Use a dedicated directory for Lua libraries
Change-Id: I1c81a55b0f7b4a83e9ea55131a36c261575c5fd9
Closes-Bug: #1553218
2016-04-28 21:24:07 +02:00
Swann Croiset 96df47af73 Increase the Heka poolsize on controllers
On controller nodes, the Heka poolsize must be increased to handle the load
generated by derived metrics from logs otherwise a deadlock
can happen in the filter plugins and block heka.

Fixes-bug: #1557388

Change-Id: I74362011d32d413f244c6cdb6e4625ed96759df0
2016-04-05 18:34:17 +02:00
Éric Lemoine ccdba23158 Move Pacemaker/Corosync code out of lma_collector
This commit moves the Pacemaker/Corosync Puppet code from the
lma_collector module to the Fuel-specific base.pp manifest.

This involves the following changes:

* Fuel's "pacemaker_wrappers::service" define is now used in base.pp
  to configure the LMA service resource to using the "pacemaker"
  provider.

* To configure "pacemaker_wrappers::service" we need to know the Heka
  user. So to avoid hacks where we'd use private variables from the
  lma_collector and heka modules to determine the Heka user the
  lma_collector and heka modules are changed to make the Heka user
  configurable. For this the "heka" class "run_as_root" parameter is
  removed in favor of a "user" parameter.

* In other manifests we use a resource collector to make sure that
  the LMA service resource is not re-configured with the default
  provider. This part is a bit hackish, but we haven't been able to
  come up with a better way to address the issue.

Change-Id: I0ed0bddb245dc3a65b034e5caec14a65cfa908cb
Implements: blueprint lma-without-fuel
2016-01-29 12:50:57 +01:00
Éric Lemoine 3f6eab7b2f Remove unused params to lma_collector class
Change-Id: I04e46d82bf2c1d5580348a05f886c1b767d48cfb
2015-12-31 09:03:05 +01:00
Simon Pasquier 8e987e9bf9 Remove watchdog check for Pacemaker
The implementation of this feature isn't robust enough for the moment.
We need to think more about it and revisit later.

Change-Id: I76534d071f98a7eb99e7977d1df3dff1977ad338
Closes-Bug: #1514893
2015-11-10 16:54:18 +01:00
Simon Pasquier 9a5f221140 Don't dump the Heka statistics in the logs
Now that we collect metrics from Heka, it is less necessary to dump the
Heka statistics in the logs. Especially since it makes the log files
less readable.

Change-Id: I60abc09a2fc12efaa072e348618973381d03ed73
2015-10-07 15:03:10 +02:00
Jenkins 02826badb8 Merge "Tune Pacemaker parameters for the LMA collector" 2015-09-17 15:33:27 +00:00
Simon Pasquier 2d1d6e6936 Add AFD plugins for OpenStack services
This change introduces the first Anomaly and Fault Detection (AFD)
filter plugins. These plugins return AFD events on the availability of
the API endpoints, the API backends (as reported by HAProxy), and the
service workers (eg nova-scheduler, nova-conductor, ...).

Change-Id: I75bfb433e4e174659900f885040a1c2032efd470
Implements: blueprint alerting-lma-collector
2015-09-17 16:27:49 +02:00
Simon Pasquier 6e2e0b5853 Tune Pacemaker parameters for the LMA collector
This change modifies the Pacemaker parameters to match with the
configuration of the other services (DNS, NTP, ...) managed by
Pacemaker.

Change-Id: Ib9f8f549e34f578df07a674a81f6b90cfb9cbe33
2015-09-17 10:40:26 +02:00
Simon Pasquier 5bf3e061e1 Use watchdog file for Pacemaker monitoring
This change configures Pacemaker to use the watchdog file to monitor the
LMA collector on the controller nodes. Every second, the watchdog file
should be updated by hekad. If it hasn't been modified for 5 seconds,
Pacemaker will restart the hekad daemon.

Change-Id: I26641fca9a2cdbe2f4321c6630831baffc3abe50
Implements: blueprint lma-aggregator-in-ha-mode
2015-09-09 07:29:38 +00:00
Simon Pasquier d5c5103ec0 Add watchdog filter for monitoring Heka
This change adds filter + output plugins that will allow Pacemaker to
check that the Heka process is alive.

At periodic intervals, the watchdog filter emits a message containing
the current timestamp. The output filter catches the message and write
the timestamp value to some file. If Heka is wedged (eg channels are
full) then the file won't be updated anymore. Pacemaker should be able
to detect it and respawn the process.

Change-Id: If2a71c9084e3c8da0d92fea5c295b36e56e0c86f
Implements: blueprint lma-aggregator-in-ha-mode
2015-09-08 17:24:26 +02:00
Swann Croiset 16bc34148d Fix issue on CentOS with Heka user
The deployment on CentOS is broken since 254eda4

This patch creates always the 'heka' user defined in heka::params:user
even if the Hekad process run as 'root'.

This way should works for both MOS 6.1 and 7.

Change-Id: I9ec690735b10f149d4477f0b8a7ca3a7d0cc54c1
2015-08-27 15:24:32 +02:00
Simon Pasquier 746b6f0f78 Configure HAProxy to forward to the aggregator
This change configures the HAProxy service to send the traffic received
on port 5565 to the local LMA collector service. It also configures a
dummy HttpListen input that is used by HAProxy to check the
availability of the LMA collector service.

Change-Id: Ifd92148b6be4e248fe15bdeafebb9356f6f989be
Implements: blueprint lma-aggregator-in-ha-mode
2015-08-24 14:16:52 +02:00
Guillaume Thouvenin bbc45eb081 Set max_file_size to be greater than max_message_size
Version 0.10.0b1 of heka checks that max_file_size is greater than
max_message_size. As we set max_message size to 196608 and the
default for max_file_size is 128Ko we need to increase it.

Change-Id: I788d7d41b5000cc48aea9956c2f0b781134da4c3
2015-08-20 13:53:08 +02:00
Simon Pasquier 60b2ffac25 Configure Pacemaker to manage the LMA collector
This change configures Pacemaker to manage the LMA collector service
with proper ordering regarding the local RabbitMQ service.

This also means that I removed the wrapper script that took care of
checking the RabbitMQ availability before launching the hekad process
on the controllers.

Change-Id: I4e747083fb9876f06fde9914b626970e37d0b429
Implements: blueprint lma-aggregator-in-ha-mode
2015-08-14 20:44:17 +02:00
Simon Pasquier b0da062cb1 Add TCP output plugin
This change configures a TCP output plugin for all the nodes. The plugin
doesn't match any message because HAProxy isn't configured yet to
forward Heka messages received on the management VIP address to the
local Heka instance.

Change-Id: I675acbdca9c81a8bf017917d4c9bc0c525b19c27
Implements: blueprint lma-aggregator-in-ha-mode
2015-08-10 16:25:37 +02:00
Swann Croiset 929e15c324 Add Nagios support for OpenStack service status
implements blueprint alerting-lma-collector

Change-Id: I722b7a83c5dd391a86423d6af526355bc2ed8bbc
2015-07-22 13:20:43 +02:00
Swann Croiset 965cdaec6a Injects status messages per service
This patch decouples per service each messages injected to allow to process
them one by one at the end of the pipeline:

* injects state messages per service from the service_accumulator_states filter
  instead of inject them by bulk.
* injects status messages promptly after the message processing instead of use
  the timer_event.
* increases the max number of message that a Heka filter can inject from the
  process_message and timer_event functions.

This is an implementation changement for message schemas exchanged between Heka
filters, the operation is not affected and remains identical from external
point of view.

implements blueprint alerting-lma-collector

Change-Id: I02f441a73ce8c03bd83ca40a212a58f9494bc23c
2015-07-06 18:11:39 +02:00
Swann Croiset 3f52ddeac6 Allow larger Hekad messages
The maximum size observed during a load test with 50 nodes is 158Kb,
the default size is 64Kb.
This is required by elasticsearch buffered output which can hit the
limited size and finally lose messages.

The Heka log:
Plugin 'elasticsearch_output' error: Message too big, requires 161024
(MAX_MESSAGE_SIZE = 65536)

Change-Id: I8970435e2f710889e4b5d2c55a53572c042ef647
2015-05-29 15:34:51 +00:00
Simon Pasquier 067834e466 Dump Heka statistics periodically
This change adds a cron job that sends SIGUSR1 to the hekad process
every hour. Heka will dump an internal report which is available in
/var/log/lma_collector.log eventually.

Change-Id: I7e164a85a8222f60e7a625d1277528b819a17661
2015-05-12 09:47:33 +02:00
Guillaume Thouvenin fb953f8af3 Wait for rabbitmq before starting lma_collector
If we start lma_collector before the availability of rabbitmq cluster it
will fail to connect to the lma queues and then, it will fail to start.
It may take several long minutes before pacemaker starts the service.
So we need to be sure that rabbitmq cluster is up and running before
starting lma_collector.

Change-Id: Ia254b744f4173f64ee3ab8200b2896ecc412d06f
2015-04-22 14:36:51 +00:00
Simon Pasquier 3d71e776b4 Add Apache license headers to Puppet manifests
This change fixes the text of the LICENSE file too.

Change-Id: Iaebc5a8fc174b4bfe12fa0fb917c6de79ebba334
2015-04-20 15:21:17 +02:00
Swann Croiset 3097e83d35 Add VIPs cluster location metrics
The pacemaker command 'crm  resource --locate --resource <rsrc>' is used to
collect 'vip__public' and 'vip__management' locations.

The command is executed by a ProcessInput Heka plugin.

Change-Id: I3a667931a58809b84d667592a7e618b906eddc56
2015-04-10 15:51:11 +02:00
Jenkins 23df41c234 Merge "Collect logs from Open vSwitch" 2015-03-25 10:48:09 +00:00
Simon Pasquier eacc5623b4 Do not deploy the Heka dashboard plugin
Someone reported on the Heka mailing list [1] that the Heka dashboard
plugin leaks memory. Thus it is safer to disable it for now.

[1] https://mail.mozilla.org/pipermail/heka/2015-March/000384.html

Change-Id: I56ba03d4338681fb40145abf2fb70fa11c9e537c
2015-03-25 08:37:04 +00:00
Swann Croiset 6c7a3559d3 Collect logs from Open vSwitch
Notice: Open vSwitch is configured to send ERROR log to syslog.
We collect here WARNING and INFO messages

Change-Id: I918daeca17fa276ae8d0f7d8ada90204d209f219
2015-03-24 18:14:31 +01:00
Simon Pasquier 77a1e6eb6d Integrate collectd into the LMA collector
The collectd service collects metrics from many sources:
- System (like CPU, RAM, disk, network and so on)
- MySQL
- RabbitMQ
- OpenStack services

It sends the data to the LMA collector using its HTTP JSON output. The
LMA collector then decodes this input and injects it into the Heka
pipeline. Eventually the metrics will be sent to InfluxDB.

Note: until we have the InfluxDB-Grafana plugin ready, the InfluxDB parameters
are hidden in the Fuel UI.

Change-Id: I59577fcdc014be8d0f1d4824ef416afda3604506
2015-03-13 12:01:47 +01:00
Simon Pasquier 8517b26293 Split into smaller tasks
This change moves away from the big monolithic Puppet manifest. Instead
we introduce separate tasks for each role that the plugin supports.

Change-Id: I370c9e8267f86da742f5cca48f1fec8bc3d9c4a9
2015-03-05 15:20:04 +01:00
Simon Pasquier c9ee4d30d9 Initial import of the LMA collector plugin
This is an import of the initial LMA PoC code. For now, it only covers
the collection of logs (notifications will be added in a subsequent
commit).

There's been a bit of rewrite to:
- decouple the Heka configuration from the LMA collector.
- run the Heka service as non-root when possible (Ubuntu only for now
  due to file permission issues on CentOS [1]).
- adapt to version 0.9 of Heka.

[1] https://bugs.launchpad.net/fuel/+bug/1425954

Change-Id: I4472b49a25e18e06984b5b29bdce18f917137bc8
2015-02-27 14:16:49 +01:00