This change separates the processing of the logs/notifications and
metric/alerting into 2 dedicated hekad processes, these services are
named 'log_collector' and 'metric_collector'.
Both services are managed by Pacemaker on controller nodes and by Upstart on
other nodes.
All metrics computed by log_collector (HTTP response times and creation time
for instances and volumes) are sent directly to the metric_collector via TCP.
Elasticsearch output (log_collector) uses full_action='block' and the
TCP output uses full_action='drop'.
All outputs of metric_collector (InfluxDB, HTTP and TCP) use
full_action='drop'.
The buffer size configurations are:
* metric_collector:
- influxdb-output buffer size is increased to 1Gb.
- aggregator-output (tcp) buffer size is decreased to 256Mb (vs 1Gb).
- nagios outputs (x3) buffer size are decreased to 1Mb.
* log_collector:
- elasticsearch-output buffer size is decreased to 256Mb (vs 1Gb).
- tcp-output buffer size is set to 256Mb.
Implements: blueprint separate-lma-collector-pipelines
Fixes-bug: #1566748
Change-Id: Ieadb93b89f81e944e21cf8e5a65f4d683fd0ffb8
We must avoid to fork process when starting the Heka daemon because if
we kill the wrapper, the Heka daemon is detached and re-attach to the
init process. This can create several Heka daemon. To avoid this we need
to use exec and avoid the usage of sudo.
Fixes-bug: #1561109
Change-Id: Idbfab2de92b993d1e5124de5bff44c4b09a88bb4
This change increases the number of file descriptors for the Heka process
to 102400. By default it inherits the value from the init process that is
1024. We already hit this value and it has been demonstrated that this
value is too low.
Change-Id: Ib5adcfe8a8c90f21c3aed28db3b9544a3d8edb9a
Closes-Bug: #1543289
If we start lma_collector before the availability of rabbitmq cluster it
will fail to connect to the lma queues and then, it will fail to start.
It may take several long minutes before pacemaker starts the service.
So we need to be sure that rabbitmq cluster is up and running before
starting lma_collector.
Change-Id: Ia254b744f4173f64ee3ab8200b2896ecc412d06f
This is an import of the initial LMA PoC code. For now, it only covers
the collection of logs (notifications will be added in a subsequent
commit).
There's been a bit of rewrite to:
- decouple the Heka configuration from the LMA collector.
- run the Heka service as non-root when possible (Ubuntu only for now
due to file permission issues on CentOS [1]).
- adapt to version 0.9 of Heka.
[1] https://bugs.launchpad.net/fuel/+bug/1425954
Change-Id: I4472b49a25e18e06984b5b29bdce18f917137bc8