This change separates the processing of the logs/notifications and
metric/alerting into 2 dedicated hekad processes, these services are
named 'log_collector' and 'metric_collector'.
Both services are managed by Pacemaker on controller nodes and by Upstart on
other nodes.
All metrics computed by log_collector (HTTP response times and creation time
for instances and volumes) are sent directly to the metric_collector via TCP.
Elasticsearch output (log_collector) uses full_action='block' and the
TCP output uses full_action='drop'.
All outputs of metric_collector (InfluxDB, HTTP and TCP) use
full_action='drop'.
The buffer size configurations are:
* metric_collector:
- influxdb-output buffer size is increased to 1Gb.
- aggregator-output (tcp) buffer size is decreased to 256Mb (vs 1Gb).
- nagios outputs (x3) buffer size are decreased to 1Mb.
* log_collector:
- elasticsearch-output buffer size is decreased to 256Mb (vs 1Gb).
- tcp-output buffer size is set to 256Mb.
Implements: blueprint separate-lma-collector-pipelines
Fixes-bug: #1566748
Change-Id: Ieadb93b89f81e944e21cf8e5a65f4d683fd0ffb8
On controller nodes, the Heka poolsize must be increased to handle the load
generated by derived metrics from logs otherwise a deadlock
can happen in the filter plugins and block heka.
Fixes-bug: #1557388
Change-Id: I74362011d32d413f244c6cdb6e4625ed96759df0
This add these Heka configuration options in global.toml
If not provided, use the Heka default values which are currently:
* max_process_inject = 1
* max_timer_inject = 10
Change-Id: If1995fa505aec6ff3000af33c548730dd06d1046
The maximum size observed during a load test with 50 nodes is 158Kb,
the default size is 64Kb.
This is required by elasticsearch buffered output which can hit the
limited size and finally lose messages.
The Heka log:
Plugin 'elasticsearch_output' error: Message too big, requires 161024
(MAX_MESSAGE_SIZE = 65536)
Change-Id: I8970435e2f710889e4b5d2c55a53572c042ef647
This is an import of the initial LMA PoC code. For now, it only covers
the collection of logs (notifications will be added in a subsequent
commit).
There's been a bit of rewrite to:
- decouple the Heka configuration from the LMA collector.
- run the Heka service as non-root when possible (Ubuntu only for now
due to file permission issues on CentOS [1]).
- adapt to version 0.9 of Heka.
[1] https://bugs.launchpad.net/fuel/+bug/1425954
Change-Id: I4472b49a25e18e06984b5b29bdce18f917137bc8