This feature was broken and not stable enough for production deployment.
Related-bug: #1606831
Related-bug: #1643542
Change-Id: I0ce52ec01838d891c43d6e797617d3044a02d10f
This patch uses the generic AFD filter with new alarms to replace
the custom AFD for workers.
Blueprint: allow-all-alarms-to-be-specified-in-alarming-file
Change-Id: I6c432e60a16da5bb3c8d0ecd0bd22a1246fe6f82
This patch uses the generic AFD filter with new alarms to replace the
custom AFD for API backends.
Blueprint: allow-all-alarms-to-be-specified-in-alarming-file
Change-Id: Id139e45a9942a9c86a2d35d1966b083d9c75af89
This change improves the InfluxDB write performances by increasing to
500 the maximum number of points that are sent per InfluxDB request.
InfluxDB recommends to have a batch size of 5,000 but it cannot be the default
configuration value due to the fixed sized of Heka messages (256K currently)
which leads to silently discard metrics.
Note that the InfluxDB accumulator will flush the data either when
it holds 500 points or when it hasn't data for at least 5 seconds.
Co-Authored-By: Swann Croiset <scroiset@mirantis.com>
Change-Id: I7d238375dc0c231782983fc4901c9a32936fb08a
Partial-Bug: #1581369
In some environments (especially using slow HDD drives), the
Elasticsearch backends may fail to ingest logs fast enough. As a result
the log_collector service running on the controller nodes are blocked.
To alleviate this issue, this change increases the bulk size for nodes
that generate lots of logs:
- controllers which run OpenStack API services in addition to Pacemaker.
- all nodes when the environment's log level is set to debug.
In such cases, the flush_count parameter is increased to 100 (instead of
10 by default).
Change-Id: Ifdfbcb8ff0292f695dee4deab45560f126bde242
Closes-Bug: #1617211
This change adds a collectd plugin that gets metrics from the Pacemaker
cluster:
- cluster's metrics
- node's metrics
- resource's metrics
Most of the metrics are only collected from the node that is the
designated controller except pacemaker_resource_local_active and
pacemaker_dc_local_active.
The plugin also removes the 'pacemaker_resource' plugin by providing the
exact same metrics and notifications for the other collectd plugins.
Finally the plugin is also installed on the standalone-rabbitmq and
standalone-database nodes if they are present.
Change-Id: I8b5b987704f69c6a60b13e8ea982f27924f488d1
This patch removes default parameters for InfluxDB/Elasticsearch HTTP port
and address. These parameters are always provided by callers and that the way
to go.
Change-Id: I5e346b71a7d639475f2fba92126f8d191f8cd5fd
This change increases the maximum number of points that are sent in a
single request. InfluxDB recommends to have a batch size of 5,000 so
this is now the default configuration value. Note that the InfluxDB
accumulator will flush the data either when it holds 5,000 points or
when it hasn't data for at least 5 seconds.
Change-Id: If07b7d285d216855997254952ca6d7511cff65ec
Partial-Bug: #1581369
HTTP metrics are now statistics aggregated every 10 seconds.
A new metric is emitted openstack_<service>_response_times with these
values:
- min
- max
- sum
- count
- percentile
Hence, the previous metric disappears (openstack_<service>_responses).
Implements-blueprint: aggregated-http-metrics
Change-Id: I48e92df6f4baa7be942ad138b7f23c3d15f5a24e
This change separates the processing of the logs/notifications and
metric/alerting into 2 dedicated hekad processes, these services are
named 'log_collector' and 'metric_collector'.
Both services are managed by Pacemaker on controller nodes and by Upstart on
other nodes.
All metrics computed by log_collector (HTTP response times and creation time
for instances and volumes) are sent directly to the metric_collector via TCP.
Elasticsearch output (log_collector) uses full_action='block' and the
TCP output uses full_action='drop'.
All outputs of metric_collector (InfluxDB, HTTP and TCP) use
full_action='drop'.
The buffer size configurations are:
* metric_collector:
- influxdb-output buffer size is increased to 1Gb.
- aggregator-output (tcp) buffer size is decreased to 256Mb (vs 1Gb).
- nagios outputs (x3) buffer size are decreased to 1Mb.
* log_collector:
- elasticsearch-output buffer size is decreased to 256Mb (vs 1Gb).
- tcp-output buffer size is set to 256Mb.
Implements: blueprint separate-lma-collector-pipelines
Fixes-bug: #1566748
Change-Id: Ieadb93b89f81e944e21cf8e5a65f4d683fd0ffb8
On controller nodes, the Heka poolsize must be increased to handle the load
generated by derived metrics from logs otherwise a deadlock
can happen in the filter plugins and block heka.
Fixes-bug: #1557388
Change-Id: I74362011d32d413f244c6cdb6e4625ed96759df0
And decrease the max_retries from 3 to 2 to stay in the 50 seconds window.
This change allows to retrieve large number of objects and also avoids to
overload the system by performing 3 'zombies' requests every 50 seconds
without any metrics collected.
Partial-bug: #1554502
Change-Id: I60a7611bc82598831538da01245b87fb29a15c44
The new parameter 'queue' configures the 'Queue' option of the Python collectd
plugin.
Change-Id: I5f5b1a21dd777469c7ab56688946d169ae3d917b
Related-bug: #1549721
This commit is related to the usage and documentation of the
lma_collector::collectd::mysql class.
The following changes are made:
1. Make the "username" and "password" parameters required. Today
they default to the empty string, which doesn't make much sense.
2. Change the internal resource name from "nova" to "config". The
name "nova" was confusing as the collection of MySQL statistics
is unrelated to Nova. With this change the generated collectd
configuration file is named "mysql-config.conf", which makes
more sense than "mysql-nova.conf" and is consistent with other
collectd config file names we have (e.g. "python-config.conf").
3. Add a unit test for the class.
4. Adjust the documentation.
Change-Id: I281c28d9f4da7ae728615041e175845ad5829b34
This change refactors the lma_collector Puppet module regarding the
processing of the OpenStack notifications to get rid of the coupling with
Fuel. In particular, the configuration of the OpenStack services is
done in the external manifests since there was no point to have it in
lma_collector.
The change removes also workarounds that were necessary with older
versions of the plugin:
- heat-engine is now managed as a regular service.
- the can_exit flag is reverted back to false for the AMQP plugins
Finally it restarts properly the Keystone service if necessary:
Keystone is executed as a WSGI application in Apache so we need to
restart Apache if the Keystone configuration changes.
Change-Id: I39a2d25695449271b946ddcbca00cd8911dbdbb4
Implements: blueprint lma-without-fuel