This feature was broken and not stable enough for production deployment.
Related-bug: #1606831
Related-bug: #1643542
Change-Id: I0ce52ec01838d891c43d6e797617d3044a02d10f
This patch uses the generic AFD filter with new alarms to replace
the custom AFD for workers.
Blueprint: allow-all-alarms-to-be-specified-in-alarming-file
Change-Id: I6c432e60a16da5bb3c8d0ecd0bd22a1246fe6f82
This patch uses the generic AFD filter with new alarms to replace the
custom AFD for API backends.
Blueprint: allow-all-alarms-to-be-specified-in-alarming-file
Change-Id: Id139e45a9942a9c86a2d35d1966b083d9c75af89
This removes duplication of code and limitations we had to deal with
because the collectd Puppet resources don't play well when they are
created at different times from several manifests.
Change-Id: I52fabb1fb5795a33f552168553a148b1520fc496
This change adds a collectd plugin that gets metrics from the Pacemaker
cluster:
- cluster's metrics
- node's metrics
- resource's metrics
Most of the metrics are only collected from the node that is the
designated controller except pacemaker_resource_local_active and
pacemaker_dc_local_active.
The plugin also removes the 'pacemaker_resource' plugin by providing the
exact same metrics and notifications for the other collectd plugins.
Finally the plugin is also installed on the standalone-rabbitmq and
standalone-database nodes if they are present.
Change-Id: I8b5b987704f69c6a60b13e8ea982f27924f488d1
This change uses the information that is already avaiable in the
collector's Hiera data to decide whether the RabbitMQ collectd
plugin should be deployed or not.
Change-Id: Ib1df231d6bf99ee6f34ee199fd5241d6b264fc00
The patch uses the management API to retrieve metrics instead of
executing rabbitmqctl command.
A side effect is that all metrics per-queues are not collected anymore.
Change-Id: I5dab785321e369ec0e1a69a79e0700b276810925
Closes-bug: #1594337
With the Rabbitmq cluster hosted on dedicated nodes, the notifications
must not be collected from controller nodes.
Change-Id: I28b2d3d0c35d16815812af447b2ab8a716276645
This patches avoids to collect logs and notifications when
both Elasticsearch and InfluxDB are not (yet) deployed.
This is useless and leads to lose all logs and notifications produced
before backends are deployed.
Change-Id: I30a39d65f7a732251def32ccfb8202c34d6408c5
The mod_status class shouldn't live in the lma_collector class because
the (re)configuration of Apache isn't the responsibility of the LMA
collector module.
Change-Id: If80c9d100263436922e06aea02d2050236ff05cf
Closes-Bug: #1547424
This use case is not really supported because Nagios configuration is
too dependent on the LMA Collector plugin and in practice, no one
chooses this option.
DocImpact
Change-Id: Ia09efb40f476c1daec51530e2c0fb16bc6f99393
This allows to support several deployment scenarii where backends are not
deployed initialy, for instance when using the 'virt' nodes to deploy
LMA backends.
The patch factorizes manifests by moving all the configuration data of
InfluxDB and Elasticsearch into hiera.
DocImpact
Fixes-bug: #1570386
Change-Id: I8688bbd10d88bc8ef68b5d31e9edd62a764dc23d
HTTP metrics are now statistics aggregated every 10 seconds.
A new metric is emitted openstack_<service>_response_times with these
values:
- min
- max
- sum
- count
- percentile
Hence, the previous metric disappears (openstack_<service>_responses).
Implements-blueprint: aggregated-http-metrics
Change-Id: I48e92df6f4baa7be942ad138b7f23c3d15f5a24e
This change separates the processing of the logs/notifications and
metric/alerting into 2 dedicated hekad processes, these services are
named 'log_collector' and 'metric_collector'.
Both services are managed by Pacemaker on controller nodes and by Upstart on
other nodes.
All metrics computed by log_collector (HTTP response times and creation time
for instances and volumes) are sent directly to the metric_collector via TCP.
Elasticsearch output (log_collector) uses full_action='block' and the
TCP output uses full_action='drop'.
All outputs of metric_collector (InfluxDB, HTTP and TCP) use
full_action='drop'.
The buffer size configurations are:
* metric_collector:
- influxdb-output buffer size is increased to 1Gb.
- aggregator-output (tcp) buffer size is decreased to 256Mb (vs 1Gb).
- nagios outputs (x3) buffer size are decreased to 1Mb.
* log_collector:
- elasticsearch-output buffer size is decreased to 256Mb (vs 1Gb).
- tcp-output buffer size is set to 256Mb.
Implements: blueprint separate-lma-collector-pipelines
Fixes-bug: #1566748
Change-Id: Ieadb93b89f81e944e21cf8e5a65f4d683fd0ffb8
This change uses the Neutron API to get the status of the Neutron
agents instead of querying the MySQL database.
Change-Id: I60fa2386a887e9dac2fe4f1234d225ad6402bf2d
Partial-Bug: #1546188
This change uses the Cinder API to get the status of the Cinder workers
instead of querying the MySQL database.
Change-Id: If92596b3cee8a4c9f0dcf84454fdff2a2532160f
Partial-Bug: #1546188
This change uses the Nova API to get the status of the Nova workers
instead of querying the MySQL database.
Change-Id: I24e84b21f988e4c748d0ead134d60df4bf9dd8b1
Partial-Bug: #1546188
This change makes sure that the Puppet manifests can be executed with
the 0.8 versions of the InfluxDB/Grafana, Elasticsearch-Kibana and
Nagios plugins.
Change-Id: Ib8bb0aff3497ff7b9e7a307ddb04d15798fbd070
This change removes the Neutron agents AFD when Contrail plugin is used
to avoid reporting a DOWN status for Neutron. It also removes the
collect of the metrics of the Neutron agents.
Change-Id: I02ecb67489d244aca85bc4b1e3d4a5cd79df1b5b
Closes-Bug: #1546017
When Ceph is used as a Cinder backend, the cinder-volume process runs on each
controller.
Fixes-bug: #1546555
Change-Id: I077bcebe0b637d001cf66803a24102db9c507c15
This commit is related to the usage and documentation of the
lma_collector::collectd::mysql class.
The following changes are made:
1. Make the "username" and "password" parameters required. Today
they default to the empty string, which doesn't make much sense.
2. Change the internal resource name from "nova" to "config". The
name "nova" was confusing as the collection of MySQL statistics
is unrelated to Nova. With this change the generated collectd
configuration file is named "mysql-config.conf", which makes
more sense than "mysql-nova.conf" and is consistent with other
collectd config file names we have (e.g. "python-config.conf").
3. Add a unit test for the class.
4. Adjust the documentation.
Change-Id: I281c28d9f4da7ae728615041e175845ad5829b34