* Split out object/workers stats collection for Nova, Cinder and Neutron plugins
* Use the common interface exposed by collectd_base.Base
Change-Id: I59f698b8f09fd0d3ce375327d9e4d81d767d961c
This feature was broken and not stable enough for production deployment.
Related-bug: #1606831
Related-bug: #1643542
Change-Id: I0ce52ec01838d891c43d6e797617d3044a02d10f
Using bulk metrics for the log counters reduces largely the likelihood
of blocking the Heka pipeline. Instead of injecting (x services
* y levels) metric messages, the filter injects only one big message.
This changes also updates the configuration of the metric_collector
service to deserialize the bulk metric to support alarms on log
counters.
Change-Id: Icb71fd6faa4191795c0470ecc24aeafd25794f42
Closes-Bug: #1643280
This patch modifies the message matcher of the aggregator output to also
send metrics with no 'hostname' field. This is to evaluate the alarms
based on these metrics at the aggregator level.
See also the Change-Id I61529d6ca2d8a9a26e5fa70a776ad03c212c7982
Change-Id: Ia2597df00315cb624f1f49cd215fb6c213fb4ff5
This patch uses the generic AFD filter with new alarms to replace
the custom AFD for workers.
Blueprint: allow-all-alarms-to-be-specified-in-alarming-file
Change-Id: I6c432e60a16da5bb3c8d0ecd0bd22a1246fe6f82
This change improves the InfluxDB write performances by increasing to
500 the maximum number of points that are sent per InfluxDB request.
InfluxDB recommends to have a batch size of 5,000 but it cannot be the default
configuration value due to the fixed sized of Heka messages (256K currently)
which leads to silently discard metrics.
Note that the InfluxDB accumulator will flush the data either when
it holds 500 points or when it hasn't data for at least 5 seconds.
Co-Authored-By: Swann Croiset <scroiset@mirantis.com>
Change-Id: I7d238375dc0c231782983fc4901c9a32936fb08a
Partial-Bug: #1581369
This patch uses the generic AFD filter to replace the custom API endpoint
AFD filter.
Blueprint: allow-all-alarms-to-be-specified-in-alarming-file
Change-Id: Ic172fb716c128827930bc51cede1dcf0bffa36d2
This patch creates new plugin check_local_endpoint.py to check openstack
service locally and emits a new metric openstack_check_local_api.
Change-Id: I58290dd685b97354137ad5c0b91aece79fd91695
This patch makes hostname an optional field. Currently here are metrics
that have no hostname:
- Some metrics provided by hypervisor_stats:
- total_free_disk_GB
- total_free_ram_MB
- total_free_vcpus
- total_used_disk_GB
- total_used_ram_MB
- total_used_vcpus
- total_running_instances
- total_running_tasks
- all metrics collected by check_openstack_api
- all metrics collected by http_check
Change-Id: I4b1078ddf6ef510ae2c95ae6937b28f007d88bea
This patch makes hostname an optional field. Currently here are metrics
that have no hostname:
- Some metrics provided by hypervisor_stats:
- total_free_disk_GB
- total_free_ram_MB
- total_free_vcpus
- total_used_disk_GB
- total_used_ram_MB
- total_used_vcpus
- total_running_instances
- total_running_tasks
- all metrics collected by check_openstack_api
- all metrics collected by http_check
Change-Id: Ic503b48e995170efd2b87c9385750fe920e2e25a
This change adds a filter plugin that monitors the kernel log messages
for hard drive errors and reports the number of errors per second
as 'hdd_errors_rate'. The filter is configured for all nodes,
irrespective of their roles. An alarm is also added that triggers
a CRITICAL alert when the metric value is greater than 0.
DocImpact
Change-Id: I485f5692a3e5facf0f7ea019ccdbd70683a7dd4e
In some environments (especially using slow HDD drives), the
Elasticsearch backends may fail to ingest logs fast enough. As a result
the log_collector service running on the controller nodes are blocked.
To alleviate this issue, this change increases the bulk size for nodes
that generate lots of logs:
- controllers which run OpenStack API services in addition to Pacemaker.
- all nodes when the environment's log level is set to debug.
In such cases, the flush_count parameter is increased to 100 (instead of
10 by default).
Change-Id: Ifdfbcb8ff0292f695dee4deab45560f126bde242
Closes-Bug: #1617211
This removes duplication of code and limitations we had to deal with
because the collectd Puppet resources don't play well when they are
created at different times from several manifests.
Change-Id: I52fabb1fb5795a33f552168553a148b1520fc496