fuel-plugin-lma-collector

Commit Graph

Author	SHA1	Message	Date
Andreas Jaeger	c929899400	Retire repository Fuel repositories are all retired in openstack namespace, retire remaining fuel repos in x namespace since they are unused now. This change removes all content from the repository and adds the usual README file to point out that the repository is retired following the process from https://docs.openstack.org/infra/manual/drivers.html#retiring-a-project See also http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011675.html A related change is: https://review.opendev.org/699752 . Change-Id: I8aded54f1b9f3b79f3a4bf8f607d3695b92f528b	2019-12-18 19:39:39 +01:00
Simon Pasquier	6dbab5edb7	Support CADF notifications Change-Id: Iba89fc145b1c4d304bd843dcde9aba1c25774c45	2017-03-07 09:08:18 +01:00
Simon Pasquier	f745102732	Get rid of openstack_nova_instance_state metric This metric isn't used anywhere and has no value on its own. Change-Id: I4b25517ace9a5721f71bd797fe073e66238f1891	2017-02-22 11:45:33 +01:00
Swann Croiset	798254de2b	Force the Puppet Package provider to apt_fuel for collectd package This allows to install unauthenitcated packages. Change-Id: I23014138f5ce29a17dec0819c1d422676190e522 Closes-bug: #1663498	2017-02-13 15:58:32 +01:00
Swann Croiset	a5c154d6ef	Configure pagination for OpenStack collectd plugins Change-Id: I32e83368b7d0d2e8b68d7f7a2df0d1b61653fa72	2017-02-09 12:54:36 +00:00
Swann Croiset	4d025aa0ab	Fix Logger setting in sandboxes A filter sandbox cannot modify the Logger field Change-Id: Ie50bf2acb3d764504be398685c8e4e61c4e1c61b Closes-bug: #1662879	2017-02-09 13:16:24 +01:00
Swann Croiset	36224e6963	Hot fix regarding Logger deserialization Change-Id: Ic522766618badfc5d328d39702c2fc28bea04167 Fixes-bug: #1662879	2017-02-09 13:16:07 +01:00
Swann Croiset	7c248af9fa	Rework collectd plugins for OpenStack * Split out object/workers stats collection for Nova, Cinder and Neutron plugins * Use the common interface exposed by collectd_base.Base Change-Id: I59f698b8f09fd0d3ce375327d9e4d81d767d961c	2017-01-31 14:54:53 +01:00
Swann Croiset	b19fd832da	Correctly cleanup self-monitoring sandboxes Change-Id: I88e794d7bbd4b056d86fcb6ca9a4cbf610370037	2017-01-25 14:46:28 +01:00
Swann Croiset	73817fba86	Fix Puppet specs Change-Id: I66049f6aa166be4681109d3d8f204c24c891a70e	2017-01-18 11:47:50 +01:00
Swann Croiset	64279d1c4b	Reduce influxdb accumulator flush_count to 400 Because 500 items leads to dropped datapoints. Change-Id: Ib99fd19e76ad071981f366d43a0f96a10ddc9a96	2017-01-13 09:32:17 +00:00
Jenkins	bd78f34d52	Merge "Remove the SMTP standalone alerting_mode"	2017-01-10 12:52:17 +00:00
Jenkins	ff282d73ce	Merge "Disable Heka "self-monitoring""	2017-01-09 13:54:16 +00:00
Swann Croiset	b7c7e7bdc2	Remove the SMTP standalone alerting_mode This feature was broken and not stable enough for production deployment. Related-bug: #1606831 Related-bug: #1643542 Change-Id: I0ce52ec01838d891c43d6e797617d3044a02d10f	2017-01-09 13:23:14 +01:00
Simon Pasquier	72fe1f64fe	Send log_messages metric as bulk Using bulk metrics for the log counters reduces largely the likelihood of blocking the Heka pipeline. Instead of injecting (x services * y levels) metric messages, the filter injects only one big message. This changes also updates the configuration of the metric_collector service to deserialize the bulk metric to support alarms on log counters. Change-Id: Icb71fd6faa4191795c0470ecc24aeafd25794f42 Closes-Bug: #1643280	2017-01-06 15:24:03 +01:00
Swann Croiset	5b65f279ce	Disable Heka "self-monitoring" Change-Id: If548c132d5847b8223284a2bb0ad288c695d9ec3 Related-bug: #1643280	2017-01-03 16:33:36 +00:00
Simon Pasquier	2bec604175	Fix AFD message matcher for multivalue metrics Change-Id: Id0bafe4219aec06228e540c913e167a4c4bf9350 Closes-Bug: #1649575	2016-12-14 11:21:36 +01:00
Simon Pasquier	737336a09c	Enforce timezone setting in log processing Change-Id: I1fc5ecf8471c2effa1dadd72cf369c64bb11ec41 Closes-Bug: #1633074	2016-11-08 09:42:33 +01:00
Swann Croiset	bc62f5eeae	Add new cluster policy for local API checks Change-Id: I18a505f90950385fcb8c51359adc4255d2837425 Closes-Bug: #1634503	2016-10-25 18:29:44 +02:00
Jenkins	b064db32b5	Merge "Do not send cluster AFDs to Nagios"	2016-10-13 15:17:57 +00:00
Swann Croiset	a88bea8558	Do not send cluster AFDs to Nagios Change-Id: Ic74a79452f79cdd9774246b1d2c39cc4a0a0b30c	2016-10-13 16:11:50 +02:00
Guillaume Thouvenin	847cdd5367	Send metrics without 'hostname' to the aggregator This patch modifies the message matcher of the aggregator output to also send metrics with no 'hostname' field. This is to evaluate the alarms based on these metrics at the aggregator level. See also the Change-Id I61529d6ca2d8a9a26e5fa70a776ad03c212c7982 Change-Id: Ia2597df00315cb624f1f49cd215fb6c213fb4ff5	2016-10-13 09:42:47 +02:00
Swann Croiset	8bc378c486	Rename GSE alerting attribute Change-Id: I30b16d7ef159242f9984b54a8ae344fbf6560314	2016-10-10 16:24:06 +02:00
Swann Croiset	3dd804d2cc	Monitor FSType tmpfs Change-Id: Ib03418755f0a090599e6eb1985df79625f0b2851	2016-10-06 19:05:36 +00:00
Guillaume Thouvenin	9dbf48dbfe	Replace the workers AFD filter This patch uses the generic AFD filter with new alarms to replace the custom AFD for workers. Blueprint: allow-all-alarms-to-be-specified-in-alarming-file Change-Id: I6c432e60a16da5bb3c8d0ecd0bd22a1246fe6f82	2016-10-06 09:05:30 +02:00
Simon Pasquier	2cc44ddba0	Increase the number of points per InfluxDB batch This change improves the InfluxDB write performances by increasing to 500 the maximum number of points that are sent per InfluxDB request. InfluxDB recommends to have a batch size of 5,000 but it cannot be the default configuration value due to the fixed sized of Heka messages (256K currently) which leads to silently discard metrics. Note that the InfluxDB accumulator will flush the data either when it holds 500 points or when it hasn't data for at least 5 seconds. Co-Authored-By: Swann Croiset <scroiset@mirantis.com> Change-Id: I7d238375dc0c231782983fc4901c9a32936fb08a Partial-Bug: #1581369	2016-09-30 14:37:40 +02:00
Guillaume Thouvenin	d61b9e9e2c	Replace the API endpoint AFD filter This patch uses the generic AFD filter to replace the custom API endpoint AFD filter. Blueprint: allow-all-alarms-to-be-specified-in-alarming-file Change-Id: Ic172fb716c128827930bc51cede1dcf0bffa36d2	2016-09-26 09:56:25 +02:00
Guillaume Thouvenin	c5eebea265	Add local API check This patch creates new plugin check_local_endpoint.py to check openstack service locally and emits a new metric openstack_check_local_api. Change-Id: I58290dd685b97354137ad5c0b91aece79fd91695	2016-09-21 14:05:55 +02:00
Guillaume Thouvenin	7cf60c3c33	Make hostname an optional field This patch makes hostname an optional field. Currently here are metrics that have no hostname: - Some metrics provided by hypervisor_stats: - total_free_disk_GB - total_free_ram_MB - total_free_vcpus - total_used_disk_GB - total_used_ram_MB - total_used_vcpus - total_running_instances - total_running_tasks - all metrics collected by check_openstack_api - all metrics collected by http_check Change-Id: I4b1078ddf6ef510ae2c95ae6937b28f007d88bea	2016-09-21 09:13:18 +00:00
Swann Croiset	0c050cb8eb	Revert "Make hostname an optional field" This reverts commit `bb67a13062`. Change-Id: I64efa48d22c15c3893d4da0783143470db75c5e8	2016-09-20 10:49:43 +02:00
Swann Croiset	553d2040cc	Send GSE service clusters status to alerting Change-Id: Iad33e1f4bffd81066a82a0d73a46e7b489eb23d7 blueprint: alarming-refactoring	2016-09-20 09:41:54 +02:00
Swann Croiset	cc5aadb474	Do not send GSE to Nagios when activate_alerting=false DocImpact blueprint: alarming-refactoring Change-Id: Ie343672afd4a222a3d7f920182a0b2e90e1fd6de	2016-09-20 09:41:54 +02:00
Swann Croiset	692cb46fbe	Do not send AFD to Nagios when activate_alerting=false blueprint: alarming-refactoring Change-Id: Ifb82ec16dcece731528c1ec7d84c96d83d452212	2016-09-20 09:41:54 +02:00
Swann Croiset	7deace8726	Alarm definition refactoring DocImpact blueprint: alarming-refactoring Change-Id: I8c053f2fbc4b4b85958be8413919f9bf1b168027	2016-09-20 09:41:54 +02:00
Guillaume Thouvenin	bb67a13062	Make hostname an optional field This patch makes hostname an optional field. Currently here are metrics that have no hostname: - Some metrics provided by hypervisor_stats: - total_free_disk_GB - total_free_ram_MB - total_free_vcpus - total_used_disk_GB - total_used_ram_MB - total_used_vcpus - total_running_instances - total_running_tasks - all metrics collected by check_openstack_api - all metrics collected by http_check Change-Id: Ic503b48e995170efd2b87c9385750fe920e2e25a	2016-09-16 09:42:59 +02:00
Jenkins	ea9338ab8a	Merge "Add monitoring of HDD errors"	2016-09-07 13:51:36 +00:00
Ildar Svetlov	99e2863c14	Add monitoring of HDD errors This change adds a filter plugin that monitors the kernel log messages for hard drive errors and reports the number of errors per second as 'hdd_errors_rate'. The filter is configured for all nodes, irrespective of their roles. An alarm is also added that triggers a CRITICAL alert when the metric value is greater than 0. DocImpact Change-Id: I485f5692a3e5facf0f7ea019ccdbd70683a7dd4e	2016-09-06 11:47:59 +03:00
Jenkins	694079600e	Merge "Increase the Elasticsearch queue to 1Gb"	2016-09-02 18:04:31 +00:00
Guillaume Thouvenin	20e6fbaab2	Add support to check Apache This patch adds the collectd plugin to check Apache and it also adds a new alarm. Change-Id: I70dc85dae2de7e7afa1d2a046c96071d242a60b1	2016-09-02 06:28:04 +00:00
Jenkins	c02cb15a5b	Merge "Increase the Elasticsearch bulk size when required"	2016-08-29 15:35:30 +00:00
Swann Croiset	83db24f549	Increase the Elasticsearch bulk size when required In some environments (especially using slow HDD drives), the Elasticsearch backends may fail to ingest logs fast enough. As a result the log_collector service running on the controller nodes are blocked. To alleviate this issue, this change increases the bulk size for nodes that generate lots of logs: - controllers which run OpenStack API services in addition to Pacemaker. - all nodes when the environment's log level is set to debug. In such cases, the flush_count parameter is increased to 100 (instead of 10 by default). Change-Id: Ifdfbcb8ff0292f695dee4deab45560f126bde242 Closes-Bug: #1617211	2016-08-29 15:17:44 +00:00
Jenkins	b835f66af1	Merge "Add a dedicated manifest to configure collectd"	2016-08-29 13:05:03 +00:00
Guillaume Thouvenin	38ed9a1b82	Add metric about the volume attachment time This patch adds a new metric that is the time it takes to attach a volume to an instance. Change-Id: I5aedb4a60cddbff34b9fead8e465429058575f33	2016-08-26 14:36:07 +00:00
Simon Pasquier	38ec02fe46	Add a dedicated manifest to configure collectd This removes duplication of code and limitations we had to deal with because the collectd Puppet resources don't play well when they are created at different times from several manifests. Change-Id: I52fabb1fb5795a33f552168553a148b1520fc496	2016-08-26 15:59:04 +02:00
Jenkins	16b288b57a	Merge "Configure AFD alarms against 'mysql_check' metric"	2016-08-26 13:39:38 +00:00
Jenkins	3e27113788	Merge "Add swap_percent_used metric"	2016-08-26 13:32:44 +00:00
Swann Croiset	7f1f3bd59f	Configure AFD alarms against 'mysql_check' metric Change-Id: Ib15fea4ab041243e44a61c9d54d1f154b02d34af	2016-08-26 15:23:07 +02:00
Igor Degtiarov	a0bd5a76d8	Add swap_percent_used metric Change-Id: I1ac8dc82ecfd9c52ceaa58fbe06edfcea9576a05	2016-08-26 10:59:24 +02:00
Swann Croiset	26c5788684	Check memcached service on controller nodes The patch replaces the service_heartbeat mechanism. Change-Id: I060e10320cf6f8b874a39037b1f9257ed1996342	2016-08-26 10:56:06 +02:00
Swann Croiset	5c4b3eb2e6	Add Python collectd plugin to check memcached availability This plugin emits check metrics for memcached. Change-Id: I5b0fba60d076080503e34f751fccaae801ca327a	2016-08-26 10:54:57 +02:00

1 2 3 4 5 ...

346 Commits