Commit Graph

346 Commits

Author SHA1 Message Date
Andreas Jaeger c929899400 Retire repository
Fuel repositories are all retired in openstack namespace, retire
remaining fuel repos in x namespace since they are unused now.

This change removes all content from the repository and adds the usual
README file to point out that the repository is retired following the
process from
https://docs.openstack.org/infra/manual/drivers.html#retiring-a-project

See also
http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011675.html

A related change is: https://review.opendev.org/699752 .

Change-Id: I8aded54f1b9f3b79f3a4bf8f607d3695b92f528b
2019-12-18 19:39:39 +01:00
Simon Pasquier 6dbab5edb7 Support CADF notifications
Change-Id: Iba89fc145b1c4d304bd843dcde9aba1c25774c45
2017-03-07 09:08:18 +01:00
Simon Pasquier f745102732 Get rid of openstack_nova_instance_state metric
This metric isn't used anywhere and has no value on its own.

Change-Id: I4b25517ace9a5721f71bd797fe073e66238f1891
2017-02-22 11:45:33 +01:00
Swann Croiset 798254de2b Force the Puppet Package provider to apt_fuel for collectd package
This allows to install unauthenitcated packages.

Change-Id: I23014138f5ce29a17dec0819c1d422676190e522
Closes-bug: #1663498
2017-02-13 15:58:32 +01:00
Swann Croiset a5c154d6ef Configure pagination for OpenStack collectd plugins
Change-Id: I32e83368b7d0d2e8b68d7f7a2df0d1b61653fa72
2017-02-09 12:54:36 +00:00
Swann Croiset 4d025aa0ab Fix Logger setting in sandboxes
A filter sandbox cannot modify the Logger field

Change-Id: Ie50bf2acb3d764504be398685c8e4e61c4e1c61b
Closes-bug: #1662879
2017-02-09 13:16:24 +01:00
Swann Croiset 36224e6963 Hot fix regarding Logger deserialization
Change-Id: Ic522766618badfc5d328d39702c2fc28bea04167
Fixes-bug: #1662879
2017-02-09 13:16:07 +01:00
Swann Croiset 7c248af9fa Rework collectd plugins for OpenStack
* Split out object/workers stats collection for Nova, Cinder and Neutron plugins
* Use the common interface exposed by collectd_base.Base

Change-Id: I59f698b8f09fd0d3ce375327d9e4d81d767d961c
2017-01-31 14:54:53 +01:00
Swann Croiset b19fd832da Correctly cleanup self-monitoring sandboxes
Change-Id: I88e794d7bbd4b056d86fcb6ca9a4cbf610370037
2017-01-25 14:46:28 +01:00
Swann Croiset 73817fba86 Fix Puppet specs
Change-Id: I66049f6aa166be4681109d3d8f204c24c891a70e
2017-01-18 11:47:50 +01:00
Swann Croiset 64279d1c4b Reduce influxdb accumulator flush_count to 400
Because 500 items leads to dropped datapoints.

Change-Id: Ib99fd19e76ad071981f366d43a0f96a10ddc9a96
2017-01-13 09:32:17 +00:00
Jenkins bd78f34d52 Merge "Remove the SMTP standalone alerting_mode" 2017-01-10 12:52:17 +00:00
Jenkins ff282d73ce Merge "Disable Heka "self-monitoring"" 2017-01-09 13:54:16 +00:00
Swann Croiset b7c7e7bdc2 Remove the SMTP standalone alerting_mode
This feature was broken and not stable enough for production deployment.

Related-bug: #1606831
Related-bug: #1643542

Change-Id: I0ce52ec01838d891c43d6e797617d3044a02d10f
2017-01-09 13:23:14 +01:00
Simon Pasquier 72fe1f64fe Send log_messages metric as bulk
Using bulk metrics for the log counters reduces largely the likelihood
of blocking the Heka pipeline. Instead of injecting (x services
* y levels) metric messages, the filter injects only one big message.

This changes also updates the configuration of the metric_collector
service to deserialize the bulk metric to support alarms on log
counters.

Change-Id: Icb71fd6faa4191795c0470ecc24aeafd25794f42
Closes-Bug: #1643280
2017-01-06 15:24:03 +01:00
Swann Croiset 5b65f279ce Disable Heka "self-monitoring"
Change-Id: If548c132d5847b8223284a2bb0ad288c695d9ec3
Related-bug: #1643280
2017-01-03 16:33:36 +00:00
Simon Pasquier 2bec604175 Fix AFD message matcher for multivalue metrics
Change-Id: Id0bafe4219aec06228e540c913e167a4c4bf9350
Closes-Bug: #1649575
2016-12-14 11:21:36 +01:00
Simon Pasquier 737336a09c Enforce timezone setting in log processing
Change-Id: I1fc5ecf8471c2effa1dadd72cf369c64bb11ec41
Closes-Bug: #1633074
2016-11-08 09:42:33 +01:00
Swann Croiset bc62f5eeae Add new cluster policy for local API checks
Change-Id: I18a505f90950385fcb8c51359adc4255d2837425
Closes-Bug: #1634503
2016-10-25 18:29:44 +02:00
Jenkins b064db32b5 Merge "Do not send cluster AFDs to Nagios" 2016-10-13 15:17:57 +00:00
Swann Croiset a88bea8558 Do not send cluster AFDs to Nagios
Change-Id: Ic74a79452f79cdd9774246b1d2c39cc4a0a0b30c
2016-10-13 16:11:50 +02:00
Guillaume Thouvenin 847cdd5367 Send metrics without 'hostname' to the aggregator
This patch modifies the message matcher of the aggregator output to also
send metrics with no 'hostname' field. This is to evaluate the alarms
based on these metrics at the aggregator level.

See also the Change-Id I61529d6ca2d8a9a26e5fa70a776ad03c212c7982

Change-Id: Ia2597df00315cb624f1f49cd215fb6c213fb4ff5
2016-10-13 09:42:47 +02:00
Swann Croiset 8bc378c486 Rename GSE alerting attribute
Change-Id: I30b16d7ef159242f9984b54a8ae344fbf6560314
2016-10-10 16:24:06 +02:00
Swann Croiset 3dd804d2cc Monitor FSType tmpfs
Change-Id: Ib03418755f0a090599e6eb1985df79625f0b2851
2016-10-06 19:05:36 +00:00
Guillaume Thouvenin 9dbf48dbfe Replace the workers AFD filter
This patch uses the generic AFD filter with new alarms to replace
the custom AFD for workers.

Blueprint: allow-all-alarms-to-be-specified-in-alarming-file
Change-Id: I6c432e60a16da5bb3c8d0ecd0bd22a1246fe6f82
2016-10-06 09:05:30 +02:00
Simon Pasquier 2cc44ddba0 Increase the number of points per InfluxDB batch
This change improves the InfluxDB write performances by increasing to
500 the maximum number of points that are sent per InfluxDB request.
InfluxDB recommends to have a batch size of 5,000 but it cannot be the default
configuration value due to the fixed sized of Heka messages (256K currently)
which leads to silently discard metrics.
Note that the InfluxDB accumulator will flush the data either when
it holds 500 points or when it hasn't data for at least 5 seconds.

Co-Authored-By: Swann Croiset <scroiset@mirantis.com>

Change-Id: I7d238375dc0c231782983fc4901c9a32936fb08a
Partial-Bug: #1581369
2016-09-30 14:37:40 +02:00
Guillaume Thouvenin d61b9e9e2c Replace the API endpoint AFD filter
This patch uses the generic AFD filter to replace the custom API endpoint
AFD filter.

Blueprint: allow-all-alarms-to-be-specified-in-alarming-file
Change-Id: Ic172fb716c128827930bc51cede1dcf0bffa36d2
2016-09-26 09:56:25 +02:00
Guillaume Thouvenin c5eebea265 Add local API check
This patch creates new plugin check_local_endpoint.py to check openstack
service locally and emits a new metric openstack_check_local_api.

Change-Id: I58290dd685b97354137ad5c0b91aece79fd91695
2016-09-21 14:05:55 +02:00
Guillaume Thouvenin 7cf60c3c33 Make hostname an optional field
This patch makes hostname an optional field. Currently here are metrics
that have no hostname:
    - Some metrics provided by hypervisor_stats:
        - total_free_disk_GB
        - total_free_ram_MB
        - total_free_vcpus
        - total_used_disk_GB
        - total_used_ram_MB
        - total_used_vcpus
        - total_running_instances
        - total_running_tasks
    - all metrics collected by check_openstack_api
    - all metrics collected by http_check

Change-Id: I4b1078ddf6ef510ae2c95ae6937b28f007d88bea
2016-09-21 09:13:18 +00:00
Swann Croiset 0c050cb8eb Revert "Make hostname an optional field"
This reverts commit bb67a13062.

Change-Id: I64efa48d22c15c3893d4da0783143470db75c5e8
2016-09-20 10:49:43 +02:00
Swann Croiset 553d2040cc Send GSE service clusters status to alerting
Change-Id: Iad33e1f4bffd81066a82a0d73a46e7b489eb23d7
blueprint: alarming-refactoring
2016-09-20 09:41:54 +02:00
Swann Croiset cc5aadb474 Do not send GSE to Nagios when activate_alerting=false
DocImpact
blueprint: alarming-refactoring

Change-Id: Ie343672afd4a222a3d7f920182a0b2e90e1fd6de
2016-09-20 09:41:54 +02:00
Swann Croiset 692cb46fbe Do not send AFD to Nagios when activate_alerting=false
blueprint: alarming-refactoring

Change-Id: Ifb82ec16dcece731528c1ec7d84c96d83d452212
2016-09-20 09:41:54 +02:00
Swann Croiset 7deace8726 Alarm definition refactoring
DocImpact
blueprint: alarming-refactoring

Change-Id: I8c053f2fbc4b4b85958be8413919f9bf1b168027
2016-09-20 09:41:54 +02:00
Guillaume Thouvenin bb67a13062 Make hostname an optional field
This patch makes hostname an optional field. Currently here are metrics
that have no hostname:
    - Some metrics provided by hypervisor_stats:
        - total_free_disk_GB
        - total_free_ram_MB
        - total_free_vcpus
        - total_used_disk_GB
        - total_used_ram_MB
        - total_used_vcpus
        - total_running_instances
        - total_running_tasks
    - all metrics collected by check_openstack_api
    - all metrics collected by http_check

Change-Id: Ic503b48e995170efd2b87c9385750fe920e2e25a
2016-09-16 09:42:59 +02:00
Jenkins ea9338ab8a Merge "Add monitoring of HDD errors" 2016-09-07 13:51:36 +00:00
Ildar Svetlov 99e2863c14 Add monitoring of HDD errors
This change adds a filter plugin that monitors the kernel log messages
for hard drive errors and reports the number of errors per second
as 'hdd_errors_rate'. The filter is configured for all nodes,
irrespective of their roles. An alarm is also added that triggers
a CRITICAL alert when the metric value is greater than 0.

DocImpact

Change-Id: I485f5692a3e5facf0f7ea019ccdbd70683a7dd4e
2016-09-06 11:47:59 +03:00
Jenkins 694079600e Merge "Increase the Elasticsearch queue to 1Gb" 2016-09-02 18:04:31 +00:00
Guillaume Thouvenin 20e6fbaab2 Add support to check Apache
This patch adds the collectd plugin to check Apache and it also adds
a new alarm.

Change-Id: I70dc85dae2de7e7afa1d2a046c96071d242a60b1
2016-09-02 06:28:04 +00:00
Jenkins c02cb15a5b Merge "Increase the Elasticsearch bulk size when required" 2016-08-29 15:35:30 +00:00
Swann Croiset 83db24f549 Increase the Elasticsearch bulk size when required
In some environments (especially using slow HDD drives), the
Elasticsearch backends may fail to ingest logs fast enough. As a result
the log_collector service running on the controller nodes are blocked.

To alleviate this issue, this change increases the bulk size for nodes
that generate lots of logs:
- controllers which run OpenStack API services in addition to Pacemaker.
- all nodes when the environment's log level is set to debug.

In such cases, the flush_count parameter is increased to 100 (instead of
10 by default).

Change-Id: Ifdfbcb8ff0292f695dee4deab45560f126bde242
Closes-Bug: #1617211
2016-08-29 15:17:44 +00:00
Jenkins b835f66af1 Merge "Add a dedicated manifest to configure collectd" 2016-08-29 13:05:03 +00:00
Guillaume Thouvenin 38ed9a1b82 Add metric about the volume attachment time
This patch adds a new metric that is the time it takes to attach a
volume to an instance.

Change-Id: I5aedb4a60cddbff34b9fead8e465429058575f33
2016-08-26 14:36:07 +00:00
Simon Pasquier 38ec02fe46 Add a dedicated manifest to configure collectd
This removes duplication of code and limitations we had to deal with
because the collectd Puppet resources don't play well when they are
created at different times from several manifests.

Change-Id: I52fabb1fb5795a33f552168553a148b1520fc496
2016-08-26 15:59:04 +02:00
Jenkins 16b288b57a Merge "Configure AFD alarms against 'mysql_check' metric" 2016-08-26 13:39:38 +00:00
Jenkins 3e27113788 Merge "Add swap_percent_used metric" 2016-08-26 13:32:44 +00:00
Swann Croiset 7f1f3bd59f Configure AFD alarms against 'mysql_check' metric
Change-Id: Ib15fea4ab041243e44a61c9d54d1f154b02d34af
2016-08-26 15:23:07 +02:00
Igor Degtiarov a0bd5a76d8 Add swap_percent_used metric
Change-Id: I1ac8dc82ecfd9c52ceaa58fbe06edfcea9576a05
2016-08-26 10:59:24 +02:00
Swann Croiset 26c5788684 Check memcached service on controller nodes
The patch replaces the service_heartbeat mechanism.

Change-Id: I060e10320cf6f8b874a39037b1f9257ed1996342
2016-08-26 10:56:06 +02:00
Swann Croiset 5c4b3eb2e6 Add Python collectd plugin to check memcached availability
This plugin emits check metrics for memcached.

Change-Id: I5b0fba60d076080503e34f751fccaae801ca327a
2016-08-26 10:54:57 +02:00