Commit Graph

68 Commits

Author SHA1 Message Date
Andreas Jaeger c929899400 Retire repository
Fuel repositories are all retired in openstack namespace, retire
remaining fuel repos in x namespace since they are unused now.

This change removes all content from the repository and adds the usual
README file to point out that the repository is retired following the
process from
https://docs.openstack.org/infra/manual/drivers.html#retiring-a-project

See also
http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011675.html

A related change is: https://review.opendev.org/699752 .

Change-Id: I8aded54f1b9f3b79f3a4bf8f607d3695b92f528b
2019-12-18 19:39:39 +01:00
Swann Croiset 0460a45377 Combine the global and per aggregate Nova memory alarms
Change-Id: I96b8e446e6776ffa28cc1537f8a1bd023c2447fa
Closes-bug: #1659275
2017-01-25 13:34:10 +01:00
Olivier Bourdon dd5d89f7c9 Add alarms for Nova aggregates
Change-Id: Ia82d5baf754d2d61c2bfa6d882ace3c8d094eafc
Depends-On: I6647600d73991bfbfc7b7c199a7f9b90b9294f68
2017-01-19 13:32:54 +01:00
Swann Croiset 8db734a584 Add no_data_policy=skip for all workers alarms
Related metrics are collected only from one node at a time.

Change-Id: I0751bb14eaf6e2fe5f2df2b2f7593cf1cc20b23b
2016-10-12 12:40:26 +00:00
Swann Croiset 8794ee5b3b Revert "Remove the no_data_policy=skip for AFD"
This reverts commit 1612638e62.

Change-Id: I9ed3f4c48835e799a08442b5ba8470ca6f676922
2016-10-12 12:40:22 +00:00
Swann Croiset 347d3ce451 Enable notifications for HDD errors
Change-Id: I8c377e12de00323279a8d1d0f2378c7aaafea379
2016-10-12 12:40:16 +00:00
Swann Croiset 731265cdc8 Support alerting attribute per AFD
Change-Id: I29aba65d35a12cc56a91c10f893e38a35ea3abf9
2016-10-12 12:39:55 +00:00
Swann Croiset 38d968a41e Fix workers alarms
These alarms are not tied to nodes

Change-Id: I2a59aa27dce34af1ab741a5bc73dfb297f8812d4
2016-10-10 16:24:06 +02:00
Swann Croiset 58f737e712 Rename AFD filter alerting attribute
Change-Id: I2e32ea0eca4f581ddc467b792ca49b002ba10d76
2016-10-10 16:24:06 +02:00
Swann Croiset 864f90190c Fix other-fs alarm critical severity
Change-Id: Idb920ae31bacf1a21cb612bc4860e3c25fca639f
2016-10-06 18:02:59 +02:00
Jenkins c5e2b1f0bd Merge "Remove the no_data_policy=skip for AFD" 2016-10-06 14:05:24 +00:00
Guillaume Thouvenin 9dbf48dbfe Replace the workers AFD filter
This patch uses the generic AFD filter with new alarms to replace
the custom AFD for workers.

Blueprint: allow-all-alarms-to-be-specified-in-alarming-file
Change-Id: I6c432e60a16da5bb3c8d0ecd0bd22a1246fe6f82
2016-10-06 09:05:30 +02:00
Guillaume Thouvenin 215f693307 Replace the API backends AFD filter
This patch uses the generic AFD filter with new alarms to replace the
custom AFD for API backends.

Blueprint: allow-all-alarms-to-be-specified-in-alarming-file
Change-Id: Id139e45a9942a9c86a2d35d1966b083d9c75af89
2016-10-05 15:41:55 +00:00
Swann Croiset a99cb11ccb Add missing heat-api-endpoint cluster aggregation rule
Change-Id: I614c8374b737b0529711cb25ba21301ddf47622f
2016-10-05 16:50:30 +02:00
Swann Croiset 1612638e62 Remove the no_data_policy=skip for AFD
The patch renames in mean while the no_data_policy to no_data_severity.

Change-Id: I415540c122b2a07c408bcb30e16212b4a2abab3c
2016-09-30 09:19:04 +02:00
Swann Croiset 213bedf712 Split top-level clusters health by (control|data)-plane
blueprint: alarm-refactoring
Change-Id: Ifbafb2deb8547e830d6ee22a7b00600180f2c4a5
2016-09-26 14:15:16 +02:00
Guillaume Thouvenin a3bcfd6102 Add alarms on OpenStack services local API endpoint
This patch adds alarms on nova, cinder, neutron, heat, glance and
keystone public API.

Change-Id: Ia1d0d85cdf2742b6f5a529a70f7b295147662170
2016-09-26 09:56:33 +02:00
Guillaume Thouvenin d61b9e9e2c Replace the API endpoint AFD filter
This patch uses the generic AFD filter to replace the custom API endpoint
AFD filter.

Blueprint: allow-all-alarms-to-be-specified-in-alarming-file
Change-Id: Ic172fb716c128827930bc51cede1dcf0bffa36d2
2016-09-26 09:56:25 +02:00
Swann Croiset 12cf9471dd Add alarms on Nova free VCPU and free memory
Change-Id: Id827630810b9a8fbf37be5bc833acf23e3b0ee7d
2016-09-22 14:47:08 +00:00
Swann Croiset 5502a158c7 Include Ceph OSD node to the storage cluster
This is the previous logical representation which has been lost with the
last refactoring.

Change-Id: I07e86225c8b1591fa7cfb020a4eee31fea9d9509
2016-09-20 09:41:54 +02:00
Swann Croiset d4d3661fb7 Configure alarms for OSD disk(s)
Change-Id: Id169250d635bd4d731eca2a292c78fd690c2ba94
2016-09-20 09:41:54 +02:00
Swann Croiset e318104fc0 Monitor all partitions
Change-Id: Ibafd5cabd4d5f86e8050fb278a114a69d00476d8
Closes-bug: #1587004
2016-09-20 09:41:54 +02:00
Swann Croiset 1eb2628689 Avoid alarm flapping for Ceph OSD checks
Change-Id: Iea110acd362da16245c51caf4d7048c113792c1f
2016-09-20 09:41:54 +02:00
Swann Croiset 015ad15ec4 Add alarm for Horizon HTTP 5xx errors
The patch also fixes the GSE horizon-(web|ui) wrong definition.

Change-Id: I4a7a64c87ac8c9fe3929ec98ebc8de51e9292a26
2016-09-20 09:41:54 +02:00
Swann Croiset 56dbeae0be Add default AFD for unknown fuel roles
This patch configures default alarms for all nodes with roles not
defined in node_profiles.yaml

Change-Id: Iff0aca6f09f8d3c721c3cac64010d5cde2c9225e
blueprint: alarming-refactoring
2016-09-20 09:41:54 +02:00
Swann Croiset 7deace8726 Alarm definition refactoring
DocImpact
blueprint: alarming-refactoring

Change-Id: I8c053f2fbc4b4b85958be8413919f9bf1b168027
2016-09-20 09:41:54 +02:00
Swann Croiset 6bebf2d91b Fix the Swift api http-errors alarm
The HAProxy backend has been renamed to object-storage (previously named
swift-api).
To stay compatible to previous Fuel version, the alarm is configured to
match both names.

Change-Id: Ie82981d6b4422cb15c3090e2387bcd37fcfb98e8
Closes-bug: 1623843
2016-09-15 14:50:27 +02:00
Swann Croiset db0d32bcb7 Add alarm for Swift error logs
This alarm definition was missing while the AFD is present.

Change-Id: I04a530193ba1c0e2611e29131af1610dc955238b
2016-09-15 14:49:09 +02:00
Jenkins 6ab568a40d Merge "Revert no_data_policy=okay for alarm based on HAProxy HTTP 5xx metrics." 2016-09-09 15:02:43 +00:00
Jenkins ea9338ab8a Merge "Add monitoring of HDD errors" 2016-09-07 13:51:36 +00:00
Swann Croiset 753ea07320 Revert no_data_policy=okay for alarm based on HAProxy HTTP 5xx metrics.
These metrics are perdiodically collected by collectd hence it is better
to have UNKNOWN status when no datapoint is received.

Change-Id: Iad673a04bf5d99319beca5fb8d0c29f42fb253a4
2016-09-06 12:23:44 +00:00
Ildar Svetlov 99e2863c14 Add monitoring of HDD errors
This change adds a filter plugin that monitors the kernel log messages
for hard drive errors and reports the number of errors per second
as 'hdd_errors_rate'. The filter is configured for all nodes,
irrespective of their roles. An alarm is also added that triggers
a CRITICAL alert when the metric value is greater than 0.

DocImpact

Change-Id: I485f5692a3e5facf0f7ea019ccdbd70683a7dd4e
2016-09-06 11:47:59 +03:00
Swann Croiset f6a1d8b611 Add missing function for network dropped-tx alarm
Closes-bug: #1620285

Change-Id: Iaa42ff650ce3d4dfa4d1f8f274221cbf3d533d06
2016-09-05 12:20:38 +00:00
Jenkins fd2cde286a Merge "Add alarm on instance creation time" 2016-09-02 18:01:06 +00:00
Jenkins 5323af2513 Merge "Expand RabbitMQ alarms on Pacemaker metrics" 2016-09-02 14:04:38 +00:00
Guillaume Thouvenin 20e6fbaab2 Add support to check Apache
This patch adds the collectd plugin to check Apache and it also adds
a new alarm.

Change-Id: I70dc85dae2de7e7afa1d2a046c96071d242a60b1
2016-09-02 06:28:04 +00:00
Simon Pasquier 02be6a39ba Expand RabbitMQ alarms on Pacemaker metrics
This change also reworks a bit the other RabbitMQ alarms to have more
meaningful alerts.

Change-Id: I9e1d7ecbcff00e772ba1812e79dfde6856ea2f14
2016-08-31 17:16:25 +02:00
Simon Pasquier 0f84cef39e Add alarm on instance creation time
Change-Id: If39190fbedff033b768c6e67be7b06ade7bf7d77
2016-08-30 18:07:26 +02:00
Simon Pasquier 80412577f8 Set no_data_policy to 'okay' for sporadic metrics
For metrics that aren't emitted at periodic intervals (eg metrics
derived from logs), it is better to consider the status to be 'okay'
when no data is received.

Change-Id: I95b20a91d67f7eb9c92b16a8e3957b104fd0baa8
2016-08-30 14:35:25 +02:00
Simon Pasquier eb9f36fa63 Fix the GSE filter wrt Pacemaker metrics
With the recent refactoring [1] of the Pacemaker collectd plugin, the
GSE filter may receive Pacemaker metrics from the other nodes of the
cluster. The Heka filter needs to be updated to discard these messages
otherwise the GSE filter flaps between active and inactive state.

[1] I8b5b987704f69c6a60b13e8ea982f27924f488d1

Change-Id: I6047da6ec5d28f22d309f1858bfbf5d3558cfcb4
Closes-Bug: #1616860
2016-08-30 07:59:33 +00:00
Jenkins 16b288b57a Merge "Configure AFD alarms against 'mysql_check' metric" 2016-08-26 13:39:38 +00:00
Guillaume Thouvenin 604541b8e7 Alarm when Keystone is too slow
This patch adds a warning level alarm when the Keystone service doesn't
response fast enough.

Change-Id: Ib523e9e61204daf4b7ad271623ece88637505965
2016-08-26 13:33:36 +00:00
Jenkins da865a4300 Merge "Add alarm based on swap_percent_used metric" 2016-08-26 13:32:50 +00:00
Jenkins 3f1dc694cf Merge "Add alarm based on swap activity metrics" 2016-08-26 13:32:36 +00:00
Swann Croiset 7f1f3bd59f Configure AFD alarms against 'mysql_check' metric
Change-Id: Ib15fea4ab041243e44a61c9d54d1f154b02d34af
2016-08-26 15:23:07 +02:00
Jenkins d46ed8070c Merge "Check memcached service on controller nodes" 2016-08-26 13:21:44 +00:00
Igor Degtiarov 9873ce11a1 Add alarm based on swap_percent_used metric
Change-Id: Id858a3e000f42e518963ae3d47ac3e07e53e043a
2016-08-26 10:59:24 +02:00
Igor Degtiarov 55f62d4f9c Add alarm based on swap activity metrics
Change-Id: I44a90f73a5659dcc314f35e97f8aeb4e78da2c58
2016-08-26 10:59:23 +02:00
Swann Croiset 26c5788684 Check memcached service on controller nodes
The patch replaces the service_heartbeat mechanism.

Change-Id: I060e10320cf6f8b874a39037b1f9257ed1996342
2016-08-26 10:56:06 +02:00
Simon Pasquier e250c7c245 Fix alarm configuration for RabbitMQ
Ide9cfd264cdafe3ad4e85b50b680f873695ac5be introduced a regression
whereby the RabbitMQ alarms weren't configured on the RabbitMQ nodes.

Change-Id: Id4c6eda07f196eba6bc1b4f77532716464e36cd1
2016-08-26 08:32:56 +00:00