Andreas Jaeger
c929899400
Retire repository
...
Fuel repositories are all retired in openstack namespace, retire
remaining fuel repos in x namespace since they are unused now.
This change removes all content from the repository and adds the usual
README file to point out that the repository is retired following the
process from
https://docs.openstack.org/infra/manual/drivers.html#retiring-a-project
See also
http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011675.html
A related change is: https://review.opendev.org/699752 .
Change-Id: I8aded54f1b9f3b79f3a4bf8f607d3695b92f528b
2019-12-18 19:39:39 +01:00
Swann Croiset
0460a45377
Combine the global and per aggregate Nova memory alarms
...
Change-Id: I96b8e446e6776ffa28cc1537f8a1bd023c2447fa
Closes-bug: #1659275
2017-01-25 13:34:10 +01:00
Olivier Bourdon
dd5d89f7c9
Add alarms for Nova aggregates
...
Change-Id: Ia82d5baf754d2d61c2bfa6d882ace3c8d094eafc
Depends-On: I6647600d73991bfbfc7b7c199a7f9b90b9294f68
2017-01-19 13:32:54 +01:00
Swann Croiset
8db734a584
Add no_data_policy=skip for all workers alarms
...
Related metrics are collected only from one node at a time.
Change-Id: I0751bb14eaf6e2fe5f2df2b2f7593cf1cc20b23b
2016-10-12 12:40:26 +00:00
Swann Croiset
8794ee5b3b
Revert "Remove the no_data_policy=skip for AFD"
...
This reverts commit 1612638e62
.
Change-Id: I9ed3f4c48835e799a08442b5ba8470ca6f676922
2016-10-12 12:40:22 +00:00
Swann Croiset
347d3ce451
Enable notifications for HDD errors
...
Change-Id: I8c377e12de00323279a8d1d0f2378c7aaafea379
2016-10-12 12:40:16 +00:00
Swann Croiset
731265cdc8
Support alerting attribute per AFD
...
Change-Id: I29aba65d35a12cc56a91c10f893e38a35ea3abf9
2016-10-12 12:39:55 +00:00
Swann Croiset
38d968a41e
Fix workers alarms
...
These alarms are not tied to nodes
Change-Id: I2a59aa27dce34af1ab741a5bc73dfb297f8812d4
2016-10-10 16:24:06 +02:00
Swann Croiset
58f737e712
Rename AFD filter alerting attribute
...
Change-Id: I2e32ea0eca4f581ddc467b792ca49b002ba10d76
2016-10-10 16:24:06 +02:00
Swann Croiset
864f90190c
Fix other-fs alarm critical severity
...
Change-Id: Idb920ae31bacf1a21cb612bc4860e3c25fca639f
2016-10-06 18:02:59 +02:00
Jenkins
c5e2b1f0bd
Merge "Remove the no_data_policy=skip for AFD"
2016-10-06 14:05:24 +00:00
Guillaume Thouvenin
9dbf48dbfe
Replace the workers AFD filter
...
This patch uses the generic AFD filter with new alarms to replace
the custom AFD for workers.
Blueprint: allow-all-alarms-to-be-specified-in-alarming-file
Change-Id: I6c432e60a16da5bb3c8d0ecd0bd22a1246fe6f82
2016-10-06 09:05:30 +02:00
Guillaume Thouvenin
215f693307
Replace the API backends AFD filter
...
This patch uses the generic AFD filter with new alarms to replace the
custom AFD for API backends.
Blueprint: allow-all-alarms-to-be-specified-in-alarming-file
Change-Id: Id139e45a9942a9c86a2d35d1966b083d9c75af89
2016-10-05 15:41:55 +00:00
Swann Croiset
a99cb11ccb
Add missing heat-api-endpoint cluster aggregation rule
...
Change-Id: I614c8374b737b0529711cb25ba21301ddf47622f
2016-10-05 16:50:30 +02:00
Swann Croiset
1612638e62
Remove the no_data_policy=skip for AFD
...
The patch renames in mean while the no_data_policy to no_data_severity.
Change-Id: I415540c122b2a07c408bcb30e16212b4a2abab3c
2016-09-30 09:19:04 +02:00
Swann Croiset
213bedf712
Split top-level clusters health by (control|data)-plane
...
blueprint: alarm-refactoring
Change-Id: Ifbafb2deb8547e830d6ee22a7b00600180f2c4a5
2016-09-26 14:15:16 +02:00
Guillaume Thouvenin
a3bcfd6102
Add alarms on OpenStack services local API endpoint
...
This patch adds alarms on nova, cinder, neutron, heat, glance and
keystone public API.
Change-Id: Ia1d0d85cdf2742b6f5a529a70f7b295147662170
2016-09-26 09:56:33 +02:00
Guillaume Thouvenin
d61b9e9e2c
Replace the API endpoint AFD filter
...
This patch uses the generic AFD filter to replace the custom API endpoint
AFD filter.
Blueprint: allow-all-alarms-to-be-specified-in-alarming-file
Change-Id: Ic172fb716c128827930bc51cede1dcf0bffa36d2
2016-09-26 09:56:25 +02:00
Swann Croiset
12cf9471dd
Add alarms on Nova free VCPU and free memory
...
Change-Id: Id827630810b9a8fbf37be5bc833acf23e3b0ee7d
2016-09-22 14:47:08 +00:00
Swann Croiset
5502a158c7
Include Ceph OSD node to the storage cluster
...
This is the previous logical representation which has been lost with the
last refactoring.
Change-Id: I07e86225c8b1591fa7cfb020a4eee31fea9d9509
2016-09-20 09:41:54 +02:00
Swann Croiset
d4d3661fb7
Configure alarms for OSD disk(s)
...
Change-Id: Id169250d635bd4d731eca2a292c78fd690c2ba94
2016-09-20 09:41:54 +02:00
Swann Croiset
e318104fc0
Monitor all partitions
...
Change-Id: Ibafd5cabd4d5f86e8050fb278a114a69d00476d8
Closes-bug: #1587004
2016-09-20 09:41:54 +02:00
Swann Croiset
1eb2628689
Avoid alarm flapping for Ceph OSD checks
...
Change-Id: Iea110acd362da16245c51caf4d7048c113792c1f
2016-09-20 09:41:54 +02:00
Swann Croiset
015ad15ec4
Add alarm for Horizon HTTP 5xx errors
...
The patch also fixes the GSE horizon-(web|ui) wrong definition.
Change-Id: I4a7a64c87ac8c9fe3929ec98ebc8de51e9292a26
2016-09-20 09:41:54 +02:00
Swann Croiset
56dbeae0be
Add default AFD for unknown fuel roles
...
This patch configures default alarms for all nodes with roles not
defined in node_profiles.yaml
Change-Id: Iff0aca6f09f8d3c721c3cac64010d5cde2c9225e
blueprint: alarming-refactoring
2016-09-20 09:41:54 +02:00
Swann Croiset
7deace8726
Alarm definition refactoring
...
DocImpact
blueprint: alarming-refactoring
Change-Id: I8c053f2fbc4b4b85958be8413919f9bf1b168027
2016-09-20 09:41:54 +02:00
Swann Croiset
6bebf2d91b
Fix the Swift api http-errors alarm
...
The HAProxy backend has been renamed to object-storage (previously named
swift-api).
To stay compatible to previous Fuel version, the alarm is configured to
match both names.
Change-Id: Ie82981d6b4422cb15c3090e2387bcd37fcfb98e8
Closes-bug: 1623843
2016-09-15 14:50:27 +02:00
Swann Croiset
db0d32bcb7
Add alarm for Swift error logs
...
This alarm definition was missing while the AFD is present.
Change-Id: I04a530193ba1c0e2611e29131af1610dc955238b
2016-09-15 14:49:09 +02:00
Jenkins
6ab568a40d
Merge "Revert no_data_policy=okay for alarm based on HAProxy HTTP 5xx metrics."
2016-09-09 15:02:43 +00:00
Jenkins
ea9338ab8a
Merge "Add monitoring of HDD errors"
2016-09-07 13:51:36 +00:00
Swann Croiset
753ea07320
Revert no_data_policy=okay for alarm based on HAProxy HTTP 5xx metrics.
...
These metrics are perdiodically collected by collectd hence it is better
to have UNKNOWN status when no datapoint is received.
Change-Id: Iad673a04bf5d99319beca5fb8d0c29f42fb253a4
2016-09-06 12:23:44 +00:00
Ildar Svetlov
99e2863c14
Add monitoring of HDD errors
...
This change adds a filter plugin that monitors the kernel log messages
for hard drive errors and reports the number of errors per second
as 'hdd_errors_rate'. The filter is configured for all nodes,
irrespective of their roles. An alarm is also added that triggers
a CRITICAL alert when the metric value is greater than 0.
DocImpact
Change-Id: I485f5692a3e5facf0f7ea019ccdbd70683a7dd4e
2016-09-06 11:47:59 +03:00
Swann Croiset
f6a1d8b611
Add missing function for network dropped-tx alarm
...
Closes-bug: #1620285
Change-Id: Iaa42ff650ce3d4dfa4d1f8f274221cbf3d533d06
2016-09-05 12:20:38 +00:00
Jenkins
fd2cde286a
Merge "Add alarm on instance creation time"
2016-09-02 18:01:06 +00:00
Jenkins
5323af2513
Merge "Expand RabbitMQ alarms on Pacemaker metrics"
2016-09-02 14:04:38 +00:00
Guillaume Thouvenin
20e6fbaab2
Add support to check Apache
...
This patch adds the collectd plugin to check Apache and it also adds
a new alarm.
Change-Id: I70dc85dae2de7e7afa1d2a046c96071d242a60b1
2016-09-02 06:28:04 +00:00
Simon Pasquier
02be6a39ba
Expand RabbitMQ alarms on Pacemaker metrics
...
This change also reworks a bit the other RabbitMQ alarms to have more
meaningful alerts.
Change-Id: I9e1d7ecbcff00e772ba1812e79dfde6856ea2f14
2016-08-31 17:16:25 +02:00
Simon Pasquier
0f84cef39e
Add alarm on instance creation time
...
Change-Id: If39190fbedff033b768c6e67be7b06ade7bf7d77
2016-08-30 18:07:26 +02:00
Simon Pasquier
80412577f8
Set no_data_policy to 'okay' for sporadic metrics
...
For metrics that aren't emitted at periodic intervals (eg metrics
derived from logs), it is better to consider the status to be 'okay'
when no data is received.
Change-Id: I95b20a91d67f7eb9c92b16a8e3957b104fd0baa8
2016-08-30 14:35:25 +02:00
Simon Pasquier
eb9f36fa63
Fix the GSE filter wrt Pacemaker metrics
...
With the recent refactoring [1] of the Pacemaker collectd plugin, the
GSE filter may receive Pacemaker metrics from the other nodes of the
cluster. The Heka filter needs to be updated to discard these messages
otherwise the GSE filter flaps between active and inactive state.
[1] I8b5b987704f69c6a60b13e8ea982f27924f488d1
Change-Id: I6047da6ec5d28f22d309f1858bfbf5d3558cfcb4
Closes-Bug: #1616860
2016-08-30 07:59:33 +00:00
Jenkins
16b288b57a
Merge "Configure AFD alarms against 'mysql_check' metric"
2016-08-26 13:39:38 +00:00
Guillaume Thouvenin
604541b8e7
Alarm when Keystone is too slow
...
This patch adds a warning level alarm when the Keystone service doesn't
response fast enough.
Change-Id: Ib523e9e61204daf4b7ad271623ece88637505965
2016-08-26 13:33:36 +00:00
Jenkins
da865a4300
Merge "Add alarm based on swap_percent_used metric"
2016-08-26 13:32:50 +00:00
Jenkins
3f1dc694cf
Merge "Add alarm based on swap activity metrics"
2016-08-26 13:32:36 +00:00
Swann Croiset
7f1f3bd59f
Configure AFD alarms against 'mysql_check' metric
...
Change-Id: Ib15fea4ab041243e44a61c9d54d1f154b02d34af
2016-08-26 15:23:07 +02:00
Jenkins
d46ed8070c
Merge "Check memcached service on controller nodes"
2016-08-26 13:21:44 +00:00
Igor Degtiarov
9873ce11a1
Add alarm based on swap_percent_used metric
...
Change-Id: Id858a3e000f42e518963ae3d47ac3e07e53e043a
2016-08-26 10:59:24 +02:00
Igor Degtiarov
55f62d4f9c
Add alarm based on swap activity metrics
...
Change-Id: I44a90f73a5659dcc314f35e97f8aeb4e78da2c58
2016-08-26 10:59:23 +02:00
Swann Croiset
26c5788684
Check memcached service on controller nodes
...
The patch replaces the service_heartbeat mechanism.
Change-Id: I060e10320cf6f8b874a39037b1f9257ed1996342
2016-08-26 10:56:06 +02:00
Simon Pasquier
e250c7c245
Fix alarm configuration for RabbitMQ
...
Ide9cfd264cdafe3ad4e85b50b680f873695ac5be introduced a regression
whereby the RabbitMQ alarms weren't configured on the RabbitMQ nodes.
Change-Id: Id4c6eda07f196eba6bc1b4f77532716464e36cd1
2016-08-26 08:32:56 +00:00