Commit Graph

288 Commits

Author SHA1 Message Date
Andreas Jaeger c929899400 Retire repository
Fuel repositories are all retired in openstack namespace, retire
remaining fuel repos in x namespace since they are unused now.

This change removes all content from the repository and adds the usual
README file to point out that the repository is retired following the
process from
https://docs.openstack.org/infra/manual/drivers.html#retiring-a-project

See also
http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011675.html

A related change is: https://review.opendev.org/699752 .

Change-Id: I8aded54f1b9f3b79f3a4bf8f607d3695b92f528b
2019-12-18 19:39:39 +01:00
Simon Pasquier a90c6cc9b7 Fix Puppet tests again
Change-Id: I749ed33b57efc2455ca56959267329048cc43973
2017-03-29 10:12:46 +02:00
Swann Croiset d2fc3a9fd8 Enable pagination for Neutron
Change-Id: Ib1be3884a26d7bec9d0cf9dbc4ade9dfd6fab31d
2017-02-09 12:55:08 +00:00
Swann Croiset a5c154d6ef Configure pagination for OpenStack collectd plugins
Change-Id: I32e83368b7d0d2e8b68d7f7a2df0d1b61653fa72
2017-02-09 12:54:36 +00:00
Swann Croiset 7c248af9fa Rework collectd plugins for OpenStack
* Split out object/workers stats collection for Nova, Cinder and Neutron plugins
* Use the common interface exposed by collectd_base.Base

Change-Id: I59f698b8f09fd0d3ce375327d9e4d81d767d961c
2017-01-31 14:54:53 +01:00
Swann Croiset 88323c91d3 Purge metric_collector toml files from previous version
Change-Id: I17ee4dfa70c242daf16da91637f54ef1edbc4801
2017-01-25 14:49:07 +01:00
Swann Croiset 973c62b04b Fix Cinder local endpoint check
Since Mitaka, Cinder returns 300 instead of 200 in previous releases

Change-Id: Ia1e35da330754d5dc21573d3b469cb708ca28d28
2017-01-23 10:19:58 +01:00
Swann Croiset e0eb164cf3 Fix MySQL resource name in Pacemaker with MOS 8.0
Change-Id: I571d178d973a81d73be71eae87e7fac5db893119
2017-01-23 10:19:58 +01:00
Jenkins 851cec5847 Merge "Implicitly configure Nagios outputs if available" 2017-01-10 12:52:23 +00:00
Jenkins bd78f34d52 Merge "Remove the SMTP standalone alerting_mode" 2017-01-10 12:52:17 +00:00
Swann Croiset 3bd7d7d76a Implicitly configure Nagios outputs if available
Change-Id: Ibf5b4c2239004a4fa8e99bcaaf2949a5155543ed
2017-01-09 13:23:14 +01:00
Swann Croiset b7c7e7bdc2 Remove the SMTP standalone alerting_mode
This feature was broken and not stable enough for production deployment.

Related-bug: #1606831
Related-bug: #1643542

Change-Id: I0ce52ec01838d891c43d6e797617d3044a02d10f
2017-01-09 13:23:14 +01:00
Swann Croiset 5b65f279ce Disable Heka "self-monitoring"
Change-Id: If548c132d5847b8223284a2bb0ad288c695d9ec3
Related-bug: #1643280
2017-01-03 16:33:36 +00:00
Simon Pasquier 93129efdd0 Fix tests in the CI
puppet-lint installed from the master branch breaks the CI. This change
uses the official gem instead because the latest version now includes
the bug fix that wasn't released before.

Change-Id: I646176e30494cf1e8fac97c6ecebb3899ade8107
2016-10-07 10:34:02 +02:00
Jenkins 357fc8c521 Merge "Increase Heka poolsize for the metric_collector" 2016-10-06 14:05:31 +00:00
Guillaume Thouvenin 9dbf48dbfe Replace the workers AFD filter
This patch uses the generic AFD filter with new alarms to replace
the custom AFD for workers.

Blueprint: allow-all-alarms-to-be-specified-in-alarming-file
Change-Id: I6c432e60a16da5bb3c8d0ecd0bd22a1246fe6f82
2016-10-06 09:05:30 +02:00
Guillaume Thouvenin 215f693307 Replace the API backends AFD filter
This patch uses the generic AFD filter with new alarms to replace the
custom AFD for API backends.

Blueprint: allow-all-alarms-to-be-specified-in-alarming-file
Change-Id: Id139e45a9942a9c86a2d35d1966b083d9c75af89
2016-10-05 15:41:55 +00:00
Swann Croiset 022b8b4b00 Increase Heka poolsize for the metric_collector
On controller nodes the increasing number of the AFD filters puts too
much load on the Heka pipeline and can  generate "idle packs" errors.
It was observed that a poolsize value of 200 solves the issue.

Change-Id: I1d5f9fea352e16e15b37828bc525906a06fadd0e
2016-10-03 07:48:45 +00:00
Simon Pasquier 276e331202 Fix deployment for detach plugins
The collector services are managed by Pacemaker for the controller,
detached RabbitMQ and detached MySQL nodes. This change ensures that for
all these roles, the OCF script is created before the collector services
are configured.

Change-Id: I555b13f0433cccaa1297cd286dbb41d88de1d369
Closes-Bug: #1627968
2016-09-27 17:52:14 +02:00
Guillaume Thouvenin 8bc835c74a Fix issue when installing the OCF script
This patch moves the installation of the OCF script at the beginning of
the depoy_start to be sure that it is available when pacemaker starts
the collector resources. As it requires a configured hiera we also moved
the hiera task.

Change-Id: I90b4fa2a9038eaed0f1dcadb0f00713a1b2487b0
Closes-bug: #1575039
2016-09-23 06:28:57 +00:00
Swann Croiset 12cf9471dd Add alarms on Nova free VCPU and free memory
Change-Id: Id827630810b9a8fbf37be5bc833acf23e3b0ee7d
2016-09-22 14:47:08 +00:00
Jenkins 0461971b93 Merge "Support alarm evaluation with collected_on metric attribute" 2016-09-22 13:48:36 +00:00
Swann Croiset be4f8d36df Support alarm evaluation with collected_on metric attribute
Change-Id: I61529d6ca2d8a9a26e5fa70a776ad03c212c7982
2016-09-22 13:42:45 +00:00
Guillaume Thouvenin c5eebea265 Add local API check
This patch creates new plugin check_local_endpoint.py to check openstack
service locally and emits a new metric openstack_check_local_api.

Change-Id: I58290dd685b97354137ad5c0b91aece79fd91695
2016-09-21 14:05:55 +02:00
Swann Croiset 70286decfc Fix the InfluxDB VIP check to map the GSE configuration
Change-Id: I9ffce693b7df05a00cfd21211745c3a02975dc76
2016-09-20 09:41:54 +02:00
Swann Croiset 015ad15ec4 Add alarm for Horizon HTTP 5xx errors
The patch also fixes the GSE horizon-(web|ui) wrong definition.

Change-Id: I4a7a64c87ac8c9fe3929ec98ebc8de51e9292a26
2016-09-20 09:41:54 +02:00
Swann Croiset 553d2040cc Send GSE service clusters status to alerting
Change-Id: Iad33e1f4bffd81066a82a0d73a46e7b489eb23d7
blueprint: alarming-refactoring
2016-09-20 09:41:54 +02:00
Swann Croiset 8b7398fa86 Removed old hiera data
This is to avoid to pollute hiera namespaces after a rolling upgrade of the
plugin.

blueprint: alarming-refactoring

Change-Id: I28039f1688583af39d089d96a5ecd7683f55642d
2016-09-20 09:41:54 +02:00
Swann Croiset 692cb46fbe Do not send AFD to Nagios when activate_alerting=false
blueprint: alarming-refactoring

Change-Id: Ifb82ec16dcece731528c1ec7d84c96d83d452212
2016-09-20 09:41:54 +02:00
Swann Croiset 7deace8726 Alarm definition refactoring
DocImpact
blueprint: alarming-refactoring

Change-Id: I8c053f2fbc4b4b85958be8413919f9bf1b168027
2016-09-20 09:41:54 +02:00
Jenkins ea9338ab8a Merge "Add monitoring of HDD errors" 2016-09-07 13:51:36 +00:00
Ildar Svetlov 99e2863c14 Add monitoring of HDD errors
This change adds a filter plugin that monitors the kernel log messages
for hard drive errors and reports the number of errors per second
as 'hdd_errors_rate'. The filter is configured for all nodes,
irrespective of their roles. An alarm is also added that triggers
a CRITICAL alert when the metric value is greater than 0.

DocImpact

Change-Id: I485f5692a3e5facf0f7ea019ccdbd70683a7dd4e
2016-09-06 11:47:59 +03:00
Jenkins c02cb15a5b Merge "Increase the Elasticsearch bulk size when required" 2016-08-29 15:35:30 +00:00
Swann Croiset 83db24f549 Increase the Elasticsearch bulk size when required
In some environments (especially using slow HDD drives), the
Elasticsearch backends may fail to ingest logs fast enough. As a result
the log_collector service running on the controller nodes are blocked.

To alleviate this issue, this change increases the bulk size for nodes
that generate lots of logs:
- controllers which run OpenStack API services in addition to Pacemaker.
- all nodes when the environment's log level is set to debug.

In such cases, the flush_count parameter is increased to 100 (instead of
10 by default).

Change-Id: Ifdfbcb8ff0292f695dee4deab45560f126bde242
Closes-Bug: #1617211
2016-08-29 15:17:44 +00:00
Simon Pasquier 38ec02fe46 Add a dedicated manifest to configure collectd
This removes duplication of code and limitations we had to deal with
because the collectd Puppet resources don't play well when they are
created at different times from several manifests.

Change-Id: I52fabb1fb5795a33f552168553a148b1520fc496
2016-08-26 15:59:04 +02:00
Swann Croiset 7f1f3bd59f Configure AFD alarms against 'mysql_check' metric
Change-Id: Ib15fea4ab041243e44a61c9d54d1f154b02d34af
2016-08-26 15:23:07 +02:00
Jenkins d46ed8070c Merge "Check memcached service on controller nodes" 2016-08-26 13:21:44 +00:00
Jenkins 4ac62137cf Merge "Fix notification_driver for Cinder and Heat" 2016-08-26 13:18:23 +00:00
Swann Croiset 26c5788684 Check memcached service on controller nodes
The patch replaces the service_heartbeat mechanism.

Change-Id: I060e10320cf6f8b874a39037b1f9257ed1996342
2016-08-26 10:56:06 +02:00
Simon Pasquier 2a8f061fe6 Fix the Elasticsearch address in collectd
Change-Id: I534b0767eda30916c620922665b9f3dcf2678d14
Closes-Bug: #1614944
2016-08-25 09:34:10 +02:00
Simon Pasquier ea6c8d3ae5 Fix notification_driver for Cinder and Heat
Change-Id: Ic9fd9f7d71ba9dbd9f4979612aefb114176a96ad
Closes-Bug: #1616456
2016-08-24 15:50:24 +02:00
Simon Pasquier 381e2b2b3a Pin the puppetlabs_spec_helper version
The latest version of puppetlabs_spec_helper (1.2.0) depends on
rubocop-rspec which itself requires at least Ruby 2.2.

Change-Id: Ica4b71296912a66a98b223c002d1e8bdd04111d6
2016-08-24 11:05:05 +02:00
Jenkins 696911fb9a Merge "Revert "Fix Elasticsearch address in collectd"" 2016-08-24 07:33:24 +00:00
Simon Pasquier 333707b454 Revert "Fix Elasticsearch address in collectd"
This reverts commit cfadbcfe9d.

Change-Id: Iacbd07dfd39120195abad195a7c5f439b1a44024
2016-08-24 07:31:46 +00:00
Simon Pasquier 8d105b021b Fix typo in hiera_override.pp
Change-Id: Ie2fc9250998f99ee24bd37053670f6dd25015091
Closes-Bug: #1614945
2016-08-23 13:41:39 +02:00
Simon Pasquier e63f55e829 Fix InfluxDB address in collectd
Change-Id: I64c1e0c2167eee98bf62c12776b16ef1fca794ad
Closes-Bug: #1614945
2016-08-19 14:43:35 +02:00
Simon Pasquier cfadbcfe9d Fix Elasticsearch address in collectd
Change-Id: I5d458da51cb121f14db5e3d8283cef43727bd580
Closes-Bug: #1614944
2016-08-19 14:39:05 +02:00
Simon Pasquier 3a3ef6f2e3 Add Pacemaker collectd plugin
This change adds a collectd plugin that gets metrics from the Pacemaker
cluster:

  - cluster's metrics
  - node's metrics
  - resource's metrics

Most of the metrics are only collected from the node that is the
designated controller except pacemaker_resource_local_active and
pacemaker_dc_local_active.

The plugin also removes the 'pacemaker_resource' plugin by providing the
exact same metrics and notifications for the other collectd plugins.

Finally the plugin is also installed on the standalone-rabbitmq and
standalone-database nodes if they are present.

Change-Id: I8b5b987704f69c6a60b13e8ea982f27924f488d1
2016-08-11 14:53:43 +02:00
Simon Pasquier 79a906d619 Use Hiera data to configure the RabbitMQ plugin
This change uses the information that is already avaiable in the
collector's Hiera data to decide whether the RabbitMQ collectd
plugin should be deployed or not.

Change-Id: Ib1df231d6bf99ee6f34ee199fd5241d6b264fc00
2016-08-09 16:17:17 +02:00
Swann Croiset 313fc00819 Check libvirt status on compute nodes
The patche adds a new collectd plugin to test the availability of libvirt
and configure AFD for all compute nodes.
These AFD are part of nova global cluster.

Change-Id: I0944f7da69caf32ed6ac9c908d4241bc8c396994
2016-08-05 10:48:30 +02:00