Commit Graph

113 Commits

Author SHA1 Message Date
Zuul 9278718a2b Merge "Add service user password rotation feature" 2023-05-09 14:00:49 +00:00
Alex Kavanagh 42714adfde Add service user password rotation feature
This patch adds the service user rotation feature, which provides two
actions:

 - list-service-usernames
 - rotate-service-user-password

The first lists the possible usernames that can be rotated.  The
second action rotates the service, and is tested via the func-test-pr.

Change-Id: Ia94ab3d54cd8a59e9ba5005b88d3ec1ff87019b1
func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/1029
2023-05-05 10:17:01 +02:00
Olivier Dufour-Cuvillier c9efea67c8 Allow NRPE to collect stats in CIS hardened env
It removes the necessity to run the cron task as root user
and ensure the content created in /var/lib/rabbitmq belongs
to rabbitmq user and group solely.

Then giving access for nrpe user is done by adding its user
to rabbitmq group.
Also implemented in the upgrade-charm hook for ongoing
deployments

Closes-Bug: #1879524
Change-Id: I19e3d675ace7c669451ca40a20d21cef1aec6a95
2023-04-13 00:36:40 +00:00
Zuul 415fcc4054 Merge "Enforce a maximum of 1024 async threads." 2023-04-12 17:36:45 +00:00
Vern Hart 3c1c05ee59 Enforce a maximum of 1024 async threads.
The beam.smp process won't start if more than 1024 are configured, the
charm could make this by default on large systems (e.g. more than 42
CPUs). This change makes RabbitMQEnvContext.calculate_threads() never
return more than 1024 (MAX_NUM_THREADS).

Change-Id: I92879445210bac6ee7d96a704cdf428ca738e3b6
Closes-Bug: #1768986
2023-04-11 16:55:44 -04:00
Edward Hope-Morley 8c9f68e8aa Fix typo in configure ttl code
Also fixes tox.ini

Change-Id: Ic4c2d34ff248d5429eb604824e42dbaba6ca2678
Closes-Bug: #1939681
2023-04-11 12:14:53 +01:00
Alex Kavanagh 55b985f55c Fix focal to jammy series upgrade
This is a fix/workaround to the package upgrade bug that affects the
charm.  The post-inst package script updates the .erlang.cookie if it is
insecure during the upgrade of rabbit from 3.8 to 3.9.  This breaks the
series-upgrade resulting in a charm erroring on the post-series-upgrade
hook.

This fix works by checking if the .erlang.cookie has changed during the
post-series-upgrade hook and either updating the cookie in peer storage
(if it is insecure) or ensuring that the cookie from peer storage is
written to the .erlang.cookie if it isn't the leader. This ensures that
the cluster continues to work and that the series-upgrade can be
completed across the cluster.

Change-Id: I540ea8da85b3b4326ccb8194f1d8b1050b04eae9
Closes-Bug: #2006485
2023-02-22 14:56:27 +00:00
Alex Kavanagh 81f08ab769 Fix issue where charms aren't clustered but RMQ is
Due to the @cache decorator in the code, it was possible to get the
charm into a state where RMQ is clustered, but the charm doesn't record
it.  The charm 'thinks' it is clustered when it has set the 'clustered'
key on the 'cluster' relation.  Unfortunately, due to the @cached
decorator it's possible in the 'cluster-relation-changed' hook to have a
situation where the RMQ instance clusters during the hook execution and
then, later, when it's supposed to writing the 'clustered' key, it reads
the previous cached value where it wasn't clustered and therefore
doesn't set the 'clustered' key.  This is just about the only
opportunity to do it, and so the charm ends up being locked.

The fix was to clear the @cache values so that the nodes would be
re-read, and this allows the charm to then write the 'clustered' key.

Change-Id: I12be41a83323d150ba1cbaeef64041f0bb5e32ce
Closes-Bug: #1975605
2023-01-06 20:39:50 +00:00
Felipe Reyes b35247364f Set cluster-partition-handling on upgrade-charm.
For units deployed before the implementation of the
cluster-partition-handling strategy they won't have that key set in the
leader making the charm believe there are pending tasks, so this change
seeds the key when is not set with the value present in the charm's
configuration.

Change-Id: Ifdae35ffee1ad7a8f4e5248c817cca14b69d9566
Closes-Bug: #1979092
2022-06-17 16:36:51 -04:00
Zuul eac35c1d99 Merge "Remove legacy checks" 2022-04-25 12:29:33 +00:00
Tianqi 9376aeb8e6 Handle non-uniform queue stats output
RabbitMQ sesrver sometimes creates non-uniform outputs that nrpe
can't parse. Instead of breaking the check, this commit outputs
the error messages and continue the check.

This problem is most likely caused by queue state being
"down" [1]. However, because the current charm doesn't show such
information and the bug is hard to manually reproduce, this
commit adds the state attribute when creating queue_state file
for future debugging.

[1] https://www.rabbitmq.com/rabbitmqctl.8.html#state_2

Closes-Bug: #1850948
Change-Id: Iaa493c8270f344cde8ad7c89bd2bb548f0ad71bd
2022-04-13 21:53:33 +00:00
Billy Olsen 26b9434648 Remove legacy checks
Remove legacy checks from set_ha_mode in rabbit_utils.py as it checks
for versions of rabbitmq which is less than version 3.0.0 which is not
available in the archives for any supported releases.

Change-Id: Ib21f6ae3f30eabaaa8d677c20a555ded4e6851d6
2022-04-13 12:51:23 -07:00
Liam Young 12de0d964c Coordinate cluster join events
Use the coordination module when cluster join events are called.
The `cluster_wait` method has been removed as it is no longer used
and `cluster_with` has been broken up into three new methods (
`clustered_with_leader`, `update_peer_cluster_status` and
`join_leader`) which can be called separately. The `modulo-nodes`
and `known-wait` charm options have been removed as they are no
longer needed.

Closes-Bug: #1902793
Change-Id: I136f5dcc855da329071e119b67df25d9045e86cc
2022-02-18 15:18:06 +00:00
Liam Young 70cbe1eef9 Coordinate package upgrades across cluster
Use the coordination module to manage package upgrades across the
cluster. To each this some of the setup was moved into a new
configure_rabbit_install method which handles setup is normally
run after an upgrade.

Change-Id: I8d244d96c83a5da164322faff873a72530ec9def
2022-02-17 11:06:04 +00:00
Liam Young 3d5e1e22d8 Coordination module for rabbit restarts
Use the coordination module to manage restarting the rabbitmq
services. This is to ensure that restarts are only
performed on one unit at a time. This helps prevent
situation which can cause the cluster to become split
brained (eg if two or more nodes are restarted at the same
time).

* Manually run _run_atstart & _run_atexit method when actions
  are run as this does not happen automatically and is needed by
  the coordination layer.
* Replace restart_on_change decorator with
  coordinated_restart_on_change. coordinated_restart_on_change
  includes logic for requesting restart locks from the coordination
  module.
* The coordination module works via the leader and cluster events so
  the hooks now include calls to check_coordinated_functions
  which will run any function that is waiting for a lock.
* Logic has been added to check for the situation where a hook is
  being run via the run_deferred_hooks actions. If this is the
  case then restarts are immediate as the action should only be run
  on one unit at a time.

Change-Id: Ia133c90a610793d4da96d3400a3906b801b52b73
2022-02-17 11:06:00 +00:00
Zuul f44cccc505 Merge "Check before applying plugin and perms changes" 2022-02-08 21:53:14 +00:00
Zuul d79095f6b5 Merge "Rabbitmq metrics and splitbrain detection" 2022-01-17 23:24:18 +00:00
Linda Guo 0653c186ce Rabbitmq metrics and splitbrain detection
Enabled rabbitmq_prometheus plugin for prometheus to scrape
the metrics of rabbitmq and alert if rabbitmq splitbrain is
detected.

Integrated rabbitmq dashboards in grafana via dashboards
relations

Added new unit test cases

Closes-Bug: 1899183
Change-Id: I88942dd0b246c498d0ab40b00d586d4349b0f100
2022-01-17 18:32:38 +11:00
Liam Young ccd11fdf9e Check before applying plugin and perms changes
Check that setting update is needed before applying a config
update to the cluster. This is mainly applicable to
rabbitmq-server > 3.8.2 which supports json output. If a
parser is not available to extract the existing settings
then the old behaviour of blindly applying the change
is used.

Closes-Bug: #1909031
Change-Id: I9599f69cc11ea8d1a4e9d618aecdab4afe488d96
2022-01-07 13:57:43 +00:00
Zuul 2212383158 Merge "Switch to enabling the managment plugin by default" 2022-01-04 14:45:41 +00:00
Zuul bc0e50b673 Merge "Use cluster strategy 'ignore' for install" 2022-01-04 14:38:30 +00:00
Hervé Beraud ff45f3ae4b Use unittest.mock instead of mock
The mock third party library was needed for mock support in py2
runtimes. Since we now only support py36 and later, we can use the
standard lib unittest.mock module instead.

Note that https://github.com/openstack/charms.openstack is used during tests
and he need `mock`, unfortunatelly it doesn't declare `mock` in its
requirements so it retrieve mock from other charm project (cross dependency).
So we depend on charms.openstack first and when
Ib1ed5b598a52375e29e247db9ab4786df5b6d142 will be merged then CI
will pass without errors.

Depends-On: Ib1ed5b598a52375e29e247db9ab4786df5b6d142
Change-Id: I98f432a771b5f6c966328d30629410a0a180dbee
2021-12-15 11:54:45 +00:00
Zuul 485b0d3dcd Merge "Modify the output to action "cluster-status" to make it user friendly (rabbitmq)" 2021-12-13 12:52:56 +00:00
Anna Savchenko 223ec26617 Modify the output to action "cluster-status"
to make it user friendly (rabbitmq)

"rabbitmqctl cluster_status" uses escape codes to color/highlight the
output, and it does not have a way to suppress this. This makes the
output to the command "juju run-action rabbitmq-server/leader
cluster-status" not user friendly and difficult to read.

Add the json formatting option to the rabbitmqctl command and use
the json.dumps method to get a user friendly output.

Add unit test.

Closes-Bug: #1943198
Change-Id: I24380e24ff1edbede9c2db1671a4fc05d5a7cc63
2021-12-10 21:22:30 +02:00
Liam Young df711c6717 Switch to enabling the managment plugin by default
Over time the managment plugin has become a core part of managing
a rabbit deployment. This includes allowing tools such as nrpe to
be able to query the api and alert for situations such as orphaned
queues.

Change-Id: Icbf760610ce83b9d95f48e99f6607ddf23963c97
Partial-Bug: 1930547
2021-11-29 11:06:18 +00:00
Liam Young 32bce11f0f Use json module to dump json in set_ha_mode
Use json module to dump json in set_ha_mode rather than trying
to generate json using string interpolation. This fixes a bug
when using the 'nodes' mode which was generating invalid json.

The new function test is taken from Id7ef45b7001d26ede3fd61f97626b5e9e8b81196

Change-Id: Ieb49036389221f6fbf2db93fbe4aebe6e986ea21
Co-Authored-By: Trent Lloyd <trent.lloyd@canonical.com>
2021-11-24 13:56:50 +00:00
Liam Young ab813a982d Use cluster strategy 'ignore' for install
Use cluster-partition-handling strategy 'ignore' during charm
installation regardless of the charm config setting. Once the
leader has checked it is clustered with peers then it sets the
cluster-partition-handling strategy to be whatever the user set
in charm config.

Partial-Bug: 1802315
Change-Id: Ic03bbe55ea8aab8b285977a5c0f9410b5bbf35c8
2021-11-24 13:17:03 +00:00
Zuul c72d401192 Merge "Restrict TLS versions" 2021-11-23 14:43:29 +00:00
James Page ece87ba8ca Restrict TLS versions
TLS < 1.2 is considered insecure; where possible limit the versions
of TLS to 1.2 or higher, enabling support for TLS 1.3 when the
required erlang and rabbitmq versions are installed.

Change-Id: Iec5ab60488986f8e332ff0e9a11895822a61c1ee
Closes-Bug: 1892450
Func-Test-PR: https://github.com/openstack-charmers/zaza-openstack-tests/pull/668
2021-11-23 14:20:10 +00:00
Zuul c0ea9ed191 Merge "Switch to new configuration file format" 2021-11-23 14:13:39 +00:00
Liam Young dbab66b0c5 Refactor methods which query rabbit
Refactor methods which query rabbit to remove the duplication
around checking if json output is supported.

Change-Id: Id4e3dbd85748e41bb4b1c8db282495cfffaa823d
2021-11-22 13:05:43 +00:00
James Page 9ed0e2d85c Switch to new configuration file format
For newer RabbitMQ versions, switch to using the new ini style
configuration file format (rabbitmq.conf vs rabbitmq.config).

This allows the charm to configure a wider set of options and
is needed to support limitation of TLS versions use for on the
wire encryption.

Upgrades at RabbitMQ 3.7.0 should switch from old to new format
and file name.

Change-Id: I6deda5ecf5990d527e22373540074d2a4b7bad38
Func-Test-PR: https://github.com/openstack-charmers/zaza-openstack-tests/pull/668
2021-11-16 09:35:31 +00:00
Julien Thieffry 242167b6ba Display busiest queues in check_queues NRPE plugin
When invoking the check_rabbitmq_queues script with wildcards for vhost
and/or queue parameters, script output does not reflect precisely which
queues are having a high number of oustanding messages as information is
consolidated under the wildcard.

This change fixes this behaviour by adding a new charm configuration
parameter which allows the user to specify the number of busiest queues,
n, to display should the check_rabbitmq_queues script reports any
warnings or errors.  The default, n=0, keeps the current script output.
This option is applicable regardless of the vhost:queue combination but
is specifically relevant when wildcards are passed as arguments.

Implementation displays the first n items in the stats list re-organized
in decreasing message count order.

Closes-Bug: #1939084
Change-Id: I5a32cb6bf37bd2a0f30861eace3c0e6cb5c2559d
2021-08-23 06:21:58 +00:00
Billy Olsen fd8d018bab Move cron max file age calculation to rabbit_utils
The check_rabbitmq_queues nrpe check accesses the cron file created
for running collect stats job. This is done in order to determine if
the stats are too old and an alert should be raised. The nagios user
does not have access to read the cron job when running in a hardened
environment where /etc/cron.d is not readable.

This change refactors this logic to move the calculation of maximum
age for a stats file from the check_rabbitmq_queues script and into
the rabbit_utils code where it is generating the nrpe configuration.
A new (optional) parameter is added to the check_rabbitmq_queues
script to accept the maximum age in seconds a file can last be
modified.

This change also removes the trusty support in hooks/install and
hooks/upgrade-charm as the rabbit_utils.py file needs to import a
dependency which is installed by the scripts. It is cleaned up to make
sure the croniter package is always installed on install or upgrade.

Change-Id: If948fc921ee0b63682946c7cc879ac50e971e588
Closes-Bug: #1940495
Co-authored-by: Aurelien Lourot <aurelien.lourot@canonical.com>
2021-08-19 15:12:17 +02:00
Zuul 8948fe7a49 Merge "Add config parameters to tune mnesia settings" 2021-08-19 09:14:09 +00:00
Billy Olsen 45ded8b0f9 Improve parsing of cron schedule
Improving the parsing of the cron schedule for /etc/cron/rabbitmq-stats.
The code makes assumptions that the user in the cron entry will be the
root user, which is generally safe as that's what the charm applied.
However, the parsing is brittle in that it depends on the 'root' string
in the entry. This changes the code so that the cron timer spec is
stripped out based on the column entries in the file.

Change-Id: I2d573e8942e840e0e5376f1537a2a3373fea3db8
Fixes-Bug: #1939702
2021-08-17 11:27:04 -07:00
Nicolas Bock 8015d9a365 Add config parameters to tune mnesia settings
When a RabbitMQ cluster is restarted, the mnesia settings determine
how long and how often each broker will try to connect to the cluster
before giving up. It might be useful for an operator to be able to
tune these parameters. This change adds two settings,
`mnesia-table-loading-retry-timeout` and
`mnesia-table-loading-retry-limit`, which set these parameters in the
rabbitmq.config file [1].

[1] https://www.rabbitmq.com/configure.html#config-items

Change-Id: I96aa8c4061aed47eb2e844d1bec44fafd379ac25
Partial-Bug: #1828988
Related-Bug: #1874075
Co-authored-by: Nicolas Bock <nicolas.bock@canonical.com>
Co-authored-by: Aurelien Lourot <aurelien.lourot@canonical.com>
2021-08-16 15:43:19 +02:00
Aurelien Lourot 54460a568a Fix 'rabbitmqctl wait' timeout
`rabbitmqctl wait`'s default behavior changed recently
and a short timeout was introduced upstream. This
patch adapts our code in order to stay on the old,
intended behavior.

Change-Id: I020e3e9e4976e21da08316ac58642b2058564b02
2021-08-16 13:50:29 +02:00
Zhang Hua 707fa0e093 Number of heat queues will keep growing forever after heat-engine restarts
Set TTL as a solution for topic queue engine_worker and heat-engine-listener
to avoid them growing all the time after heat-engin restarts.

This is rabbitmq-server part. eg: we can set heat ttl by:

juju config heat ttl=3600000

Closes-Bug: 1925436
Change-Id: I7b826fe965a200da29020a8f2c6148f76d10a2b0
2021-06-23 17:33:13 +08:00
Liam Young fbf3bda59a If rabbit cluster is partioned show that in status
If rabbit cluster is partioned show that in status. This check
only works on focal+, prior to that the check is ignored.

Change-Id: Id45c969d37f8cb1c26d0f9834f4a79e7555dd03c
Closes-Bug: 1930417
2021-06-21 08:47:08 +00:00
Liam Young 81c33953f9 Implementation of deferred restarts
Add deferred event actions and config.

Change-Id: Ifbb15c0c04117a5a98672b2af4fd7203dae9a18e
2021-04-09 21:11:30 +00:00
Zuul de11e7f0dd Merge "Fix: do not use charmhelpers in non-charm context" 2021-01-15 15:46:40 +00:00
Martin Kalcok 7acad5fdaa NRPE: Allow excluding queues from queue-size checks
Option '-e <vhost>  <queue>' was added to the 'check_rabbitmq_queues.py'
nrpe script to allow excluding selected queues when checking queue
sizes. Corresponding option 'exclude_queues' was added to the
charm config.
By default, following queues are excluded:
 * event.sample
 * notifications_designate.info
 * notifications_designate.error
 * versioned_notifications.info
 * versioned_notifications.error

Closes-Bug: #1811433
Change-Id: I57e297bb4323a3ab98da020bfcb1630889aac6d7
2021-01-14 11:35:31 +01:00
Peter Sabaini ab79c3ee6c Fix: do not use charmhelpers in non-charm context
In change I60141397f39e3b1b0274230db8d984934c98a08d charmhelper
library is being used in the rabbitmq queue nrpe check. This is
problematic as the check does not actually run in a charm context and
therefore does not have access to the charm environment such as the
current config. Additionally an issue in collating check results had
been introduced.

This change aims to fix these issues. Instead of using the charmhelper
library, the cronspec is read out from the cron job definition
itself, and the series is probed from /etc/lsb-release

Change-Id: I952aeda31e997ccadb6cff62e3b0d46349650979
2020-12-23 11:33:46 +01:00
Felipe Reyes 07ec03b5d7 Add queue-master-locator config option
queue-master-locator is a configuration option supported by
rabbitmq-server since 3.6, it allows to have control of where the
master queue will be created.

Change-Id: I38cc019b73d062572e19bd532b6bccdaf88638ba
Func-Test-PR: https://github.com/openstack-charmers/zaza-openstack-tests/pull/382
Closes-Bug: #1890759
Signed-off-by: Nicolas Bock <nicolas.bock@canonical.com>
2020-12-13 15:43:31 -03:00
Zuul 39780d9e09 Merge "Update NRPE logic to add/remove checks and files" 2020-11-27 15:10:26 +00:00
Robert Gildein 60f2f486d0 Update NRPE logic to add/remove checks and files
The function `update_nrpe_checks` has been changed to remove redundant
checks and scripts based on rabbitmq configuration, but the main logic was
unchanged.

The function logic is based on these three functions:
1) copy all the custom NRPE scripts and create cron file
2) add NRPE checks and remove redundant
2.a) update the NRPE vhost check for TLS and non-TLS
2.b) update the NRPE queues check
2.c) update the NRPE cluster check
3) remove redundant scripts - this must be done after removing
                              the relevant check

Closes-Bug: #1779171
Change-Id: Ice83133c2c73532720f33298713267f69e8b4c3a
2020-11-25 11:57:19 +01:00
Zuul b80f032d1a Merge "Display queue sizes along with queues" 2020-11-10 20:40:05 +00:00
Peter Sabaini b3710a0085 Display queue sizes along with queues
When checking queues, display not only queue names but also their
size (number of messages). Return sizes as integers.

Also update parsing to account for a rabbitmqctl output change in
focal.

Closes-Bug: #1838964

Change-Id: I2014f065393a1ad4b594363ade6c01ccec4fb71a
2020-11-06 18:22:56 +01:00
Peter Sabaini 943f4f63ab Fix: nrpe queue check should check for freshness
Make the rabbitmq queue check also check if its input data file was
recently updated. This input data is created via cronjob; if that gets
stuck we might not actually be getting meaningful data.

The charm supports configuring the check interval via a full cron time
specification, so technically one could have that updated only once a
year even if this doesn't make much sense in a monitoring scenario.

Also fix a buglet in the nrpe update hook function: only deploy a
queue check if the cron job hasn't been deconfigured by setting it to
the empty string

Change-Id: I60141397f39e3b1b0274230db8d984934c98a08d
Closes-Bug: #1898523
2020-11-06 09:36:49 +01:00