This patch adds the service user rotation feature, which provides two
actions:
- list-service-usernames
- rotate-service-user-password
The first lists the possible usernames that can be rotated. The
second action rotates the service, and is tested via the func-test-pr.
Change-Id: Ia94ab3d54cd8a59e9ba5005b88d3ec1ff87019b1
func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/1029
It removes the necessity to run the cron task as root user
and ensure the content created in /var/lib/rabbitmq belongs
to rabbitmq user and group solely.
Then giving access for nrpe user is done by adding its user
to rabbitmq group.
Also implemented in the upgrade-charm hook for ongoing
deployments
Closes-Bug: #1879524
Change-Id: I19e3d675ace7c669451ca40a20d21cef1aec6a95
The beam.smp process won't start if more than 1024 are configured, the
charm could make this by default on large systems (e.g. more than 42
CPUs). This change makes RabbitMQEnvContext.calculate_threads() never
return more than 1024 (MAX_NUM_THREADS).
Change-Id: I92879445210bac6ee7d96a704cdf428ca738e3b6
Closes-Bug: #1768986
This is a fix/workaround to the package upgrade bug that affects the
charm. The post-inst package script updates the .erlang.cookie if it is
insecure during the upgrade of rabbit from 3.8 to 3.9. This breaks the
series-upgrade resulting in a charm erroring on the post-series-upgrade
hook.
This fix works by checking if the .erlang.cookie has changed during the
post-series-upgrade hook and either updating the cookie in peer storage
(if it is insecure) or ensuring that the cookie from peer storage is
written to the .erlang.cookie if it isn't the leader. This ensures that
the cluster continues to work and that the series-upgrade can be
completed across the cluster.
Change-Id: I540ea8da85b3b4326ccb8194f1d8b1050b04eae9
Closes-Bug: #2006485
Due to the @cache decorator in the code, it was possible to get the
charm into a state where RMQ is clustered, but the charm doesn't record
it. The charm 'thinks' it is clustered when it has set the 'clustered'
key on the 'cluster' relation. Unfortunately, due to the @cached
decorator it's possible in the 'cluster-relation-changed' hook to have a
situation where the RMQ instance clusters during the hook execution and
then, later, when it's supposed to writing the 'clustered' key, it reads
the previous cached value where it wasn't clustered and therefore
doesn't set the 'clustered' key. This is just about the only
opportunity to do it, and so the charm ends up being locked.
The fix was to clear the @cache values so that the nodes would be
re-read, and this allows the charm to then write the 'clustered' key.
Change-Id: I12be41a83323d150ba1cbaeef64041f0bb5e32ce
Closes-Bug: #1975605
For units deployed before the implementation of the
cluster-partition-handling strategy they won't have that key set in the
leader making the charm believe there are pending tasks, so this change
seeds the key when is not set with the value present in the charm's
configuration.
Change-Id: Ifdae35ffee1ad7a8f4e5248c817cca14b69d9566
Closes-Bug: #1979092
RabbitMQ sesrver sometimes creates non-uniform outputs that nrpe
can't parse. Instead of breaking the check, this commit outputs
the error messages and continue the check.
This problem is most likely caused by queue state being
"down" [1]. However, because the current charm doesn't show such
information and the bug is hard to manually reproduce, this
commit adds the state attribute when creating queue_state file
for future debugging.
[1] https://www.rabbitmq.com/rabbitmqctl.8.html#state_2
Closes-Bug: #1850948
Change-Id: Iaa493c8270f344cde8ad7c89bd2bb548f0ad71bd
Remove legacy checks from set_ha_mode in rabbit_utils.py as it checks
for versions of rabbitmq which is less than version 3.0.0 which is not
available in the archives for any supported releases.
Change-Id: Ib21f6ae3f30eabaaa8d677c20a555ded4e6851d6
Use the coordination module when cluster join events are called.
The `cluster_wait` method has been removed as it is no longer used
and `cluster_with` has been broken up into three new methods (
`clustered_with_leader`, `update_peer_cluster_status` and
`join_leader`) which can be called separately. The `modulo-nodes`
and `known-wait` charm options have been removed as they are no
longer needed.
Closes-Bug: #1902793
Change-Id: I136f5dcc855da329071e119b67df25d9045e86cc
Use the coordination module to manage package upgrades across the
cluster. To each this some of the setup was moved into a new
configure_rabbit_install method which handles setup is normally
run after an upgrade.
Change-Id: I8d244d96c83a5da164322faff873a72530ec9def
Use the coordination module to manage restarting the rabbitmq
services. This is to ensure that restarts are only
performed on one unit at a time. This helps prevent
situation which can cause the cluster to become split
brained (eg if two or more nodes are restarted at the same
time).
* Manually run _run_atstart & _run_atexit method when actions
are run as this does not happen automatically and is needed by
the coordination layer.
* Replace restart_on_change decorator with
coordinated_restart_on_change. coordinated_restart_on_change
includes logic for requesting restart locks from the coordination
module.
* The coordination module works via the leader and cluster events so
the hooks now include calls to check_coordinated_functions
which will run any function that is waiting for a lock.
* Logic has been added to check for the situation where a hook is
being run via the run_deferred_hooks actions. If this is the
case then restarts are immediate as the action should only be run
on one unit at a time.
Change-Id: Ia133c90a610793d4da96d3400a3906b801b52b73
Enabled rabbitmq_prometheus plugin for prometheus to scrape
the metrics of rabbitmq and alert if rabbitmq splitbrain is
detected.
Integrated rabbitmq dashboards in grafana via dashboards
relations
Added new unit test cases
Closes-Bug: 1899183
Change-Id: I88942dd0b246c498d0ab40b00d586d4349b0f100
Check that setting update is needed before applying a config
update to the cluster. This is mainly applicable to
rabbitmq-server > 3.8.2 which supports json output. If a
parser is not available to extract the existing settings
then the old behaviour of blindly applying the change
is used.
Closes-Bug: #1909031
Change-Id: I9599f69cc11ea8d1a4e9d618aecdab4afe488d96
The mock third party library was needed for mock support in py2
runtimes. Since we now only support py36 and later, we can use the
standard lib unittest.mock module instead.
Note that https://github.com/openstack/charms.openstack is used during tests
and he need `mock`, unfortunatelly it doesn't declare `mock` in its
requirements so it retrieve mock from other charm project (cross dependency).
So we depend on charms.openstack first and when
Ib1ed5b598a52375e29e247db9ab4786df5b6d142 will be merged then CI
will pass without errors.
Depends-On: Ib1ed5b598a52375e29e247db9ab4786df5b6d142
Change-Id: I98f432a771b5f6c966328d30629410a0a180dbee
to make it user friendly (rabbitmq)
"rabbitmqctl cluster_status" uses escape codes to color/highlight the
output, and it does not have a way to suppress this. This makes the
output to the command "juju run-action rabbitmq-server/leader
cluster-status" not user friendly and difficult to read.
Add the json formatting option to the rabbitmqctl command and use
the json.dumps method to get a user friendly output.
Add unit test.
Closes-Bug: #1943198
Change-Id: I24380e24ff1edbede9c2db1671a4fc05d5a7cc63
Over time the managment plugin has become a core part of managing
a rabbit deployment. This includes allowing tools such as nrpe to
be able to query the api and alert for situations such as orphaned
queues.
Change-Id: Icbf760610ce83b9d95f48e99f6607ddf23963c97
Partial-Bug: 1930547
Use json module to dump json in set_ha_mode rather than trying
to generate json using string interpolation. This fixes a bug
when using the 'nodes' mode which was generating invalid json.
The new function test is taken from Id7ef45b7001d26ede3fd61f97626b5e9e8b81196
Change-Id: Ieb49036389221f6fbf2db93fbe4aebe6e986ea21
Co-Authored-By: Trent Lloyd <trent.lloyd@canonical.com>
Use cluster-partition-handling strategy 'ignore' during charm
installation regardless of the charm config setting. Once the
leader has checked it is clustered with peers then it sets the
cluster-partition-handling strategy to be whatever the user set
in charm config.
Partial-Bug: 1802315
Change-Id: Ic03bbe55ea8aab8b285977a5c0f9410b5bbf35c8
TLS < 1.2 is considered insecure; where possible limit the versions
of TLS to 1.2 or higher, enabling support for TLS 1.3 when the
required erlang and rabbitmq versions are installed.
Change-Id: Iec5ab60488986f8e332ff0e9a11895822a61c1ee
Closes-Bug: 1892450
Func-Test-PR: https://github.com/openstack-charmers/zaza-openstack-tests/pull/668
Refactor methods which query rabbit to remove the duplication
around checking if json output is supported.
Change-Id: Id4e3dbd85748e41bb4b1c8db282495cfffaa823d
For newer RabbitMQ versions, switch to using the new ini style
configuration file format (rabbitmq.conf vs rabbitmq.config).
This allows the charm to configure a wider set of options and
is needed to support limitation of TLS versions use for on the
wire encryption.
Upgrades at RabbitMQ 3.7.0 should switch from old to new format
and file name.
Change-Id: I6deda5ecf5990d527e22373540074d2a4b7bad38
Func-Test-PR: https://github.com/openstack-charmers/zaza-openstack-tests/pull/668
When invoking the check_rabbitmq_queues script with wildcards for vhost
and/or queue parameters, script output does not reflect precisely which
queues are having a high number of oustanding messages as information is
consolidated under the wildcard.
This change fixes this behaviour by adding a new charm configuration
parameter which allows the user to specify the number of busiest queues,
n, to display should the check_rabbitmq_queues script reports any
warnings or errors. The default, n=0, keeps the current script output.
This option is applicable regardless of the vhost:queue combination but
is specifically relevant when wildcards are passed as arguments.
Implementation displays the first n items in the stats list re-organized
in decreasing message count order.
Closes-Bug: #1939084
Change-Id: I5a32cb6bf37bd2a0f30861eace3c0e6cb5c2559d
The check_rabbitmq_queues nrpe check accesses the cron file created
for running collect stats job. This is done in order to determine if
the stats are too old and an alert should be raised. The nagios user
does not have access to read the cron job when running in a hardened
environment where /etc/cron.d is not readable.
This change refactors this logic to move the calculation of maximum
age for a stats file from the check_rabbitmq_queues script and into
the rabbit_utils code where it is generating the nrpe configuration.
A new (optional) parameter is added to the check_rabbitmq_queues
script to accept the maximum age in seconds a file can last be
modified.
This change also removes the trusty support in hooks/install and
hooks/upgrade-charm as the rabbit_utils.py file needs to import a
dependency which is installed by the scripts. It is cleaned up to make
sure the croniter package is always installed on install or upgrade.
Change-Id: If948fc921ee0b63682946c7cc879ac50e971e588
Closes-Bug: #1940495
Co-authored-by: Aurelien Lourot <aurelien.lourot@canonical.com>
Improving the parsing of the cron schedule for /etc/cron/rabbitmq-stats.
The code makes assumptions that the user in the cron entry will be the
root user, which is generally safe as that's what the charm applied.
However, the parsing is brittle in that it depends on the 'root' string
in the entry. This changes the code so that the cron timer spec is
stripped out based on the column entries in the file.
Change-Id: I2d573e8942e840e0e5376f1537a2a3373fea3db8
Fixes-Bug: #1939702
When a RabbitMQ cluster is restarted, the mnesia settings determine
how long and how often each broker will try to connect to the cluster
before giving up. It might be useful for an operator to be able to
tune these parameters. This change adds two settings,
`mnesia-table-loading-retry-timeout` and
`mnesia-table-loading-retry-limit`, which set these parameters in the
rabbitmq.config file [1].
[1] https://www.rabbitmq.com/configure.html#config-items
Change-Id: I96aa8c4061aed47eb2e844d1bec44fafd379ac25
Partial-Bug: #1828988
Related-Bug: #1874075
Co-authored-by: Nicolas Bock <nicolas.bock@canonical.com>
Co-authored-by: Aurelien Lourot <aurelien.lourot@canonical.com>
`rabbitmqctl wait`'s default behavior changed recently
and a short timeout was introduced upstream. This
patch adapts our code in order to stay on the old,
intended behavior.
Change-Id: I020e3e9e4976e21da08316ac58642b2058564b02
Set TTL as a solution for topic queue engine_worker and heat-engine-listener
to avoid them growing all the time after heat-engin restarts.
This is rabbitmq-server part. eg: we can set heat ttl by:
juju config heat ttl=3600000
Closes-Bug: 1925436
Change-Id: I7b826fe965a200da29020a8f2c6148f76d10a2b0
If rabbit cluster is partioned show that in status. This check
only works on focal+, prior to that the check is ignored.
Change-Id: Id45c969d37f8cb1c26d0f9834f4a79e7555dd03c
Closes-Bug: 1930417
Option '-e <vhost> <queue>' was added to the 'check_rabbitmq_queues.py'
nrpe script to allow excluding selected queues when checking queue
sizes. Corresponding option 'exclude_queues' was added to the
charm config.
By default, following queues are excluded:
* event.sample
* notifications_designate.info
* notifications_designate.error
* versioned_notifications.info
* versioned_notifications.error
Closes-Bug: #1811433
Change-Id: I57e297bb4323a3ab98da020bfcb1630889aac6d7
In change I60141397f39e3b1b0274230db8d984934c98a08d charmhelper
library is being used in the rabbitmq queue nrpe check. This is
problematic as the check does not actually run in a charm context and
therefore does not have access to the charm environment such as the
current config. Additionally an issue in collating check results had
been introduced.
This change aims to fix these issues. Instead of using the charmhelper
library, the cronspec is read out from the cron job definition
itself, and the series is probed from /etc/lsb-release
Change-Id: I952aeda31e997ccadb6cff62e3b0d46349650979
queue-master-locator is a configuration option supported by
rabbitmq-server since 3.6, it allows to have control of where the
master queue will be created.
Change-Id: I38cc019b73d062572e19bd532b6bccdaf88638ba
Func-Test-PR: https://github.com/openstack-charmers/zaza-openstack-tests/pull/382
Closes-Bug: #1890759
Signed-off-by: Nicolas Bock <nicolas.bock@canonical.com>
The function `update_nrpe_checks` has been changed to remove redundant
checks and scripts based on rabbitmq configuration, but the main logic was
unchanged.
The function logic is based on these three functions:
1) copy all the custom NRPE scripts and create cron file
2) add NRPE checks and remove redundant
2.a) update the NRPE vhost check for TLS and non-TLS
2.b) update the NRPE queues check
2.c) update the NRPE cluster check
3) remove redundant scripts - this must be done after removing
the relevant check
Closes-Bug: #1779171
Change-Id: Ice83133c2c73532720f33298713267f69e8b4c3a
When checking queues, display not only queue names but also their
size (number of messages). Return sizes as integers.
Also update parsing to account for a rabbitmqctl output change in
focal.
Closes-Bug: #1838964
Change-Id: I2014f065393a1ad4b594363ade6c01ccec4fb71a
Make the rabbitmq queue check also check if its input data file was
recently updated. This input data is created via cronjob; if that gets
stuck we might not actually be getting meaningful data.
The charm supports configuring the check interval via a full cron time
specification, so technically one could have that updated only once a
year even if this doesn't make much sense in a monitoring scenario.
Also fix a buglet in the nrpe update hook function: only deploy a
queue check if the cron job hasn't been deconfigured by setting it to
the empty string
Change-Id: I60141397f39e3b1b0274230db8d984934c98a08d
Closes-Bug: #1898523