hacluster uses the command "crm configure get-property <CMD>" to obtain
a property of the cluster, although "get-property" has been deprecated
in favor of "get_property", since crmsh-4.2.1 a warning is printed to
stdout[0] breaking the parsing.
# crm configure get-property maintenance-mode 2>/dev/null
WARNING: This command 'get-property' is deprecated, please use 'get_property'
INFO: "get-property" is accepted as "get_property"
true
[0] 86282af8e5
Change-Id: Id0ee9ab1873d14dcd1c960001cdeb8318f599ef5
Closes-Bug: #2008704
list_nodes() recently had some changes to run 'crm node show'
in jammy+ instead of 'crm node status'. The difference is that
'crm node show' returns the pacemaker-remote nodes in addition
to the hacluster nodes. This change limits the nodes returned
by list_nodes() to the hacluster nodes (ie. the nodes that
have a node ID).
Closes-Bug: #1995295
Change-Id: Ia405d4270f56c949f79167f8b75c1304b598b918
The command 'crm node show' is used on jammy to retrieve the list of
nodes defined in a cluster. The output for nodes includes ': member'
which breaks ensuing commands that are using list_nodes() output.
For example:
juju-3f6cb6-zaza-4135aa8b2509-8.project.serverstack: member
This change trims everything including and after the ':' from the
output.
Closes-Bug: #1994160
Change-Id: I54a4f854f3e293503ec97d99a49b6dc51ee50c87
The command 'crm node show' is used on jammy to retrieve the list of
nodes defined in a cluster, although this command also includes the
properties set on a node (e.g. standby=off) which breaks the current
logic parsing.
This change uses a regular expresion to filter out all the lines from
the output that don't start with a non-white character (^\S+).
Change-Id: I3e00daa1b877a7faae1370f08b2d9c5bd7795c5f
Closes-Bug: #1987685
Related-Bug: #1972022
The version of crmsh available on jammy doesn't have the 'crm node
status' subcommand available since it was removed[0], this change uses
the command 'crm node attribute' to figure out if the node is in standby
mode when running on ubuntu>=jammy, and 'crm node show' to get the list
of nodes.
[0] https://github.com/ClusterLabs/crmsh/pull/753
Change-Id: Iafb711be220573cb701527ec84de285edd8942cf
Closes-Bug: #1972022
The crm node delete already handles some expected failure modes. Add
"Transport endpoint is not connected" so that it retries the node
delete.
Change-Id: I9727e7b5babcfed1444f6d4821498fbc16e69297
Closes-Bug: #1931588
Co-authored-by: Aurelien Lourot <aurelien.lourot@canonical.com>
The `state` action will provide details about the health of the cluster.
This action has one parameter to display the history of the cluster status,
which is false by default.
Closes-Bug: #1717831
Change-Id: Iaf6e4a75a36491eab8e6802a6f437e5f410ed29e
Add an `update-ring` action for that purpose.
Also print more on various pacemaker failures.
Also removed some dead code.
Func-Test-PR: https://github.com/openstack-charmers/zaza-openstack-tests/pull/369
Change-Id: I35c0c9ce67fd459b9c3099346705d43d76bbdfe4
Closes-Bug: #1400481
Related-Bug: #1874719
Co-Authored-By: Aurelien Lourot <aurelien.lourot@canonical.com>
Co-Authored-By: Felipe Reyes <felipe.reyes@canonical.com>
This is already deprecated since June 2020 thanks
to a 'blocked' message in assess_status_helper()
but this commit:
1. makes it clear in config.yaml, and
2. removes the corresponding already dead code.
Change-Id: Ia6315273030e31b10125f2dd7a7fb7507d8a10b7
Use location directives to spread pacemaker remote resources across
cluster. This is to prevent multiple resources being taken down in
the event of a single node failure. This would usually not be a
problem but if the node is being queried by masakari host
monitors at the time the node goes down then the query can hang.
Change-Id: Ib8a667d0d82ef3dcd4da27e62460b4f0ce32ee43
Partial-Bug: #1889094
Depends-On: Ic45dbdd9d8581f25549580c7e98a8d6e0bf8c3e7
The recent maas stonith resources, introduced to support stonith
with pacemaker-remotes, included a hash of the combined url and
api key in the resource name. But the charm only supports one
stonith resource (single maas_url api key config options). Having
the hash makes managing the resources more complicated espically
when the url or api key change. So remove any existing resource
(there is very unlikely to be one as the feature is only just
out of preview state) and replace with a single resource called
'st-maas'.
Change-Id: I053f1dc882eebfdef384cbbbfa7cabc82bce5f8b
The current charm does not indicated to the end user when a specific
resource is not running. Neither does it indicate when a node is offline
or stopped.
Validate that configured resources are actually running and let the end
user know if they are not.
Closes-Bug: #1834263
Change-Id: I1171e71ae3b015b4b838b7ecf0de18eb10d7c8f2
The change adds a stonith plugin for maas and method for creating
stonith resources that use the plugin.
Change-Id: I825d211d68facce94bee9c6b4b34debaa359e836
The current list_nodes command tries to parse the output of
"crm node list" to get a list of nodes. But the output is messy if
remote nodes are present so use "crm node status" to get properly
formatted XML which can be reliably parsed.
Change-Id: Iea7ef3dca194e7440dc2cde2624d07e990006685
This patch implements support to update parameters of an already
existing resource using "crm configure load update FILE"
The parameters of a resource are hashed using md5 and stored in the kv
store, when the checksum doesn't match the resource is updated,
otherwise discarded.
Change-Id: I5735eaa1309c57e3620b0a6f68ffe13ec8165592
Closes-Bug: 1753432
Using the xml output provided by "crm configure" and parsing it to look
for nodes that match the xpath ".//*[@id='$NAME']". The test case added
uses the xml generated when ceph-radosgw has dns-ha enabled which
creates a groups of hostnames that cross references resources making the
previous approach give false positives.
Change-Id: If1c3584c889e7e101f15ed5ba6de89c687667754
Closes-Bug: 1789915
This patch implements support to update parameters of an already
existing resource using "crm configure load update FILE"
Change-Id: I22730091d674145db4a1187b0904d9f88d9d8c6d
Partial-Bug: #1753432
This config option allows syadmins to set pacemaker in maintenance mode
which will stop monitoring on the configured resources, so services
can be stopped/restarted and pacemaker won't start them again or
migrating resources (e.g. virtual IPs).
Change-Id: I232a043e6d9d45f2cf833d4f7c4d89b079f258bb
Partial-Bug: 1698926
On corosync restart, corosync may take longer than a minute to come
up. The systemd start script times out too soon. Then pacemaker which
is dependent on corosync is immediatly started and fails as corosync
is still in the process of starting.
Subsequently the charm would run crm node list to validate pacemaker.
This would become an infinite loop.
This change adds longer timeout values for systemd scripts and adds
better error handling and communication to the end user.
Change-Id: I7c3d018a03fddfb1f6bfd91fd7aeed4b13879e45
Partial-Bug: #1654403
All contributions to this charm where made under Canonical
copyright; switch to Apache-2.0 license as agreed so we
can move forward with official project status.
This charm does include files from a few other projects
which we can't re-license - leave those as is for now.
Change-Id: I4d0ec0cceed05ef6b6153148c8b9fc9333189b77