Pacemaker has changed the output format of crm_mon and this broke
the regex to catch nodes that are on standby mode. This change
updates the regex for not alerting on paused units.
Change-Id: I137acad076bff58506fea6e1618a00765adacd9b
Closes-Bug: #1971182
Related-Bug: #1880576
Previously, paused hacluster units showed up CRITICAL error
in nagios even though they were only in the 'standby' mode
in corosync.
The hacluster charm now uses the '-s' option of the check_crm
nrpe script to ignore alerts of the standby units.
Change-Id: I976d5ff01d0156fbaa91f9028ac81b44c96881af
Closes-Bug: #1880576
The old check_crm script had separate checks for failcounts and failed
actions, but since failed actions cause failcounts, the two will always be
present together, and expire together.
Furthermore, the previous defaults effectively caused the failed actions
check to shadow the failcount one, because the former used to cause
CRITICALs, while the latter was only causing WARNINGs.
This version of check_crm deprecates failed actions detection in favor of
only failcount alerting, but adds support for separate warn/crit
thresholds.
Default thresholds are set at 3 and 10 for warn and crit, respectively.
Although sending criticals for high fail counter entries may seem
redundant when we already do that for stopped resources, some resources
are configured with infinite migration thresholds and will therefore
never show up as failed in crm_mon. Having separate fail counter
thresholds can therefore still be valuable, even if for most resources
migration-threshold will be set lower than the critical fail-counter threshold.
Closes-Bug: #1864040
Change-Id: I417416e20593160ddc7eb2e7f8460ab5f9465c00
This commit adds a new option to check_crm named --failedactions
Possible options are 'warning', 'critical', or anything else (which is
considered equivalent to 'ignore').
The default is 'critical' to be backward compatible.
Change-Id: I5908f5f4b7d77219280dfe896ea938459c6b23bd
Partial-Bug: #1796400
From Bionic onwards, libnagios is replaced by libmonitoring. Trusty only
contains the former, Bionic only the latter. Xenial contains both, but
we prefer libmonitoring where it exists, so that we can drop libnagios
support entirely once Trusty goes EOL.
Change-Id: I613fd0b29b797e8900581f939eda72a1ab72868b
Closes-Bug: 1796143