Commit Graph

10 Commits

Author SHA1 Message Date
Gabriel Cocenza a0b419519c Fix standby node regex for check_crm
Pacemaker has changed the output format of crm_mon and this broke
the regex to catch nodes that are on standby mode. This change
updates the regex for not alerting on paused units.

Change-Id: I137acad076bff58506fea6e1618a00765adacd9b
Closes-Bug: #1971182
Related-Bug: #1880576
2022-05-02 19:17:36 -03:00
Martin Kalcok c385fef7b0 NRPE: Don't report paused hacluster nodes as CRITICAL error
Previously, paused hacluster units showed up CRITICAL error
in nagios even though they were only in the 'standby' mode
in corosync.
The hacluster charm now uses the '-s' option of the check_crm
nrpe script to ignore alerts of the standby units.

Change-Id: I976d5ff01d0156fbaa91f9028ac81b44c96881af
Closes-Bug: #1880576
2020-11-06 14:19:42 +01:00
Andrea Ieri 0ce34b17be Improve resource failcount detection
The old check_crm script had separate checks for failcounts and failed
actions, but since failed actions cause failcounts, the two will always be
present together, and expire together.
Furthermore, the previous defaults effectively caused the failed actions
check to shadow the failcount one, because the former used to cause
CRITICALs, while the latter was only causing WARNINGs.

This version of check_crm deprecates failed actions detection in favor of
only failcount alerting, but adds support for separate warn/crit
thresholds.
Default thresholds are set at 3 and 10 for warn and crit, respectively.

Although sending criticals for high fail counter entries may seem
redundant when we already do that for stopped resources, some resources
are configured with infinite migration thresholds and will therefore
never show up as failed in crm_mon. Having separate fail counter
thresholds can therefore still be valuable, even if for most resources
migration-threshold will be set lower than the critical fail-counter threshold.

Closes-Bug: #1864040
Change-Id: I417416e20593160ddc7eb2e7f8460ab5f9465c00
2020-11-02 14:07:18 +00:00
Andrea Ieri 50b83149b3 Cosmetic fix for long lines
This commit is a noop in terms of functionality

Change-Id: I28e2f4219aeb8769c496ba1b38ac3d59be0bef5f
2020-02-19 16:53:21 +01:00
Ryan Beisner 6ed2bb0943
Standardize auxiliary file location across os-charms
Change-Id: Ifaa5453bc0703c77184184e05c53d21649f6b92e
Closes-Bug: #1843826
2019-09-12 15:51:49 -05:00
Andrea Ieri 9483383555 Choose whether to ignore/warn/crit on failed actions
This commit adds a new option to check_crm named --failedactions
Possible options are 'warning', 'critical', or anything else (which is
considered equivalent to 'ignore').
The default is 'critical' to be backward compatible.

Change-Id: I5908f5f4b7d77219280dfe896ea938459c6b23bd
Partial-Bug: #1796400
2019-05-09 12:02:17 +02:00
Barry Price 600ba322fa
Add Bionic compatibility for the NRPE scripts via libmonitoring.
From Bionic onwards, libnagios is replaced by libmonitoring. Trusty only
contains the former, Bionic only the latter. Xenial contains both, but
we prefer libmonitoring where it exists, so that we can drop libnagios
support entirely once Trusty goes EOL.

Change-Id: I613fd0b29b797e8900581f939eda72a1ab72868b
Closes-Bug: 1796143
2019-01-21 15:44:57 +07:00
Brad Marshall cb6563c1cc [bradm] Removed haproxy nrpe checks 2015-02-17 16:30:22 +10:00
Brad Marshall a946a4a002 [bradm] Add sudoers files for nagios checks 2015-02-12 12:06:05 +10:00
Brad Marshall 9c3ca6e743 [bradm] Sync charmhelpers nrpe support, and add nrpe checks 2015-02-12 09:49:44 +10:00