When the host not added in the failover segment, it will raise 400(host
with name *** could not be found). It should not retry to send
notification in this case.
Change-Id: I24a6aba97b834ae92dabe85196f01d27bb518b3c
The setDaemon method of the threading.Thread was deprecated
in Python 3.10 (*).
Replace the setDaemon method with the daemon property.
*: https://docs.python.org/3.10/library/threading.html#threading.Thread.setDaemon
Change-Id: I643251c0394b8e8ede8198f580549ef6f260a9de
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
When large scale failure, there would be too many host or instance
failure notifications in a very short time. Each time when one
notification to be sent to masakari, it needs to make client, which
brings great pressure to keystone.
This patch keep the client reusable when it is made. Until exception
it will be made again.
Change-Id: I39795bc796d3e2402881b8116cdc241aa2d60a9f
This is a new host monitor by consul. It can monitor host connectivity
via management, tenant and storage interfaces.
Implements: bp host-monitor-by-consul
Change-Id: I384ad70dfd9116c6e253e0562b762593a3379d0c
Both cibadmin-based and crm_mon-based host status queryings were
affected, allowing partitioned cluster to tell Masakari to
evacuate hosts from the other partition (which nota bene include
all remotes if applicable).
Closes-Bug: #1878548
Change-Id: I0b1ca8a011ee4da162a2c3a986c1dab9a3d38190
The hostmonitor, like other Masakari monitors, starts as an
Oslo service (based on eventlet). The main thread is supposed
to run a loop that has an internal wait mechanism (instead of
reusing periodic_tasks from oslo_service). However, the loop
could be broken, if an unexpected exception appeared, and it
never ran again but the process was still alive (due to
oslo_service not stopping). The example mentioned in the bug
report is about unavailability of the Masakari API (and/or
Keystone API) before notification sending. This exception is
not caught early because SendNotification._make_client is
called outside of the try block (unlike the actual notification
sending). The exception bubbles up and stops the main loop,
leaving a useless hostmonitor process. The user is unaware
unless they notice the logs are no longer growing.
While the general design begs for a revamp (we might get away
with that by using Consul in the first place), the easy fix is
to prevent exceptions breaking the loop completely so that the
hostmonitor can continue to work and try to regain health.
At the very least it will keep posting ERROR messages in the log
which is more likely to be spotted in comparison to lack of logs
(which is, unfortunately, less commonly considered an alerting
situation).
This change also fixes, adapts and robustifies the two relevant
unit tests.
Closes-Bug: #1930361
Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Change-Id: I7e3447dcddc7998e3e3c30f4f0019d91a99c79ce
By default, if split's step is None, runs of consecutive whitespace
are regarded as a single separator. So there is no need to use
split(' ').
Change-Id: Idcda8dfcaf5fd5abfab106238f91acdd3166883f
The original test is adapted because the code now no longer
overwrites the same status.
Change-Id: Ic77f932f56974a66a092b15b0d211efd73b9fc9c
Implements: bp retry-check-when-host-failure
Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
When running in a container, it might not be possible to use systemd
to verify the status of Corosync and Pacemaker.
In such case, allow the user to choose the stack being used.
Change-Id: I44ce3be6b6fda3834f6df63861b0dcf546da46a1
Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
If a custom CA file is configured via [api] cafile, currently
communication with Keystone will fail, since the session is not created
using this CA file. The [api] insecure option is also ignored.
This change fixes the issue by using keystoneauth loading for the auth
and session, to ensure all standard configuration options are supported.
Change-Id: Idd58b72f7f5242e8135fec71b42adf5dd1852417
Closes-Bug: #1873736
This replaces the deprecated (in python 3.2) unittest.TestCase method
assertRaisesRegexp() with assertRaisesRegex().
Also add associated hacking check.
Change-Id: I62d5b4c0259c6e2e0fee361542d4b1234ab0ea57
Signed-off-by: Chuck Short <chucks@redhat.com>
It repeatedly uses 'node_state_tag.get('uname')' to parse the hostname.
But the hostname doesn't change in the loop.
Change-Id: Icac10015698378a1901c664f37f11b4529bf03e1
In masakari-engine, it enable/disable nova-compute service
by the process name. So process-monitor need to reset
the nova process name to "nova-compute" when trigger
process notification.
Change-Id: Ia6f22bd1d183093bb0345f323a80268fb62df388
Close-Bug: #1858757
Now that we no longer support py27, we can use the standard library
unittest.mock module instead of the third party mock lib.
Change-Id: Ie5ee60235bafc1e7b3461dee29b83fb62125178e
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
When sending an alert from the instancemonitor check the monitors
config file for the hostname before sending the alert.
Change-Id: If11aa1abb1142941d6dcd00c46063d9015644978
Closes-Bug: #1866638
The repo is Python 3 now, so update hacking to version 3.0 which
supports Python 3.
Fix problems found by updated hacking version.
Update local hacking checks to work with newer flake8.
Remove hacking and friends from lower-constraints, they're
not needed for installation.
Change-Id: Ic7903c61bde999685ca26b5a10d070c8d8d206a3
As described in bug #1728527 cibadmin does not expose the state of
the pacemaker-remote nodes which means hostmonitor cannot track
them. This change switches to use crm_mon to check the status of
remote nodes if the new config option host.restrict_to_remotes
to set to True. This will trigger host monitor to use crm_mon
to monitor nodes and will only monitor nodes that are marked
as remotes (not members).
Change-Id: I3f2026805413504c875ea5f39eb036d44b26dd43
Depends-On: Iaa2251708616e9c69817bf5b346d795ea7a4d21b
Closes-Bug: #1728527