Commit Graph

134 Commits

Author SHA1 Message Date
Vasyl Saienko f4769d3177 Honor libvirt.connection_uri in introspectivemonitor
Do not use default uri, pick up from parameters.

Change-Id: I8620aeab224ab37096656d20c4bcc3fe7e7f3f18
2024-02-13 16:36:14 +00:00
Vasyl Saienko b89aeeea88 [introspectivemonitor] Fix syntax for python3
Change-Id: I4810e852d7fbf6c0140ab64414eed63f93d3f956
2024-02-13 16:36:06 +00:00
Vasyl Saienko 185003f7dc Fix pep8 for new hacking
Change-Id: I0bf55e6289fdbfcb6c564976f9040278fc8f9d71
2023-11-29 14:19:17 +02:00
suzhengwei cf646dfce2 not retry to send notification for specific http exception
When the host not added in the failover segment, it will raise 400(host
with name *** could not be found). It should not retry to send
notification in this case.

Change-Id: I24a6aba97b834ae92dabe85196f01d27bb518b3c
2023-01-04 11:41:33 +08:00
Takashi Natsume afcf34decc Use daemon property instead of setDaemon method
The setDaemon method of the threading.Thread was deprecated
in Python 3.10 (*).
Replace the setDaemon method with the daemon property.

*: https://docs.python.org/3.10/library/threading.html#threading.Thread.setDaemon

Change-Id: I643251c0394b8e8ede8198f580549ef6f260a9de
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
2022-08-24 23:30:42 +09:00
Maksim Malchuk 7a44244f25 Libvirt auth support
Related-Bug: #1965754
Change-Id: I46f63de4b8ca8e5acd5db9cb8b0d2e13393d666c
Signed-off-by: Maksim Malchuk <maksim.malchuk@gmail.com>
2022-05-13 15:15:20 +00:00
Zuul a89511e088 Merge "host monitor by consul" 2022-03-03 11:26:36 +00:00
Zuul 89c067a4d8 Merge "connection too much when large scale failure" 2022-02-28 08:39:59 +00:00
Takashi Kajinami 741cffdfe7 Use LOG.warning instead of deprecated LOG.warn
The LOG.warn method is deprecated[1] and the LOG.warning method should
be used instead.

[1] https://docs.python.org/3/library/logging.html#logging.warning

Change-Id: I7e3dc5d1897cd10b94e0a5a5a06db667cba7d443
2022-01-28 09:05:39 +09:00
sue 4ecfb34a09 connection too much when large scale failure
When large scale failure, there would be too many host or instance
failure notifications in a very short time. Each time when one
notification to be sent to masakari, it needs to make client, which
brings great pressure to keystone.

This patch keep the client reusable when it is made. Until exception
it will be made again.

Change-Id: I39795bc796d3e2402881b8116cdc241aa2d60a9f
2022-01-26 15:19:25 +08:00
sue 7c476d07aa host monitor by consul
This is a new host monitor by consul. It can monitor host connectivity
via management, tenant and storage interfaces.

Implements: bp host-monitor-by-consul
Change-Id: I384ad70dfd9116c6e253e0562b762593a3379d0c
2021-12-23 14:39:09 +08:00
zhaoleilc 50677c2ca8 Fix a typo
This patch fixes a typo.

Change-Id: Ib72b08823e6a086ef0efbb956a3c2cdb4366c217
2021-12-10 17:28:43 +08:00
zhaoyixin 5198a3c910 Fix some typos
This patch fixes some typos.

Closes-Bug: #1949540
Change-Id: Iddb519719f3aca80286e9ca1666040ac202e037e
2021-11-03 10:47:30 +08:00
Radosław Piliszek c2d9a4f9cb Fix hostmonitor to respect quorum
Both cibadmin-based and crm_mon-based host status queryings were
affected, allowing partitioned cluster to tell Masakari to
evacuate hosts from the other partition (which nota bene include
all remotes if applicable).

Closes-Bug: #1878548
Change-Id: I0b1ca8a011ee4da162a2c3a986c1dab9a3d38190
2021-09-13 19:27:52 +00:00
Zuul 981cda7e83 Merge "Remove conditionals for an ancient openstacksdk" 2021-08-17 06:41:46 +00:00
sue 339218d0af move DriverBase to the base dir
Move the driver.py one level up, so that new monitor driver can reuse
it.

Change-Id: I3ec98682056d07c777d592e01d290af9d06d7ff1
2021-08-16 15:03:08 +08:00
Radosław Piliszek fe4e09408c Remove conditionals for an ancient openstacksdk
Change-Id: I4b035925b212236e5a31cd3752d08d281501c26d
2021-07-28 19:41:38 +00:00
Zuul 6e84289a53 Merge "Fix hostmonitor hanging forever after certain exceptions" 2021-07-27 06:23:37 +00:00
Zuul 1244861394 Merge "Fix typos" 2021-07-23 18:44:42 +00:00
sue e7154f3d77 Fix hostmonitor hanging forever after certain exceptions
The hostmonitor, like other Masakari monitors, starts as an
Oslo service (based on eventlet). The main thread is supposed
to run a loop that has an internal wait mechanism (instead of
reusing periodic_tasks from oslo_service). However, the loop
could be broken, if an unexpected exception appeared, and it
never ran again but the process was still alive (due to
oslo_service not stopping). The example mentioned in the bug
report is about unavailability of the Masakari API (and/or
Keystone API) before notification sending. This exception is
not caught early because SendNotification._make_client is
called outside of the try block (unlike the actual notification
sending). The exception bubbles up and stops the main loop,
leaving a useless hostmonitor process. The user is unaware
unless they notice the logs are no longer growing.

While the general design begs for a revamp (we might get away
with that by using Consul in the first place), the easy fix is
to prevent exceptions breaking the loop completely so that the
hostmonitor can continue to work and try to regain health.
At the very least it will keep posting ERROR messages in the log
which is more likely to be spotted in comparison to lack of logs
(which is, unfortunately, less commonly considered an alerting
situation).

This change also fixes, adapts and robustifies the two relevant
unit tests.

Closes-Bug: #1930361
Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Change-Id: I7e3447dcddc7998e3e3c30f4f0019d91a99c79ce
2021-07-23 14:30:23 +00:00
ericxiett 31f5b47d38 Fix typos
This patch fixes some typos.

Closes-Bug: #1934845
Change-Id: I93e48dfa896f7b7112d42f7681469f3d3e55c50a
2021-07-07 14:08:49 +08:00
YeHaiyang 76d9e894c7 Fix several code comment errors in process monitor.
Change-Id: I21617a0952c2d6704e7adb8397d032a21ba38d8e
2021-07-05 19:40:55 +08:00
YeHaiyang 6a496e7871 Replace "split(' ')" with "split()" in masakari-monitors
By default, if split's step is None, runs of consecutive whitespace
are regarded as a single separator. So there is no need to use
split(' ').

Change-Id: Idcda8dfcaf5fd5abfab106238f91acdd3166883f
2021-06-30 01:23:19 +00:00
Zuul bc624feecd Merge "Replaces yaml.load() with yaml.safe_load()" 2021-04-27 01:01:58 +00:00
Zuul 12d67aede4 Merge "Drop CAP_NET_ADMIN" 2021-03-25 17:04:47 +00:00
Zuul 22059afc26 Merge "Repeated check to determine host status" 2021-03-24 01:33:26 +00:00
suzhengwei 987584a1c6 Repeated check to determine host status
The original test is adapted because the code now no longer
overwrites the same status.

Change-Id: Ic77f932f56974a66a092b15b0d211efd73b9fc9c
Implements: bp retry-check-when-host-failure
Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
2021-03-22 20:36:40 +00:00
Mark Goddard 0cdbb23587 [hostmonitor] Add pacemaker_node_type option
When running in a container, it might not be possible to use systemd
to verify the status of Corosync and Pacemaker.
In such case, allow the user to choose the stack being used.

Change-Id: I44ce3be6b6fda3834f6df63861b0dcf546da46a1
Co-Authored-By: Radosław Piliszek <radoslaw.piliszek@gmail.com>
2021-03-22 16:43:25 +00:00
Radosław Piliszek 07bd41f0b4 Drop CAP_NET_ADMIN
It is not required for performing monitors' duties.

Change-Id: Ib1297ce6e4fca0bfcb82d32b3669475d2011fbe1
2021-03-21 19:10:19 +00:00
suzhengwei 2932977f02 remove unused configration
Change-Id: I0cd8dc2fa96001622b3d89534c43b84e6103e369
2021-01-07 17:54:56 +08:00
suzhengwei 9458a9f449 remove unused script
Change-Id: I53ed72291abe997bf78fe93938cda7721bbb9f8c
2021-01-05 10:16:07 +08:00
Zuul abba984abd Merge "Remove unnecessary continue statement" 2020-11-13 10:45:03 +00:00
Zuul a34ddd216e Merge "Use keystoneauth1 config option loading for masakari client" 2020-09-17 10:19:38 +00:00
Zuul 649c583e71 Merge "Remove six" 2020-08-29 08:46:12 +00:00
Zuul 97da0fe7ec Merge "Replace assertRaisesRegexp with assertRaisesRegex" 2020-08-29 08:42:35 +00:00
Zuul 3a3a04576d Merge "repeated parsing" 2020-08-26 18:02:58 +00:00
Mark Goddard e704045880 Use keystoneauth1 config option loading for masakari client
If a custom CA file is configured via [api] cafile, currently
communication with Keystone will fail, since the session is not created
using this CA file. The [api] insecure option is also ignored.

This change fixes the issue by using keystoneauth loading for the auth
and session, to ensure all standard configuration options are supported.

Change-Id: Idd58b72f7f5242e8135fec71b42adf5dd1852417
Closes-Bug: #1873736
2020-07-23 12:19:00 +01:00
Chuck Short 886543d9e7 Replace assertRaisesRegexp with assertRaisesRegex
This replaces the deprecated (in python 3.2) unittest.TestCase method
assertRaisesRegexp() with assertRaisesRegex().

Also add associated hacking check.

Change-Id: I62d5b4c0259c6e2e0fee361542d4b1234ab0ea57
Signed-off-by: Chuck Short <chucks@redhat.com>
2020-06-24 01:12:05 +00:00
Hervé Beraud 8c27817b50 Remove elementtree deprecated methods
All our supported runtimes [1] are compatible with the recommended
alternatives.

`Element.getchildren` [2] is deprecated since python 3.2 and will be removed
in python 3.9, these changes switch usages to `list(elem)`.

[1] https://governance.openstack.org/tc/reference/runtimes/victoria.html#python-runtimes-for-train
[2] https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.getchildren

Change-Id: Ib34b57e2a660b09bde7c158b4aa7d68e39e5cf36
2020-06-23 11:34:55 +02:00
Tushar Patil 5e05de3255 Remove unnecessary continue statement
Removed unncessary continue statement.

Change-Id: Idd953793488fc1407084988905251dbaa6336392
2020-06-23 01:31:02 +00:00
suzhengwei 425bb1d663 repeated parsing
It repeatedly uses 'node_state_tag.get('uname')' to parse the hostname.
But the hostname doesn't change in the loop.

Change-Id: Icac10015698378a1901c664f37f11b4529bf03e1
2020-06-04 11:27:06 +08:00
jacky06 33d25e3cc3 Remove six
We don't need this in a Python 3-only world.

Change-Id: I56dca98d06458174af0f3d4dc585a754e8692f89
2020-05-13 23:12:26 +08:00
suzhengwei f317a24a25 reset nova-compute process name
In masakari-engine, it enable/disable nova-compute service
by the process name. So process-monitor need to reset
the nova process name to "nova-compute" when trigger
process notification.

Change-Id: Ia6f22bd1d183093bb0345f323a80268fb62df388
Close-Bug: #1858757
2020-04-21 09:27:16 +08:00
Sean McGinnis 92c934a8e3
Use unittest.mock instead of third party mock
Now that we no longer support py27, we can use the standard library
unittest.mock module instead of the third party mock lib.

Change-Id: Ie5ee60235bafc1e7b3461dee29b83fb62125178e
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
2020-04-18 11:54:18 -05:00
Zuul e225e6d1bd Merge "Check config file for hostname" 2020-04-02 02:29:28 +00:00
Liam Young 85dda1b0eb Check config file for hostname
When sending an alert from the instancemonitor check the monitors
config file for the hostname before sending the alert.

Change-Id: If11aa1abb1142941d6dcd00c46063d9015644978
Closes-Bug: #1866638
2020-04-02 00:23:46 +00:00
Andreas Jaeger 64e2b887db Update hacking for Python3
The repo is Python 3 now, so update hacking to version 3.0 which
supports Python 3.

Fix problems found by updated hacking version.

Update local hacking checks to work with newer flake8.

Remove hacking and friends from lower-constraints, they're
not needed for installation.

Change-Id: Ic7903c61bde999685ca26b5a10d070c8d8d206a3
2020-04-01 21:07:40 +02:00
Liam Young 8cb4de9e65 Use hostname to avoid clash with section
Switch to using 'hostname' rather than 'host' to specify the
Hostname, FQDN or IP address of this host. This is to avoid a
clash with a section of the same name *1

*1 https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/conf/host.py#L87

Change-Id: I7d95b063c2eabbd8893857b5e1e7d342db0aebec
Closes-Bug: #1866660
2020-03-30 09:21:03 +01:00
Liam Young dc9b777724 Use crm_mon for pacemaker-remote deployments
As described in bug #1728527 cibadmin does not expose the state of
the pacemaker-remote nodes which means hostmonitor cannot track
them. This change switches to use crm_mon to check the status of
remote nodes if the new config option host.restrict_to_remotes
to set to True. This will trigger host monitor to use crm_mon
to monitor nodes and will only monitor nodes that are marked
as remotes (not members).

Change-Id: I3f2026805413504c875ea5f39eb036d44b26dd43
Depends-On: Iaa2251708616e9c69817bf5b346d795ea7a4d21b
Closes-Bug: #1728527
2019-08-27 17:00:22 +00:00
jayashri bidwe ae3ab24f9a Remove deprecated shell scripts
As per deprecation notes [1], removed all unused shell
scripts from hostmonitor and processmonitor.

[1]: https://docs.openstack.org/releasenotes/masakari-monitors/ocata.html

Change-Id: I93761ce4685c258058cb2d6b2ccb2323636f33ff
2019-06-19 10:27:22 +05:30