metal/mtce-common/src/common
Eric Macdonald 50dc29f6c0 Improve maintenance power/reset control command retry handling
This update improves on and drives consistency into the
maintenance power on/off and reset handling in terms of
retries and use of graceful and immediate commands.

This update maintains the 10 retries for both power-on
and power-off commands and increases the number of retries
for the reset command from 5 to 10 to line up with the
power operation commands.

This update also ensures that the first 5 retries are done
with the graceful action command while the last 5 are with
the immediate.

This update also removed a power on handling case that could
have lead to a stuck state. This case was virtually impossible
to hit based on the required sequence of intermittent command
failures but that scenario handling was fixed up anyway.

Issues have been seen with the power-off handling on some servers.
Suspect that those servers need more time to power-off. So, this
introduced a 30 seconds delay following a power-off command before
issuing the power status query to give the server some time to
power-off before retrying the power-off command.

Test Plan: Both IPMI and Redfish

PASS: Verify power on/off and reset handling support up to 10 retries
PASS: Verify graceful command is used for the first power on/off
      or reset try and the first 5 retries
PASS: Verify immediate command is used for the final 5 retries
PASS: Verify reset handling with/without retries (none/mid/max)
PASS: Verify power-on  handling with/without retries (none/mid/max)
PASS: Verify power-off handling  with/without retries (none/mid/max)
PASS: Verify power status command failure handling for power on/off
NOTE: FIT (fault insertion testing) was used to create retry scenarios

PASS: Verify power-off inter retry delay feature
PASS: Verify 30 second power-off to power query delay
PASS: Verify redfish power/reset commands used are logged by default
PASS: Verify power-off/on and reset logging

Regression:

PASS: verify power-on/off and reset handling without retries
PASS: Verify power-off handling when power is already off
PASS: Verify power-on handling when power is already on

Closes-Bug: 2031945
Signed-off-by: Eric Macdonald <eric.macdonald@windriver.com>
Change-Id: Ie39326bcb205702df48ff9dd090f461c7110dd36
2024-01-25 22:42:26 +00:00
..
Makefile Add redfish support detection to maintenance 2019-08-19 14:03:37 +00:00
alarmUtil.cpp Debian: Make Mtce offline handler more resilient to slow shutdowns 2022-10-24 15:57:43 +00:00
alarmUtil.h Failure case handling of LUKS service 2023-12-06 00:34:02 -05:00
bmcUtil.cpp Mtce: Add ActionInfo extension support for reset operations. 2022-10-13 17:40:05 +00:00
bmcUtil.h Mtce: Add ActionInfo extension support for reset operations. 2022-10-13 17:40:05 +00:00
fitCodes.h Add mtcAgent socket initialization failure retry handling. 2020-04-01 19:24:22 +00:00
hostClass.cpp Refactor BMC provisioning in Maintenance 2019-12-09 09:39:49 -05:00
hostClass.h Refactor BMC provisioning in Maintenance 2019-12-09 09:39:49 -05:00
hostUtil.cpp Add support for peer controller reset via mtcClient 2021-01-14 16:44:14 -05:00
hostUtil.h Add support for peer controller reset via mtcClient 2021-01-14 16:44:14 -05:00
httpUtil.cpp Mtce: Fix bmc password fetch error handling 2022-06-01 15:21:05 +00:00
httpUtil.h Remove all nova and libvirt files from mtce-common 2019-03-19 15:23:36 -05:00
ipmiUtil.cpp Add support for peer controller reset via mtcClient 2021-01-14 16:44:14 -05:00
ipmiUtil.h Add support for peer controller reset via mtcClient 2021-01-14 16:44:14 -05:00
jsonUtil.cpp Mtce: Add ActionInfo extension support for reset operations. 2022-10-13 17:40:05 +00:00
jsonUtil.h Remove all nova and libvirt files from mtce-common 2019-03-19 15:23:36 -05:00
keyClass.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
keyClass.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
logMacros.h Add bmc reset delay in the reset progression command handler 2023-11-02 20:58:00 +00:00
msgClass.cpp Debian: Redfishtool requests fail when IPV4 address has square brackets 2022-10-06 22:21:38 +00:00
msgClass.h Debian: Redfishtool requests fail when IPV4 address has square brackets 2022-10-06 22:21:38 +00:00
nlEvent.cpp Fix heartbeat messaging when interface is set to 'lo' 2020-06-26 14:16:41 +00:00
nlEvent.h Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
nodeBase.cpp Add bmc reset delay in the reset progression command handler 2023-11-02 20:58:00 +00:00
nodeBase.h Improve maintenance power/reset control command retry handling 2024-01-25 22:42:26 +00:00
nodeEvent.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
nodeEvent.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
nodeMacro.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
nodeTimers.cpp Refactor BMC provisioning in Maintenance 2019-12-09 09:39:49 -05:00
nodeTimers.h Improve maintenance power/reset control command retry handling 2024-01-25 22:42:26 +00:00
nodeUtil.cpp Avoid logging in fork_sysreq_reboot failsafe thread 2023-01-10 11:38:12 -05:00
nodeUtil.h Prevent pmond process recovery when system is not running 2020-06-15 11:09:47 -04:00
pingUtil.cpp Fix BMC access loss handling 2020-01-03 09:34:37 -05:00
pingUtil.h Fix BMC access loss handling 2020-01-03 09:34:37 -05:00
redfishUtil.cpp Mtce: Add ActionInfo extension support for reset operations. 2022-10-13 17:40:05 +00:00
redfishUtil.h Mtce: Add ActionInfo extension support for reset operations. 2022-10-13 17:40:05 +00:00
regexUtil.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
regexUtil.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
returnCodes.h Refactor infrastructure network in mtce code 2019-04-18 09:32:41 -04:00
secretUtil.cpp Mtce: Fix bmc password fetch error handling 2022-06-01 15:21:05 +00:00
secretUtil.h Mtce: Fix bmc password fetch error handling 2022-06-01 15:21:05 +00:00
threadUtil.cpp Improve mtcAgent interrupted thread cleanup 2021-03-15 10:51:16 -04:00
threadUtil.h Debian: Redfishtool requests fail when IPV4 address has square brackets 2022-10-06 22:21:38 +00:00
timeUtil.cpp Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
timeUtil.h Decouple Guest-server/agent from stx-metal 2018-09-18 17:15:08 -04:00
tokenUtil.cpp Remove references to ceilometer in maintenance 2019-04-30 14:28:12 -04:00
tokenUtil.h MTCE: reading BMC passwords from Barbican secret storage. 2019-02-14 09:04:46 -05:00