Enhance Driver Interface for Soft Power Off and Inject NMI
This specification proposes the work required to enhance the driver interface to support soft power off and diagnostic interrupt (NMI [1]), and the work required to provide ipmitool reference implementation. [1] http://en.wikipedia.org/wiki/Non-maskable_interrupt Change-Id: I3dc6561ea7cecf8b8d998717fefa9cf8001d0f4c Partial-Bug: #1526226
This commit is contained in:
parent
dd42ff0341
commit
8bb750c0cf
|
@ -0,0 +1,536 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==================================================
|
||||
Enhance Power Interface for Soft Power Off and NMI
|
||||
==================================================
|
||||
|
||||
https://bugs.launchpad.net/ironic/+bug/1526226
|
||||
|
||||
The proposal presents the work required to enhance the power
|
||||
interface to support soft reboot and soft power off, and the
|
||||
management interface to support diagnostic interrupt (NMI [1]).
|
||||
|
||||
|
||||
Problem description
|
||||
===================
|
||||
There exists a problem in the current driver interface which doesn't
|
||||
provide with soft power off and diagnostic interrupt (NMI [1])
|
||||
capabilities even though ipmitool [2] and most of BMCs support these
|
||||
capabilities.
|
||||
|
||||
Here is a part of ipmitool man page in which describes soft power off and
|
||||
diagnostic interrupt (NMI [1]).
|
||||
|
||||
$ man ipmitool::
|
||||
|
||||
...
|
||||
power
|
||||
|
||||
Performs a chassis control command to view and change the
|
||||
power state.
|
||||
|
||||
...
|
||||
|
||||
diag
|
||||
|
||||
Pulse a diagnostic interrupt (NMI) directly to the
|
||||
processor(s).
|
||||
|
||||
soft
|
||||
|
||||
Initiate a soft-shutdown of OS via ACPI. This can be
|
||||
done in a number of ways, commonly by simulating an
|
||||
overtemperature or by simulating a power button press.
|
||||
It is necessary for there to be Operating System
|
||||
support for ACPI and some sort of daemon watching for
|
||||
events for this soft power to work.
|
||||
|
||||
From customer's point of view, both tenant admin and tenant user, the
|
||||
lack of the soft power off and diagnostic interrupt (NMI [1]) lead the
|
||||
following inconveniences.
|
||||
|
||||
1. Customer cannot safely shutdown or soft power off their instance
|
||||
without logging on.
|
||||
|
||||
2. Customer cannot take NMI dump to investigate OS related problem by
|
||||
themselves.
|
||||
|
||||
From deployer's point of view, that is cloud provider, the lack of the
|
||||
two capabilities leads the following inconveniences.
|
||||
|
||||
1. Cloud provider support staff cannot shutdown customer's instance
|
||||
safely without logging on for hardware maintenance reason or etc.
|
||||
|
||||
2. Cloud provider support staff cannot ask customer to take NMI dump
|
||||
as one of investigation materials.
|
||||
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
In order to solve the problems described in the previous section,
|
||||
this spec proposes to enhance the power states, the PowerInterface
|
||||
base class and the ManagementInterface base class so that each driver
|
||||
can implement to initiate soft reboot, soft power off and inject NMI.
|
||||
|
||||
And this enhancement enables the soft reboot, soft power off and
|
||||
inject NMI through Ironic CLI and REST API for tenant admin and cloud
|
||||
provider. Also this enhancement enables them through Nova CLI and REST
|
||||
API for tenant user when Nova's blueprint [3] is implemented.
|
||||
|
||||
As a reference implementation, this spec also proposes to implement
|
||||
the enhanced PowerInterface base class into the IPMIPower concrete
|
||||
class and the enhanced ManagementInterface base class into the
|
||||
IPMIManagement concrete class.
|
||||
|
||||
|
||||
1. add the following new power states to ironic.common.states::
|
||||
|
||||
SOFT_REBOOT = 'soft rebooting'
|
||||
SOFT_POWER_OFF = 'soft power off'
|
||||
|
||||
2. add "get_supported_power_states" method and its default implementation
|
||||
to the base PowerInterface class in ironic/drivers/base.py::
|
||||
|
||||
def get_supported_power_states(self, task):
|
||||
"""Get a list of the supported power states.
|
||||
|
||||
:param task: A TaskManager instance containing the node to act on.
|
||||
:returns: A list of the supported power states defined
|
||||
in :mod:`ironic.common.states`.
|
||||
"""
|
||||
return [states.POWER_ON, states.POWER_OFF, states.REBOOT]
|
||||
|
||||
* Note: WakeOnLanPower driver supports only states.POWER_ON.
|
||||
|
||||
3. add a default parameter ``timeout`` into the "set_power_state"
|
||||
method in to the base PowerInterface class in ironic/drivers/base.py::
|
||||
|
||||
@abc.abstractmethod
|
||||
def set_power_state(self, task, power_state, timeout=None):
|
||||
"""Set the power state of the task's node.
|
||||
|
||||
:param task: a TaskManager instance containing the node to act on.
|
||||
:param power_state: Any power state from :mod:`ironic.common.states`.
|
||||
:param timeout: timeout positive integer (> 0) for any power state.
|
||||
``None`` indicates to use default timeout which depends on
|
||||
``power_state``[*]_ and driver.
|
||||
:raises: MissingParameterValue if a required parameter is missing.
|
||||
"""
|
||||
|
||||
.. [*] The default timeout for ``SOFT_REBOOT`` and ``SOFT_POWER_OFF``
|
||||
can be configured in the Ironic configuration file,
|
||||
typically /etc/ironic/ironic.conf, as follows::
|
||||
|
||||
[conductor]
|
||||
# This section defines generic default timeout values.
|
||||
#
|
||||
# timeout (in seconds) of soft reboot and soft power off operation.
|
||||
# This value always has to be positive(> 0).
|
||||
# (integer value)
|
||||
soft_power_off_timeout = 600
|
||||
|
||||
4. enhance "set_power_state" method in IPMIPower class so that the
|
||||
new states can be accepted as "power_state" parameter.
|
||||
|
||||
IPMIPower reference implementation supports SOFT_REBOOT and
|
||||
SOFT_POWER_OFF.
|
||||
|
||||
SOFT_REBOOT is implemented by first SOFT_POWER_OFF and then a plain POWER_ON
|
||||
such that Ironic implemented REBOOT. This implementation enables
|
||||
generic BMC detect the reboot completion as the power state change
|
||||
from ON -> OFF -> ON which power transition is called ``power cycle``.
|
||||
|
||||
The following table shows power state value of each state variables.
|
||||
``new_state`` is a value of the second parameter of set_power_state()
|
||||
function.
|
||||
``power_state`` is a value of node property.
|
||||
``target_power_state`` is a value of node property.
|
||||
|
||||
+-----------------+--------------+--------------------+--------------+
|
||||
|new_state | power_state | target_power_state | power_state |
|
||||
| | (start state)| (assigned value) | (end state) |
|
||||
+-----------------+--------------+--------------------+--------------+
|
||||
|SOFT_REBOOT | POWER_ON | SOFT_POWER_OFF | POWER_OFF[*]_|
|
||||
| | POWER_OFF[*]_| POWER_ON | POWER_ON |
|
||||
|SOFT_REBOOT | POWER_OFF | POWER_ON | POWER_ON |
|
||||
|SOFT_POWER_OFF | POWER_ON | SOFT_POWER_OFF | POWER_OFF |
|
||||
|SOFT_POWER_OFF | POWER_OFF | NONE | POWER_OFF |
|
||||
+-----------------+--------------+--------------------+--------------+
|
||||
|
||||
.. [*] intermediate state of ``power cycle``.
|
||||
SOFT_REBOOT is implemented as power cycle such as REBOOT.
|
||||
|
||||
In case that timeout or error occurred when the new_state is set
|
||||
to either SOFT_REBOOT or SOFT_POWER_OFF, the end state becomes
|
||||
ERROR for logging.
|
||||
|
||||
+-----------------+--------------+--------------------+--------------+
|
||||
|new_state | power_state | target_power_state | power_state |
|
||||
| | (start state)| (assigned value) | (end state) |
|
||||
+-----------------+--------------+--------------------+--------------+
|
||||
|SOFT_REBOOT | POWER_ON | SOFT_POWER_OFF | ERROR[*]_ |
|
||||
|SOFT_POWER_OFF | POWER_ON | SOFT_POWER_OFF | ERROR[*]_ |
|
||||
+-----------------+--------------+--------------------+--------------+
|
||||
|
||||
.. [*] ERROR state will be overwritten by periodic sync power
|
||||
status task.
|
||||
|
||||
|
||||
5. add "get_supported_power_states" method and implementation in
|
||||
IPMIPower::
|
||||
|
||||
def get_supported_power_states(self, task):
|
||||
"""Get a list of the supported power states.
|
||||
|
||||
:param task: A TaskManager instance containing the node to act on.
|
||||
currently not used.
|
||||
:returns: A list of the supported power states defined
|
||||
in :mod:`ironic.common.states`.
|
||||
"""
|
||||
|
||||
return [states.POWER_ON, states.POWER_OFF, states.REBOOT,
|
||||
states.SOFT_REBOOT, states.SOFT_POWER_OFF]
|
||||
|
||||
6. add "inject_nmi" abstract method to the base ManagementInterface
|
||||
class in ironic/drivers/base.py::
|
||||
|
||||
@abc.abstractmethod
|
||||
def inject_nmi(self, task):
|
||||
"""Inject NMI, Non Maskable Interrupt.
|
||||
|
||||
:param task: A TaskManager instance containing the node to act on.
|
||||
:returns: None
|
||||
"""
|
||||
|
||||
7. add "inject_nmi" concrete method implementation in IPMIManagement
|
||||
class.
|
||||
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
* Both the soft power off and diagnostic interrupt (NMI [1]) could be
|
||||
implemented by vendor passthru. However the proposed change is
|
||||
better than the vendor passthru, because users of Ironic API or
|
||||
Ironic CLI can write script or program uniformly.
|
||||
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
None
|
||||
|
||||
|
||||
State Machine Impact
|
||||
--------------------
|
||||
None
|
||||
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
* Add support of SOFT_REBOOT and SOFT_POWER_OFF to the target
|
||||
parameter of following API::
|
||||
|
||||
PUT /v1/nodes/(node_ident)/states/power
|
||||
|
||||
The target parameter supports the following JSON data respectively.
|
||||
``timeout`` is an optional parameter for any ``target`` parameter.
|
||||
In case of "soft reboot" and "soft power off", ``timeout`` overrides
|
||||
``soft_power_off_timeout`` in the in the Ironic configuration file,
|
||||
typically /etc/ironic/ironic.conf.
|
||||
|
||||
Examples
|
||||
|
||||
{"target": "soft reboot",
|
||||
"timeout": 900}
|
||||
|
||||
{"target": "soft power off",
|
||||
"timeout": 600}
|
||||
|
||||
* Add a new "supported_power_states" member to the return type Node
|
||||
and NodeStates, and enhance the following APIs::
|
||||
|
||||
GET /v1/nodes/(node_ident)
|
||||
|
||||
GET /v1/nodes/(node_ident)/states
|
||||
|
||||
JSON example of the returned type NodeStates
|
||||
{
|
||||
"console_enabled": false,
|
||||
"last_error": null,
|
||||
"power_state": "power on",
|
||||
"provision_state": null,
|
||||
"provision_updated_at": null,
|
||||
"target_power_state": "soft power off",
|
||||
"target_provision_state": "active",
|
||||
"supported_power_states": [
|
||||
"power on",
|
||||
"power off",
|
||||
"rebooting",
|
||||
"soft rebooting",
|
||||
"soft power off"
|
||||
]
|
||||
}
|
||||
|
||||
Consequently Ironic CLI "ironic node-show" and "ironic node-show-states"
|
||||
return "supported_power_states" member in the table format.
|
||||
|
||||
example of "ironic node-show-states"
|
||||
|
||||
+------------------------+----------------------------------------+
|
||||
| Property | Value |
|
||||
+------------------------+----------------------------------------+
|
||||
| target_power_state | soft power off |
|
||||
| target_provision_state | None |
|
||||
| last_error | None |
|
||||
| console_enabled | False |
|
||||
| provision_updated_at | 2015-08-01T00:00:00+00:00 |
|
||||
| power_state | power on |
|
||||
| provision_state | active |
|
||||
| supported_power_states | ["power on", "power off", "rebooting", |
|
||||
| | "soft rebooting", "soft power off"] |
|
||||
+------------------------+----------------------------------------+
|
||||
|
||||
* Add a new management API to support inject NMI::
|
||||
|
||||
PUT /v1/nodes/(node_ident)/management/inject_nmi
|
||||
|
||||
Request doesn't take any parameter.
|
||||
|
||||
|
||||
Client (CLI) impact
|
||||
-------------------
|
||||
* Enhance Ironic CLI "ironic node-set-power-state" to support power
|
||||
graceful off/reboot by adding optional arguments.
|
||||
This CLI is async. In order to get the latest status,
|
||||
call "ironic node-show-states" and check the returned value.::
|
||||
|
||||
usage: ironic node-set-power-state <node> <power-state>
|
||||
[--soft] [--timeout <timeout>]
|
||||
|
||||
Power a node on/off/reboot, power graceful off/reboot to a node.
|
||||
|
||||
Positional arguments
|
||||
|
||||
<node>
|
||||
|
||||
Name or UUID of the node.
|
||||
|
||||
<power-state>
|
||||
|
||||
'on', 'off', 'reboot'
|
||||
|
||||
Optional arguments:
|
||||
--soft
|
||||
power graceful off/reboot.
|
||||
|
||||
--timeout <timeout>
|
||||
timeout positive integer value(> 0) for any ``power-state``.
|
||||
If ``--soft`` option is also specified, it overrides
|
||||
``soft_power_off_timeout`` in the in the Ironic configuration
|
||||
file, typically /etc/ironic/ironic.conf.
|
||||
|
||||
|
||||
* Add a new Ironic CLI "ironic node-inject-nmi" to support inject nmi.
|
||||
This CLI is async. In order to get the latest status, serial console
|
||||
access is required.::
|
||||
|
||||
usage: ironic node-inject-nmi <node>
|
||||
|
||||
Inject NMI, Non Maskable Interrupt.
|
||||
|
||||
Positional arguments
|
||||
|
||||
<node>
|
||||
|
||||
Name or UUID of the node.
|
||||
|
||||
* Enhance OSC plugin "openstack baremetal node" so that the parameter
|
||||
can accept 'reboot [--soft] [--timeout <timeout>]', 'power [on|off
|
||||
[--soft] [--timeout <timeout>]' and 'inject_nmi'.
|
||||
This CLI is async. In order to get the latest status,
|
||||
call "openstack baremetal node show" and check the returned value.::
|
||||
|
||||
usage: openstack baremetal node reboot [--soft] [--timeout <timeout>] <uuid>
|
||||
|
||||
usage: openstack baremetal node power off [--soft] [--timeout <timeout>] <uuid>
|
||||
|
||||
usage: openstack baremetal node inject_nmi <uuid>
|
||||
|
||||
RPC API impact
|
||||
--------------
|
||||
None
|
||||
|
||||
|
||||
Driver API impact
|
||||
-----------------
|
||||
PowerInterface base and ManagementInterface base are enhanced by
|
||||
adding a new method respectively as described in the section "Proposed
|
||||
change".
|
||||
And these enhancements keep API backward compatible.
|
||||
Therefor it doesn't have any risk to break out of tree drivers.
|
||||
|
||||
|
||||
Nova driver impact
|
||||
------------------
|
||||
The default behavior of "nova reboot" command to a virtual machine
|
||||
instance such as KVM is soft reboot.
|
||||
And "nova reboot" command has a option '--hard' to indicate hard reboot.
|
||||
|
||||
However the default behavior of "nova reboot" to an Ironic instance
|
||||
is hard reboot, and --hard option is meaningless to the Ironic instance.
|
||||
|
||||
Therefor Ironic Nova driver needs to be update to unify the behavior
|
||||
between virtual machine instance and bare-metal instance.
|
||||
|
||||
This problem is reported as a bug [6]. How to fix this problem is
|
||||
specified in nova blueprint [10] and spec [11].
|
||||
|
||||
The default behavior change of "nova reboot" command is made by
|
||||
following the standard deprecation policy [12]. How to deprecate nova
|
||||
command is also specified in nova blueprint [10] and spec [11].
|
||||
|
||||
|
||||
Ramdisk impact
|
||||
--------------
|
||||
None
|
||||
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
None
|
||||
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
None
|
||||
|
||||
|
||||
Scalability impact
|
||||
------------------
|
||||
None
|
||||
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
None
|
||||
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
* Deployer, cloud provider, needs to set up ACPI [7] and NMI [1]
|
||||
capable bare metal servers in cloud environment.
|
||||
|
||||
* change the default timeout value (sec) in the Ironic configuration
|
||||
file, typically /etc/ironic/ironic.conf if necessary.
|
||||
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
* Each driver developer needs to follow this interface to implement
|
||||
this proposed feature.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
Naohiro Tamura (naohirot)
|
||||
|
||||
Other contributors:
|
||||
None
|
||||
|
||||
|
||||
Work Items
|
||||
----------
|
||||
* Enhance PowerInterface class and ManagementInterface class to
|
||||
support soft power off and inject nmi [1] as described "Proposed
|
||||
change".
|
||||
|
||||
* Enhance Ironic API as described in "REST API impact".
|
||||
|
||||
* Enhance Ironic CLI as described in "Client (CLI) impact".
|
||||
|
||||
* Implement the enhanced PowerInterface class into the concrete class
|
||||
IPMIPower, and the enhanced ManagementInterface class into the
|
||||
concrete class IPMIManagement.
|
||||
Implementing vendor's concrete class is up to each vendor.
|
||||
|
||||
* Coordinate the work with Nova NMI support "Inject NMI to an
|
||||
instance" [3] if necessary.
|
||||
|
||||
* Update the deployer documentation from the ironic perspective.
|
||||
|
||||
|
||||
Dependencies
|
||||
============
|
||||
* Soft power off control depends on ACPI [7]. In case of Linux system,
|
||||
acpid [8] has to be installed. In case of Windows system, local
|
||||
security policy has to be set as described in "Shutdown: Allow
|
||||
system to be shut down without having to log on" [9].
|
||||
|
||||
* NMI [1] reaction depends on Kernel Crash Dump Configuration. How to
|
||||
set up the kernel dump can be found for Linux system in [13], [14], and
|
||||
for Windows in [15].
|
||||
|
||||
Testing
|
||||
=======
|
||||
* Unit Tests.
|
||||
|
||||
* Tempest Tests, at least soft reboot/soft power off.
|
||||
|
||||
* Each vendor plans Third Party CI Tests if implemented.
|
||||
|
||||
|
||||
Upgrades and Backwards Compatibility
|
||||
====================================
|
||||
None (Forwards Compatibility is out of scope)
|
||||
|
||||
* Note
|
||||
The backwards compatibility issue of the default behavior change of
|
||||
"nova reboot" command is solved by following the standard deprecation
|
||||
policy [12].
|
||||
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
* The deployer doc and REST API reference manual need to be updated.
|
||||
(CLI manual is generated automatically from source code)
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
[1] http://en.wikipedia.org/wiki/Non-maskable_interrupt
|
||||
|
||||
[2] http://linux.die.net/man/1/ipmitool
|
||||
|
||||
[3] https://review.openstack.org/#/c/187176/
|
||||
|
||||
[4] https://en.wikipedia.org/wiki/Communicating_sequential_processes
|
||||
|
||||
[5] http://linux.die.net/man/1/virsh
|
||||
|
||||
[6] https://bugs.launchpad.net/nova/+bug/1485416
|
||||
|
||||
[7] http://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface
|
||||
|
||||
[8] http://linux.die.net/man/8/acpid
|
||||
|
||||
[9] https://technet.microsoft.com/en-us/library/jj852274%28v=ws.10%29.aspx
|
||||
|
||||
[10] https://blueprints.launchpad.net/nova/+spec/soft-reboot-poweroff
|
||||
|
||||
[11] https://review.openstack.org/#/c/229282/
|
||||
|
||||
[12] http://governance.openstack.org/reference/tags/assert_follows-standard-deprecation.html
|
||||
|
||||
[13] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Kernel_Crash_Dump_Guide/
|
||||
|
||||
[14] https://help.ubuntu.com/lts/serverguide/kernel-crash-dump.html
|
||||
|
||||
[15] https://support.microsoft.com/en-us/kb/927069
|
|
@ -0,0 +1 @@
|
|||
../approved/enhance-power-interface-for-soft-reboot-and-nmi.rst
|
Loading…
Reference in New Issue