Add manual cleaning to documentation

This updates the documentation to include manual cleaning.

Change-Id: I8f91214911e8916c329c20a140e1d0957b1cc137
Partial-Bug: #1526290
This commit is contained in:
Ruby Loo 2016-01-06 16:45:13 +00:00 committed by Ruby Loo
parent a70b5365d3
commit f2d9886f99
6 changed files with 219 additions and 61 deletions

View File

@ -6,33 +6,181 @@ Node cleaning
Overview
========
Ironic provides two modes for node cleaning: ``automated`` and ``manual``.
``Automated cleaning`` is automatically performed before the first
workload has been assigned to a node and when hardware is recycled from
one workload to another.
``Manual cleaning`` must be invoked by the operator.
.. _automated_cleaning:
Automated cleaning
==================
When hardware is recycled from one workload to another, ironic performs
cleaning on the node to ensure it's ready for another workload. This ensures
the tenant will get a consistent bare metal node deployed every time.
automated cleaning on the node to ensure it's ready for another workload. This
ensures the tenant will get a consistent bare metal node deployed every time.
Ironic implements cleaning by collecting a list of steps to perform on a node
from each Power, Deploy, and Management driver assigned to the node. These
steps are then arranged by priority and executed on the node when it is moved
to cleaning state, if cleaning is enabled.
Ironic implements automated cleaning by collecting a list of cleaning steps
to perform on a node from the Power, Deploy, Management, and RAID interfaces
of the driver assigned to the node. These steps are then ordered by priority
and executed on the node when the node is moved
to ``cleaning`` state, if automated cleaning is enabled.
Typically, nodes move to cleaning state when moving from active -> available.
Nodes also traverse cleaning when going from manageable -> available. For a
full understanding of all state transitions into cleaning, please see
:ref:`states`.
With automated cleaning, nodes move to ``cleaning`` state when moving from
``active`` -> ``available`` state (when the hardware is recycled from one
workload to another). Nodes also traverse cleaning when going from
``manageable`` -> ``available`` state (before the first workload is
assigned to the nodes). For a full understanding of all state transitions
into cleaning, please see :ref:`states`.
Ironic added support for cleaning nodes in the Kilo release.
Ironic added support for automated cleaning in the Kilo release.
.. _enabling-cleaning:
Enabling cleaning
=================
To enable cleaning, ensure your ironic.conf is set as follows: ::
Enabling automated cleaning
---------------------------
To enable automated cleaning, ensure that your ironic.conf is set as follows.
(Prior to Mitaka, this option was named 'clean_nodes'.)::
[conductor]
automated_clean=true
This will enable the default set of steps, based on your hardware and ironic
drivers. If you're using an agent_* driver, this includes, by default, erasing
all of the previous tenant's data.
This will enable the default set of cleaning steps, based on your hardware and
ironic drivers. If you're using an agent_* driver, this includes, by default,
erasing all of the previous tenant's data.
You may also need to configure a `Cleaning Network`_.
Cleaning steps
--------------
Cleaning steps used for automated cleaning are ordered from higher to lower
priority, where a larger integer is a higher priority. In case of a conflict
between priorities across drivers, the following resolution order is used:
Power, Management, Deploy, and RAID interfaces.
You can skip a cleaning step by setting the priority for that cleaning step
to zero or 'None'.
You can reorder the cleaning steps by modifying the integer priorities of the
cleaning steps.
See `How do I change the priority of a cleaning step?`_ for more information.
Manual cleaning
===============
``Manual cleaning`` is typically used to handle long running, manual, or
destructive tasks that an operator wishes to perform either before the first
workload has been assigned to a node or between workloads. When initiating a
manual clean, the operator specifies the cleaning steps to be performed.
Manual cleaning can only be performed when a node is in the ``manageable``
state. Once the manual cleaning is finished, the node will be put in the
``manageable`` state again.
Ironic added support for manual cleaning in the 4.4 (Mitaka series)
release.
Setup
-----
In order for manual cleaning to work, you may need to configure a
`Cleaning Network`_.
Starting manual cleaning via API
--------------------------------
Manual cleaning can only be performed when a node is in the ``manageable``
state. The REST API request to initiate it is available in API version 1.15 and
higher::
PUT /v1/nodes/<node_ident>/states/provision
(Additional information is available `here <http://docs.openstack.org/developer/ironic/webapi/v1.html#nodes>`_.)
This API will allow operators to put a node directly into ``cleaning``
provision state from ``manageable`` state via 'target': 'clean'.
The PUT will also require the argument 'clean_steps' to be specified. This
is an ordered list of cleaning steps. A cleaning step is represented by a
dictionary (JSON), in the form::
{
'interface': <interface>,
'step': <name of cleaning step>,
'args': {<arg1>: <value1>, ..., <argn>: <valuen>}
}
The 'interface' and 'step' keys are required for all steps. If a cleaning step
method takes keyword arguments, the 'args' key may be specified. It
is a dictionary of keyword variable arguments, with each keyword-argument entry
being <name>: <value>.
If any step is missing a required keyword argument, manual cleaning will not be
performed and the node will be put in ``clean failed`` provision state with an
appropriate error message.
If, during the cleaning process, a cleaning step determines that it has
incorrect keyword arguments, all earlier steps will be performed and then the
node will be put in ``clean failed`` provision state with an appropriate error
message.
An example of the request body for this API::
{
"target":"clean",
"clean_steps": [{
"interface": "raid",
"step": "create_configuration",
"args": {"create_nonroot_volumes": "False"}
},
{
"interface": "deploy",
"step": "erase_devices"
}]
}
In the above example, the driver's RAID interface would configure hardware
RAID without non-root volumes, and then all devices would be erased
(in that order).
Starting manual cleaning via ``ironic`` CLI
-------------------------------------------
Manual cleaning is supported in the ``ironic node-set-provision-state``
command, starting with python-ironicclient 1.2.
The target/verb is 'clean' and the argument 'clean-steps' must be specified.
Its value is one of:
- a JSON string
- path to a JSON file whose contents are passed to the API
- '-', to read from stdin. This allows piping in the clean steps.
Using '-' to signify stdin is common in Unix utilities.
Keep in mind that manual cleaning is only supported in API version 1.15 and
higher.
An example of doing this with a JSON string::
ironic --ironic-api-version 1.15 node-set-provision-state \
clean --clean-steps '{"clean_steps": [...]}'
Or with a file::
ironic --ironic-api-version 1.15 node-set-provision-state \
clean --clean-steps my-clean-steps.txt
Or with stdin::
cat my-clean-steps.txt | ironic --ironic-api-version 1.15 \
node-set-provision-state clean --clean-steps -
Cleaning Network
================
If you are using the Neutron DHCP provider (the default) you will also need to
ensure you have configured a cleaning network. This network will be used to
@ -73,38 +221,48 @@ FAQ
How are cleaning steps ordered?
-------------------------------
Cleaning steps are ordered by integer priority, where a larger integer is a
higher priority. In case of a conflict between priorities across drivers,
the following resolution order is used: Power, Management, Deploy.
For automated cleaning, cleaning steps are ordered by integer priority, where
a larger integer is a higher priority. In case of a conflict between priorities
across drivers, the following resolution order is used: Power, Management,
Deploy, and RAID interfaces.
For manual cleaning, the cleaning steps should be specified in the desired
order.
How do I skip a cleaning step?
------------------------------
Cleaning steps with a priority of 0 or None are skipped.
For automated cleaning, cleaning steps with a priority of 0 or None are skipped.
How do I change the priority of a cleaning step?
------------------------------------------------
For manual cleaning, specify the cleaning steps in the desired order.
For automated cleaning, it depends on whether the cleaning steps are
out-of-band or in-band.
Most out-of-band cleaning steps have an explicit configuration option for
priority.
Changing the priority of an in-band (ironic-python-agent) cleaning step
currently requires use of a custom HardwareManager. The only exception is
erase_devices, which can have its priority set in ironic.conf. For instance,
to disable erase_devices, you'd use the following config::
requires use of a custom HardwareManager. The only exception is
``erase_devices``, which can have its priority set in ironic.conf. For instance,
to disable erase_devices, you'd set the following configuration option::
[deploy]
erase_devices_priority=0
To enable/disable the in-band disk erase using ``agent_ilo`` driver, use the
following config::
following configuration option::
[ilo]
clean_priority_erase_devices=0
Generic hardware manager first tries to perform ATA disk erase by using
The generic hardware manager first tries to perform ATA disk erase by using
``hdparm`` utility. If ATA disk erase is not supported, it performs software
based disk erase using ``shred`` utility. By default, the number of iterations
performed by ``shred`` for software based disk erase is 1. To configure
the number of iterations, use the following config::
the number of iterations, use the following configuration option::
[deploy]
erase_devices_iterations=1
@ -115,14 +273,14 @@ What cleaning step is running?
To check what cleaning step the node is performing or attempted to perform and
failed, either query the node endpoint for the node or run ``ironic node-show
$node_ident`` and look in the `internal_driver_info` field. The `clean_steps`
field will contain a list of all remaining steps with their priority, and the
field will contain a list of all remaining steps with their priorities, and the
first one listed is the step currently in progress or that the node failed
before going into cleanfail state.
before going into ``clean failed`` state.
Should I disable cleaning?
--------------------------
Cleaning is recommended for ironic deployments, however, there are some
tradeoffs to having it enabled. For instance, ironic cannot deploy a new
Should I disable automated cleaning?
------------------------------------
Automated cleaning is recommended for ironic deployments, however, there are
some tradeoffs to having it enabled. For instance, ironic cannot deploy a new
instance to a node that is currently cleaning, and cleaning can be a time
consuming process. To mitigate this, we suggest using disks with support for
cryptographic ATA Security Erase, as typically the erase_devices step in the
@ -138,17 +296,18 @@ cleaning.
Troubleshooting
===============
If cleaning fails on a node, the node will be put into cleanfail state and
placed in maintenance mode, to prevent ironic from taking actions on the
If cleaning fails on a node, the node will be put into ``clean failed`` state
and placed in maintenance mode, to prevent ironic from taking actions on the
node.
Nodes in cleanfail will not be powered off, as the node might be in a state
such that powering it off could damage the node or remove useful information
about the nature of the cleaning failure.
Nodes in ``clean failed`` will not be powered off, as the node might be in a
state such that powering it off could damage the node or remove useful
information about the nature of the cleaning failure.
A cleanfail node can be moved to manageable state, where they cannot be
scheduled by nova and you can safely attempt to fix the node. To move a node
from cleanfail to manageable: ``ironic node-set-provision-state manage``.
A ``clean failed`` node can be moved to ``manageable`` state, where it cannot
be scheduled by nova and you can safely attempt to fix the node. To move a node
from ``clean failed`` to ``manageable``:
``ironic node-set-provision-state manage``.
You can now take actions on the node, such as replacing a bad disk drive.
Strategies for determining why a cleaning step failed include checking the
@ -156,8 +315,8 @@ ironic conductor logs, viewing logs on the still-running ironic-python-agent
(if an in-band step failed), or performing general hardware troubleshooting on
the node.
When the node is repaired, you can move the node back to available state, to
allow it to be scheduled by nova.
When the node is repaired, you can move the node back to ``available`` state,
to allow it to be scheduled by nova.
::
@ -167,5 +326,5 @@ allow it to be scheduled by nova.
# Now, make the node available for scheduling by nova
ironic node-set-provision-state $node_ident provide
The node will begin cleaning from the start, and move to available state
when complete.
The node will begin automated cleaning from the start, and move to
``available`` state when complete.

View File

@ -658,9 +658,8 @@ Configure the Bare Metal service for cleaning
[neutron]
...
# UUID of the network to create Neutron ports on when booting
# to a ramdisk for cleaning/zapping using Neutron DHCP (string
# value)
# UUID of the network to create Neutron ports on, when booting
# to a ramdisk for cleaning using Neutron DHCP. (string value)
#cleaning_network_uuid=<None>
cleaning_network_uuid = NETWORK_UUID
@ -1731,7 +1730,7 @@ To move a node from ``enroll`` to ``manageable`` provision state::
+------------------------+--------------------------------------------------------------------+
When a node is moved from the ``manageable`` to ``available`` provision
state, the node will be cleaned if configured to do so (see
state, the node will go through automated cleaning if configured to do so (see
:ref:`CleaningNetworkSetup`).
To move a node from ``manageable`` to ``available`` provision state::

View File

@ -84,13 +84,13 @@ upgrade has completed.
Cleaning
--------
A new feature in Kilo is support for the cleaning of nodes between workloads to
ensure the node is ready for another workload. This can include erasing the
hard drives, updating firmware, and other steps. For more information, see
:ref:`cleaning`.
A new feature in Kilo is support for the automated cleaning of nodes between
workloads to ensure the node is ready for another workload. This can include
erasing the hard drives, updating firmware, and other steps. For more
information, see :ref:`automated_cleaning`.
If Ironic is configured with cleaning enabled (defaults to True) and to use
Neutron as the DHCP provider (also the default), you will need to set the
If Ironic is configured with automated cleaning enabled (defaults to True) and
to use Neutron as the DHCP provider (also the default), you will need to set the
`cleaning_network_uuid` option in the Ironic configuration file before starting
the Kilo Ironic service. See :ref:`CleaningNetworkSetup` for information on
how to set up the cleaning network for Ironic.

View File

@ -114,7 +114,7 @@ Additional requirements
* BIOS must try next boot device if PXE boot failed
* Cleaning should be disabled, see :ref:`cleaning`
* Automated cleaning should be disabled, see :ref:`automated_cleaning`
* Node should be powered off before start of deploy

View File

@ -61,10 +61,10 @@ API Versions History
Newly registered nodes begin in the ``enroll`` provision state by default,
instead of ``available``. To get them to the ``available`` state,
the ``manage`` action must first be ran, to verify basic hardware control.
On success the node moves to ``manageable`` provision state, then the
``provide`` action must be run, which will clean the node and
make it available.
the ``manage`` action must first be run to verify basic hardware control.
On success the node moves to ``manageable`` provision state. Then the
``provide`` action must be run. Automated cleaning of the node is done and
the node is made ``available``.
**1.10**

View File

@ -2,4 +2,4 @@
features:
- Adds support for manual cleaning. This is available with API
version 1.15. For more information, see
http://specs.openstack.org/openstack/ironic-specs/specs/approved/manual-cleaning.html
http://docs.openstack.org/developer/ironic/deploy/cleaning.html#manual-cleaning