Support for Software RAID

This spec proposes to add support for software RAID to Ironic.

Change-Id: I64fa27eac172016da5588156be3c0f2e3c5c6c31
This commit is contained in:
Arne Wiebalck 2018-12-11 15:26:34 +01:00
parent e5c3ea7b9e
commit f4ec59ae86
2 changed files with 237 additions and 0 deletions

View File

@ -0,0 +1,236 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=========================
Support for Software RAID
=========================
https://storyboard.openstack.org/#!/story/2004581
This spec proposes to add support for the configuration of software RAIDs.
In analogy to the way hardware RAIDs are currently set up, the RAID setup
shall be done as part of the cleaning ("clean-time software RAID"). Admin
Users define the target RAID config which will be applied whenever the
node is cleaned, i.e. before it becomes available for instance creation.
In order to allow the End User to provide details on how the software RAID
shall be configured, the RAID setup should eventually become part of the
deployment steps. Integrating this into the deployment steps framework,
however, is beyond the scope of this spec.
Problem description
===================
As it is hardware agnostic, flexible, reliable, and easy to use, software RAID
has become a popular choice to protect against disk device failures - also in
production setups. Large deployments, such as the ones at Oath or CERN, rely
on software RAID for their various services.
Ironic's current lack of support for such setups requires Deployers and Admins
to withdraw to workarounds in order to provide their End Users with physical
instances based on a software RAID configuration. These workarounds may require
to maintain an additional installation infrastructure which is then either
integrated into the installation process or requires the End User to re-install
a machine a second time after it has been already provisioned by Ironic to
eventually end up with the desired configuration of the disk devices. This
increases the complexity for Deployers and Admins, and can also lead to a
decrease of the End Users' satisfaction with the overall provisioning and
installation process.
Proposed change
===============
The proposal is to extend Ironic to support software RAID by:
* using a node's ``target_raid_config`` to specify the desired s/w RAID layout
(with some restrictions, see below);
* adding support in the ``ironic-python-agent`` to understand a software
RAID config as specified in a node's ``target_raid_config`` and be able to
create and delete such configurations;
* allow the ``ironic-python-agent`` to consider s/w RAID devices for
deployment, e.g. via root device hints (considering them at all is
already addressed in [1]);
* adding support in Ironic and the ``ironic-python-agent`` to take the
necessary steps to boot from a s/w RAID, e.g. installing the boot loader
on the correct device(s).
Initially, only the following configurations will be supported for the
``target_raid_config`` as to be set by the Admin:
* a single RAID-1 spanning the available devices and serving as the deploy
target device, or
* a RAID-1 serving as the deploy target device plus a RAID-N where the RAID
level N is configurable by the Admin. N can be 0, 1, 5, 6, or 10.
The supported configurations have been limited to these two options in order
to avoid issues when booting from RAID devices. Having a (small) RAID-1 device
to boot from is a common approach when setting up more advanced RAID
configurations: a RAID-1 holder device can look like a standalone disk and does
not require the bootloader to have any knowledge or capabilities to understand
more complex RAID configurations.
Support for more than one RAID-N, support for the selection of a subset of
drives to act as holder devices, as well as support to partition the created
RAID-N device are left for follow-up enhancements and beyond the scope of
this specification.
A first prototype very close to the proposal is available from [2][3][4].
Alternatives
------------
As mentioned above, the alternative is to use other methods to create s/w RAID
setups on physical nodes and integrate these out-of-band approaches into the
provisioning workflow of individual deployments. This increases complexity on
the Deployer/Admin side and can have a negative impact on the user experience
when creating physical instances which need to have a software RAID setup..
Data model impact
-----------------
None.
State Machine Impact
--------------------
None.
REST API impact
---------------
None.
Client (CLI) impact
-------------------
None.
"ironic" CLI
~~~~~~~~~~~~
None.
"openstack baremetal" CLI
~~~~~~~~~~~~~~~~~~~~~~~~~
None.
RPC API impact
--------------
None.
Driver API impact
-----------------
The proposed functionality could be consolidated into a new RAID interface.
Nova driver impact
------------------
None.
Ramdisk impact
--------------
The ``ironic-python-agent`` will need to be able to:
* setup and clean software RAID devices
* consider software RAID devices for deployment
* configure the holder devices of the RAID-1 device in a way they are bootable
This functionality could be consolidated in an additional RAID interface.
Security impact
---------------
None.
Other end user impact
---------------------
While the predefined RAID-1 ensures that a system should be able to boot,
End Users need to be aware that the kernel of the started image needs to
be able to understand software RAID devices.
Scalability impact
------------------
None.
Performance Impact
------------------
None.
Other deployer impact
---------------------
Deployers will need to be aware that the configuration and clean up of
the RAID-N devices is only done during cleaning, so any changes require
the node to be cleaned. Also, the config is not configurable by the End
User, but limited to admins (as the target_raid_config) is a node
property. All of this, however, already holds true for hardware RAID
configurations.
Developer impact
----------------
None.
Implementation
==============
An inital proof-of-concept is available from [2][3][4].
Assignee(s)
-----------
Primary assignee:
None.
Other contributors:
Arne.Wiebalck@cern.ch (arne_wiebalck)
Work Items
----------
This is to be defined once the overall idea is accepted and there's agreement
on a design.
Dependencies
============
None.
Testing
=======
TBD
Upgrades and Backwards Compatibility
====================================
None.
Documentation Impact
====================
Documentation on how to configure a software RAID along with the limitations
outlined in 'Deployer's Impact' need to be documented.
References
==========
[1] https://review.openstack.org/#/c/592639
[2] CERN Hardware Manager: https://github.com/cernops/cern-ironic-hardware-manager/commit/7f6d892ec4848a09000ed1f28f3137bf8ba917f0
[3] Patched Ironic Python Agent: https://github.com/cernops/ironic-python-agent/commit/bddac76c4d100af0103a6bc08b81dd71681a9c02
[4] Patched Ironic: https://github.com/cernops/ironic/commit/581e65f1d8986ac3e859678cb9aadd5a5b06ba60

View File

@ -0,0 +1 @@
../approved/software-raid.rst