The driver composition reform

This spec suggests revamping how we name and compose drivers from interfaces.
It will allow having one vendor driver with options configurable per node
instead of many drivers for every vendor.

Partial-Bug: 1524745
Change-Id: I2754f9bb810f5e05a9c1e4625afb8f2589eeef74
This commit is contained in:
Dmitry Tantsur 2015-06-04 14:19:44 +02:00 committed by Dmitry Tantsur
parent bae354c20f
commit 98886ccd37
2 changed files with 665 additions and 0 deletions

View File

@ -0,0 +1,664 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=========================
Driver composition reform
=========================
https://bugs.launchpad.net/ironic/+bug/1524745
This spec suggests revamping how we name and compose drivers from interfaces.
It will allow having one vendor driver with options configurable per node
instead of many drivers for every vendor.
Problem description
===================
Our driver interface matrix has become increasingly complex. To top it all off
nowadays we have many interfaces that can be used for every driver. To name a
few:
* ``boot``: most drivers support PXE and iPXE, while some also support
virtual media; support for *petitboot* bootloader is proposed.
* ``deploy``: two deploy approaches are supported: write image via iSCSI or
write it directly from within the agent.
* ``inspect``: there is generic inspection using ironic-inspector, but some
drivers also allow out-of-band inspection. This feature is optional, so we
should provide a way to disable it.
Currently we've ended up with a complex and really confusing naming scheme.
For example:
* ``pxe_ipmitool`` uses PXE or iPXE for boot and iSCSI for deploy.
* ``agent_ipmitool`` actually also uses PXE or iPXE, but it does not use iSCSI.
* To top it all, ``pxe_ipmitool`` is actually using agent!
* To reflect all the possibilities, the names would have to be more like
``pxe_iscsi_ipmitool``, ``ipxe_iscsi_ipmitool``, ``pxe_direct_ipmitool``,
``ipxe_direct_ipmitool``, etc.
* Now repeat the same with every power driver we have.
Proposed change
===============
Introduction
------------
The following concepts are used in this spec:
**vendor**
The driving force behind a specific driver. It can be hardware vendors
or the ironic team itself in case of generic drivers, such as IPMI.
This also includes out-of-tree drivers.
**hardware interface** (or just **interface**)
A notion replacing the term "driver interface" - a set of functionality
dealing with some aspect of bare metal provisioning in a vendor-specific
way. For example, right now we have ``power``, ``deploy``, ``inspect``,
``boot`` and a few more interfaces.
**hardware type**
A family of hardware supporting the same set of interfaces from the ironic
standpoint. This can be as wide as all hardware supporting the IPMI
protocol or as narrow as several hardware models supporting some specific
interfaces.
**driver**
A thin object containing links to hardware interfaces. Before this spec
*driver* meant roughly the same as *hardware type* means in this spec.
**classic driver**
An ironic driver from before this spec: a class with links to interfaces
hardcoded in the Python code.
**dynamic driver**
A *driver* created at run time with links to interfaces generated based on
information in the node record (including hardware type and interfaces).
With this spec we are going to achieve the following goals:
* Make *vendors* in charge of defining a set of supported interface
implementations and the default implementation to be used.
* Allow *vendors* to guarantee that unsupported interface implementations
will not be used with hardware types they define. This is done by having
a hardware type list all interfaces it supports.
* Allow 3rd parties to create out-of-tree *hardware types* that allow them to
maximize their reuse of the in-tree interface implementations.
* Make *hardware type* definition as declarative as possible.
* Allow a user to switch *hardware type* for a node, just as it was possible
to change a driver before this spec.
* Allow a user to switch between *interface* implementations supported by
a *hardware type* for a node via the bare metal API.
Configuration
-------------
* A *hardware type* is defined as a Python class - see `Hardware Types`_ for
details. An entry point is created to provide a simple name for each
*hardware type*, for example::
ironic.hardware.types =
generic-ipmi = ironic.hardware.ipmi:GenericIpmiHardware
ilo-gen8 = ironic.hardware.ilo:iLOGen8Hardware
ilo-gen9 = ironic.hardware.ilo:iLOGen9Hardware
* The list of *hardware interfaces* is still hardcoded in the Python code
and cannot be extended by plugins. The interfaces are implemented
in the same way as before this spec: by subclassing an appropriate
abstract class from `ironic.drivers.base
<http://docs.openstack.org/developer/ironic/api/ironic.drivers.base.html#module-ironic.drivers.base>`_.
* For each *hardware interface*, all implementations get their own entrypoint
and a unique name, for example::
ironic.hardware.interfaces.power =
ipmitool = ironic.drivers.modules.ipmitool:IpmitoolPower
* Compatibility between *hardware types* and *hardware interface*
implementations is expressed in the Python code - see `Hardware Types`_
for details.
* Create a new configuration option ``enabled_hardware_types`` with a list of
enabled *hardware types*. This will not include *classic drivers* which
are enabled by the existing ``enabled_drivers`` option.
* Create a family of configuration options ``enabled_<INTERFACE>_interfaces``
with a list of enabled implementations of each *hardware interface*.
Note that the default implementation is implicitly enabled for each
interface of each enabled *hardware type* and does not have to be explicitly
listed in the configuration. There are several reasons for that,
the most important is to allow backward compatibility with the
``driver`` fields as described below.
* Change how we load drivers: instead of one singleton instance of a driver,
we'll have an instance of *dynamic driver* per node, containing links
to hardware interface implementations (just like today).
However, interface implementations themselves will stay singletons, and will
be preloaded during the start up and stored in the conductor.
Conductor will fail to start if any **enabled** *hardware types* or
*interface* implementations cannot be loaded (e.g. due to missing
dependencies).
.. note::
While it's technically possible to enable interfaces that are not used in
any of enabled *hardware types*, they will not get loaded in this case.
The *classic drivers* will be loaded exactly as before.
* Modify the periodic tasks collection code to also collect periodic tasks
for enabled interfaces of every enabled *hardware type*.
* Conductor will fail to start if there is a name clash between a *classic
driver* and a *hardware type*.
Database and Rest API
---------------------
* Allow the node ``driver`` field to accept the *hardware types* as well.
This will work in all API versions.
.. note::
There are two reasons for that:
* Consistency: we never prevented new drivers to be used with old API
versions, and *dynamic drivers* will look mostly like new drivers to
users.
* Usability: we plan on eventually deprecating the classic drivers.
When we remove them, all clients will need to specify the *hardware
types* when enrolling nodes. To allow older clients to continue
interacting with the API service, even as they use new driver
names (hardware types), we must continue to use the same field name and
API semantics.
* For each interface create a new field on the ``node`` table named
``<interface_name>_interface``. A migration will be needed each time
we add a new interface (which hopefully won't happen too often).
For *hardware types* setting ``<interface_name>_interface`` field to ``None``
means using the *vendor* default defined in the *hardware type*.
If the *vendor* default is ``None``, the interface will be disabled.
Trying to set any of these fields to a value other than ``None`` will result
in an error if the ``driver`` field is set to a *classic driver*. Similarly,
all these fields are reset to ``None`` if the ``driver`` field is set to
a *classic driver*.
* Every time ``driver`` and/or any of the interface fields is updated,
the conductor checks that the *hardware type* supports all the resulting
interfaces (except when ``driver`` is set to a *classic driver*).
To change between two incompatible sets of interfaces, all changes should
come in one API call. E.g. for a node with the ``ilo-gen8`` *hardware type*
and ``vmedia_ilo`` boot interface the following JSON patch will be allowed::
[
{"op": "replace", "path": "/boot_interface", "value": "ipxe"},
{"op": "replace", "path": "/driver", "value": "generic-ipmi"}
]
but the following patch will fail because of incompatible boot interface::
[
{"op": "replace", "path": "/driver", "value": "generic-ipmi"},
]
.. note::
`RFC 6902 <https://tools.ietf.org/html/rfc6902#section-5>`_ requires
a JSON patch to be atomic, because an HTTP PATCH operation must be atomic.
Meaning, it's possible for some operations to end up with an inconsistent
object as long as the end result is consistent.
The validation will be conducted on the API service side by checking the new
``conductor_hardware_interfaces`` database table.
* If for some reason the existing *inteface* becomes invalid for a node (e.g.
it was disabled after the node was enrolled), it will be signalized via the
usual node validation API. The validation for this interface won't pass with
an appropriate error message. On the programming level, the driver attribute
for this interface (e.g. ``task.driver.deploy``) will be set to ``None``.
* Update ``GET /v1/drivers`` to also list enabled *hardware types*.
This change is **not** affected by API versioning, because we allow old API
versions to use *hardware types* with the ``driver`` field.
* Allow ``GET /v1/drivers`` to filter only *hardware types* or only *classic
drivers*.
Update ``GET /v1/drivers/<HW TYPE>`` to report the *hardware type*
information, including the list of enabled *hardware interfaces*.
This feature is guarded by an API version bump (as usual).
* Allow filtering nodes by ``<interface_name>_interface`` fields in the node
list API.
This feature is guarded by an API version bump (as usual).
* Create a new table ``conductor_hardware_interfaces`` to hold the relationship
between conductors, hardware types and available interfaces. A warning will
be issued on conductor start up, if it detects that other conductors have
a different set of interfaces for the same enabled *hardware type*.
This situation is inevitable during live upgrades, so it must not result in
an error. However, we will document that all conductors should have the same
set of interfaces for the same enabled *hardware types*.
This table will not be exposed in the HTTP API for now.
Deprecations
------------
We are **not** planning to deprecate and remove the support for *classic
drivers* in the V1 API.
We are planning to deprecate and remove the *classic drivers* which exist
in-tree. The deprecation procedure may be tricky and will be covered by a
follow-up spec.
Alternatives
------------
* We could put interfaces under a new JSON key on a node. However, we're
trying to move away from informally defined JSON keys. It would also prevent
us from being able to implement the filtering of nodes based on a particular
interface.
* We could create a new API endpoint for updating the interfaces. This will be
inconsistent with how we update the ``driver`` field though.
We could then create a new API version, preventing updating ``driver`` via
the regular node update API, but that would be a breaking change.
* We could create a new field ``hardware_type`` instead of having the existing
``driver`` field accept a *hardware type*. This was a part of the
proposal previously, but we found that it complicates things substantially
without clear benefits.
* We could create a whole new family of API endpoints instead of reusing
``/v1/drivers``, e.g. ``/v1/hardware-types``. However, it would require us
to replicate all driver-related functionality nearly intact, for example
driver vendor passthru. So users would have to somehow figure out which
vendor passthru endpoint to use based on what kind of a driver is in the
``driver`` field.
Data model impact
-----------------
* For each interface, create a new node field ``<interface_name>_interface``
initially set to ``NULL``.
* Create a new internal table ``conductor_hardware_interfaces``:
``conductor_id`` - conductor ID (foreign key to conductors table),
``hardware_type VARCHAR(255)`` - *hardware type* entrypoint name,
``interface_type VARCHAR(16)`` - interface type name (e.g. ``deploy``),
``interface_name VARCHAR(255)`` - interface implementation entry point name.
This table will get populated on conductor start up and purged on deleting
the conductor record. On conductor startup, during init_host(), the conductor
will fetch the list of hardware interfaces supported by all registered
conductors and compare to its own configuration. If the same *hardware type*
is enabled on two conductors with a different set of enabled_interfaces, this
will result in a WARNING log message. The enabled *hardware types* themselves
do not have to match (just like today, different conductors can have
different set of drivers).
State Machine Impact
--------------------
None
REST API impact
---------------
* Update ``GET /v1/drivers``:
Return both *classic drivers* and *hardware types* no matter which API
version is used.
New URL parameters:
* ``type`` (string, one of ``classic``, ``dynamic``, optional) - if provided,
limit the resulting driver list to only *classic drivers* or *hardware
types* accordingly.
New response field:
``type`` whether the driver is *dynamic* or *classic*.
This change is guarded by a new API version.
* Update ``GET /v1/drivers/<NAME>``:
New response field:
``type`` whether the driver is *dynamic* or *classic*.
New response fields that are not ``None`` only for *hardware types*:
``default_<interface_name>_interface``
the entrypoint name of the default implementation for a given interface.
``enabled_<interface_name>_interfaces``
the list of entrypoint names of enabled implementations for a given
interface.
* Update ``GET /v1/drivers/<NAME>/properties`` and ``GET
/v1/drivers/<NAME>/vendor_passthru/methods`` and the actual driver vendor
passthru call implementation:
When requested for a *dynamic driver*, assume the default ``vendor``
interface implementation. We will need to support non-default implementations
as well, but it goes somewhat beyond the scope of this already big spec.
Client (CLI) impact
-------------------
"ironic" CLI
~~~~~~~~~~~~
* Update the node creation command to accept one argument per interface.
Example::
ironic node-create --driver=ilo-gen9 --power-interface=redfish
The same change is applied to the OSC plugin.
* Extend the output of the ``driver-list`` command with the ``Type`` column.
* Extend the ``driver-list`` command with ``--type`` argument, which, if
supplied, limits the driver list to only *classic drivers* (``classic``
value) or *hardware types* (``dynamic`` value).
* Extend the output of the ``driver-show`` command with the newly introduced
fields.
"openstack baremetal" CLI
~~~~~~~~~~~~~~~~~~~~~~~~~
Similar changes to what's in `"ironic" CLI`_ are applied here.
RPC API impact
--------------
* No impact on the hash ring, as both *hardware types* and *classic drivers*
are used in the same field.
Driver API impact
-----------------
Hardware Types
~~~~~~~~~~~~~~
* Create a new ``AbstractHardwareType`` class as an abstract base class for
all hardware types. Here is a simplified example implementation, using only
power, deploy and inspect interfaces::
import abc, six
@six.add_metaclass(abc.ABCMeta)
class AbstractHardwareType(object):
@abc.abstractproperty
def supported_power_interfaces(self):
pass
@abc.abstractproperty
def supported_deploy_interfaces(self):
pass
@property
def supported_inspect_interfaces(self):
return []
@abc.abstractproperty
def default_power_interface(self):
pass
@abc.abstractproperty
def default_deploy_interface(self):
pass
@property
def default_inspect_interface(self):
return None
Note that some interfaces (power, deploy) are mandatory, while the other
(inspect) are not. Optional interfaces will be disabled if not overridden in
a hardware type. An error indicating unsupported operation will be returned
to a user trying to access related features (e.g. inspection).
* Create a new ``GenericHardwareType`` class which most of the actual hardware
type classes will want to subclass. This class will insert generic
implementations for some interfaces and provide meaningful defaults::
class GenericHardwareType(AbstractHardwareType):
supported_deploy_interfaces = [AgentDeploy]
supported_inspect_interfaces = [InspectorInspect]
default_deploy_interface = AgentDeploy
Note that all properties contain classes, not instances.
* Here is an example of how hardware types could be created::
class GenericIpmiHardware(GenericHardwareType):
supported_power_interfaces = [IpmitoolPower, IpminativePower]
default_power_interface = IpmitoolPower
class iLOGen8Hardware(GenericHardwareType):
supported_power_interfaces = (
GenericIpmiHardware.supported_power_interfaces
+ [IloPower]
)
supported_inspect_interfaces = (
GenericHardwareType.supported_inspect_interfaces
+ [IloInspect]
)
default_inspect_interface = IloInspect
class iLOGen9Hardware(iLOGen8Hardware):
supported_power_interfaces = (
iLOGen8Hardware.supported_power_interfaces
+ [RedfishPower]
)
.. note::
These definitions use classes, not entrypoints names. These examples assume
the required classes are imported.
.. note::
The following entrypoints will have to be defined for these examples to
work::
ironic.hardware.types =
generic-ipmi = ironic.hardware.ipmi:GenericIpmiHardware
ilo-gen8 = ironic.hardware.ilo:iLOGen8Hardware
ilo-gen9 = ironic.hardware.ilo:iLOGen9Hardware
ironic.hardware.interfaces.power =
ipmitool = ironic.drivers.modules.ipmitool:IpmitoolPower
ipminative = ironic.drivers.modules.ipmitool:IpminativePower
ilo = ironic.drivers.modules.ilo:IloPower
redfish = ironic.drivers.modules.redfish:RedfishPower
ironic.hardware.interfaces.inspect =
inspector = ironic.drivers.modules.inspector:InspectorInspect
ilo = ironic.drivers.modules.ilo:IloInspect
The following configuration will be required to enable everything in these
examples::
[DEFAULT]
enabled_hardware_types = generic-ipmi,ilo-gen8,ilo-gen9
enabled_power_interfaces = ipmitool,ipminative,ilo,redfish
enabled_inspect_interfaces = inspector,ilo
Driver Creation
~~~~~~~~~~~~~~~
* At start up time the conductor instantiates all enabled hardware types,
as well as all enabled interface implementations for enabled hardware types.
* Each time the node is created or loaded from the database, a thin BareDriver_
object is created with all interfaces set on it. This is similar to how
network drivers already work. It gets assigned to ``task.driver``, and after
that everything works as before this spec.
Nova driver impact
------------------
None
Ramdisk impact
--------------
None
Security impact
---------------
None
Other end user impact
---------------------
* End users should switch to *hardware types* over time.
Scalability impact
------------------
None
Performance Impact
------------------
* A driver instance will be now created per node as opposed to creating one per
conductor right now. This will somewhat increase the memory usage per node.
We can probably define __slots__ on the driver class to reduce this effect.
Other deployer impact
---------------------
* A deployer can set the new ``enabled_hardware_types`` option to enable more
*hardware types*. Otherwise only the default *hardware types* and already
enabled classic drivers will be available.
* A deployer can also set any of new ``enabled_<INTERFACE>_interfaces`` options
to enable more *interfaces* for the enabled *hardware types*.
Developer impact
----------------
This spec changes the way we expect the developers to write their drivers.
* No more new *classic drivers* will be accepted in-tree as soon as this change
lands.
* Developers should implement *hardware types* and *interfaces* to provide
new hardware support for Ironic. Built-in *interfaces* implementations will
be available for reuse both in-tree and out-of-tree.
Implementation
==============
Assignee(s)
-----------
* Dmitry Tantsur (lp: divius, irc: dtantsur)
* Jim Rollenhagen (irc: jroll)
Work Items
----------
* Create base classes supporting *hardware types*.
* Create tables for tracking enabled *hardware interfaces*.
* Load *hardware types* on conductor start up and record them in the internal
table.
* Create node fields for *interfaces* and expose them in the API.
* Update the drivers API to support *hardware types*.
* Create the *hardware types* for hardware supported directly by the team,
i.e. the generic IPMI-compatible hardware. The SSH driver might be removed
soon; it won't get updated in this case.
Dependencies
============
* For the vendor interface to be really pluggable, we need to `promote agent
passthru to the core API
<http://specs.openstack.org/openstack/ironic-specs/specs/approved/agent-api.html>`_.
Testing
=======
* Unit test coverage will obviously be provided.
* A new gate job will be created, using a dynamic version of the IPMI driver.
We will aim to make it the primary approach in the gate over time.
* Grenade testing for upgrades / migration of existing workloads to new
drivers.
Upgrades and Backwards Compatibility
====================================
This reform is designed to be backward compatible. The *classic drivers* will
be supported for at least some time. A separate spec will cover the
deprecation of the *classic drivers*.
We will recommend switching to using appropriate *dynamic drivers* as soon as
it's possible.
Upgrade flow
------------
#. Ironic is updated to a version supporting *dynamic drivers*.
The API version used by clients is not updated yet.
#. All nodes are still using *classic drivers*. On a node ``driver=x_y``.
#. Users with an old API version:
* can set ``driver`` to a *classic driver*.
* can set ``driver`` to a *hardware type*, which will result in using a
*dynamic driver* with the default set of interfaces.
#. Users with a new API version:
* can set ``driver`` to a *hardware type* or a *classic driver*
* can set non-default interface implementations when ``driver``
is set to a real *hardware type*
Documentation Impact
====================
* Document switching to *dynamic drivers*
* Document creating new *hardware types*
References
==========
Initial etherpad: https://etherpad.openstack.org/p/liberty-ironic-driver-composition
Newton etherpad: https://etherpad.openstack.org/p/ironic-newton-summit-driver-composition
.. _BareDriver: http://docs.openstack.org/developer/ironic/api/ironic.drivers.base.html#ironic.drivers.base.BareDriver

View File

@ -0,0 +1 @@
../approved/driver-composition-reform.rst