From 98886ccd37bdb4824e6d62ea065ed01f915aad16 Mon Sep 17 00:00:00 2001 From: Dmitry Tantsur Date: Thu, 4 Jun 2015 14:19:44 +0200 Subject: [PATCH] The driver composition reform This spec suggests revamping how we name and compose drivers from interfaces. It will allow having one vendor driver with options configurable per node instead of many drivers for every vendor. Partial-Bug: 1524745 Change-Id: I2754f9bb810f5e05a9c1e4625afb8f2589eeef74 --- specs/approved/driver-composition-reform.rst | 664 ++++++++++++++++++ .../driver-composition-reform.rst | 1 + 2 files changed, 665 insertions(+) create mode 100644 specs/approved/driver-composition-reform.rst create mode 120000 specs/not-implemented/driver-composition-reform.rst diff --git a/specs/approved/driver-composition-reform.rst b/specs/approved/driver-composition-reform.rst new file mode 100644 index 00000000..21528285 --- /dev/null +++ b/specs/approved/driver-composition-reform.rst @@ -0,0 +1,664 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. + + http://creativecommons.org/licenses/by/3.0/legalcode + +========================= +Driver composition reform +========================= + +https://bugs.launchpad.net/ironic/+bug/1524745 + +This spec suggests revamping how we name and compose drivers from interfaces. +It will allow having one vendor driver with options configurable per node +instead of many drivers for every vendor. + +Problem description +=================== + +Our driver interface matrix has become increasingly complex. To top it all off +nowadays we have many interfaces that can be used for every driver. To name a +few: + +* ``boot``: most drivers support PXE and iPXE, while some also support + virtual media; support for *petitboot* bootloader is proposed. + +* ``deploy``: two deploy approaches are supported: write image via iSCSI or + write it directly from within the agent. + +* ``inspect``: there is generic inspection using ironic-inspector, but some + drivers also allow out-of-band inspection. This feature is optional, so we + should provide a way to disable it. + +Currently we've ended up with a complex and really confusing naming scheme. +For example: + +* ``pxe_ipmitool`` uses PXE or iPXE for boot and iSCSI for deploy. + +* ``agent_ipmitool`` actually also uses PXE or iPXE, but it does not use iSCSI. + +* To top it all, ``pxe_ipmitool`` is actually using agent! + +* To reflect all the possibilities, the names would have to be more like + ``pxe_iscsi_ipmitool``, ``ipxe_iscsi_ipmitool``, ``pxe_direct_ipmitool``, + ``ipxe_direct_ipmitool``, etc. + +* Now repeat the same with every power driver we have. + +Proposed change +=============== + +Introduction +------------ + +The following concepts are used in this spec: + +**vendor** + The driving force behind a specific driver. It can be hardware vendors + or the ironic team itself in case of generic drivers, such as IPMI. + This also includes out-of-tree drivers. +**hardware interface** (or just **interface**) + A notion replacing the term "driver interface" - a set of functionality + dealing with some aspect of bare metal provisioning in a vendor-specific + way. For example, right now we have ``power``, ``deploy``, ``inspect``, + ``boot`` and a few more interfaces. +**hardware type** + A family of hardware supporting the same set of interfaces from the ironic + standpoint. This can be as wide as all hardware supporting the IPMI + protocol or as narrow as several hardware models supporting some specific + interfaces. +**driver** + A thin object containing links to hardware interfaces. Before this spec + *driver* meant roughly the same as *hardware type* means in this spec. +**classic driver** + An ironic driver from before this spec: a class with links to interfaces + hardcoded in the Python code. +**dynamic driver** + A *driver* created at run time with links to interfaces generated based on + information in the node record (including hardware type and interfaces). + +With this spec we are going to achieve the following goals: + +* Make *vendors* in charge of defining a set of supported interface + implementations and the default implementation to be used. + +* Allow *vendors* to guarantee that unsupported interface implementations + will not be used with hardware types they define. This is done by having + a hardware type list all interfaces it supports. + +* Allow 3rd parties to create out-of-tree *hardware types* that allow them to + maximize their reuse of the in-tree interface implementations. + +* Make *hardware type* definition as declarative as possible. + +* Allow a user to switch *hardware type* for a node, just as it was possible + to change a driver before this spec. + +* Allow a user to switch between *interface* implementations supported by + a *hardware type* for a node via the bare metal API. + +Configuration +------------- + +* A *hardware type* is defined as a Python class - see `Hardware Types`_ for + details. An entry point is created to provide a simple name for each + *hardware type*, for example:: + + ironic.hardware.types = + generic-ipmi = ironic.hardware.ipmi:GenericIpmiHardware + ilo-gen8 = ironic.hardware.ilo:iLOGen8Hardware + ilo-gen9 = ironic.hardware.ilo:iLOGen9Hardware + +* The list of *hardware interfaces* is still hardcoded in the Python code + and cannot be extended by plugins. The interfaces are implemented + in the same way as before this spec: by subclassing an appropriate + abstract class from `ironic.drivers.base + `_. + +* For each *hardware interface*, all implementations get their own entrypoint + and a unique name, for example:: + + ironic.hardware.interfaces.power = + ipmitool = ironic.drivers.modules.ipmitool:IpmitoolPower + +* Compatibility between *hardware types* and *hardware interface* + implementations is expressed in the Python code - see `Hardware Types`_ + for details. + +* Create a new configuration option ``enabled_hardware_types`` with a list of + enabled *hardware types*. This will not include *classic drivers* which + are enabled by the existing ``enabled_drivers`` option. + +* Create a family of configuration options ``enabled__interfaces`` + with a list of enabled implementations of each *hardware interface*. + + Note that the default implementation is implicitly enabled for each + interface of each enabled *hardware type* and does not have to be explicitly + listed in the configuration. There are several reasons for that, + the most important is to allow backward compatibility with the + ``driver`` fields as described below. + +* Change how we load drivers: instead of one singleton instance of a driver, + we'll have an instance of *dynamic driver* per node, containing links + to hardware interface implementations (just like today). + + However, interface implementations themselves will stay singletons, and will + be preloaded during the start up and stored in the conductor. + + Conductor will fail to start if any **enabled** *hardware types* or + *interface* implementations cannot be loaded (e.g. due to missing + dependencies). + + .. note:: + While it's technically possible to enable interfaces that are not used in + any of enabled *hardware types*, they will not get loaded in this case. + + The *classic drivers* will be loaded exactly as before. + +* Modify the periodic tasks collection code to also collect periodic tasks + for enabled interfaces of every enabled *hardware type*. + +* Conductor will fail to start if there is a name clash between a *classic + driver* and a *hardware type*. + +Database and Rest API +--------------------- + +* Allow the node ``driver`` field to accept the *hardware types* as well. + This will work in all API versions. + + .. note:: + There are two reasons for that: + + * Consistency: we never prevented new drivers to be used with old API + versions, and *dynamic drivers* will look mostly like new drivers to + users. + + * Usability: we plan on eventually deprecating the classic drivers. + When we remove them, all clients will need to specify the *hardware + types* when enrolling nodes. To allow older clients to continue + interacting with the API service, even as they use new driver + names (hardware types), we must continue to use the same field name and + API semantics. + +* For each interface create a new field on the ``node`` table named + ``_interface``. A migration will be needed each time + we add a new interface (which hopefully won't happen too often). + + For *hardware types* setting ``_interface`` field to ``None`` + means using the *vendor* default defined in the *hardware type*. + If the *vendor* default is ``None``, the interface will be disabled. + + Trying to set any of these fields to a value other than ``None`` will result + in an error if the ``driver`` field is set to a *classic driver*. Similarly, + all these fields are reset to ``None`` if the ``driver`` field is set to + a *classic driver*. + +* Every time ``driver`` and/or any of the interface fields is updated, + the conductor checks that the *hardware type* supports all the resulting + interfaces (except when ``driver`` is set to a *classic driver*). + + To change between two incompatible sets of interfaces, all changes should + come in one API call. E.g. for a node with the ``ilo-gen8`` *hardware type* + and ``vmedia_ilo`` boot interface the following JSON patch will be allowed:: + + [ + {"op": "replace", "path": "/boot_interface", "value": "ipxe"}, + {"op": "replace", "path": "/driver", "value": "generic-ipmi"} + ] + + but the following patch will fail because of incompatible boot interface:: + + [ + {"op": "replace", "path": "/driver", "value": "generic-ipmi"}, + ] + + .. note:: + `RFC 6902 `_ requires + a JSON patch to be atomic, because an HTTP PATCH operation must be atomic. + Meaning, it's possible for some operations to end up with an inconsistent + object as long as the end result is consistent. + + The validation will be conducted on the API service side by checking the new + ``conductor_hardware_interfaces`` database table. + +* If for some reason the existing *inteface* becomes invalid for a node (e.g. + it was disabled after the node was enrolled), it will be signalized via the + usual node validation API. The validation for this interface won't pass with + an appropriate error message. On the programming level, the driver attribute + for this interface (e.g. ``task.driver.deploy``) will be set to ``None``. + +* Update ``GET /v1/drivers`` to also list enabled *hardware types*. + This change is **not** affected by API versioning, because we allow old API + versions to use *hardware types* with the ``driver`` field. + +* Allow ``GET /v1/drivers`` to filter only *hardware types* or only *classic + drivers*. + + Update ``GET /v1/drivers/`` to report the *hardware type* + information, including the list of enabled *hardware interfaces*. + + This feature is guarded by an API version bump (as usual). + +* Allow filtering nodes by ``_interface`` fields in the node + list API. + + This feature is guarded by an API version bump (as usual). + +* Create a new table ``conductor_hardware_interfaces`` to hold the relationship + between conductors, hardware types and available interfaces. A warning will + be issued on conductor start up, if it detects that other conductors have + a different set of interfaces for the same enabled *hardware type*. + + This situation is inevitable during live upgrades, so it must not result in + an error. However, we will document that all conductors should have the same + set of interfaces for the same enabled *hardware types*. + + This table will not be exposed in the HTTP API for now. + +Deprecations +------------ + +We are **not** planning to deprecate and remove the support for *classic +drivers* in the V1 API. + +We are planning to deprecate and remove the *classic drivers* which exist +in-tree. The deprecation procedure may be tricky and will be covered by a +follow-up spec. + +Alternatives +------------ + +* We could put interfaces under a new JSON key on a node. However, we're + trying to move away from informally defined JSON keys. It would also prevent + us from being able to implement the filtering of nodes based on a particular + interface. + +* We could create a new API endpoint for updating the interfaces. This will be + inconsistent with how we update the ``driver`` field though. + + We could then create a new API version, preventing updating ``driver`` via + the regular node update API, but that would be a breaking change. + +* We could create a new field ``hardware_type`` instead of having the existing + ``driver`` field accept a *hardware type*. This was a part of the + proposal previously, but we found that it complicates things substantially + without clear benefits. + +* We could create a whole new family of API endpoints instead of reusing + ``/v1/drivers``, e.g. ``/v1/hardware-types``. However, it would require us + to replicate all driver-related functionality nearly intact, for example + driver vendor passthru. So users would have to somehow figure out which + vendor passthru endpoint to use based on what kind of a driver is in the + ``driver`` field. + +Data model impact +----------------- + +* For each interface, create a new node field ``_interface`` + initially set to ``NULL``. + +* Create a new internal table ``conductor_hardware_interfaces``: + + ``conductor_id`` - conductor ID (foreign key to conductors table), + + ``hardware_type VARCHAR(255)`` - *hardware type* entrypoint name, + + ``interface_type VARCHAR(16)`` - interface type name (e.g. ``deploy``), + + ``interface_name VARCHAR(255)`` - interface implementation entry point name. + + This table will get populated on conductor start up and purged on deleting + the conductor record. On conductor startup, during init_host(), the conductor + will fetch the list of hardware interfaces supported by all registered + conductors and compare to its own configuration. If the same *hardware type* + is enabled on two conductors with a different set of enabled_interfaces, this + will result in a WARNING log message. The enabled *hardware types* themselves + do not have to match (just like today, different conductors can have + different set of drivers). + +State Machine Impact +-------------------- + +None + +REST API impact +--------------- + +* Update ``GET /v1/drivers``: + + Return both *classic drivers* and *hardware types* no matter which API + version is used. + + New URL parameters: + + * ``type`` (string, one of ``classic``, ``dynamic``, optional) - if provided, + limit the resulting driver list to only *classic drivers* or *hardware + types* accordingly. + + New response field: + + ``type`` whether the driver is *dynamic* or *classic*. + + This change is guarded by a new API version. + +* Update ``GET /v1/drivers/``: + + New response field: + + ``type`` whether the driver is *dynamic* or *classic*. + + New response fields that are not ``None`` only for *hardware types*: + + ``default__interface`` + the entrypoint name of the default implementation for a given interface. + + ``enabled__interfaces`` + the list of entrypoint names of enabled implementations for a given + interface. + +* Update ``GET /v1/drivers//properties`` and ``GET + /v1/drivers//vendor_passthru/methods`` and the actual driver vendor + passthru call implementation: + + When requested for a *dynamic driver*, assume the default ``vendor`` + interface implementation. We will need to support non-default implementations + as well, but it goes somewhat beyond the scope of this already big spec. + +Client (CLI) impact +------------------- + +"ironic" CLI +~~~~~~~~~~~~ + +* Update the node creation command to accept one argument per interface. + Example:: + + ironic node-create --driver=ilo-gen9 --power-interface=redfish + + The same change is applied to the OSC plugin. + +* Extend the output of the ``driver-list`` command with the ``Type`` column. + +* Extend the ``driver-list`` command with ``--type`` argument, which, if + supplied, limits the driver list to only *classic drivers* (``classic`` + value) or *hardware types* (``dynamic`` value). + +* Extend the output of the ``driver-show`` command with the newly introduced + fields. + +"openstack baremetal" CLI +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Similar changes to what's in `"ironic" CLI`_ are applied here. + +RPC API impact +-------------- + +* No impact on the hash ring, as both *hardware types* and *classic drivers* + are used in the same field. + +Driver API impact +----------------- + +Hardware Types +~~~~~~~~~~~~~~ + +* Create a new ``AbstractHardwareType`` class as an abstract base class for + all hardware types. Here is a simplified example implementation, using only + power, deploy and inspect interfaces:: + + import abc, six + + @six.add_metaclass(abc.ABCMeta) + class AbstractHardwareType(object): + @abc.abstractproperty + def supported_power_interfaces(self): + pass + + @abc.abstractproperty + def supported_deploy_interfaces(self): + pass + + @property + def supported_inspect_interfaces(self): + return [] + + @abc.abstractproperty + def default_power_interface(self): + pass + + @abc.abstractproperty + def default_deploy_interface(self): + pass + + @property + def default_inspect_interface(self): + return None + + Note that some interfaces (power, deploy) are mandatory, while the other + (inspect) are not. Optional interfaces will be disabled if not overridden in + a hardware type. An error indicating unsupported operation will be returned + to a user trying to access related features (e.g. inspection). + +* Create a new ``GenericHardwareType`` class which most of the actual hardware + type classes will want to subclass. This class will insert generic + implementations for some interfaces and provide meaningful defaults:: + + class GenericHardwareType(AbstractHardwareType): + supported_deploy_interfaces = [AgentDeploy] + supported_inspect_interfaces = [InspectorInspect] + default_deploy_interface = AgentDeploy + + Note that all properties contain classes, not instances. + +* Here is an example of how hardware types could be created:: + + class GenericIpmiHardware(GenericHardwareType): + supported_power_interfaces = [IpmitoolPower, IpminativePower] + default_power_interface = IpmitoolPower + + class iLOGen8Hardware(GenericHardwareType): + supported_power_interfaces = ( + GenericIpmiHardware.supported_power_interfaces + + [IloPower] + ) + supported_inspect_interfaces = ( + GenericHardwareType.supported_inspect_interfaces + + [IloInspect] + ) + default_inspect_interface = IloInspect + + class iLOGen9Hardware(iLOGen8Hardware): + supported_power_interfaces = ( + iLOGen8Hardware.supported_power_interfaces + + [RedfishPower] + ) + +.. note:: + These definitions use classes, not entrypoints names. These examples assume + the required classes are imported. + +.. note:: + The following entrypoints will have to be defined for these examples to + work:: + + ironic.hardware.types = + generic-ipmi = ironic.hardware.ipmi:GenericIpmiHardware + ilo-gen8 = ironic.hardware.ilo:iLOGen8Hardware + ilo-gen9 = ironic.hardware.ilo:iLOGen9Hardware + + ironic.hardware.interfaces.power = + ipmitool = ironic.drivers.modules.ipmitool:IpmitoolPower + ipminative = ironic.drivers.modules.ipmitool:IpminativePower + ilo = ironic.drivers.modules.ilo:IloPower + redfish = ironic.drivers.modules.redfish:RedfishPower + + ironic.hardware.interfaces.inspect = + inspector = ironic.drivers.modules.inspector:InspectorInspect + ilo = ironic.drivers.modules.ilo:IloInspect + + The following configuration will be required to enable everything in these + examples:: + + [DEFAULT] + enabled_hardware_types = generic-ipmi,ilo-gen8,ilo-gen9 + enabled_power_interfaces = ipmitool,ipminative,ilo,redfish + enabled_inspect_interfaces = inspector,ilo + +Driver Creation +~~~~~~~~~~~~~~~ + +* At start up time the conductor instantiates all enabled hardware types, + as well as all enabled interface implementations for enabled hardware types. + +* Each time the node is created or loaded from the database, a thin BareDriver_ + object is created with all interfaces set on it. This is similar to how + network drivers already work. It gets assigned to ``task.driver``, and after + that everything works as before this spec. + +Nova driver impact +------------------ + +None + +Ramdisk impact +-------------- + +None + +Security impact +--------------- + +None + +Other end user impact +--------------------- + +* End users should switch to *hardware types* over time. + +Scalability impact +------------------ + +None + +Performance Impact +------------------ + +* A driver instance will be now created per node as opposed to creating one per + conductor right now. This will somewhat increase the memory usage per node. + We can probably define __slots__ on the driver class to reduce this effect. + +Other deployer impact +--------------------- + +* A deployer can set the new ``enabled_hardware_types`` option to enable more + *hardware types*. Otherwise only the default *hardware types* and already + enabled classic drivers will be available. + +* A deployer can also set any of new ``enabled__interfaces`` options + to enable more *interfaces* for the enabled *hardware types*. + +Developer impact +---------------- + +This spec changes the way we expect the developers to write their drivers. + +* No more new *classic drivers* will be accepted in-tree as soon as this change + lands. + +* Developers should implement *hardware types* and *interfaces* to provide + new hardware support for Ironic. Built-in *interfaces* implementations will + be available for reuse both in-tree and out-of-tree. + +Implementation +============== + +Assignee(s) +----------- + +* Dmitry Tantsur (lp: divius, irc: dtantsur) +* Jim Rollenhagen (irc: jroll) + +Work Items +---------- + +* Create base classes supporting *hardware types*. + +* Create tables for tracking enabled *hardware interfaces*. + +* Load *hardware types* on conductor start up and record them in the internal + table. + +* Create node fields for *interfaces* and expose them in the API. + +* Update the drivers API to support *hardware types*. + +* Create the *hardware types* for hardware supported directly by the team, + i.e. the generic IPMI-compatible hardware. The SSH driver might be removed + soon; it won't get updated in this case. + +Dependencies +============ + +* For the vendor interface to be really pluggable, we need to `promote agent + passthru to the core API + `_. + +Testing +======= + +* Unit test coverage will obviously be provided. + +* A new gate job will be created, using a dynamic version of the IPMI driver. + We will aim to make it the primary approach in the gate over time. + +* Grenade testing for upgrades / migration of existing workloads to new + drivers. + +Upgrades and Backwards Compatibility +==================================== + +This reform is designed to be backward compatible. The *classic drivers* will +be supported for at least some time. A separate spec will cover the +deprecation of the *classic drivers*. + +We will recommend switching to using appropriate *dynamic drivers* as soon as +it's possible. + +Upgrade flow +------------ + +#. Ironic is updated to a version supporting *dynamic drivers*. + The API version used by clients is not updated yet. + +#. All nodes are still using *classic drivers*. On a node ``driver=x_y``. + +#. Users with an old API version: + + * can set ``driver`` to a *classic driver*. + * can set ``driver`` to a *hardware type*, which will result in using a + *dynamic driver* with the default set of interfaces. + +#. Users with a new API version: + + * can set ``driver`` to a *hardware type* or a *classic driver* + * can set non-default interface implementations when ``driver`` + is set to a real *hardware type* + +Documentation Impact +==================== + +* Document switching to *dynamic drivers* + +* Document creating new *hardware types* + +References +========== + +Initial etherpad: https://etherpad.openstack.org/p/liberty-ironic-driver-composition + +Newton etherpad: https://etherpad.openstack.org/p/ironic-newton-summit-driver-composition + +.. _BareDriver: http://docs.openstack.org/developer/ironic/api/ironic.drivers.base.html#ironic.drivers.base.BareDriver diff --git a/specs/not-implemented/driver-composition-reform.rst b/specs/not-implemented/driver-composition-reform.rst new file mode 120000 index 00000000..6f381592 --- /dev/null +++ b/specs/not-implemented/driver-composition-reform.rst @@ -0,0 +1 @@ +../approved/driver-composition-reform.rst \ No newline at end of file