Merge "Deployment steps framework"
This commit is contained in:
commit
a352700cd1
|
@ -0,0 +1,395 @@
|
|||
..
|
||||
This work is licensed under a Creative Commons Attribution 3.0 Unported
|
||||
License.
|
||||
|
||||
http://creativecommons.org/licenses/by/3.0/legalcode
|
||||
|
||||
==========================
|
||||
Deployment Steps Framework
|
||||
==========================
|
||||
|
||||
https://storyboard.openstack.org/#!/story/1753128
|
||||
|
||||
There is a desire for ironic to support customizable and extendable deployment
|
||||
steps, which would provide the ability to prepare bare metal nodes (servers)
|
||||
that better match the needs of the users who will be using the nodes.
|
||||
|
||||
In order to support that, we propose refactoring the existing deployment
|
||||
code in ironic into a deployment steps framework, similar to the cleaning
|
||||
steps framework.
|
||||
|
||||
Problem description
|
||||
===================
|
||||
|
||||
Presently, ironic provides a way to prepare nodes prior to them being made
|
||||
available for deployment (see `state diagram`_). This is done via `cleaning`_.
|
||||
However, it is not always possible, efficient, or effective to perform some of
|
||||
these preparations without knowing the requirements of the users of the
|
||||
nodes. In addition, there may be operations that should only be done once the
|
||||
users' requirements are known.
|
||||
|
||||
For example, during `cleaning`_, a node could be configured for RAID.
|
||||
However, this might not be the desired RAID configuration that the user of the
|
||||
node wants. Since the user's desires are only known at deployment time, a
|
||||
mechanism that allows for custom RAID configuration during deployment is
|
||||
preferred.
|
||||
|
||||
Features like custom RAID configuration, BIOS configuration, and custom
|
||||
kernel boot parameters are a few use cases that would benefit from a way
|
||||
of defining deployment steps at deploy time, in ironic.
|
||||
|
||||
It makes sense to provide support for this via deployment steps. This would
|
||||
be conceptually similar to the cleaning steps supported by ironic already.
|
||||
|
||||
Proposed change
|
||||
===============
|
||||
|
||||
This proposal is the first step in providing support for performing different
|
||||
deployment operations based on the user's desires. (The `RFE to reconfigure
|
||||
nodes on deploy using traits`_ is an example of a feature that depends on
|
||||
this work.)
|
||||
|
||||
The proposed change is to implement a deployment steps (or ``deploy steps``)
|
||||
framework that is very similar to the existing framework for automated and
|
||||
manual `cleaning`_. (This was discussed and agreed upon in principle, at the
|
||||
`OpenStack Dublin PTG`_.)
|
||||
|
||||
This change is internal to ironic. Users will not be able to affect the
|
||||
deployment process any more than they can do today.
|
||||
|
||||
Conceptually, the clean steps model is a simple idea and operators are familiar
|
||||
with it. Having similar deploy steps provides consistency and it will be easier
|
||||
for operators to adopt, due to their familiarity with clean steps. It is also
|
||||
powerful in that, at the end of the day (or year or two), a particular step
|
||||
could be a clean step, a deploy step, or both.
|
||||
|
||||
This includes re-factoring of code to be used by both clean and deploy steps.
|
||||
|
||||
The existing deployment process will be implemented as a list of one (or more)
|
||||
deploy steps.
|
||||
|
||||
What is a deploy step?
|
||||
----------------------
|
||||
Similar to clean steps, functions that are deploy steps will be decorated
|
||||
with ``@deploy_step``, defined in ironic/drivers/base.py as follows::
|
||||
|
||||
def deploy_step(priority, argsinfo=None):
|
||||
"""Decorator for deployment steps.
|
||||
|
||||
:param priority: an integer priority; used for determining the order in
|
||||
which the step is run in the the deployment process. (See below,
|
||||
"When are deploy steps executed" for more details.)
|
||||
:param argsinfo: a dictionary of keyword arguments where key is the name of
|
||||
the argument and value is a dictionary as follows:
|
||||
|
||||
‘description’: <description>. Required. This should include
|
||||
possible values.
|
||||
‘required’: Boolean. Optional; default is False. True if this
|
||||
argument is required.
|
||||
|
||||
An alternative is to have one decorator that allows specifying a function
|
||||
to be a clean step and/or a deploy step, e.g.::
|
||||
|
||||
@step(clean_priority=0, deploy_priority=0, argsinfo=None)
|
||||
|
||||
However, clean steps are abortable and deploy steps aren't (yet, see below),
|
||||
and it is unclear whether other arguments might be added for the deploy step
|
||||
decorator. Thus, it seems safer and simpler to have a separate decorator for
|
||||
deploy steps. (Having one decorator for both types of steps is left as a
|
||||
future exercise.)
|
||||
|
||||
Although ironic allows cleaning to be aborted, ironic doesn't allow the
|
||||
deployment to be aborted (although there is an `RFE to support abort in
|
||||
deploy_wait`_). So it is outside the scope of this specification.
|
||||
|
||||
A deploy step can be implemented by any Interface, not just DeployInterface.
|
||||
|
||||
When are deploy steps executed?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Each deploy step has a priority; a non-negative integer. In this first phase,
|
||||
the priorities will be hard-coded. There will be no way to turn off or change
|
||||
these priorities.
|
||||
|
||||
The steps are executed from highest priority to lowest priority. Steps with
|
||||
priorities of zero (0) are not executed. A step has to be finished, before the
|
||||
next one is started.
|
||||
|
||||
Alternatives
|
||||
------------
|
||||
|
||||
There may be other ways to provide support for customizable deployment
|
||||
steps per user/instance, but there doesn't seem to be good reasons for
|
||||
having a different design from that used for clean steps.
|
||||
|
||||
We could choose not to provide support for customized deploy steps on a per
|
||||
user/instance basis. In that case, some of the current workarounds to overcome
|
||||
this problem include:
|
||||
|
||||
* have groups of nodes configured in advance (using clean steps) for each
|
||||
required combination of configurations. This could lead to strange capacity
|
||||
planning issues.
|
||||
|
||||
* executing the desired configuration steps after each node is deployed.
|
||||
As these configuration steps are executed post-deploy, most of them need a
|
||||
reboot of the node, orchestration is needed to do these reboots properly,
|
||||
and this causes performance issues that are not acceptable in a production
|
||||
environment. This approach won't work for pre-deploy steps though, such as
|
||||
RAID for the boot disk.
|
||||
|
||||
* users can create their own images for each use case. But the limitation
|
||||
is that the number of images can grow exponentially, and that there is no
|
||||
ability to match a specific type of hardware with a specific image.
|
||||
|
||||
* use a customizable DeployInterface like the `ansible`_ deploy interface
|
||||
(although the `ansible`_ deploy interface is not recommended for production
|
||||
use). This may not be able to achieve the same level of access to the
|
||||
hardware or settings, to have the same effect.
|
||||
|
||||
Data model impact
|
||||
-----------------
|
||||
|
||||
Similar to clean steps, a Node object will be updated with:
|
||||
|
||||
* a new ``deploy_step`` field: this is the current deploy step that is being
|
||||
executed or None if no steps have been executed yet. This will require an
|
||||
update to the DB.
|
||||
* ``driver_internal_info['deploy_steps']``: the list of deploy steps to be
|
||||
executed.
|
||||
* ``driver_internal_info['deploy_step_index']``: the index into the list of
|
||||
deploy steps (or None if no steps have been executed yet); this corresponds
|
||||
to node.deploy_step.
|
||||
|
||||
State Machine Impact
|
||||
--------------------
|
||||
|
||||
No new state or transition will be added.
|
||||
|
||||
The state of the node will alternate from states.DEPLOYING (``deploying``) to
|
||||
states.DEPLOYWAIT (``wait call-back``) for each asynchronous deploy step.
|
||||
|
||||
REST API impact
|
||||
---------------
|
||||
|
||||
There will not be any new API methods.
|
||||
|
||||
GET /v1/nodes/*
|
||||
~~~~~~~~~~~~~~~
|
||||
The GET /v1/nodes/* requests that return information about nodes will
|
||||
be modified to also return the node's ``deploy_step`` field and the
|
||||
deploy-related information in the node's ``driver_internal_info`` field.
|
||||
|
||||
Similar to the ``clean_step`` field, the ``deploy_step`` field will be the
|
||||
current deploy step being executed, or None if there is no deployment in
|
||||
progress (or hasn't started yet).
|
||||
|
||||
If the deployment fails, the ``deploy_step`` field will show which step caused
|
||||
the deployment to fail.
|
||||
|
||||
This change requires a new API version. For nodes that have not yet been
|
||||
deployed using the deploy steps, the ``deploy_step`` field will be None, and
|
||||
there won't be any deploy-related entries in the ``driver_internal_info``
|
||||
field.
|
||||
|
||||
For older API versions, this ``deploy_step`` field will not be available,
|
||||
although any deploy-related entries in the ``driver_internal_info`` field will
|
||||
be shown.
|
||||
|
||||
Client (CLI) impact
|
||||
-------------------
|
||||
The only change (when the new API version is specified), is that the response
|
||||
for a Node will include the new ``deploy_step`` field and during deployment,
|
||||
the new deploy-step-related entries in the node's ``driver_internal_info``
|
||||
field.
|
||||
|
||||
"ironic" CLI
|
||||
~~~~~~~~~~~~
|
||||
Even though this has been deprecated, responses will include the change
|
||||
described above.
|
||||
|
||||
"openstack baremetal" CLI
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Responses will inclde the change described above.
|
||||
|
||||
RPC API impact
|
||||
--------------
|
||||
|
||||
None.
|
||||
|
||||
Driver API impact
|
||||
-----------------
|
||||
|
||||
Similar to cleaning, these methods will be added to the
|
||||
drivers.base.BaseInterface class::
|
||||
|
||||
def get_deploy_steps(self, task):
|
||||
"""Get a list of deploy steps this interface can perform on a node.
|
||||
|
||||
:param task: a TaskManager object, useful for interfaces overriding this method
|
||||
:returns: a list of deploy step dictionaries
|
||||
"""
|
||||
|
||||
def execute_deploy_step(self, task, step):
|
||||
"""Execute the deploy step on task.node.
|
||||
|
||||
:param task: a TaskManager object
|
||||
:param step: The dictionary representing the step to execute
|
||||
:raises DeployStepFailed: if the step fails
|
||||
:returns: None if this method has completed synchronously, or
|
||||
states.DEPLOYWAIT if the step will continue to execute
|
||||
asynchronously.
|
||||
"""
|
||||
|
||||
The actual deploy steps will be determined in the coding phase; we will start
|
||||
with one big deploy step (to get the framework in) and then break that step up
|
||||
into more steps -- determined by what makes sense given the existing code, and
|
||||
the constraints (e.g. support for out-of-tree drivers, backwards compatibility
|
||||
when a deploy step in release N is split into several steps in release N+1).
|
||||
|
||||
(This specification will be updated with the actual deploy steps, once that
|
||||
is determined.)
|
||||
|
||||
Out-of-tree Interfaces
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
Although the conductor will still support deployment the old way (without
|
||||
deploy steps), this support will be deprecated and removed based on the
|
||||
`standard deprecation policy
|
||||
<https://governance.openstack.org/tc/reference/tags/assert_follows-standard-deprecation.html>`_.
|
||||
(The deprecation period may be extended if there is a strong desire to do so
|
||||
by the vendors; we're flexible.)
|
||||
|
||||
For out-of-tree interfaces that don't have deploy steps, the conductor will
|
||||
emit (log) a deprecation warning, that the out-of-tree interface should be
|
||||
updated to use deploy steps, and that all nodes that are being deployed
|
||||
using the old way, need to be finished deploying, before an upgrade to the
|
||||
release where there is no longer any more support for the old way.
|
||||
|
||||
Nova driver impact
|
||||
------------------
|
||||
|
||||
None
|
||||
|
||||
Ramdisk impact
|
||||
--------------
|
||||
|
||||
There should be no impact to the ramdisk (IPA).
|
||||
|
||||
In the future, when we allow configuration and specification of deploy steps
|
||||
per node, we might provide support for collecting deploy steps from the
|
||||
ramdisk, but that is out of scope for this first phase.
|
||||
|
||||
Security impact
|
||||
---------------
|
||||
|
||||
None
|
||||
|
||||
Other end user impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Scalability impact
|
||||
------------------
|
||||
|
||||
None.
|
||||
|
||||
Performance Impact
|
||||
------------------
|
||||
|
||||
None.
|
||||
|
||||
Other deployer impact
|
||||
---------------------
|
||||
|
||||
None.
|
||||
|
||||
Developer impact
|
||||
----------------
|
||||
|
||||
DeployInterfaces (and any other interfaces involved in the deployment process)
|
||||
will need to be written with deploy steps in mind.
|
||||
|
||||
|
||||
Implementation
|
||||
==============
|
||||
|
||||
Assignee(s)
|
||||
-----------
|
||||
|
||||
Primary assignee:
|
||||
* rloo (Ruby Loo)
|
||||
|
||||
Work Items
|
||||
----------
|
||||
|
||||
Ironic:
|
||||
* Add support for deploy steps to base driver
|
||||
* rework the existing code into one or more deploy steps
|
||||
* Update the conductor to get the deploy steps and execute them
|
||||
|
||||
``python-ironicclient``:
|
||||
* Add support for node.deploy_step
|
||||
|
||||
Dependencies
|
||||
============
|
||||
None.
|
||||
|
||||
Testing
|
||||
=======
|
||||
|
||||
* unit tests for all new code and changed behaviour
|
||||
* CI jobs already test the deployment process; they should continue to work
|
||||
with these changes
|
||||
|
||||
Upgrades and Backwards Compatibility
|
||||
====================================
|
||||
|
||||
* Old Interfaces will work with the new BaseInterface class because
|
||||
the code will cleanly fall back when an Interface does not support
|
||||
``get_deploy_steps()``. A deprecation warning will be logged, and we will
|
||||
remove support for the old way according to the OpenStack policy for
|
||||
deprecations & removals.
|
||||
|
||||
* Likewise, an Interface implementation with ``get_deploy_steps()`` will work
|
||||
in an older version of Ironic.
|
||||
|
||||
* In a cold upgrade:
|
||||
|
||||
* if the agent heartbeats and driver_internal_info['deploy_steps'] is empty,
|
||||
proceed the old way.
|
||||
* if a deployment is started by a conductor using deploy steps (new code),
|
||||
it means all the conductors are using the new code, so the deployment
|
||||
can continue on any conductor that supports the node
|
||||
|
||||
* In a rolling upgrade:
|
||||
|
||||
* if the agent heartbeats and driver_internal_info['deploy_steps'] is empty,
|
||||
proceed the old way (similar to cold upgrade)
|
||||
* a new conductor will not use the deploy steps mechanism if it is pinned to
|
||||
the old release (via `pin_release_version` configuration option).
|
||||
if a deployment is started by a conductor using deploy steps (new code),
|
||||
it means that it is unpinned, and all the conductors are using the new
|
||||
code, so the deployment can continue on any conductor that supports the
|
||||
node.
|
||||
|
||||
Documentation Impact
|
||||
====================
|
||||
|
||||
* api-ref: https://developer.openstack.org/api-ref/baremetal/ will be updated
|
||||
to include the new node.deploy_step field
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
* `cleaning`_
|
||||
* `OpenStack Dublin PTG`_ etherpad
|
||||
* `RFE to reconfigure nodes on deploy using traits`_
|
||||
* `RFE to support abort in deploy_wait`_
|
||||
* `state diagram`_
|
||||
|
||||
.. _`cleaning`: https://docs.openstack.org/ironic/latest/admin/cleaning.html
|
||||
.. _`OpenStack Dublin PTG`: https://etherpad.openstack.org/p/ironic-rocky-ptg-deploy-steps
|
||||
.. _`RFE to reconfigure nodes on deploy using traits`: https://bugs.launchpad.net/ironic/+bug/1722275
|
||||
.. _`RFE to support abort in deploy_wait`: https://bugs.launchpad.net/ironic/+bug/1498251
|
||||
.. _`state diagram`: https://docs.openstack.org/ironic/latest/contributor/states.html
|
||||
.. _`ansible`: https://docs.openstack.org/ironic/latest/admin/drivers/ansible.html
|
|
@ -0,0 +1 @@
|
|||
../approved/deployment-steps-framework.rst
|
Loading…
Reference in New Issue