Promote agent vendor passthru to core API

Change-Id: Ifa4a1936845198a61d7901b9580829fd1ccde65b
Related-Bug: #1570841
This commit is contained in:
Dmitry Tantsur 2016-04-15 15:18:37 +02:00
parent d5db9ec171
commit b528dc9189
2 changed files with 313 additions and 0 deletions

View File

@ -0,0 +1,312 @@
..
This work is licensed under a Creative Commons Attribution 3.0 Unported
License.
http://creativecommons.org/licenses/by/3.0/legalcode
=========================================
Promote agent vendor passthru to core API
=========================================
https://bugs.launchpad.net/ironic/+bug/1570841
This spec suggests making the current agent vendor passthru API (lookup and
heartbeat) first class API endpoints and deprecate the agent vendor passthru
interface.
Problem description
===================
The vendor passthru was designed for vendors to place their specific features
before they get wider adoption in Ironic and get promoted to the core API.
However, when IPA is used (which is the only ramdisk available in-tree), these
two API endpoints play the critical role in both deployment and cleaning
processes. Thus every IPA-based driver must mix agent vendor passthru into
its vendor passthru.
There was a bug in the drac driver when it was not done, and the driver
did not work with IPA.
This proposal also tries to reduce the amount of data sent to and from
unauthenticated endpoints. The current vendor passthru API accepts the whole
inventory and returns the whole node record, including IPMI credentials.
Proposed change
===============
* Create new API endpoints for lookup and heartbeat - see `REST API impact`_
for details.
* Extend the deploy interface with a heartbeat method - see
`Driver API impact`_ for details.
Alternatives
------------
* Continue doing what we do now.
* Make the lookup call driver-dependent (as the passthru used to be).
This looks like unnecessary complication (e.g. we have to pass a driver to
IPA from the conductor nowadays).
* We could use this change to move the unauthenticated endpoints away from the
main API completely. This could be done by introducing a new API service,
say ``ironic-agent-api``, serving only these two endpoints. Then we will
recommend operators to make this service only listen on the provisioning
network, but not on the network accessible to users.
Arguably the same result can be achieved by configuring a WSGI container
(Apache mod_wsgi or similar), so it might not be worth complication.
Data model impact
-----------------
None
State Machine Impact
--------------------
None
REST API impact
---------------
Two new endpoints are added. Both endpoints are **NOT** authenticated.
* ``GET /v1/lookup?addresses=MAC1,MAC2&node_uuid=UUID``
Look up node details for further use in the ramdisk.
No body; at least one of the following URL parameters must be present:
* ``addresses`` comma-separated list of hardware addresses (e.g. MAC) from
the node for lookup;
* ``node_uuid`` node UUID, if known (e.g. by inspection).
If ``node_uuid`` is present, ``addresses`` are ignored.
By default only return a node if it's in one of transient states:
``deploying``, ``deploy wait``, ``cleaning``, ``clean wait``,
``inspecting``, ``inspect wait``. Deployers who need lookup to always work
will be able to set a new option ``[api]restrict_lookup`` to ``False``.
.. note::
In theory, we don't need ``-ing`` states here either. But when we reboot
during cleaning, we don't currently reset the state to ``clean wait``.
The other ``-ing`` states are supported in case 3rd party drivers have
similar restrictions.
Response: HTTP 200 with JSON body containing keys:
* ``config`` dictionary for passing configuration options from conductor
to the ramdisk. For the IPA ramdisk only one is currently used:
* ``heartbeat_timeout`` timeout (in seconds) between heart beats from
the ramdisk, expected by Ironic.
* ``node`` partial node representation as a JSON object, with the following
fields sent:
* ``properties`` for root device hints,
* ``instance_info`` for disk sizing details,
* ``uuid`` node UUID,
* ``driver_internal_info`` for passing other runtime information.
More fields can be exposed with time with an appropriate API version bump.
Error codes:
* 400 - bad request,
* 404 - a node was not found.
* ``POST /v1/heartbeat/<UUID>``
Record a heartbeat message from the ramdisk.
Body is a JSON with fields:
* ``callback_url`` - the IPA URL to call back. Note that for potential
non-IPA-based drivers it might have a different meaning (e.g. if we agree
on the ansible driver, this can be an SSH "URL" for it).
Response: HTTP 202 with no body.
Error codes:
* 400 - bad request,
* 404 - a node was not found,
* 409 - node is locked (should be retried by the ramdisk).
A new API version will be introduced to cover both endpoints.
Client (CLI) impact
-------------------
Both endpoints will be exposed in the Python API for the ironic client as::
ironic.node.lookup(addresses, node_uuid=None)
ironic.node.heartbeat(node_uuid, callback_url)
However, as they are not intended for end users, they will not be exposed in
both CLI.
"ironic" CLI
~~~~~~~~~~~~
None
"openstack baremetal" CLI
~~~~~~~~~~~~~~~~~~~~~~~~~
None
RPC API impact
--------------
A new RPC call is created to connect the heart beat API endpoint and
the new deploy driver method: ``heartbeat`` (async).
Driver API impact
-----------------
A new method is added to the deploy driver interface:
::
def heartbeat(self, task, callback_url):
"""Record a heart beat for the node.
:param task: a task manager task
:param callback_url: a URL to use to call to the ramdisk
:return: None
"""
LOG.warning('Got hearbeat message from node %(node)s, but the driver '
'%(driver)s does not support heartbeating',
{'node': task.node.uuid, 'driver': task.node.driver})
The heartbeat method from ``BaseAgentVendor`` will be refactored to a separate
mix-in class for reusing in both ``AgentDeploy`` and ``BaseAgentVendor``.
The new method will not be abstract to allow drivers that use a different
approach (e.g. which do not have a ramdisk at all). The default implementation
will do nothing to account for deploy drivers which do not need heart beats.
The new method will receive a shared node lock. It is up to the implementation
to upgrade the lock to exclusive, if required.
Nova driver impact
------------------
None
Security impact
---------------
* This change will expose unauthenticated API to lookup a node by its MAC
addresses. It does not have any impact on most deployments, as both in-tree
deploy methods (iscsi and http) already expose such API.
* After the complete switch to the new API endpoints is finished, it will no
longer be possible to fetch the whole node knowing its MAC address without
authentication. Only limited fields will be available. Notably, the power
credentials are not sent in the new API endpoints.
* We should clearly note that any deploy implementation should treat the
incoming data in the new ``heartbeat`` call with care. Particularly, no
sensitive information should be ever sent to the endpoint designated by the
``callback_url`` parameter.
Other end user impact
---------------------
None
Scalability impact
------------------
None
Performance Impact
------------------
* Unlike the old lookup passthru, the new lookup endpoint will not use RPC,
lowering load on the message queue and the conductor.
Other deployer impact
---------------------
* An update of the IPA image will be recommended to make it use the new API.
* New option ``restrict_lookup`` in the ``api`` section (boolean, defaults to
``True``) - whether to restrict the new lookup API to only certain states in
which lookup is expected.
Developer impact
----------------
3rd party driver developers should stop using the ``BaseAgentVendor`` class
in their drivers and just use the ``AgentDeploy`` class.
3rd party drivers should document whether they require the ``restrict_lookup``
option to be ``False`` for correct functioning.
Implementation
==============
Assignee(s)
-----------
* Dmitry Tantsur (lp: divius, irc: dtanstur)
* Jim Rollenhagen (irc: jroll)
Work Items
----------
* Create new deploy interface methods
* Implement them in the AgentDeploy
* Create new RPC calls and API endpoints
* Switch IPA to use the new endpoints, and fall back to old ones on failure
Dependencies
============
None
Testing
=======
Testing will be conducted as part of the current gate tests.
Upgrades and Backwards Compatibility
====================================
Full backward compatibility will be guaranteed independent of upgrade order
between IPA and ironic itself.
The ``BaseAgentVendor`` class will be deprecated, but stay for some time,
following the usual deprecation policy. Old IPA images will be able to run by
using the old passthru API.
The new IPA image will try to hit the new endpoints first, and will fall back
to the old ones on getting HTTP 406 Not Acceptable (meaning, the API version
is not supported).
Documentation Impact
====================
* Document how to implement new deploy drivers with the new heartbeat method.
* Document the potential security issues with both endpoints.
References
==========

View File

@ -0,0 +1 @@
../approved/agent-api.rst