Stein priorites

Change-Id: I337a60639c7884a827b96fe76dd803d1441f288b
This commit is contained in:
Julia Kreger 2018-09-17 08:00:50 -07:00
parent f1f559c313
commit 6e5783a722
2 changed files with 278 additions and 0 deletions

View File

@ -15,6 +15,7 @@ those discussions:
:glob:
:maxdepth: 1
priorities/stein-priorities
priorities/rocky-priorities
priorities/queens-priorities
priorities/pike-priorities

View File

@ -0,0 +1,277 @@
.. _stein-priorities:
========================
Stein Project Priorities
========================
This is a list of development priorities the Ironic team is prioritizing for
Stein development, in order of relative size and dependency addressing.
Note that this is not our complete backlog for the cycle, we still hope
to review and land non-priority items.
The primary contact(s) listed is/are responsible for tracking the status of
that work and herding cats to help get that work done. They are not the only
contributor(s) to this work, and not necessary doing most of the coding!
They are expected to be available on IRC and the ML for questions, and report
status on the whiteboard_ for the weekly IRC sync-up. The number of primary
contacts is typically limited to 2-3 individuals to simplify communication.
We expect at least one of them to have core privileges to simplify getting
changes in.
As the time remaining in the Stein cycle is approximately 30 weeks from the
Project Teams Gathering, the list of priorities has been split into two
major pieces based upon an estimate of relative size. The overall goal
is for the `Smaller Goals`_ items to be focused on with in the first few
months of the cycle, while the larger `Epic Goals`_ may receive some work
early on, but will be targeted for later in the cycle.
.. _whiteboard: https://etherpad.openstack.org/p/IronicWhiteBoard
Smaller Goals
~~~~~~~~~~~~~
+---------------------------------------+-------------------------------------+
| Priority | Primary Contacts |
+=======================================+=====================================+
| `Upgrade Checker`_ | TheJulia |
+---------------------------------------+-------------------------------------+
| `Python3 First`_ | ? |
+---------------------------------------+-------------------------------------+
| `iPXE/PXE interface split`_ | TheJulia |
+---------------------------------------+-------------------------------------+
| `UEFI First`_ | ? |
+---------------------------------------+-------------------------------------+
| `HTTPClient booting`_ | ?TheJulia? |
+---------------------------------------+-------------------------------------+
| `Nova conductor_group awareness`_ | jroll, TheJulia |
+---------------------------------------+-------------------------------------+
| `Enhanced Checksum Support`_ | jroll, kaifeng |
+---------------------------------------+-------------------------------------+
| `DHCP-less/L3 virtual media boot`_ | shekar |
+---------------------------------------+-------------------------------------+
Epic Goals
~~~~~~~~~~
+---------------------------------------+-------------------------------------+
| Goal | Primary Contacts |
+=======================================+=====================================+
| `Deploy Templates`_ | mgoddard, dtantsur, rloo |
+---------------------------------------+-------------------------------------+
| `Graphical Console`_ | mkrai, etingof |
+---------------------------------------+-------------------------------------+
| `Federation Capabilities`_ | TheJulia, dtantsur |
+---------------------------------------+-------------------------------------+
| `Task execution improvements`_ | etingof, TheJulia, mgoddard |
+---------------------------------------+-------------------------------------+
| `No IPA to conductor communication`_ | jroll, rloo |
+---------------------------------------+-------------------------------------+
| `Getting steps`_ | TheJulia, dtantsur |
+---------------------------------------+-------------------------------------+
| `Conductor role splitting`_ | jroll, dtantsur |
+---------------------------------------+-------------------------------------+
| `Neutron Event Processing`_ | vdrok, mgoddard, hjensas |
+---------------------------------------+-------------------------------------+
Inter-Project Goals
-------------------
+---------------------------------------+-------------------------------------+
| `Deployment state callbacks to nova`_ | TheJulia, ?jroll? |
+---------------------------------------+-------------------------------------+
| `Smartnic Support`_ | TheJulia, mkrai |
+---------------------------------------+-------------------------------------+
Details
~~~~~~~
Upgrade Checker
---------------
This is an OpenStack Community goal for the Stein Cycle. For ironic this will
mean a new command called ``ironic-status upgrade check``. This command is
intended to return an error for things that would be fatal for an upgrade
such as new required configuration missing, or schema/data upgrades not
yet performed.
Python3 First
-------------
This is an OpenStack Community goal for the Stein Cycle. Most of this work has
already been completed in ironic. Largely we need to change our tests so we
are explicitly testing on Python3. We can't do this for every test at the
moment, but we should be able to change most and still ensure the bulk of
the code paths are covered by tests labeled with ``python2``.
iPXE/PXE interface split
------------------------
This is an older effort that has been restarted in the interest of supporting
multiple architectures (such as AArch64, Power, and x86_64) in the same
deployment.
As it turns out, Power's architecture expects the older PXELinux style
templates that are written by our PXE boot interface. Additionally, while
AArch64 can be booted using iPXE, no pre-built binaries are available.
As such, we need to no longer make this global for the conductor, but
specific to the node, and splitting the interfaces apart begins to make
much more sense. The original specification can be found
`here <http://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/ipxe-boot-interface.html>`.
UEFI First
----------
2020 is an important year for Baremetal Operators, as Legacy boot mode support
is anticipated to be removed from newer processors being shipped.
To ensure our success, we need to improve our testing and prepare for the time
when UEFI is the only boot mode available for newer hardware. As a result,
this will become a multi-cycle focus to enable the default boot mode to be
changed to ``uefi`` in a future cycle.
HTTPClient Booting
------------------
While the community is interested in supporting HTTPClient based booting,
we currently have a few steps to surpass first. Namely the iPXE/PXE interface
split and improved UEFI testing.
The nature of this work is to enable an explict HTTP booting scenario where
the booting node does not leverage PXE.
Nova conductor_group awareness
------------------------------
This work is exlusively in the ironic virt driver in the `openstack/nova`
repository. This would enable us to define a ``conductor_group`` to which
the nova-compute process leverages for the view of baremetal nodes it is
responsible for.
Enhanced Checksum Support
-------------------------
Ironic presently defaults to use of MD5 checksums for the ``image_checksum``
which is far from ideal. During the Rocky cycle, Glance has enhanced their
support for checksum storage, which means we should enhance ours as well.
DHCP-less/L3 virtual media boot
-------------------------------
Some operators and vendors wish to enable ironic to manage deployments where
DHCP is not something that is leveraged or utilized in the deployment process.
In order to do this, we need to enable some additional capabilities in terms of
enabling information to be attached to a deployment ramdisk. The
specification can be found
`here <http://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/L3-based-deployment.html>`.
Deploy Templates
----------------
In the future, we want to take specific action based upon traits submitted to
ironic from Nova describing the instance's expected state or behavior.
This will allow us to take actions and influence the deployment steps, and
as such is a continuation of the Deploy Steps work from the Rocky cycle.
Graphical Console
-----------------
We need a way to expose graphical (e.g. VNC) consoles to users from drivers
that support it. We reached agreement on the specification in the Rocky cycle
and have started to work through the patches to enable this. Our goal being
to have a framework and preferably at least one vendor driver to support
Graphical console connectivity. The specification can be found
`here <http://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/vnc-graphical-console.html>`_.
Federation Capabilities
-----------------------
Edge computing is bringing a variety of cases where support for federation
of ironic deployments can be useful and extremely powerful.
In order to better support this emerging use case, we want to try and agree
on a viable path forward that meets several different use cases and
requirements. The objective for this effort is an agreed upon specification.
Task execution improvements
---------------------------
We realize that our task execution and locking model is problematic, and while
it does scale in some ways, it does not scale in other ways. This work will
consist of worker execution improvments, an evaluation and possible
implementation of different worker thread execution models, and careful
improvement of locking.
No IPA to conductor communication
---------------------------------
Larger operators need much more strict security in their deployments,
where they wish to prevent all outbound network connectivity to the
control plane. Presently the design model requires nodes are able to
reach ironic's API in order to perform heartbeat and lookup operations.
The concept with this is to optionally enable the conductor to drive the
deployment by polling IPA using the already known IP address. That being
said, this is realistically going to require `Task execution improvements`_
to be complete to help ensure that operators are able to have performant
deployments. The specification can be found
`here <https://review.openstack.org/#/c/212206/>`.
Getting steps
-------------
One of the biggest frustrations that people have with our cleaning model
is the lack of visibility into what steps they can execute. This is further
compounded with ``deploy steps``. We have ideas on this and we need to begin
providing the mechanisms to raise that visibility.
This may also involve state machine states to enable the agent to sit in a
holding pattern pending operator action.
The goal is ultimately to provide a CLI user to be able to understand the
available steps that can be utilized.
Neutron Event Processing
------------------------
Currently ironic has no way to determine when certain asynchronous events
actually finish in neutron, and with what result. Nova, on the contrary, uses
a special neutron driver, which filters out notifications and posts some of
them to a special nova API endpoint. We should do the same.
Conductor role splitting
------------------------
The conductor presently does all of the work... But does it need to?
This is a question we should be asking ourselves as we evolve, if we
can optionally break the conductor into many pieces, to enable edge
conductors, or edge local boot management. The goal here is to try and
obtain a matrix of distinct actions taken, which will hopefully further
guide us as time moves on.
Smartnic Support
----------------
Smartnics complicates ironic as the NIC needs to be programmed with the
power in a state such that the configuration on the NIC can be changed.
While the effort to support this may ultimately result in enhancements
to neutron in the form of Super-Agents to apply the configuration, we
still need to understand the impact to our workflows and ensure that
sufficient security is still present. The primary objective is to have
a joint specification written in advance of the Berlin summit to reach
consensus with the Neutron team as to the mechanics, information passing,
and setting storage.
Deployment state callbacks to nova
----------------------------------
One of the issues in ironic's nova virt driver is that no concept of
callbacks exist. Due to this, the virt driver polls the ironic API
endpoint repeatedly, which increases overall system load. In an ideal
world, ironic would utilize a mechanism to indicate deployment state
similar to how neutron informs nova that networking has been configured.