From 6e5783a722671473d5dea215492537844d32146a Mon Sep 17 00:00:00 2001 From: Julia Kreger Date: Mon, 17 Sep 2018 08:00:50 -0700 Subject: [PATCH] Stein priorites Change-Id: I337a60639c7884a827b96fe76dd803d1441f288b --- doc/source/index.rst | 1 + priorities/stein-priorities.rst | 277 ++++++++++++++++++++++++++++++++ 2 files changed, 278 insertions(+) create mode 100644 priorities/stein-priorities.rst diff --git a/doc/source/index.rst b/doc/source/index.rst index 5456d8d2..eb516003 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -15,6 +15,7 @@ those discussions: :glob: :maxdepth: 1 + priorities/stein-priorities priorities/rocky-priorities priorities/queens-priorities priorities/pike-priorities diff --git a/priorities/stein-priorities.rst b/priorities/stein-priorities.rst new file mode 100644 index 00000000..de0ea492 --- /dev/null +++ b/priorities/stein-priorities.rst @@ -0,0 +1,277 @@ +.. _stein-priorities: + +======================== +Stein Project Priorities +======================== + +This is a list of development priorities the Ironic team is prioritizing for +Stein development, in order of relative size and dependency addressing. +Note that this is not our complete backlog for the cycle, we still hope +to review and land non-priority items. + +The primary contact(s) listed is/are responsible for tracking the status of +that work and herding cats to help get that work done. They are not the only +contributor(s) to this work, and not necessary doing most of the coding! +They are expected to be available on IRC and the ML for questions, and report +status on the whiteboard_ for the weekly IRC sync-up. The number of primary +contacts is typically limited to 2-3 individuals to simplify communication. +We expect at least one of them to have core privileges to simplify getting +changes in. + +As the time remaining in the Stein cycle is approximately 30 weeks from the +Project Teams Gathering, the list of priorities has been split into two +major pieces based upon an estimate of relative size. The overall goal +is for the `Smaller Goals`_ items to be focused on with in the first few +months of the cycle, while the larger `Epic Goals`_ may receive some work +early on, but will be targeted for later in the cycle. + +.. _whiteboard: https://etherpad.openstack.org/p/IronicWhiteBoard + +Smaller Goals +~~~~~~~~~~~~~ + ++---------------------------------------+-------------------------------------+ +| Priority | Primary Contacts | ++=======================================+=====================================+ +| `Upgrade Checker`_ | TheJulia | ++---------------------------------------+-------------------------------------+ +| `Python3 First`_ | ? | ++---------------------------------------+-------------------------------------+ +| `iPXE/PXE interface split`_ | TheJulia | ++---------------------------------------+-------------------------------------+ +| `UEFI First`_ | ? | ++---------------------------------------+-------------------------------------+ +| `HTTPClient booting`_ | ?TheJulia? | ++---------------------------------------+-------------------------------------+ +| `Nova conductor_group awareness`_ | jroll, TheJulia | ++---------------------------------------+-------------------------------------+ +| `Enhanced Checksum Support`_ | jroll, kaifeng | ++---------------------------------------+-------------------------------------+ +| `DHCP-less/L3 virtual media boot`_ | shekar | ++---------------------------------------+-------------------------------------+ + + +Epic Goals +~~~~~~~~~~ + ++---------------------------------------+-------------------------------------+ +| Goal | Primary Contacts | ++=======================================+=====================================+ +| `Deploy Templates`_ | mgoddard, dtantsur, rloo | ++---------------------------------------+-------------------------------------+ +| `Graphical Console`_ | mkrai, etingof | ++---------------------------------------+-------------------------------------+ +| `Federation Capabilities`_ | TheJulia, dtantsur | ++---------------------------------------+-------------------------------------+ +| `Task execution improvements`_ | etingof, TheJulia, mgoddard | ++---------------------------------------+-------------------------------------+ +| `No IPA to conductor communication`_ | jroll, rloo | ++---------------------------------------+-------------------------------------+ +| `Getting steps`_ | TheJulia, dtantsur | ++---------------------------------------+-------------------------------------+ +| `Conductor role splitting`_ | jroll, dtantsur | ++---------------------------------------+-------------------------------------+ +| `Neutron Event Processing`_ | vdrok, mgoddard, hjensas | ++---------------------------------------+-------------------------------------+ + +Inter-Project Goals +------------------- + ++---------------------------------------+-------------------------------------+ +| `Deployment state callbacks to nova`_ | TheJulia, ?jroll? | ++---------------------------------------+-------------------------------------+ +| `Smartnic Support`_ | TheJulia, mkrai | ++---------------------------------------+-------------------------------------+ + + +Details +~~~~~~~ + +Upgrade Checker +--------------- + +This is an OpenStack Community goal for the Stein Cycle. For ironic this will +mean a new command called ``ironic-status upgrade check``. This command is +intended to return an error for things that would be fatal for an upgrade +such as new required configuration missing, or schema/data upgrades not +yet performed. + +Python3 First +------------- + +This is an OpenStack Community goal for the Stein Cycle. Most of this work has +already been completed in ironic. Largely we need to change our tests so we +are explicitly testing on Python3. We can't do this for every test at the +moment, but we should be able to change most and still ensure the bulk of +the code paths are covered by tests labeled with ``python2``. + +iPXE/PXE interface split +------------------------ + +This is an older effort that has been restarted in the interest of supporting +multiple architectures (such as AArch64, Power, and x86_64) in the same +deployment. + +As it turns out, Power's architecture expects the older PXELinux style +templates that are written by our PXE boot interface. Additionally, while +AArch64 can be booted using iPXE, no pre-built binaries are available. + +As such, we need to no longer make this global for the conductor, but +specific to the node, and splitting the interfaces apart begins to make +much more sense. The original specification can be found +`here `. + +UEFI First +---------- + +2020 is an important year for Baremetal Operators, as Legacy boot mode support +is anticipated to be removed from newer processors being shipped. + +To ensure our success, we need to improve our testing and prepare for the time +when UEFI is the only boot mode available for newer hardware. As a result, +this will become a multi-cycle focus to enable the default boot mode to be +changed to ``uefi`` in a future cycle. + +HTTPClient Booting +------------------ + +While the community is interested in supporting HTTPClient based booting, +we currently have a few steps to surpass first. Namely the iPXE/PXE interface +split and improved UEFI testing. + +The nature of this work is to enable an explict HTTP booting scenario where +the booting node does not leverage PXE. + +Nova conductor_group awareness +------------------------------ + +This work is exlusively in the ironic virt driver in the `openstack/nova` +repository. This would enable us to define a ``conductor_group`` to which +the nova-compute process leverages for the view of baremetal nodes it is +responsible for. + +Enhanced Checksum Support +------------------------- + +Ironic presently defaults to use of MD5 checksums for the ``image_checksum`` +which is far from ideal. During the Rocky cycle, Glance has enhanced their +support for checksum storage, which means we should enhance ours as well. + +DHCP-less/L3 virtual media boot +------------------------------- + +Some operators and vendors wish to enable ironic to manage deployments where +DHCP is not something that is leveraged or utilized in the deployment process. +In order to do this, we need to enable some additional capabilities in terms of +enabling information to be attached to a deployment ramdisk. The +specification can be found +`here `. + +Deploy Templates +---------------- + +In the future, we want to take specific action based upon traits submitted to +ironic from Nova describing the instance's expected state or behavior. + +This will allow us to take actions and influence the deployment steps, and +as such is a continuation of the Deploy Steps work from the Rocky cycle. + +Graphical Console +----------------- + +We need a way to expose graphical (e.g. VNC) consoles to users from drivers +that support it. We reached agreement on the specification in the Rocky cycle +and have started to work through the patches to enable this. Our goal being +to have a framework and preferably at least one vendor driver to support +Graphical console connectivity. The specification can be found +`here `_. + +Federation Capabilities +----------------------- + +Edge computing is bringing a variety of cases where support for federation +of ironic deployments can be useful and extremely powerful. + +In order to better support this emerging use case, we want to try and agree +on a viable path forward that meets several different use cases and +requirements. The objective for this effort is an agreed upon specification. + +Task execution improvements +--------------------------- + +We realize that our task execution and locking model is problematic, and while +it does scale in some ways, it does not scale in other ways. This work will +consist of worker execution improvments, an evaluation and possible +implementation of different worker thread execution models, and careful +improvement of locking. + +No IPA to conductor communication +--------------------------------- + +Larger operators need much more strict security in their deployments, +where they wish to prevent all outbound network connectivity to the +control plane. Presently the design model requires nodes are able to +reach ironic's API in order to perform heartbeat and lookup operations. + +The concept with this is to optionally enable the conductor to drive the +deployment by polling IPA using the already known IP address. That being +said, this is realistically going to require `Task execution improvements`_ +to be complete to help ensure that operators are able to have performant +deployments. The specification can be found +`here `. + +Getting steps +------------- + +One of the biggest frustrations that people have with our cleaning model +is the lack of visibility into what steps they can execute. This is further +compounded with ``deploy steps``. We have ideas on this and we need to begin +providing the mechanisms to raise that visibility. + +This may also involve state machine states to enable the agent to sit in a +holding pattern pending operator action. + +The goal is ultimately to provide a CLI user to be able to understand the +available steps that can be utilized. + +Neutron Event Processing +------------------------ + +Currently ironic has no way to determine when certain asynchronous events +actually finish in neutron, and with what result. Nova, on the contrary, uses +a special neutron driver, which filters out notifications and posts some of +them to a special nova API endpoint. We should do the same. + +Conductor role splitting +------------------------ + +The conductor presently does all of the work... But does it need to? + +This is a question we should be asking ourselves as we evolve, if we +can optionally break the conductor into many pieces, to enable edge +conductors, or edge local boot management. The goal here is to try and +obtain a matrix of distinct actions taken, which will hopefully further +guide us as time moves on. + +Smartnic Support +---------------- + +Smartnics complicates ironic as the NIC needs to be programmed with the +power in a state such that the configuration on the NIC can be changed. + +While the effort to support this may ultimately result in enhancements +to neutron in the form of Super-Agents to apply the configuration, we +still need to understand the impact to our workflows and ensure that +sufficient security is still present. The primary objective is to have +a joint specification written in advance of the Berlin summit to reach +consensus with the Neutron team as to the mechanics, information passing, +and setting storage. + +Deployment state callbacks to nova +---------------------------------- + +One of the issues in ironic's nova virt driver is that no concept of +callbacks exist. Due to this, the virt driver polls the ironic API +endpoint repeatedly, which increases overall system load. In an ideal +world, ironic would utilize a mechanism to indicate deployment state +similar to how neutron informs nova that networking has been configured.