Special cases boot/uefi record setup to focus on UEFI
nvram updates instead of attempting nvram updates *and*
setting the boot device to disk.
Closes-Bug: 2053064
Change-Id: Ic6584479a47146577052d17fa3f697eef64ac73c
This change adds two network boot interfaces, ``http`` and
``http-ipxe``. These interfaces are based upon the underlying PXE
boot interface code in ironic, and where this differs is it signals
to Ironic that we must do the boot loader needful in terms of telling
DHCP to send a URL instead of a filename and IP address for PXE
as a starting point.
The naming of the interfaces focuses more on the transport mechanism
and then specific style. Very similar to existing ``pxe`` and ``ipxe``
interface modeling, except in the ``ipxe`` case, it is more a specific
loader and mechanism to be utilized.
Related-Bug: #2032380
Change-Id: Ie7ace88b62b9179f640ef2a732dd228e12bd320d
The kickstart unit tests were written in such a way that if
the tests are run on a system with kickstart validator present,
then the test behavior is different (and fails) than if it runs
without. Specifically, when it is present, an error is generated:
TypeError: write() argument must be str, not MagicMock
This is because we pass in a mock value for unit testing.
Removes the alternative path of if the validator is present
for unit testing, and locks the test into the false which
simplifies the validation path for the kickstart interface.
Change-Id: Idfb6b4f3b49901aa1a222c6fedc4367ef3bfd2a2
The PXE Annaconda dhcp cleanup test triggers the dhcp_factory clean
up code by default. Which is good! Problem is, if you don't have
dnsmasq installed, things blow up.
Specifically becuase it was called in such a way where it was
trying to clean up dhcp records for nodes. Example:
ironic.common.exception.InstanceDeployFailure: An error occurred
after deployment, while preparing to reboot the node
1be26c0b-03f2-4d2e-ae87-c02d7f33c123: [Errno 2] No such file
or directory:
'/etc/dnsmasq.d/hostsdir.d/ironic-52:54:00:cf:2d:31.conf'
Instead of executing that far, we just now check that we did, indeed
call for dhcp cleanup.
This was discovered while trying to fix unit test race conditions
and random failures in CI.
Change-Id: Id7b1e2e9ca97aeff786e9df06f35eca67dd36b58
Instance network boot (not to be confused with ramdisk, iSCSI or
anaconda deploy methods) is insecure, underused and difficult to
maintain. This change removes a lot of related code from Ironic.
The so called "netboot fallback" is still supported for legacy boot when
boot device management is not available or is unreliable.
Change-Id: Ia8510e4acac6dec0a1e4f5cb0e07008548a00c52
It requires network booting and legacy boot. While the latter will be
supported for a long time, the former is being removed.
Change-Id: Ie48e51fa95ba2059bd3cca6b8968f475934a75e5
A race condition can be observed in CI under heavy load where the
conductor triggers are boot of the agent before it is fully online
based upon state, but not considering the existence of an agent
token. As a result, agent is never able to check in with Ironic
and the overall operation fails.
We now consider agent token's existence before retrying PXE as
it is the very earliest indicator of a starting agent.
Change-Id: Ice764866a08647031d16570860ec384204269501
Story: 2010107
Task: 45674
The anaconda deploy interface has a few issues that are
addressed here:
- fixes logic in get_instance_image_info() for anaconda. If the
ironic node's instance_info doesn't have both 'stage2' and
'ks_template' specified, we weren't using any values from the
instance_info. This has been fixed to use values from
instance_info if specified. Otherwise, they are set as follows:
The 'stage2' value is taken from the image properties.
We use the value for 'ks_template' if it is specified in the
image properties. If not (since it is optional), we use the
config option's '[anaconda]default_ks_template' value.
setting.
- For anaconda's stage2 directory, we were incorrectly creating a
directory using the full path of the stage2 file. It now
correctly creates the right directory.
- The anaconda deploy interface expects the node's instance_info
to be populated with the 'image_url'; added code to do that in
PXEAnacondaDeploy's prepare() method.
- When the deploy is finished and the bm node is being rebooted,
we incorrectly set the node's provision state to 'active'
instead of doing it via the provisioning state machine mechanism.
- The code that was doing the validation of the kickstart file was
incorrect and resulted in errors; this has been addressed.
- The '%traceback' section in the packaged 'ks.cfg.template' file
is deprecated and fails validation, so it has been removed.
Change-Id: I953e948bcfa108d4c8e7b145da2f52b652e52a10
This change ensures all files written for pxe boot have
permissions determined by the [pxe]file_permission config option.
Change-Id: I1bc24e3871bae3ce070e7abe85fc4c48e844c317
The kickstart template expects a dictionary with 'ks_options'
as the key. Instead build_kickstart_config_options function
returns a dict with keys 'liveimg_url', 'agent_token' and
'heartbeat_url'.
This change fixes this problem by returning a dictionary of
dict with 'ks_options' as key and the dictionary with
keys 'liveimg_url', 'agent_token' and heartbeat_url' as
value.
Fix a bug where the deploy() method of anaconda deploy
interface where it did not return states.DEPLOYWAIT instead
it returned 'None' which caused the instance to go straight to
'active' instead of 'wait call-back'.
Fix issues in the default kickstart template where heartbeat was
missing 'callback_url' parameter and the HTTP method should be
'POST' not 'PUT'.
Fix issues with automated cleaning when anaconda deploy interface
is used.
Anaconda deploy interface could not deploy tarballs as
the disk image sent to the anaconda interface via liveimg --url
kickstart command does not include any file extension. When
no file extension is present the kickstart command liveimg --url
assumes the disk is a mountable partiton image. We fix this
problem by enabling the user to specify file extensions using
a glance image property named 'disk_file_extension' on the OS
image.
Co-Authored-By: Ruby Loo <opensrloo@gmail.com>
Change-Id: I556f8c9efbc5ab0941513c3ecaa2aa3ca7f346ae
Change the default boot mode to UEFI, as discussed during the end
of the Wallaby release cycle and previously agreed a very long time
ago by the Ironic community.
Change-Id: I6d735604d56d1687f42d0573a2eed765cbb08aec
We have implemented the cleaning prepare/tear_down, but haven't
implemented fetching/running in-band clean steps. This change moves
the cleaning logic from AgentDeployMixin to AgentBaseMixin, where it
arguably belongs.
In a follow-up patch I'm planning to reduce the number of mix-ins we
currently have, but that won't be backportable.
Change-Id: Ibc5610b14cea487d26191249e5c0333fdcd4b914
Because of the way validation works, the ramdisk deploy interface
currently requires an image_source in addition to kernel/ramdisk.
After 1d6441cc34 it is no longer
necessary, and this change removes this requirement.
Change-Id: I59996fac059dade0ef186598be1e8971e073eb04
This function has a confusing public interface and is always preceeded
by roughly the same logic, copy-pasted across boot interfaces. Move
this logic inside of the function and streamline its interface.
Change-Id: I4fc63be4e3cd4656d0ca7e893d4f3a98c07a8b4c
The current ramdisk deploy code expects a user to set the boot_option
capability to "ramdisk". Not only is it redundant, it's also not
documented, so chances are high nobody has ever done that.
As a side effect of e6bb99cd8f boot
interfaces no longer validate kernel/ramdisk/image if boot option
is "local". Unless a user explicitly sets boot option to "ramdisk",
the validation will be skipped for the ramdisk deploy.
This patch follows the pattern of the anaconda deploy and makes
get_boot_option always return "ramdisk" for the ramdisk deploy.
In the future we need to refactor this code so that the deploy interface
provides the boot option it works with, but that is a lot of changes.
Change-Id: I25c28df048c706f0c5b013b4d252f09d5a7e57bd
Interfaces should only validate values they're going to use. Boot
interfaces do not care about image properties when local boot is used
(which is the default), so they shouldn't validate them. The deploy
interface has to provide validation for images.
This change fixes PXE, iPXE and redfish-virtual-media, although other
boot interfaces may need a similar change. We also need to refactor
handling instance_info in deploy_utils, but that can wait until the
iSCSI deploy removal.
Also refactor unit tests for redfish-virtual-media.
Story: #2008874
Task: #42418
Change-Id: Ida21f21d6435c0d7fa46cb5b1161f034ad8956ee
The iSCSI deploy was very easy to start with, but it has since become
apparently that it suffers from scalability and maintenance issues.
It was deprecated in the Victoria cycle and can now be removed.
Hide the guide to upgrade to hardware types since it's very outdated.
I had to remove the iBMC diagram since my SVG-fu is not enough to fix it.
Change-Id: I2cd6bf7b27fe0be2c08104b0cc37654b506b2e62
agent_status is used by anaconda ramdisk to inform the
conductor about state of the deployment. Valid agent
states are 'start', 'end' and 'error'. The agent_status_message
is used to describe the why the agent_status is set to a
particular state. Use of these parameters require API
version 1.72 or greater.
When anaconda finishes deployment the agent_status is
set to 'end'. When anaconda ramdisk is unable to deploy
the OS for some reason the agent_status is set to 'error'.
PXEAnacondaDeploy is implemented to handle the 'anaconda'
deploy interface. PXEAnacondaDeploy ties to together pieces
needed to deploy a node using anaconda ramdisk.
Co-Authored-By: Jay Faulkner <jay@jvf.cc>
Change-Id: Ieb452149730510b001c4712bbb2e0f28acfc3c2e
The kickstart template is supplied by the user and it needs
to be validated to make sure it includes all the expected
variables and nothing else.
We validate the template by rendering it using expected
variables. If any of the expected variables are not present
in the template or unexpected variables are defined in the
template we raise InvalidKickstartTemplate exception
Once we render the template into kickstart file we
pass the file to 'ksvalidator' tool if it is present
on the system to validate the rendered kickstart file
for correctness.
'ksvalidator' tool comes from pykickstart libarary and
it is GPLv2 licensed. GPLv2 license is incompatible with
Openstack. So we do not explicitly include the library in
requirements.txt instead rely on it being pre-existing on
the conductor. If the 'ksvalidator' binary is not present
on the system, kickstart validation will be skipped
Change-Id: I3e040bbdbcefb8764c93355d0ba7179e2110b9c6
To prepare for booting anaconda we need to generate a kickstart file
from the kickstart template and pass it to the installer as a kernel
command line argument (inst.ks). Similarly the second stage of the
installer (stage2) needs to fetched and it's location needs to be
passed as a kernel command line argument (inst.stage2)
This change also adds 'boot_anaconda' target to pxe_config.template
and ipxe_config.template and renders that target correctly. The pxe
configuration will automatically switch to boot_anaconda target when
the boot_option is 'kickstart'.
Change-Id: I3ffe5a60684cdefe51c7a0a47acc1acedbb49145
This change adds 'anaconda' group and 'default_ks_template'
configuration option under that group to ironic configuration file.
Along with this change a new boot_option named 'kickstart' is added
to identify anaconda kickstart deploy in the boot interface.
deploy_utils.get_boot_option method is modified to check if
node.deploy_interface is set to 'anaconda' and return boot_option
'kickstart'.
This change also validates whether required parameters are set when
the boot_option on the node is set to 'kickstart'.
When boot_option is 'kickstart' we also validate if the glance image
source has 'squashfs_id' property associated with it.
Change-Id: I2ef7c33e2e63e6d08c084b4c5dbd77a44ddd2d14
Story: 2007839
Task: 41675
For some (likely historical) reasons we only use it for PXE and iPXE,
but the same logic applies to any boot interface (since it depends
on how the management interface and the BMC work, not on the boot
method). This change moves its handling to conductor utils.
Change-Id: I948beb4053034d3c1b4c5b7c64100e41f6022739
Two drivers already support turning secore boot on and off,
Redfish will follow soon. This patch adds ManagementInterface
calls to get and set the secure boot state.
Story: #2008270
Task: #41561
Change-Id: I96b2697163def52618b4c051a5c85adf7d1818a5
First, use default_boot_mode in get_boot_mode instead of BIOS.
Second, call sync_boot_mode for all ramdisk types in the PXE boot,
not only during deployment.
Change-Id: I3f13bacbdcb319c191eeb8ae93aecf8fba68f9ec
Python3 have a standard library for mock in the unittest module,
let's drop the mock requirement and switch tests to unittest mock.
Change-Id: I4f1b3e25c8adbc24cdda51c73da3b66967f7ef23
The default value of "netboot" was introduced to this configuration
variable as part of commit 93f947c852
in Ocata release.
This patch changes the default value of configuration parameter
'[deploy]/default_boot_option' and devstack variable
'IRONIC_DEFAULT_BOOT_OPTION' to 'local'.
Change-Id: I9bf56a7088281bbe20b8b6c2e47c6ab6559bfea4
Story: #1619339
Task: #10505
Adds functionality for dual stack capabilities and automatic
population to neutron with the correct response based upon the
IP version of the provisioning/cleaning/rescue or tenant ports.
This was origianlly intended to be separated from removing the
need for [pxe]ip_version, however the resulting code changes
from doing both this and making ironic support dual stacks
touched the same tests and some of the same code, so combined
is simpler.
Change-Id: If7a296001e204ae0c9a49495731052ab33379628
Fixes W504 and E117, resulting in some indentation changes.
Also fixes code that exceeds the complexity requirement, that is bumped
to 20 (mostly to avoid refactoring the agent heartbeat call, resulting
in conflicts for the deploy steps work).
Change-Id: I8e49f2c039b0ddfca9138f8e148708b7e8b5df7e
Remove the dynamically registered ipxe_enabled option and say goodbye.
Further extracts common bits to the PXEBaseMixin, tuning tests here and
there.
Story: 2007003
Task: 37779
Change-Id: I7c1b2a984d45bd63b4e95b62ce02960924c2ce17
This patch starts removing codes related to ipxe support from the pxe interface,
and extract common logic to the PXEBaseMixin.
Story: 2007003
Task: 37779
Change-Id: Ia621f38d9b6c4570ba6a62ddb8c72244ff3fe33b
When ipxe hardware interface is in use, the node should always be boot
with iPXE disregards the deprecated configuration option [pxe]ipxe_enabled.
Story: 2007003
Task: 37779
Change-Id: Ia658ddc966e13a7ce973eccd9c42e40a3da406f4
This change implements in-band inspection support for PXE and iPXE
boot interfaces and all in-tree network interfaces.
Story: #1528920
Task: #23184
Change-Id: I470d55add73bae47a2755cde93d4b1e1f30e94a7
They were not handled correctly and ipxe-related configs were left
after node tear down.
Story: 2006907
Task: 37549
Change-Id: I1ee6727d2fc52619544e327a10a62ae8a7e6f7fe
Currently only ipa-api-url is recognized there, since it's hardcoded
in the templates. Anything else is silently ignored. This patch
fixes it by wiring all provided options in pxe_append_params and
dropping the hardcoded ipa-api-url (provided by agent deploy).
Add some logging to make debugging such issues easier.
Change-Id: I573cf99d52a6965d64c2ed7a87cf901c12ea3fec
Story: #1528920
Task: #37255
PXE is inherently unreliable and sometimes times out without an
obvious reason. It happens particularly often in resource constrained
environments, such as the CI. This change allows an operator to
set a timeout, after which the boot is retried again.
The _add_node_filters call had to be refactored to avoid hitting
the complexity limit.
Change-Id: I34a11f52e8e98e5b64f2d21f7190468a9e4b030d
Story: #2005167
Task: #29901
The current structure is designed to support several major versions.
However, we only support V2 and in the future will use openstacksdk
to abstract away major version differences.
Change-Id: I99bcb0650ac609ae9f0a8bcff70429eb4a3b7274
This patch addresses some frequently failing tests with a few minor
changes related around how the exceptions are tested and how the
data is passed to the exceptions that are failing.
I've been able to duplicate the failing tests in CI while streaming
video on my desktop of all things. With these changes, I've been
unable to reproduce the failures.
This also fixes entries for DeployTemplate generation which were
noticed as well, however the failures were not observed in CI.
And a missing i18n tag was added into the pxe code, where it was
previously missing.
Change-Id: I719fa12340b51c55c0441df572ee7e3b33b1910c
By default, Ironic makes persistent changes to the system boot order
at the end of an instance deployment. This may not be desired in
all cases, e.g. when DC policies require to leave the persistent
system boot order unchanged.
While keeping the persistent approach as the default, this patch
proposes to extent the existing 'force_persistent_boot_device'
field in the node's driver_info for (i)PXE in a way that allows
to have all boot device changes to be non-persistent.
Change-Id: If3a19f74fb0dfbcff2cde4cd5a8cc1edf5de3058
Story: #2004846
Task: #29058