Commit Graph

271 Commits

Author SHA1 Message Date
Dmitry Tantsur 307c4572a6
Add node auto-discovery support for in-band inspection
This is a MVP of auto-discovery with no extra customization and no new
auto_discovered field from the spec.

Change-Id: I1528096aa08da6af4ac3c45b71d00e86947ed556
2024-02-02 09:24:52 +01:00
Julia Kreger 041a7d7064 Redfish UefiHttp boot support
Adds a redfish-https boot interface, based upon the
redfish-virtual-media boot interface, however substantially copies
some base methods because of simplification offered to use by
putting "attach/detach" logic into how the sushy library handles
the application and reset of a URL as a boot setting.

This feature also increases the requirement for the Sushy library
to version 4.7.0 which includes support to set the HttpBootUri
field in the BMC and automatically unset it as well.

Closes-Bug: #2032380
Change-Id: I991611cd67cb91aea21fc30bbae7cd24409dbbfa
2024-01-04 07:12:20 -08:00
Dmitry Tantsur cba10669f5 Fix the HTTP code for reaching max_concurrent_deploy: 503 instead of 500
Change-Id: I3d8c7724c1d44baa67a6364dde2f52abdb906526
2023-10-02 16:13:15 +02:00
Zuul 30881b3281 Merge "Very basic in-band inspection with the "agent" interface" 2023-07-31 18:51:56 +00:00
Dmitry Tantsur afada321d8 Very basic in-band inspection with the "agent" interface
Only port creation/updating/deletion logic has been replicated from
ironic-inspector, as well as the add_ports and keep_ports options.

In the future patches, the added code will become a part of processing
hooks.

Change-Id: I69d6a1a53c5bf9e0f41d1a5bce7215edeea54b22
2023-07-13 09:50:11 +02:00
Zuul ab9758e14f Merge "Fix the HTTP code of the BadRequest exception" 2023-07-11 12:21:42 +00:00
Dmitry Tantsur 6428116212 Fix the HTTP code of the BadRequest exception
We have both Invalid and BadRequest which result in HTTP 400 and 500
accordingly. The latter is clearly incorrect.

Then we have NotFound and HTTPNotFound which mean the same thing.

Alias exceptions in both cases with the intention to drop one copy.

Finally, NotAuthorized and Unauthorized result in HTTP 403 and 500
again. Fortunately, the latter is not used and can be removed.

Change-Id: If9d571792a8617dd6ecf17e163dea252cb0f7fae
2023-06-29 14:45:10 +02:00
Iury Gregory Melo Ferreira dad6724292 Add DB API for Firmware and Object
Adds the following methods to DB API:

* create_firmware_component
* update_firmware_component
* get_firmware_component
* get_firmware_component_list

FirmwareComponent
* create | save | get

FirmwareComponentList
* get_by_node id | sync_firmware_components

Adds two exceptions:

* FirmwareComponentAlreadyExists
* FirmwareComponentNotFound

Tests for db and objects

Changes were required in models, the class name should match the
object name we will create

Story: 2010659
Task: 47977

Change-Id: Ie1e2a4150d4ee4521290737612780c02506f4a9e
2023-06-28 14:05:21 -03:00
Jakub Jelinek 9d3d16b791 Fix Inventory DB
Follow-up to I6b830e5cc30f1fa1f1900e7c45e6f246fa1ec51c
Original changa introduced some errors such as mismatched
arguments for exceptions

Story: 2010275
Task: 46204
Change-Id: I550e048ab22a6cd25502b41d1c579819df369249
2023-02-16 15:39:45 +00:00
Jakub Jelinek bc921118b1 Erase swift inventory entry on node deletion
Follow-up to Ie174904420691be64ce6ca10bca3231f45a5bc58
which enables storage of inventory in Swift, but does not delete
the Swift entry when the node whose inventory is stored is deleted

Story: 2010275
Task: 46204
Change-Id: I74b19f7a42c1326d7ec04e6320176e81639ebfb4
2023-02-14 10:58:05 +00:00
Jakub Jelinek 59b0dc4599 Implements node inventory: database
Prepare the ironic database to accommodate node inventory received from
the inspector once the API is implemented.

Story: 2010275
Task: 46204
Change-Id: I6b830e5cc30f1fa1f1900e7c45e6f246fa1ec51c
2022-11-15 16:55:36 +00:00
Julia Kreger 9a8b1d149c Concurrent Distructive/Intensive ops limits
Provide the ability to limit resource intensive or potentially
wide scale operations which could be a symptom of a highly
distructive and unplanned operation in progress.

The idea behind this change is to help guard the overall deployment
to prevent an overall resource exhaustion situation, or prevent an
attacker with valid credentials from putting an entire deployment
into a potentially disasterous cleaning situation since ironic only
other wise limits concurrency based upon running tasks by conductor.

Story: 2010007
Task: 45140

Change-Id: I642452cd480e7674ff720b65ca32bce59a4a834a
2022-09-20 06:47:38 -07:00
Zuul 5d2283137c Merge "Make anaconda non-image deploys sane" 2022-07-14 01:28:00 +00:00
Julia Kreger e78f123ff8 Make anaconda non-image deploys sane
Ironic has a lot of logic built up around use of images for filesystems,
however several recent additions, such as the ``ramdisk`` and ``anaconda``
deployment interfaces have started to break this mold.

In working with some operators attempting to utilzie the anaconda
deployment interface outside the context of full OpenStack, we discovered
some issues which needed to be make simpler to help remove the need to
route around data validation checks for things that are not required.

Standalong users also have the ability to point to a URL with anaconda,
where as Operators using OpenStack can only do so with customized kickstart
files. While this is okay, the disparity in configuraiton checking
was also creating additional issues.

In this, we discovered we were not really graceful with redirects,
so we're now a little more graceful with them.

Story: 2009939
Story: 2009940
Task: 44834
Task: 44833
Change-Id: I8b0a50751014c6093faa26094d9f99e173dcdd38
2022-07-11 07:41:06 -07:00
Sam Zuk 94f9745f0c [Minor] Fix misspellings of "insufficient"
In a few places in the codebase, "insufficient" is misspelled as
"insufficent," which includes function names and exception class names.
This can be inconvenient for writing and debugging code, in which case
one would raise an exception/call a function and get an error that is
resolved by intentionally misspelling the function call.

The changes made here are mostly to the names of exceptions and
functions but also include some other instances of this misspelling
in docstrings, policy descriptions, etc. There were also some strings
describing policies in ironic/common/policy.py that were missing
spaces, which were also fixed.

Story: 2010089
Task: 45604
Change-Id: I7b65c449d5d30ca30f537a95a3ffd365492e0274
2022-06-14 17:06:14 +00:00
Julia Kreger fdc6424de3 Clarify driver load error message
The NoValidDefaultForInterface exception is a little misleading
in that if one doesn't have the base interface enabled, and they
attempt to enable a hardware type which requires or only supports
disabled interfaces, they will also get an exeption. The reality
is we need to suggest for them to look at enabling the interfaces
before looking at the default interface overrides, because logically
the brain jumps to setting a default before checking the interface
settings.

Change-Id: I50d4381e11da96cb7ae0ee8cbda18534380bd471
2021-11-23 20:40:51 +00:00
Jacob Anders b385d9ae5b Add support for verify steps
This change adds support for verify steps in Ironic. Verify steps
allow executing actions on transition from "verifying" to "managable"
state and can perform actions such as cleaning BMC job queue or
resetting the BMC on supported platforms. Verify steps are similar
to deploy and clean steps, just simpler.

Story: 2009025
Task: 42751
Change-Id: Iee27199a0315b8609e629bac272998c28274802b
2021-09-30 20:46:17 +10:00
Julia Kreger 85b6dc9356 Facilitate asset copy for bootloader ops
Adds capability to copy bootloader assets from the system OS
into the network boot folders on conductor startup.

Change-Id: Ica8f9472d0a2409cf78832166c57f2bb96677833
2021-09-15 16:24:13 -07:00
Kaifeng Wang fbaad948d8 Implements node history: database
This patch provides basic data model change to support node history.
Batch removal is not included in this patch.

Change-Id: I5c7cebd585ee84b5b57bd4690d4074baf0d05699
Story: 2002980
Task: 22989
2021-09-09 09:35:09 -07:00
Dmitry Tantsur 47398edd3c Inherit InvalidImageRef from InvalidParameterValue
InvalidImageRef is a kind of InvalidParameterValue and can happen during
validation, causing a traceback now.

Change-Id: I5f10fe7240e74d337f991bbd1a5220cc4e713de7
2021-05-04 17:32:54 +02:00
Arun S A G 9d3de26fb1 Validate the kickstart template and file before use
The kickstart template is supplied by the user and it needs
to be validated to make sure it includes all the expected
variables and nothing else.

We validate the template by rendering it using expected
variables. If any of the expected variables are not present
in the template or unexpected variables are defined in the
template we raise InvalidKickstartTemplate exception

Once we render the template into kickstart file we
pass the file to 'ksvalidator' tool if it is present
on the system to validate the rendered kickstart file
for correctness.

'ksvalidator' tool comes from pykickstart libarary and
it is GPLv2 licensed. GPLv2 license is incompatible with
Openstack. So we do not explicitly include the library in
requirements.txt instead rely on it being pre-existing on
the conductor. If the 'ksvalidator' binary is not present
on the system, kickstart validation will be skipped

Change-Id: I3e040bbdbcefb8764c93355d0ba7179e2110b9c6
2021-03-23 21:53:35 -07:00
Julia Kreger d9913370de Guard conductor from consuming all of the ram
One of the biggest frustrations larger operators have is when they
trigger a massive number of concurrent deployments. As one would
expect, the memory utilization of the conductor goes up. Except,
even with the default number of worker threads, if we're requested
to convert 80 images at the same time, or to perform the write-out
to the remote node at the same time, we will consume a large amount
of system RAM. Or more specifically, qemu-img will consume a large
amount of memory.

If the amount of memory goes too low, the system can trigger
OOMKiller which will slay processes using ram. Ideally, we do not
want this to happen to our conductor process, much less the work
that is being performed, so we need to add some guard rails to help
keep us from entering into situations where we may compromise the
conductor by taking on too much work.

Adds a guard in the conductor to prevent multiple parallel
deployment operations from running the conductor out of memory.

With the defaults, the conductor will attempt to throttle back
automatically and hold worker threads which will slow down the
amount of work also proceeding through the conductor, as we are
in a memory condition where we should be careful about the work.

The defaults allow this to occur for a total of 15 seconds between
re-check of available RAM, for a total number of six retries.
The minimum default is 1024 (MB), as this is the amount of memory
qemu-img allocates when trying to write images. This quite literally
means no additional qemu-img process can spawn until the default
memory situation has resolved itself.

Change-Id: I69db0169c564c5b22abd0cb1b890f409c13b0ac2
2021-01-29 14:33:57 -08:00
Derek Higgins 7d85b35c84 Register all hardware_interfaces together
Prevent each driver comming online one at a time. So that
/driver returns nothign until all interfaces are registered

Story: #2008423
Task: #41368

Change-Id: I6ef3e6e36b96106faf4581509d9219e5c535a6d8
2021-01-08 15:16:53 +00:00
Zuul f11f330d00 Merge "Support port name" 2020-12-19 20:46:10 +00:00
Kaifeng Wang b7ddeb314d Support port name
MAC address is not user friendly for port management, having
a name field is also a feature parity with other resources.
This patch implements db related change.

Change-Id: Ibad9a1b6bbfddc0af1950def4e27db3757904cb1
Story: 2003091
Task: 23180
2020-11-29 13:37:55 +08:00
Steve Baker e41893c9d0 JSON conversion followup change
This change addresses nit-level review comments from this task.

Story: 1651346
Task: 10551
Change-Id: I01608004ce90facadb73e252203900a1e62cbea1
2020-11-26 11:05:48 +13:00
Julia Kreger 545dc2106b Handle agent still doing the prior command
The agent command exec model is based upon an incoming
heartbeat, however heartbeats are independent and
commands can take a long time. For example, software RAID
setup in CI can encounter this.

From an IPA log:

[-] Picked root device /dev/md0 for node c6ca0af2-baec-40d6-879d-cbb5c751aafb
    based on root device hints {'name': '/dev/md0'}
[-] Attempting to download image from http://199.204.45.248:3928/agent_images/
    c6ca0af2-baec-40d6-879d-cbb5c751aafb
[-] Executing command: standby.get_partition_uuids with args: {} execute_command
    /usr/local/lib/python3.6/site-packages/ironic_python_agent/extensions/base.py:255
[-] Tried to execute standby.get_partition_uuids, agent is still executing Command name:
    execute_deploy_step, params: {'step': {'interface': 'deploy', 'step': 'write_image',
    'args': {'image_info': {'id': 'cb9e199a-af1b-4a6f-b00e-f284008b8046',
    'urls': ['http://199.204.45.248:3928/agent_images/c6ca0af2-baec-40d6-879d-cbb5c751aafb'],
    'disk_format': 'raw', 'container_format': 'bare', 'stream_raw_images': True, 'os_hash_algo':
    'sha512', 'os_hash_value':<trimed>

This was with code built on master, using master images.
Inside the conductor log, it notes that it is likely an out
of date agent because only AgentAPIError is evaluated,
however any API error is evaluated this way. In reality, we need
to explicitly flag *when* we have an error that is because
we've tried to soon as something is already being worked upon.

The result, is to evaluate and return an exception indicating work
is already in flight.

Update - It looks like, the original fix to prevent busy agent
recognition did not fully detect all cases as getting steps is a
command which can
get skipped by accident with a busy agent, under certain circumstances.
Change I5d86878b5ed6142ed2630adee78c0867c49b663f in ironic-python-agent
also changed the string that was being checked for the previous
handling, where we really should have just made the string we were
checking lower case in ironic. Oh well! This should fix things
right up.

Story: 2008167
Task: 41175
Change-Id: Ia169640b7084d17d26f22e457c7af512db6d21d6
2020-10-29 14:58:34 -07:00
Kaifeng Wang 82a2fe4f7f Follow up to I44336423194eed99f026c44b6390030a94ed0522
Allow using IPv6 address in the provisioning network.

IP address based pxe config may not be used actually, in that case we
can remove it and saving a few neutron interaction.

Change-Id: Ideef57674550270a87513e039cd030f0bcc1c10e
2020-08-10 13:46:59 +08:00
Steve Baker 44cc6dd792 Add wsme core types, remove WSME
The header for the file types.py denotes its dual-licensed status as
MIT with copyright to the original WSME authors, plus apache licensed
as part of Ironic.

Story: 1651346
Task: 10551

Change-Id: I986cc4a936c8679e932463ff3c91d1876a713196
2020-07-14 10:34:13 +12:00
Steve Baker 8006c9dfd2 Add json and param parsing to args
Some unused HTTP param to arg parsing has not been implemented to
reduce code complexity. This includes the following types:
- DictType
- complex types

Asserts are added to confirm these param types are not used in ironic
currently, and to prevent them being used in future development.

Story: 1651346
Task: 10551

Change-Id: Idfcf99216f10e8928fe4ba6202a7d69bfa916459
2020-07-14 10:34:13 +12:00
Dmitry Tantsur 7828fe8b64 agent: poll long-running commands till completion
Currently for install_bootloader we use wait=True with a longer
timeout. As a more robust alternative, poll the agent until
the command completes. This avoids trying to guess how long
the command will actually take.

Change-Id: I62e9086441fa2b164aee42f7489d12aed4076f49
Story: #2006963
2020-06-19 16:46:44 +02:00
Aeva Black 9f75bbd938 Add my new address to .mailmap
This commit updates the mailmap file and changes my alias
in a few places within old comments.

Change-Id: Ica0e184109d794b8e129d567b5606d7fe84ff384
2020-04-13 07:29:37 -07:00
Kaifeng Wang b3721ce4ff Automatic port allocation for the serial console
Introduces [console]port_range configuration option and implements
the feature of automatic port allocation for IPMI based serial console.

The ipmi_terminal_port in driver_info takes precedance if specified,
otherwise ironic will allocate free port from configured port range
for underlying serial proxy tools.

The implementation deviation with the original proposal is this patch
doesn't validate whether user specified ipmi_terminal_port falls in the
range, based on following considerations:
a. ipmi_terminal_port is considered a resort for backwards compatibility,
we will remove this eventually.
b. different conductors may have different port range configured (rare,
but could happen).
c. force ipmi_terminal_port in the port range could raise the
possibility of conflicts with ports in the configured range, this is not
a desired result, so leave the choice to the end users.

Change-Id: If8722d09dc74878f4da2e4a7f059d9b079c3e472
Story: 2007099
Task: 38135
2020-02-10 16:09:12 +08:00
Arne Wiebalck 3ecaadbb35 Support node retirement
This change adds support for node retirement: nodes can
have additional properties 'retired' and 'retired_reason'
which change the way the nodes (can) traverse the FSM
and which operations are allowed. In particular:
- retired nodes cannot move from manageable to available;
- upon instance deletion, retired nodes move to manageable
  (rather than available).

Story: #2005425
Task: #38142

Change-Id: I8113a44c28f62bf83f8e213aeb6704f96055d52b
2020-01-28 11:01:32 +01:00
Steve Baker f192f2c45d Subclass wsme.exc.ClientSideError
This change avoids importing a wsgi namespace exception class, and
allows the future option of changing the parent class of
exception.ClientSideError when wsme is no longer processing API
requests.

Change-Id: I8165e094fafb91ff94eaa1dd96baba7671487448
Story: 1651346
2020-01-22 16:46:59 +13:00
Riccardo Pittau 78c121a5d7 Stop using six library
Since we've dropped support for Python 2.7, it's time to look at
the bright future that Python 3.x will bring and stop forcing
compatibility with older versions.
This patch removes the six library from requirements, not
looking back.

Change-Id: Ib546f16965475c32b2f8caabd560e2c7d382ac5a
2019-12-23 09:38:25 +01:00
Julia Kreger 5f18e52b64 Remove CIMC/UCS drivers
Cisco's Third-Party CI was taken down as a result of the
CTO's office being restructured. Numerous attempts to
re-engage with Cisco directly and address the various
known issues in their drivers have not proven to be
fruitful.

Additionally, the drivers are not Python3 compatible,
and some reports have indicated that the CIMC driver is
no longer compatible with newer versions.

As such, the ironic community has little choice but to
to remove the Cisco UCS/CIMC hardware types and driver
interface code.

Story: 2005033
Task: 29522
Change-Id: Ie12eaf7572ce4d66f6a68025b7fe2d294185ce28
2019-06-25 23:44:19 -07:00
Riccardo Pittau f5dbf8ba0c Switch to use exception from ironic-lib
The exception modules in ironic and ironic-lib contain the same
almost identical class IronicException.
With this patch we directly use the one in ironic-lib.

Updating requirements and lower-constraints to use compatible
version of ironic-lib.

Also deprecating duplicated fatal_exception_format_errors
option.

Change-Id: I1ce0d12d912020346425fd658d3b1807607455a4
Story: 1626578
Task: 10515
2019-06-11 12:03:44 +02:00
Zuul f944f60041 Merge "Add Huawei iBMC driver support" 2019-03-15 16:59:23 +00:00
Qianbiao NG f1f4f892fe Add Huawei iBMC driver support
This patch proposes to adding iBMC driver for deploying the
Huawei 2288H V5, CH121 V5 series servers.
The driver aims to add management and power interfaces using
Huawei iBMC RESTful APIs for those series servers.

Change-Id: Ic5e920e4e58811c6a6dfe927732595950aea64e7
Story: 2004635
Task: 28566
2019-03-14 11:04:29 +08:00
Zuul faf0bccbb0 Merge "Fix TypeError: __str__ returned non-string (type ImageRefValidationFailed)" 2019-03-07 04:41:20 +00:00
Dmitry Tantsur 245af384ff Fix TypeError: __str__ returned non-string (type ImageRefValidationFailed)
This change contains two fixes for the same issue:
* Do not pass an instance of ImageRefValidationFailed as a message
  to an exception constructor.
* Make sure that IronicException.__str__ always returns an str,
  even when a non-string is passed as the first argument to __init__.

Change-Id: I96edb28955e64915e9d6a481634857fd27690555
Story: #2003682
Task: #26206
2019-03-04 13:16:17 +01:00
Mark Goddard ec2f7f992e Deploy templates: API & notifications
Adds deploy_templates REST API endpoints for retrieving, creating,
updating and deleting deployment templates. Also adds notification
objects for deploy templates.

Bumps the minimum WSME requirement to 0.9.3, since the lower constraints
job was failing with a 500 error when sending data in an unexpected
format to the POST /deploy_templates API.

Change-Id: I0e8c97e600f9b1080c8bdec790e5710e7a92d016
Story: 1722275
Task: 28677
2019-03-04 10:30:16 +00:00
Mark Goddard b137af30b9 Deploy templates: data model, DB API & objects
Adds deploy_templates and deploy_template_steps tables to the database,
provides a DB API for these tables, and a DeployTemplate versioned
object.

Change-Id: I5b8b59bbea1594b1220438050b80f1c603dbc346
Story: 1722275
Task: 28674
2019-02-13 19:26:21 +00:00
Dmitry Tantsur 96b9d9de07 Allocation API: conductor API (without HA and take over)
This change introduces the two RPC calls required for the allocation
API: create_allocation and destroy_allocation.

The nodes RPC is updated to:
* Prevent instance_uuid deletion if a node has an allocation and is
  not in an updatable state.
* Delete allocation when instance_uuid is deleted and the node is
  in an updatable state.
* Delete allocation when a node is unprovisioned and instance_uuid
  is thus cleared.

Change-Id: I45815727f970c3d7fe51bb78d8e162a374d12e04
Story: #2004341
Task: #27987
2019-01-31 13:01:09 +01:00
Dmitry Tantsur a4717d9958 Allocation API: database and RPC
This change adds the database models and API, as well as RPC objects
for the allocation API. Also the node database API is extended with
query by power state and list of UUIDs.

There is one discrepancy from the initially approved spec: since we
do not have to separately update traits in an allocation, the planned
allocation_traits table was replaced by a simple field.

Change-Id: I6af132e2bfa6e4f7b93bd20f22a668790a22a30e
Story: #2004341
Task: #28367
2019-01-07 12:51:10 +01:00
Dmitry Tantsur 68d62f2bee Support for protecting nodes from undeploying and rebuilding
When handling the "pet" case, some nodes may be critical for the deployment.
For example, in an OpenStack installer like TripleO you may want to make
sure your controllers are not removed by an incorrect operation.

This changes introduces a new field "protected" on nodes. When it is
set to True, the "deleted" and "rebuild" provisioning actions fail with
HTTP 403.  Deleting such nodes is also not possible.

Also adds "protected_reason" for the operators to specify the reason
a node is protected.

Story: #2003869
Task: #26706
Change-Id: I1950bf6dd65b6596cae69d431ef288e578a89d6e
2018-11-27 10:07:30 +01:00
Julia Kreger abb0865771 Remove oneview drivers
In accordance with the deprecation of oneview,
It is time to remove the oneview drivers.

This patch oneview interfaces and documentation.

Change-Id: Ided79fa788411f839614813ff033c42a13b88c75
Story: #2001924
Task: #24943
2018-10-15 16:32:15 -07:00
Zuul 90cccaa520 Merge "Fix for failure in cleaning" 2018-07-25 14:19:00 +00:00
Shivanand Tendulker 7c5a04c114 Fix for failure in cleaning
The cleaning operation may fail, if an in-band clean step were to
execute after the completion of out-of-band clean step that
performs reboot of the node. The failure is caused because of race
condition where in cleaning is resumed before the Ironic Python
Agent(IPA) is ready to execute clean steps.

Story: #2002731
Task: #22580
Change-Id: Idaacb9fbb1ea3ac82cdb6769df05d8206660c8cb
2018-07-24 02:18:08 -04:00