Primarily remove the workaround added in
Ia6d512ff2ae417bab938cb095fbb0884d195010a which added
continued use of autocommit, which is incompatible with
SQLAlchemy 2.0.
Also set the environment for unit tests to report compatability
warnings, although it appears none are being reported at this time.
Also cuts out the db upgrade cruft to only use the online database
migration code through oslo_db's enginefacade, which has the smarts
to handle online or offline migrations.
And then, retools unit/functional test data storage to utlize sqlite,
and in that re-tooled the queries to prevent locking conditions
which could exist with queries, and some additional refactoring/cleanup.
Also, don't mock and test time.sleep().
Additionally, it looks like we have discovered the root cause of the
memory/connection leakage issue which has been observed, due to the
way lists of nodes are processed/returned.
This change was based upon the work in
I506da42a9891a245831f325e34bec92e0a3f33f0 which is included in
this commit as the entire database structure and interaction
has been modified for ironic-inspector.
Co-Authored-By: aarefiev <aarefiev@mirantis.com>
Story: 2009727
Task: 44132
Change-Id: Ic88eb9dec5fddc924a72d9a23c17a304954ebf46
This patches removes the ironic-client dependency from the
ironic module in favor of openstacksdk.
Increase minimum required version of openstacksdk to use
recent added features.
Change-Id: I31964179835ad454e8205a59f483b419de4fb62d
When manage_boot is False, we do not power on the machines on starting
introspection, nor do we power them off on aborting. We should not
power off on finishing introspection either, let's leave it up to ironic.
Change-Id: If8115f8d592e1b24b07ef52bcd703d10763c1f00
Story: #1528920
When an active node introspection is performed
where an entry already exists in ironic, but inspector
has never seen the node before, we enter a state where
we should allow introspection to occur, but that we don't
have a cache record.
As a result of no cache entry, we presently fail hard claiming
the node's port already exists. In the case of active nodes,
This will always be the case, so we need to add a cache
entry and allow the process to... somewhat continue as if
normal introspeciton is occuring.
Co-Authored-By: Dmitry Tantsur <dtantsur@redhat.com>
Change-Id: Ib20b4a4e4d46ba23b8a4d87791cf30a957f1f9f9
Story: 2006233
Task: 35834
NodeInfo accepts a lock instance when instantiated, it is only used
in the node clean up periodic task, such implementation introduces
complexity on the lock abstraction. This patch moves node locking
out of get_node.
_get_lock_ctx is removed for it only duplicates _get_lock according to
current usage.
Change-Id: I4e946942ce684b27062d1a42c68faf1d92569489
This patch adds support to provide unprocessed introspection data
to reapply a node. The provided introspection data will be save to
current introspection storage backend.
Change-Id: I969ae9c32f53f89c006a64a006388ddea9542aa5
Story: 1564863
Task: 11344
In https://review.openstack.org/#/c/637673, the get introspection
interface is narrowed down to only accept node uuid, which previously
accepts both uuid and name. But the name to uuid conversion is
missing in the reapply api, which causes feature regression, and
this is the fix :(
Story: 1726713
Task: 11373
Change-Id: I6912853deab77f1365f665ca1e52c13063d2cdf1
The patch revises driver interface for the introspection data
backends. Previously getting introspection data supports node
uuid/name, while saving takes a node_info object, which is not
consistent and makes migration tool looks weird if implemented
based upon it.
For the get() interface, actually only uuid will be passed in,
so it's narrowed down to accept only uuids, logic names will be
converted from api level if there is a need.
The save() interface is changed to accept node uuid instead of
node_info, which is consistent with the get() interface.
Change-Id: I4702ed7372d0e60ed6252879a7496649a0453b84
Story: 1726713
Task: 11373
An issue is spotted during implementing [1], we have a solely code
path from api, do_reapply, reapply, to _reapply. Inspector now reads
intrspection data ahead, so there is no need to do so in _reapply,
reading introspection data there is never reached.
The patch removes unecessary code and corresponding tests.
[1] https://storyboard.openstack.org/#!/story/1564863
Change-Id: I5558bce2bc49de3d1c5dba59e203de4824a3addd
Story: 1726713
Task: 11373
Configurable introspection data storage backend [1] is proposed
to provide flexible extension of introspection data storage
instead of the single support of Swift storage backend.
This patch adds plugin mechanism for loading introspection
storage, creates database backend and moves Swift storage
into a plugin.
[1] http://specs.openstack.org/openstack/ironic-inspector-specs/specs/configurable-introspection-data-backends.html
Story: 1726713
Task: 11373
Co-Authored-By: Kaifeng Wang <kaifeng.w@gmail.com>
Change-Id: Ie4d09dc0afc441b20a1e5e3bd8e742b1df918954
Multiple spots were not using DB transactions when processing the terminal
state transitions (error, abort, finish, timeout). The pattern looked like
this:
node_info.fsm_event(istate.Events.error)
# more code
node_info.finished(error='Oops!')
which led to brief periodes of state inconsistency of NodeInfo records in
the DB.
This patch refactors the NodeInfo.finished() method to require a terminal state
transition to perform as part of the NodeInfo state update:
NodeInfo().finished(istate.Events.finish)
NodeInfo().finished(istate.Events.abort, 'Canceled by operator')
This patch also introduces a new state: aborting to allow the inspector to
try call power-off the node before marking the introspection aborted.
There's a new DB migration since the new state implies a schema change too
(Enum).
Closes-Bug: #1721233
Closes-Bug: #1721230
Closes-Bug: #1723384
Change-Id: I0bb051d1956a996ed006d55a5ca2d670d9455047
We need to always use bmc_address from inventory, the ipmi_address field is
there only for compatibility with older processing hooks.
Change-Id: Ibf00ecd9278af4ee9318ada44c7c670d13ac22aa
Closes-Bug: #1714944
This experimental feature was deprecated in the Ocata release,
as it was found unstable, untested and dangerous.
API version is bumped to 1.12 to indicate this change to users.
Change-Id: I1aad6ddfd03946edc19ae510accd6c8daf5fc268
Closes-Bug: #1654318
Currently port logic is placed inconsistently: port creation is in core processing,
while port deletion is in validate_interfaces before_update. This changes moves
port creation there as well. This should only affect deployments that tamper with
validation_interfaces hook, as previously port creation was run just before running
before_update hooks.
This allows deployments to replace port creation logic by replacing the
validate_interfaces hook.
Change-Id: Idd8f748fdf31fc694bd7b554837e509024716c18
Partial-Bug: #1667472
The i18n team has decided not to translate the logs because it seems
like it not very useful.
Change-Id: I46c1b0c3efa28c3f887b1a29dc77d47fe749be87
Closes-Bug: #1674374
The reapply API/Action(openstack baremetal introspection reprocess UUID)
doesn't update the started_at time when Ironic Inspector begins processing
the node.
This adds the started_at time when the reapply API/Action is performed.
Change-Id: Ic79db4ba9305841fb662afcb56f556ad4a57a500
Closes-Bug: #1625180
Use the flake8 plugin flake8-import-order to check import ordering. It
can do it automatically and don't need reviewers to check it.
Change-Id: I9ced9c297273db0eec6ab3995b663b1e8dffe87d
InfiniBand is computer-networking communications standard
used in high-performance computing, features very high
throughput and very low latency.
This patch allow ironic-inspector to add the client_id
to ironic port extra. The client_id option allow pxe boot
from InfiniBand interface.
Closes-Bug: #1532534
Depends-On: Ifad453977e5d3be64b34e544f269835a72b4d73f
Change-Id: I479d54c29bcacb6bd5c1ab20033ae6e428b0e744
This patch replaces python standard base64 library call to
oslo_serialization.base64 to follow OpenStack Python3 porting
standard [1].
Use base64 encoding takes 8-bit binary byte data and encodes it. On
Python3, A string is a sequence of Unicode characters thus base64 has
no idea what to do with Unicode data, it's not 8-bit[2]. We use
oslo_serialization.base64 for python2 and python3.
[1] https://wiki.openstack.org/wiki/Python3
[2] http://stackoverflow.com/questions/8908287/base64-encoding-in-python-3
Change-Id: Ibf24df3a90ecbcdec400a0570f2818f89b78ea0a
Currently, state of a node introspection isn't kept in the database.
This change introduces:
* a new database column to keep the node introspection state
* an automaton to manage the node introspection state
* a decorator to declare a function performing an introspection state
transition
* a version_id column is added, to enhance database consistency, that
is consulted whenever node_info is committed
This change is part of the HA_Inspector effort[1]
[1] https://specs.openstack.org/openstack/ironic-inspector-specs/specs/HA_inspector.html
Closes-Bug: #1618835
Partial-Bug: #1525218
Change-Id: I18cb45f0d1194414715ccbe826d8a95610ec718d
Use hacking 0.12.0
Use the new checks that are available:
[H106] Don’t put vim configuration in source files.
[H203] Use assertIs(Not)None to check for None.
[H904] Delay string interpolations at logging calls.
Fix code so tests pass.
Change-Id: I7b1cd98aeb3bb8f4f01dd1d69d1afcd839749074
Setting node to the error status with error message
if reapply fails to get introspection data from swift.
Change-Id: Idccb68847d14d5050c735facf3da7b3ec108adbe
Closes-Bug: #1618833
This adds configuration option 'processing.power_off'
defaulting to True, which will prevent powering off the
node after introspection
Change-Id: I16eb6b73fd57e84175bbce81c79e432ed8d1d3fa
Closes-Bug: #1488534
The template for ramdisk logs file names can now be changed via
the configuration. The default now contains only node UUID and datetime.
Also a proper tar.gz extension is appended to avoid confusion.
Depends-On: Ie507e2e5c58cffa255bbfb2fa5ffb95cb98ed8c4
Change-Id: I738f9bd35705d0d11c95b0164186ed0b366b5252
From now on only rely on the IPA inventory and 2 additional fields:
boot_interface and root_device.
Also updated unit tests to use one inventory example.
Also removed duplicating unit tests and checks in test_process.
Also removed devstack support for the old ramdisk.
Change-Id: Ib382328295fc2c1b9143171b1047304febadcaca
This change drops check on UUID validness from our API.
It also has a subtle effect of doing Ironic node fetching in
the introspection status and data fetch calls, which might make them
slightly longer (but only when name is used).
A new helper common.ironic.get_node is created to unify how we fetch nodes
from Ironic. It also provides nicer exceptions.
Change-Id: I20cf65e57910568b70a62c3f9269a962e78a07e2
Closes-Bug: #1523902
Previously the ramdisk logs were only stored if the ramdisk reported an error.
However, we are moving from ramdisk-side validation to server-side, so we need
ramdisk logs to be available if processing fails too.
This change moves storing ramdisk logs from a ramdisk_error plugin to core
processing code. As before, it can be disabled by setting ramdisk_logs_dir to
an empty value.
Change-Id: Ib3742ee1c1d4f2f96d29466626e1121694610dc3
Closes-Bug: #1564448
As requested in the related bug, this pull request allows to run the
introspection again on previously stored data. This should make it simple
to correct mistakes in the introspection rules.
For this purpose, new API entry point was introduced:
/v1/introspection/<UUID>/data/unprocessed that supports an empty POST
method to trigger the introspection over stored data. New function
`reapply` was introduced that takes care about the entry point and carries
out the introspection. The `process` function was modified to allow
reusing common parts in the new reapply function. The storage access
methods were updated to allow saving the "raw" memdisk data besides the
processed introspection data.
Following preconditions are checked the reapply function having been
triggered:
* no data is being sent along with the request
* Swift store is configured and enabled and the stored data is present
for the node UUID
* node_info object is cached for the UUID and it is possible to lock the
object
Should the preconditions fail, an immediate response is given to the user:
* 400 if the request contained data or in case Swift store is not enabled
in configuration
* 409 if it wasn't possible to acquire lock for the node_info object
* 404 in case Ironic didn't keep track of related BM node
If the preconditions are met, a background task is executed to carry out
the processing and a 202 Accepted response is returned to the endpoint
user.
As requested, these steps are performed in the background task:
* preprocessing hooks
* post processing hooks, storing result in Swift
* introspection rules
These steps are avoided, based on the RFE:
* not_found_hook is skipped
* power operations
Limitations:
* IMPI credentials are not updated --- ramdisk not running
* there's no way to update the raw data atm.
* the raw data is never cleaned from the store
* check for stored data presence is performed in background;
missing data situation still results in a 202 response
Change-Id: Ic027c9d15f7f5475fcc3f599d081d1e8d5e244d4
Closes-Bug: #1525237
Currently we are using only the resulting MAC(s) when doing a node lookup.
In many cases it is the MAC of the PXE-booting NIC. However, it's not necessary
the MAC that people used for enrolling the Ironic node, which will lead to
lookup failures on the virtual environment. This change makes the lookup
procedure use all of the valid MAC's.
Similarly, the enroll node_not_found_hook now checks all MAC's before creating
a node.
Code in the validate_interfaces hook was reordered to ensure we only keep
interfaces with valid MAC's even in the "all_interfaces" list.
Change-Id: Ie7df05d9a7855716fb835c90cfb0ac7fc4cd66df
With enroll hook it's possible to process nodes in enroll state.
As nodes in this state are not expected to have valid power
credentials, we shouldn't fail if power off goes wrong.
Change-Id: I79e502052c4ae1b531b4f0a0bc314b4cf6d29aac
A green thread is now used instead of spawn_n for running asynchronous
operations during introspection, processing and aborting.
The existing periodic tasks are now run using Futurist PeriodicWorker.
Main shut down procedure was split into a separate function for convenience.
Also updated the example.conf to the latest versions (some pending updates from
3rdparty libraries included).
Change-Id: Id0efa31aee68a80ec55e4136c53189484b452559
Ramdisk logs are huge (and may potentially be VERY huge) and clutter the output
for 'introspection data save'. They might even potentially expose unwanted
information to a user.
Change-Id: I0e7f1f647e60711f84d7883caab5ce8b14d6184f
It is not currently possible to stop a running introspection. This
may be annoying for the operator, considering the amount of time it
takes for the bare metal node to call the continue API request,
especially knowing the introspection will fail/time-out eventually
(such as when debugging).
This patch introduces a REST API endpoint "POST
/v1/introspection/<node-UUID>/abort" in order to fill the gap. Upon the
abort method call, following preconditions are checked:
* there's a bare metal node matching the UUID
* introspection was not finished for the node
* introspection process is waiting for the node to give the continue call
Following Responses are returned to the caller in case the
preconditions were not met:
* 404 Response in case node wasn't found
* 409 Response (resource busy) in case the introspection process is not
waiting for the Continue call
Otherwise, a 202 Response is returned.
When the abort method is processed, the node is powered off and it is
black-listed in inspector's firewall to prevent it from booting the
introspection image. This happens asynchronously.
To prevent interference with the continue call processing, the
processing method was updated to give a 400 Response to the
introspection client in case continuing a finished or canceled
introspection.
Limitations:
* IMPI credentials are never updated in case introspection was canceled
* 202 response is returned even if the introspection was already finished
* the endpoint differs from requested "DELETE
/v1/introspection/<node-UUID>"
Links:
[1] https://bugs.launchpad.net/ironic-inspector/+bug/1525235
Change-Id: If043171f0d292ae2775dc1f26233dd4911599247
Closes-Bug: #1525235
Long time ago we had an idea of batching node updates. However,
it fails miserably when a plugin or an introspection rule has to inspect
existing node properties, because it receives possibly stale values.
We've deprecated support for batching patches back in Liberty, this
patch removes the associated bits from the hook interface.
Change-Id: Ia482ff50ca276ce1ffec631f016c6a6b54d5a4ab
Closes-Bug: #1506348
Currently our logging in processing is very inconsistent:
some log strings mention node UUID, some - node BMC IP, some nothing.
This change introduces a common prefix for all processing logs
based on as much information as possible.
Only code that actually have some context about the node (either
NodeInfo or introspection data) is updated.
Also logging BMC addresses can be disabled now.
Updates example.conf (a lot of updated comments from oslo).
Change-Id: Ib20f2acdc60bfaceed7a33467557b92857c32798
Currently we only return "Data pre-processing failed" and the only option
to figure out what went wrong is to dive into logs.
Also fixed error message for unexpected exceptions during pre-processing
to actually contain some exception information.
Change-Id: I998c0bbd48cc57b7b13fac220f4f7cfe2a5e0681
Closes-Bug: #1523907
Now a NodeInfo object has a lock instance attached, which is associated
with a node UUID. New calls acquire_lock and release_lock are added.
NodeInfo.find_node now always acquire a lock, get_node does it optionally.
NodeInfo.finished now always release a lock.
Note that this change does not make NodeInfo thread-safe. Rather it's
preventing simultaneous operations on the same node.
Change-Id: I0c0bd2f4e0622fd7086660b0874c5b8dca82b4d2
Implements: blueprint node-state