This commit adds the code to raise the error for the following command,
if the hardware does not have 4940 or 4942 QAT devices.
Command: "helm-chart-attribute-modify --enabled true
intel-device-plugins-operator intel-device-plugins-qat
intel-device-plugins-operator"
TEST CASES:
PASSED: Build is success.
PASSED: Bootstrap is success.
PASSED: Upload the package using command system application-upload.
PASSED: Check chart enabled status using command
"system helm-override-list intel-device-plugins-operator --long"
PASSED: Enable the QAT chart using command "system
helm-chart-attribute-modify", raises error in non QAT system.
PASSED: Enable the QAT chart using command "system
helm-chart-attribute-modify", it should success on QAT system.
PASSED: Disable the QAT chart using command "system
helm-chart-attribute-modify", it should success in both QAT and
non QAT system.
PASSED: Apply app intel-device-plugins-operator on both QAT and non QAT
system is successful.
Story: 2010604
Task: 50027
Change-Id: Ied634cf35b53421bcaa2f8307e76a0fc87d3bb1f
Signed-off-by: Md Irshad Sheikh <mdirshad.sheikh@windriver.com>
This change adds the variable public_secondary_ip_address to
platform::haproxy::params filled with the secondary OAM address pool
floating address value, in a similar way that is done for the primary
address pool. This will be used in HAproxy to bind the necessary L4
public ports to the secondary address.
Test plan
[PASS] Install and add a secondary pool via CLI and, then, after
lock/unlock, check that all public endpoints (openstack
endpoint list) are available in the primary and secondary
addresses, on the following setups:
- AIO-SX (prim:IPv4, sec:IPv6)
- AIO-SX (prim:IPv6, sec:IPv4)
- AIO-DX (prim:IPv4, sec:IPv6) with system-controller role
- AIO-DX (prim:IPv6, sec:IPv4) with system-controller role
[PASS] Access the public APIs on both protocols using curl.
Story: 2011027
task: 49996
Change-Id: I1b79f4e462ab34ab2aa7187d92460202fa15ae7e
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
Skip recovery for applications that have update_failure_no_rollback
set to 'true' and that eventually fail to pass lifecycle semantic
checks during updates.
Triggering the recovery of an app that does not support rollbacks can
result in a broken state. This aims to standardize the behavior of the
application update process by equalizing how we handle lifecycle
semantic checks failures to other update errors such as apply failures.
Test Plan:
PASS: build-pkgs && build-image
PASS: AIO-SX fresh install
PASS: Create a modified version of cert-manager setting the
'update_failure_no_rollback' option to 'true'.
Update cert-manager to the modified version.
Confirm that the update succeeded.
PASS: Create a modified version of cert-manager setting the
'update_failure_no_rollback' option to 'true'.
Force an exception when running lifecycle semantic checks.
Update cert-manager to the modified version.
Confirm that the update failed with a descriptive error message
informing that the skip recovery feature is enabled.
Fix the code and reapply the app.
Confirm that the app was successfully applied.
Closes-Bug: 2064737
Change-Id: Ie90c5c3c3a79d8502eb9cc1aa11222963ba13621
Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>
This commits changes IPSec client to use hostname instead of IP Address
on swanctl configuration parameter local addr in worker nodes.
Test Plan:
PASS: In a DX system with IPsec enabled and security association
established in both controllers, add a worker node and observe
that IPSec will be enabled and security association will
established in the three nodes without manually intervention.
Story: 2010940
Task: 50039
Change-Id: Idba336e3870f33db840846578441984e11b0d574
Signed-off-by: Leonardo Mendes <Leonardo.MendesSantana@windriver.com>
When user executes "system health-query" commands, it logs the
full certificate snapshot to sysinv.log, this is happening
because the CertAlarmAudit is imported to health.py module
to check for any expiry/expired alarms before upgrade activity.
This fix addresses this issue by changing the "info" log to
"debug".
Test Cases:
PASS: Run "system health-query", "system health-query-kube-upg
rade" , "system health-query-upgrade" and verify that
cert snapshot is logging only in the debug mode.
Closes-bug: 2064925
Change-Id: Ia0482a557931afdef89a6fa88017ea488a6dca59
Signed-off-by: amantri <ayyappa.mantri@windriver.com>
Modify USM software to use fqdn host name for management network
Test Plan:
PASS: Install DC subcloud, ensure it is in managed state,
and execute software commands (Eg. software list)
Closes-Bug: 2063460
Change-Id: I1782d02d58dfe3c8a08048f6d807e3e62532b292
Signed-off-by: Joseph Vazhappilly <joseph.vazhappillypaily@windriver.com>
This commit adds and installs ipsec-config script executed during the
execution of the sm-service. The ipsec-config service has the goal to
create a symbolic link between swanctl.conf file and different .conf
files depending on which personality the controller node is assuming,
swanctl_active.conf or swanctl_standby.conf.
This script implements 5 actions: start, stop, status, meta-data and
monitor.
1) The start action creates a symbolic link between swanctl.conf and
swanctl_active.conf file, as the active controller has ipsec-config
service on enabled-active status.
2) The stop action creates a symbolic link between swanctl.conf and
swanctl_standby.conf file, as the stand-by controller has ipsec-
config service on disabled status.
3) The status action reports the current service status based on the
symbolic link associated with swanctl.conf file.
4) The meta-data action reports ipsec-config's meta-data info.
5) The monitor action indicates ipsec-config service is working as
expected. This action is performed on a specific interval to check
in-service status.
Test Plan:
PASS: Build a debian iso containing the changes.
PASS: Bootstrap, install and unlock a DX system w/ IPsec enabled. Wait
until system reboots and verify unlocked enable available status.
On controller-0, manually execute ipsec-config's start action and
observe that a symbolic link is created between swanctl.conf and
swanctl_active.conf.
/etc/swanctl/swanctl.conf -> /etc/swanctl/swanctl_active.conf
PASS: Bootstrap, install and unlock a DX system w/ IPsec enabled. Wait
until system reboots and verify unlocked enable available status.
On controller-1, manually execute ipsec-config's stop action and
observe that a symbolic link is created between swanctl.conf and
swanctl_standby.conf.
/etc/swanctl/swanctl.conf -> /etc/swanctl/swanctl_standby.conf
PASS: Manually execute ipsec-config's status action and observe status
report output. Observe that the output matches with the symbolic
link associated with /etc/swanctl/swanctl.conf.
PASS: Manually execute ipsec-config's monitor action. Observe that the
output matches with the symbolic link associated with
/etc/swanctl/swanctl.conf. It is expected that controller's
floating IP is addressed on system-local-nodes configuration for
an active controller. In return, controller's floating IP is not
expected on swanctl configuration for a stand-by controller.
Story: 2010940
Task: 49990
Change-Id: I45f06ad41f3240d4149a688cef130cd7c9ae7019
Signed-off-by: Manoel Benedito Neto <Manoel.BeneditoNeto@windriver.com>
This commit updated ipsec-client to generate two copies of swanctl
configurtion files for controller nodes, one for when the node is active
controller (swanctl_active.conf), and one for when the node is standby
controller (swanctl_standby.conf). A symlink (swanctl.conf) is created
pointing to one of the two config files based on the role of the node.
When controller swact, the symlink will be updated by a SM service.
Test Plan (IPv4 and IPv6 DX system):
PASS: controller-0 bootstrap, verify swanctl configuration files and
symlink are created in /etc/swanctl directory:
/etc/swanctl/swanctl_standby.conf
/etc/swanctl/swanctl_active.conf
/etc/swanctl/swanctl.conf -> /etc/swanctl/swanctl_active.conf
PASS: controller-1 installation, after installed, verify swanctl
configuration files and symlink are created in /etc/swanctl
directory:
/etc/swanctl/swanctl_standby.conf
/etc/swanctl/swanctl_active.conf
/etc/swanctl/swanctl.conf -> /etc/swanctl/swanctl_standby.conf
PASS: controller-1 unlock, after controller-1 is unlocked, verfiy that
during drbd synchronization there is no uncontrolled swact, and
controller-1 comes up in "enabled" and "available" state after
drbd is fully synced.
Story: 2010940
Task: 49927
Change-Id: Ic4b3d8a8368e87b2c9f875d5f9cdf555be25a682
Signed-off-by: Andy Ning <andy.ning@windriver.com>
Locking a controller takes a finite amount of time, resulting in a
brief window between issuing a lock command toward the inactive
controller and the controller actually entering the locked state.
Typically, this window lasts only a few seconds. However, during
periods of high system activity or when VMs or other migrations are
occurring, it can extend to a minute or longer before the controller
enters the locked state.
In some cases, initiating a 'system host-swact' command while the
inactive controller is in this 'Locking but not yet Locked' state has
led to a switch of activity to a locked controller.
The current pre-swact semantic check is inadequate in preventing
this race condition, which could result in a locked active controller.
This update adds a precheck of a list of in-progress actions, any of
which will now reject a swact request.
Test Plan:
PASS: Verify sysinv package build.
PASS: Verify swact is rejected for any of the in-progress actions
listed in the precheck.
PASS: Verify swact reject handling and output text.
PASS: Verify pep8 of changed lines.
Regression:
PASS: Verify swact handling when task is empty
PASS: Verify swact handling when task is not empty and not Locking
PASS: Verify Swact soak (10x)
Closes-Bug: 2064347
Change-Id: I78238fa649c330d7b908dbcf50f654c004205ee6
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
Included code in the upgrade healthy check to prevent the usage of
ICAs in the 'ca.crt' field of the 'system-local-ca' secret.
Also, included code to search through the trusted bundle for the RCA
if 'ca.crt' is not filled in the secret or if it is incorrect.
Test plan:
PASS: Perform 'system health-query-upgrade', verify that an error is
shown if the 'ca.crt' data of system-local-ca secret is filled
with a copy of the ICA.
PASS: Updated platform certificates in stx 9. Verified that
system-local-ca secret does not possess the field 'ca.crt'.
Perform upgrade from stx 9.0 (AIO-SX).
Verified that the 'ca.crt' field was filled with the correct RCA
cert.
Story: 2009811
Task: 50017
Change-Id: Ia3603b25d6d730a3465026b6b5291761d068613a
Signed-off-by: Marcelo Loebens <Marcelo.DeCastroLoebens@windriver.com>
This commit removes flag "mgmt_ipsec" from sysinv DB during host
reinstall action to allow a node to be reinstalled properly.
Test Plan:
PASS: In a DX system with IPsec enabled and security association
established in both controllers, run "echo "Li69nux*" | sudo -S
-u postgres psql -d sysinv -c "select hostname,capabilities from
i_host;" to see "mgmt_ipsec" flag is set to "enabled" in both
nodes. Then run "system host-lock controller-1" to lock
controller-1 and then run "system host-reinstall controller-1".
So, run "echo "Li69nux*" | sudo -S -u postgres psql -d sysinv -c
"select hostname,capabilities from i_host;" and observe
"mgmt_ipsec" flag was removed from controller-1 tuple. Wait until
controller-1 be reinstalled and run "system host-unlock
controller-1" to unlock the node and see IPsec enabled and
security association in both controllers again.
Story: 2010940
Task: 50011
Change-Id: I0a74759b45cbb7bdb585b672fe8ffe8d6e2a7407
Signed-off-by: Leonardo Mendes <Leonardo.MendesSantana@windriver.com>
A long term solution to avoid apps not updating in time
would require significant changes to the upgrade
system, so a simple change is being made in the short
term to see if some additional time will help. This
script only fails intermittently, so testing is simple
in nature.
Test Plan:
PASS: Upgrade from stx8 to stx9 without errors.
Closes-Bug: 2064315
Change-Id: If2571837b005e604a1001412401e35b8ce711867
Signed-off-by: Reed, Joshua <Joshua.Reed@windriver.com>
Add a CLI tool which will allow a developer to learn which
apps are compatible between the current K8s on the platform
and the target version specified. There are also safeguards
to prevent the user from supplying an invalid target K8s
version.
Add an --include-path option to provide the full file path
instead of simply the app name in the output
Usage:
sysinv-app query <target-k8s-version> [Optional: --include-path]
Output:
app_name_1
app_name_2
Output with --include-path:
/some/path/app_name_1.ver.tgz
/another/path/app_name_2.var.tgz
Depends-On: https://review.opendev.org/c/starlingx/config/+/909172
Test Plan:
PASS: Vary the current K8S installed on the platform
and manually modify the minimum/maximum K8s version
for an app in the KubAppBundle table. Verify that
the correct app list prints to terminal.
Story: 2010929
Task: 49875
Change-Id: Ie8dfb8cff9a587f9b52f8d94f7dd089c46dd5d63
Signed-off-by: Joshua Reed <joshua.reed@windriver.com>
This change updated ipsec-server and ipsec-client to generate IPsec
configuration suitable for IPv6 (and IPv4) system. A connection to
bypass local traffic (eg, traffic from unit IP to floating IP in active
controller) is added for both IPv4 and IPv6 system. And a connection to
bypass ICMPv6 protocol is added for IPv6 system only.
Reference to why ICMPv6 protocol is bypassed:
https://wiki.strongswan.org/projects/strongswan/wiki/IPv6NDP/1
Test Plan (IPv4 and IPv6 DX system):
PASS: controller-0 bootstrap, verify bootstrap is successfull, and
swanctl.conf is generated properly.
PASS: controller-1 installation, verify it is installed successfully,
and swanctl.conf is generated properly.
PASS: After controller-1 is installed, verifiy IPsec SAs are established
between controllers, and controller-1 is online.
PASS: controller-1 unlock, verfiy controller-1 is unlocked successfully,
and comes up in "enabled" "available" state.
PASS: Verify system commands (such as "system host-list") are working
properly.
PASS: Lock and unlock controller-1, verify they are successfull, IPsec
SAs re-established after unlock, and controller-1 comes back in
"enabled" and "available" state.
Story: 2010940
Task: 49926
Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/916839
Change-Id: I964d4d8fe10bbe8942f6effd8ca275218b8a4e92
Signed-off-by: Andy Ning <andy.ning@windriver.com>
After the "deploy host" command is executed to deploy a
software release, an alarm is created by [1] to indicate
success/failure and follow-up actions needed.
This commit clears the success alarm when the host
reboots running with the new software release.
[1] https://review.opendev.org/c/starlingx/update/+/916688
Test Case
PASS: run "deploy host", verify the alarm is created, unlock
the host and verify the alarm is cleared after host
reports inventory
PASS: lock/unlock host
PASS: install/bootstrap/unlock AIO-DX
Story: 2010676
Task: 49933
Depends-on: https://review.opendev.org/c/starlingx/update/+/916688
Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
Change-Id: Ia4494142b9f5487a0ddf472833d6c80719382f8a
StarlingX stopped supporting CentOS builds in the after release 7.0.
This update will strip CentOS from our code base. It will also remove
references to the failed OpenSUSE feature as well.
Story: 2011110
Task: 49944
Change-Id: I8cd4e23ab83f2fe064fa1f88553eb32a69a67265
Signed-off-by: Scott Little <scott.little@windriver.com>
This commit creates the sw_version field on i_host table, changes the
pxe file creation to use this field and also creates a migrate script
that will be executed on 22.12 to 24.09 scenario.
There are other places where the version is read/write from loads
table, these should be addressed by another commit.
Test Plan:
The tests below for deploy host was done from 24.03 to 24.09 iso.
PASS: AIO-DX install/bootstrap/unlock
PASS: AIO-SX install/bootstrap/unlock
PASS: System host-show <hostname> returning respective value at cli
PASS: Deploy host of controller-1
Story: 2010676
Task: 49865
Change-Id: I7be0c4e48a10d4296d1cda50c49d6d1992e89139
Signed-off-by: Luis Eduardo Bonatti <LuizEduardo.Bonatti@windriver.com>
Add the folowing enhancements to application updates during
Kubernetes upgrades:
1) Move the pre application update logic from kube-upgrade-start step
to a specific separated step called via a new command line option
named kube-pre-application-update, which can be triggered after the
download images step and before upgrading networking.
2) Move the post application update logic from kube-upgrade-complete
step to a specific separated step called via a new command line
option named kube-post-application-update, which can be triggered
after the kube-upgrade-complete stage and before the upgrade is
deleted.
3) Introduce validation logic to kube-upgrade-start step to check if
all applied apps have available versions compatible with intermediate
and target Kubernetes versions. Upgrades are blocked if apps marked
to be pre updated are incompatible with current and target Kubernetes
versions. Upgrades are also blocked if apps marked to be post updated
are incompatible with the target Kubernetes version.
4) Delete uploaded applications incompatible with the target Kubernetes
version and upload one that is compatible if available.
5) Restore kube-upgrade-start and kube-upgrade-complete to their
original logic before application updates during Kubernetes upgrades
was implemented on task 49416. The kube-upgrade-start step is
synchronous as it used to be before that change.
6) Update sysinv and cgts-client unit tests to account for the new
Kubernetes upgrade steps.
7) Create a helper function called "patch_kube_upgrade" to improve code
reuse when creating patch requests for new shell commands related to
Kubernetes upgrades.
Test Plan:
AIO-SX Test Cases:
PASS: Fresh install.
PASS: Successful Kubernetes single version upgrade with no apps that
need to be updated.
PASS: Successful Kubernetes multi-version upgrade with no apps that need
to be updated.
PASS: Successful Kubernetes upgrade with apps that need to be updated
before and after the new version is deployed.
PASS: Check if the upgrade is blocked if an app is incompatible with a
Kubernetes intermediate version during a multi-version
upgrade.
PASS: Check if the upgrade is blocked if an app marked to be pre updated
is incompatible with the Kubernetes target version.
PASS: Check if the upgrade is blocked if an app marked to be post
updated is incompatible with the Kubernetes target version.
PASS: Check if uploaded apps have been replaced by compatible versions.
PASS: Check if uploaded apps that do not have compatible versions were
removed.
PASS: Failure to run kube-pre-application-update and successful
retry.
PASS: Failure to run kube-post-application-update and successful
retry.
PASS: Abort during kube-pre-application-update and start over.
PASS: Reject aborting Kubernetes upgrade after post-updated-apps state.
AIO-DX Test Cases:
PASS: Fresh install.
PASS Successful Kubernetes upgrade with no apps that need to be
updated.
PASS: Successful Kubernetes upgrade with apps that need to be updated
before and after the new version is deployed.
PASS: Check if the upgrade is blocked if an app marked to be pre updated
is incompatible with the Kubernetes target version.
PASS: Check if the upgrade is blocked if an app marked to be post
updated is incompatible with the Kubernetes target version.
Story: 2010929
Task: 49595
Change-Id: I9b48567c39c9a12b7563d56ab90fbfe9dd7082aa
Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>
In both exception cases, the function return False and
do not thrown an error. Therefore the previous logging
as an exception gives a misleading error. Instead,
log as a warning.
Story: 2010929
Task: 49968
Change-Id: Idfde4a18375ed1746ef139e2d4ae0f4c0342a1bd
Signed-off-by: Joshua Reed <joshua.reed@windriver.com>
When there is only one monitor available on a storage system,
for example, the standby controller and storage-0 are locked,
we should first unlock the controller, and only then can we
unlock storage-0.
To fix this, the storage unlock check has been modified so
that if it is storage-0 and already provisioned, only one
monitor is required to perform the unlock.
This allows quorum to be reestablished as quickly as possible.
Test Plan:
PASS: Lock controller (not active) and storage-0
PASS: Unlock storage-0 and controller
PASS: Lock storage-0 and controller (not active)
PASS: Unlock storage-0 and controller
PASS: Lock controller (not active) and storage-0
PASS: Unlock controller and storage-0
PASS: Lock storage-0 and controller (not active)
PASS: Unlock controller and storage-0
PASS: Lock controller (not active), storage-1
and reboot storage-0
PASS: Unlock storage-1
PASS: Lock controller (not active)
and reinstall storage-0
PASS: In fresh install, shutdown controller
(not active) before unlocking storage-0
Closes-Bug: 2062569
Change-Id: I335be06c9dd17d5e099a7914955d4f7bf5f3b32e
Signed-off-by: Erickson Silva de Oliveira <Erickson.SilvadeOliveira@windriver.com>
check for /etc/swanctl/x509/system-ipsec-certificate-<hostname>.crt
exist and show in the output of "system certificate-list" also
show certificate details with "system certificate-show IPsec"
Test Cases:
PASS: Enable IPsec on controller-0, verify that IPsec certificate
list in the output of "system certificate-list" and
"system certificate-show IPsec" shows details of IPsec
certificate
PASS: Enable IPsec on controller-1, verify that IPsec certificate
list in the output of "system certificate-list" and
"system certificate-show IPsec" shows details of IPsec
certificate
PASS: verify that IPsec certificate not shown in the output of
"system certificate-list" if /etc/swanctl/x509/system-ipsec-
certificate-<hostname>.crt doesn't exit
Story: 2010940
Task: 49891
Change-Id: I95be304d99feff83e69750b90de289c1dde18b0c
Signed-off-by: amantri <ayyappa.mantri@windriver.com>
mtcClient needs that the pxeboot interface config filename to contain
the label "[network-id]" to find the pxeboot address by reading the
file in /etc/network/interfaces.d/.
During dual-stack development the format [net-id]-[addr-id] was
adopted to differentiate files for each protocol (ipv4 and IPv6) in
the same network.
Since this broke mtcClient operation, we keeping the previous format
to the pxeboot network, since it does not support dual-stack.
Test Plan
[PASS] Install AIO-DX with pxeboot, management and cluster-host on
the same ethernet port and verify:
- the system is up with no alarms.
- the pxeboot config file does not contain "-N" suffix
[PASS] Install AIO-DX with pxeboot, management and cluster-host on
the same bonding port and verify:
- the system is up with no alarms.
- the pxeboot config file does not contain "-N" suffix
[PASS] Install AIO-DX with pxeboot on the ethernet port, and
management and cluster-host on a vlan port on top of it and
verify:
- the system is up with no alarms.
- the pxeboot config file does not contain "-N" suffix
Story: 2011027
Task: 49919
Change-Id: I42ab55d15c0df7d6a14377278b2f7624e83cb836
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
A parent change has modified the default subject for the platform
certificates in 'starlingx/ansible-playbooks'. This change extends
the upgrade script '81-create-required-platform-certs.py' to
apply the same changes during upgrades.
The leaf certificate's following fields will be modified if not
customized by the user:
- 'commonName' - default now is <cert_short_name>
- 'localities' - default now is <region>
- 'organization' - default now is 'starlingx'
Test plan:
PASS: Manually execute the upgrade script and check the subject
fields:
- With old default values included in commonName and
localities.
(should be replaced w/ the new default)
- With commonName or localities different from previous
defaults.
(should be kept the same)
Story: 2009811
Task: 49832
Change-Id: If32172419836a02625144a87934fe75802311712
Signed-off-by: Marcelo Loebens <Marcelo.DeCastroLoebens@windriver.com>
Was create unit tests to cover the new auto_update logic.
The new logic includes a new method called
"_get_app_bundle_for_update" which takes into account different
aspects of application metadata to figure out whether an app
should be auto-updated.
In addition to checking application version numbers, the unit
tests cover different scenarios and code paths such as different
minimum and maximum Kubernetes versions, whether auto_update is
enabled and the update timing during k8s upgrades.
Unit tests were also created for some more functions involved
in the logic.
Test plan:
PASS: Run tox py39, pylint and verify that they are
all passing.
Story: 2010929
Task: 49892
Signed-off-by: David Bastos <david.barbosabastos@windriver.com>
Change-Id: I44d798fe1d0e9883103745c32763894a35e445a2