Commit Graph

5387 Commits

Author SHA1 Message Date
Zuul 9fd2da6266 Merge "Fix host add by DHCP" 2024-05-15 02:30:50 +00:00
Luis Eduardo Bonatti 982b1d89f1 Fix host add by DHCP
This commit fix the issue to add a new host by adding a try except
avoiding error.

Test Plan:
PASS: Add a new host with "system host-update 2 personality=controller"
 cmd.

Closes-Bug: 2065636

Change-Id: Ie658058ab6cdbab042d8428f77a81499eb8fbc82
Signed-off-by: Luis Eduardo Bonatti <LuizEduardo.Bonatti@windriver.com>
2024-05-14 13:52:59 -03:00
Zuul 8b8b761be1 Merge "Always generate network ifcfg files with label" 2024-05-13 20:53:37 +00:00
Andre Kantek 378fee63d3 Always generate network ifcfg files with label
During dual-stack (IPv4 and IPv6) network testing, it was observed
that traffic on the primary address family (e.g., IPv4) was
interrupted when the secondary address pool (e.g., IPv6) was
configured in Linux.

This issue stemmed from how StarlingX manipulated ifcfg files. When an
interface configuration file contained only one address family, the
final file used lacked a label. However, for dual-stack
configurations, a separate labeled file was generated for the same
interface.

This behavior caused problems when apply_network_config.h script
executed. It compares the contents of /etc/network/interface.d/ with
the configuration provided by Puppet. Since the files differed due to
the missing label in the single-address case, the script triggered an
unnecessary ifdown operation on the entire interface, not just the
labels, leading to traffic interruption.

PXE boot interfaces are an exception to the labeling requirement as
MTCE relies on the filename during boot to extract information.
Therefore, when a PXE boot interface is the only network configured
on an interface, no label is generated. This is acceptable because
PXE boot typically uses IPv4 (single-stack) and doesn't encounter
the dual-stack labeling issue.

Test Plan
[PASS] Install AIO-SX in single-stack and then add dual-stack config
       for OAM network in runtime and observe that there is no traffic
       interruption as the secondary address is added
[PASS] Install AIO-DX in single-stack with the following variants:
       - ethernet port with {mgmt, cluster-host, pxeboot} networks
       - ethernet port with pxeboot and vlan with {mgmt, cluster-host}
          networks
       - bonding port with {mgmt, cluster-host, pxeboot} networks
       - bonding port with pxeboot and vlan with {mgmt, cluster-host}
          networks

Story: 2011027
Task: 50054

Change-Id: I8df423a428c7a853b65f7b448f4c0740f7e72321
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2024-05-13 14:39:32 -03:00
Zuul 180ac1df30 Merge "Fix 'sysinv-helm create-fluxcd-app-overrides' command" 2024-05-10 15:44:02 +00:00
Joshua Reed 83f9b48a47 Eliminate file check in Armada required function.
Previously the function was basing a decision whether
or not Armada is required soley on whether or not
the folder /opt/platform/armada exists or not.  There
might be the condition where an Armada application was
installed on a lower version of STX, then upgraded to
FluxCD, and the old armada manifests left over.  This
change seeks to make the determination about Armada
being required solely upon the helm2 cli showing a release
and if a pod existing in the armada namespace existing with
the "application=armada" label.

Test Plan:
PASS: Upgrade activation step between stx9.0 and future
      stx10.0 and 76 script passes.
PASS: Force install an old armada application to check
      if the 76 script leaves armada in place. In this
      case the Armada pod is detected and the script
      exit and activation fails as it should.

Closes-Bug:2065320

Change-Id: I50dabe843549f7f84522c2a61056560c5c084da5
Signed-off-by: Joshua Reed <joshua.reed@windriver.com>
2024-05-10 06:37:23 -06:00
David Bastos 81ace1c7b5 Fix 'sysinv-helm create-fluxcd-app-overrides' command
The 'sysinv-helm create-fluxcd-app-overrides <app_name>
<namespace>' is a command that allows a user to generate helm
override values for a helm chart independently of an application
upload/apply. This command is useful for testing.

It was broken due to a cache cleaning done in apps_metadata_dict
that mistakenly added a different key to the one being used.

The command correction is done by changing the dictionary's key
set to the correct key.

Test Plan:
PASS: Run "sysinv-helm create-fluxcd-app-overrides
      /home/sysadmin oidc-auth-apps kube-system"
      commamd and generate files with success.
PASS: Run "sysinv-helm create-fluxcd-app-overrides
      /home/sysadmin/ cert-manager cert-manager" command and
      generate files with success.

Closes-Bug: 2060864

Change-Id: If5aa2bc96577811182ed0fd326c55b229410c4ff
Signed-off-by: David Bastos <david.barbosabastos@windriver.com>
2024-05-10 11:48:14 +00:00
Zuul 69e075e250 Merge "Fix IPSec client to use hostname in workers nodes" 2024-05-07 15:41:54 +00:00
Zuul 9013af7035 Merge "Don't allow user to enable QAT chart" 2024-05-07 15:04:41 +00:00
Zuul 8a475df399 Merge "add secondary address variable for public HAproxy config" 2024-05-07 13:38:50 +00:00
Zuul dba272121f Merge "Prevent swacting to a 'Locking' controller" 2024-05-06 22:58:06 +00:00
Zuul dc2446fc1a Merge "Skip app recovery if lifecycle fails during update" 2024-05-06 21:17:18 +00:00
Md Irshad Sheikh 31c5637584 Don't allow user to enable QAT chart
This commit adds the code to raise the error for the following command,
if the hardware does not have 4940 or 4942 QAT devices.
Command: "helm-chart-attribute-modify  --enabled true
intel-device-plugins-operator intel-device-plugins-qat
intel-device-plugins-operator"

TEST CASES:

PASSED: Build is success.
PASSED: Bootstrap is success.
PASSED: Upload the package using command system application-upload.
PASSED: Check chart enabled status using command
        "system helm-override-list intel-device-plugins-operator --long"
PASSED: Enable the QAT chart using command "system
        helm-chart-attribute-modify", raises error in non QAT system.
PASSED: Enable the QAT chart using command "system
        helm-chart-attribute-modify", it should success on QAT system.
PASSED: Disable the QAT chart using command "system
        helm-chart-attribute-modify", it should success in both QAT and
        non QAT system.
PASSED: Apply app intel-device-plugins-operator on both QAT and non QAT
        system is successful.

Story: 2010604
Task: 50027

Change-Id: Ied634cf35b53421bcaa2f8307e76a0fc87d3bb1f
Signed-off-by: Md Irshad Sheikh <mdirshad.sheikh@windriver.com>
2024-05-06 16:16:21 -04:00
Andre Kantek 1ddfa95fec add secondary address variable for public HAproxy config
This change adds the variable public_secondary_ip_address to
platform::haproxy::params filled with the secondary OAM address pool
floating address value, in a similar way that is done for the primary
address pool. This will be used in HAproxy to bind the necessary L4
public ports to the secondary address.

Test plan
[PASS] Install and add a secondary pool via CLI and, then, after
        lock/unlock, check that all public endpoints (openstack
        endpoint list) are available in the primary and secondary
        addresses, on the following setups:
        - AIO-SX (prim:IPv4, sec:IPv6)
        - AIO-SX (prim:IPv6, sec:IPv4)
        - AIO-DX (prim:IPv4, sec:IPv6) with system-controller role
        - AIO-DX (prim:IPv6, sec:IPv4) with system-controller role
[PASS] Access the public APIs on both protocols using curl.

Story: 2011027
task: 49996

Change-Id: I1b79f4e462ab34ab2aa7187d92460202fa15ae7e
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2024-05-06 15:16:46 -03:00
Zuul f12ae5d7c1 Merge "Change certificate snapshot to debug logging" 2024-05-06 16:53:41 +00:00
Zuul 2e75b8b336 Merge "Add fqdn for management network in usm" 2024-05-06 15:15:01 +00:00
Igor Soares 9cc099e2b7 Skip app recovery if lifecycle fails during update
Skip recovery for applications that have update_failure_no_rollback
set to 'true' and that eventually fail to pass lifecycle semantic
checks during updates.

Triggering the recovery of an app that does not support rollbacks can
result in a broken state. This aims to standardize the behavior of the
application update process by equalizing how we handle lifecycle
semantic checks failures to other update errors such as apply failures.

Test Plan:
PASS: build-pkgs && build-image
PASS: AIO-SX fresh install
PASS: Create a modified version of cert-manager setting the
      'update_failure_no_rollback' option to 'true'.
      Update cert-manager to the modified version.
      Confirm that the update succeeded.
PASS: Create a modified version of cert-manager setting the
      'update_failure_no_rollback' option to 'true'.
      Force an exception when running lifecycle semantic checks.
      Update cert-manager to the modified version.
      Confirm that the update failed with a descriptive error message
      informing that the skip recovery feature is enabled.
      Fix the code and reapply the app.
      Confirm that the app was successfully applied.

Closes-Bug: 2064737

Change-Id: Ie90c5c3c3a79d8502eb9cc1aa11222963ba13621
Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>
2024-05-06 11:37:40 -03:00
Leonardo Mendes bcac3d13f7 Fix IPSec client to use hostname in workers nodes
This commits changes IPSec client to use hostname instead of IP Address
on swanctl configuration parameter local addr in worker nodes.

Test Plan:
PASS: In a DX system with IPsec enabled and security association
      established in both controllers, add a worker node and observe
      that IPSec will be enabled and security association will
      established in the three nodes without manually intervention.

Story: 2010940
Task: 50039

Change-Id: Idba336e3870f33db840846578441984e11b0d574
Signed-off-by: Leonardo Mendes <Leonardo.MendesSantana@windriver.com>
2024-05-06 11:14:35 -03:00
Zuul 7678476fa4 Merge "Add and Configure IPsec Config Service" 2024-05-06 13:59:46 +00:00
amantri f49374ecea Change certificate snapshot to debug logging
When user executes "system health-query" commands, it logs the
full certificate snapshot to sysinv.log, this is happening
because the CertAlarmAudit is imported to health.py module
to check for any expiry/expired alarms before upgrade activity.
This fix addresses this issue by changing the "info" log to
"debug".

Test Cases:
PASS: Run "system health-query", "system health-query-kube-upg
      rade" , "system health-query-upgrade" and verify that
      cert snapshot is logging only in the debug mode.

Closes-bug: 2064925

Change-Id: Ia0482a557931afdef89a6fa88017ea488a6dca59
Signed-off-by: amantri <ayyappa.mantri@windriver.com>
2024-05-06 09:49:42 -04:00
Joseph Vazhappilly 233714aef0 Add fqdn for management network in usm
Modify USM software to use fqdn host name for management network

Test Plan:
PASS: Install DC subcloud, ensure it is in managed state,
      and execute software commands (Eg. software list)

Closes-Bug: 2063460

Change-Id: I1782d02d58dfe3c8a08048f6d807e3e62532b292
Signed-off-by: Joseph Vazhappilly <joseph.vazhappillypaily@windriver.com>
2024-05-06 07:21:22 -04:00
Zuul ab5b79c106 Merge "Update ipsec-client to generate two swanctl.conf" 2024-05-03 16:31:13 +00:00
Zuul 2aed6e9eed Merge "Update IPsec config generation for IPv6" 2024-05-03 15:29:27 +00:00
Manoel Benedito Neto 68b06da7b8 Add and Configure IPsec Config Service
This commit adds and installs ipsec-config script executed during the
execution of the sm-service. The ipsec-config service has the goal to
create a symbolic link between swanctl.conf file and different .conf
files depending on which personality the controller node is assuming,
swanctl_active.conf or swanctl_standby.conf.

This script implements 5 actions: start, stop, status, meta-data and
monitor.
1) The start action creates a symbolic link between swanctl.conf and
   swanctl_active.conf file, as the active controller has ipsec-config
   service on enabled-active status.
2) The stop action creates a symbolic link between swanctl.conf and
   swanctl_standby.conf file, as the stand-by controller has ipsec-
   config service on disabled status.
3) The status action reports the current service status based on the
   symbolic link associated with swanctl.conf file.
4) The meta-data action reports ipsec-config's meta-data info.
5) The monitor action indicates ipsec-config service is working as
   expected. This action is performed on a specific interval to check
   in-service status.

Test Plan:
PASS: Build a debian iso containing the changes.
PASS: Bootstrap, install and unlock a DX system w/ IPsec enabled. Wait
      until system reboots and verify unlocked enable available status.
      On controller-0, manually execute ipsec-config's start action and
      observe that a symbolic link is created between swanctl.conf and
      swanctl_active.conf.
      /etc/swanctl/swanctl.conf -> /etc/swanctl/swanctl_active.conf
PASS: Bootstrap, install and unlock a DX system w/ IPsec enabled. Wait
      until system reboots and verify unlocked enable available status.
      On controller-1, manually execute ipsec-config's stop action and
      observe that a symbolic link is created between swanctl.conf and
      swanctl_standby.conf.
      /etc/swanctl/swanctl.conf -> /etc/swanctl/swanctl_standby.conf
PASS: Manually execute ipsec-config's status action and observe status
      report output. Observe that the output matches with the symbolic
      link associated with /etc/swanctl/swanctl.conf.
PASS: Manually execute ipsec-config's monitor action. Observe that the
      output matches with the symbolic link associated with
      /etc/swanctl/swanctl.conf. It is expected that controller's
      floating IP is addressed on system-local-nodes configuration for
      an active controller. In return, controller's floating IP is not
      expected on swanctl configuration for a stand-by controller.

Story: 2010940
Task: 49990

Change-Id: I45f06ad41f3240d4149a688cef130cd7c9ae7019
Signed-off-by: Manoel Benedito Neto <Manoel.BeneditoNeto@windriver.com>
2024-05-02 21:18:22 +00:00
Andy Ning 8e1ec99d09 Update ipsec-client to generate two swanctl.conf
This commit updated ipsec-client to generate two copies of swanctl
configurtion files for controller nodes, one for when the node is active
controller (swanctl_active.conf), and one for when the node is standby
controller (swanctl_standby.conf). A symlink (swanctl.conf) is created
pointing to one of the two config files based on the role of the node.
When controller swact, the symlink will be updated by a SM service.

Test Plan (IPv4 and IPv6 DX system):
PASS: controller-0 bootstrap, verify swanctl configuration files and
      symlink are created in /etc/swanctl directory:
      /etc/swanctl/swanctl_standby.conf
      /etc/swanctl/swanctl_active.conf
      /etc/swanctl/swanctl.conf -> /etc/swanctl/swanctl_active.conf
PASS: controller-1 installation, after installed, verify swanctl
      configuration files and symlink are created in /etc/swanctl
      directory:
      /etc/swanctl/swanctl_standby.conf
      /etc/swanctl/swanctl_active.conf
      /etc/swanctl/swanctl.conf -> /etc/swanctl/swanctl_standby.conf
PASS: controller-1 unlock, after controller-1 is unlocked, verfiy that
      during drbd synchronization there is no uncontrolled swact, and
      controller-1 comes up in "enabled" and "available" state after
      drbd is fully synced.

Story: 2010940
Task: 49927

Change-Id: Ic4b3d8a8368e87b2c9f875d5f9cdf555be25a682
Signed-off-by: Andy Ning <andy.ning@windriver.com>
2024-05-02 14:27:20 -04:00
Zuul 2dbd5f0b84 Merge "Fix system-local-ca ca.crt for upgrades" 2024-05-02 18:23:30 +00:00
Zuul 159039de4c Merge "Provide helper CLI utility to check app K8s compatibility." 2024-05-01 22:32:15 +00:00
Zuul 345502d38f Merge "Add IPSec support node reinstallation" 2024-05-01 17:18:56 +00:00
Zuul a1df0259a3 Merge "Increase timeout of 95-watch-apps-upgrade.sh by 50%" 2024-05-01 16:08:27 +00:00
Zuul 960fcede41 Merge "Changes the number of monitors required to unlock storage" 2024-05-01 14:31:05 +00:00
Eric MacDonald f29cc84ba3 Prevent swacting to a 'Locking' controller
Locking a controller takes a finite amount of time, resulting in a
brief window between issuing a lock command toward the inactive
controller and the controller actually entering the locked state.

Typically, this window lasts only a few seconds. However, during
periods of high system activity or when VMs or other migrations are
occurring, it can extend to a minute or longer before the controller
enters the locked state.

In some cases, initiating a 'system host-swact' command while the
inactive controller is in this 'Locking but not yet Locked' state has
led to a switch of activity to a locked controller.

The current pre-swact semantic check is inadequate in preventing
this race condition, which could result in a locked active controller.

This update adds a precheck of a list of in-progress actions, any of
which will now reject a swact request.

Test Plan:

PASS: Verify sysinv package build.
PASS: Verify swact is rejected for any of the in-progress actions
      listed in the precheck.
PASS: Verify swact reject handling and output text.
PASS: Verify pep8 of changed lines.

Regression:

PASS: Verify swact handling when task is empty
PASS: Verify swact handling when task is not empty and not Locking
PASS: Verify Swact soak (10x)

Closes-Bug: 2064347
Change-Id: I78238fa649c330d7b908dbcf50f654c004205ee6
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2024-05-01 13:23:54 +00:00
Marcelo Loebens 0db4edfa42 Fix system-local-ca ca.crt for upgrades
Included code in the upgrade healthy check to prevent the usage of
ICAs in the 'ca.crt' field of the 'system-local-ca' secret.

Also, included code to search through the trusted bundle for the RCA
if 'ca.crt' is not filled in the secret or if it is incorrect.

Test plan:
PASS: Perform 'system health-query-upgrade', verify that an error is
      shown if the 'ca.crt' data of system-local-ca secret is filled
      with a copy of the ICA.
PASS: Updated platform certificates in stx 9. Verified that
      system-local-ca secret does not possess the field 'ca.crt'.
      Perform upgrade from stx 9.0 (AIO-SX).
      Verified that the 'ca.crt' field was filled with the correct RCA
      cert.

Story: 2009811
Task: 50017

Change-Id: Ia3603b25d6d730a3465026b6b5291761d068613a
Signed-off-by: Marcelo Loebens <Marcelo.DeCastroLoebens@windriver.com>
2024-05-01 00:28:28 +00:00
Leonardo Mendes de7a3ce5ca Add IPSec support node reinstallation
This commit removes flag "mgmt_ipsec" from sysinv DB during host
reinstall action to allow a node to be reinstalled properly.

Test Plan:
PASS: In a DX system with IPsec enabled and security association
      established in both controllers, run "echo "Li69nux*" | sudo -S
      -u postgres psql -d sysinv -c "select hostname,capabilities from
      i_host;" to see "mgmt_ipsec" flag is set to "enabled" in both
      nodes. Then run "system host-lock controller-1" to lock
      controller-1 and then run "system host-reinstall controller-1".
      So, run "echo "Li69nux*" | sudo -S -u postgres psql -d sysinv -c
      "select hostname,capabilities from i_host;" and observe
      "mgmt_ipsec" flag was removed from controller-1 tuple. Wait until
      controller-1 be reinstalled and run "system host-unlock
      controller-1" to unlock the node and see IPsec enabled and
      security association in both controllers again.

Story: 2010940
Task: 50011

Change-Id: I0a74759b45cbb7bdb585b672fe8ffe8d6e2a7407
Signed-off-by: Leonardo Mendes <Leonardo.MendesSantana@windriver.com>
2024-04-30 16:32:13 -03:00
Zuul d26c26f649 Merge "Replace load sysinv data" 2024-04-30 18:35:59 +00:00
Zuul 5e85cd3920 Merge "Clear deploy host alarm when host report inventory" 2024-04-30 18:12:35 +00:00
Reed, Joshua 43ebe161d0 Increase timeout of 95-watch-apps-upgrade.sh by 50%
A long term solution to avoid apps not updating in time
would require significant changes to the upgrade
system, so a simple change is being made in the short
term to see if some additional time will help.  This
script only fails intermittently, so testing is simple
in nature.

Test Plan:
PASS: Upgrade from stx8 to stx9 without errors.

Closes-Bug: 2064315

Change-Id: If2571837b005e604a1001412401e35b8ce711867
Signed-off-by: Reed, Joshua <Joshua.Reed@windriver.com>
2024-04-30 11:21:53 -06:00
Zuul 79c94ed7b2 Merge "Enhance app updates during Kubernetes upgrades" 2024-04-30 15:58:21 +00:00
Joshua Reed cecc1bb421 Provide helper CLI utility to check app K8s compatibility.
Add a CLI tool which will allow a developer to learn which
apps are compatible between the current K8s on the platform
and the target version specified.  There are also safeguards
to prevent the user from supplying an invalid target K8s
version.

Add an --include-path option to provide the full file path
instead of simply the app name in the output

Usage:

sysinv-app query <target-k8s-version> [Optional: --include-path]

Output:

app_name_1
app_name_2

Output with --include-path:
/some/path/app_name_1.ver.tgz
/another/path/app_name_2.var.tgz

Depends-On: https://review.opendev.org/c/starlingx/config/+/909172

Test Plan:
PASS: Vary the current K8S installed on the platform
      and manually modify the minimum/maximum K8s version
      for an app in the KubAppBundle table. Verify that
      the correct app list prints to terminal.

Story: 2010929
Task: 49875

Change-Id: Ie8dfb8cff9a587f9b52f8d94f7dd089c46dd5d63
Signed-off-by: Joshua Reed <joshua.reed@windriver.com>
2024-04-30 07:28:35 -06:00
Andy Ning a9d790302c Update IPsec config generation for IPv6
This change updated ipsec-server and ipsec-client to generate IPsec
configuration suitable for IPv6 (and IPv4) system. A connection to
bypass local traffic (eg, traffic from unit IP to floating IP in active
controller) is added for both IPv4 and IPv6 system. And a connection to
bypass ICMPv6 protocol is added for IPv6 system only.

Reference to why ICMPv6 protocol is bypassed:
https://wiki.strongswan.org/projects/strongswan/wiki/IPv6NDP/1

Test Plan (IPv4 and IPv6 DX system):
PASS: controller-0 bootstrap, verify bootstrap is successfull, and
      swanctl.conf is generated properly.
PASS: controller-1 installation, verify it is installed successfully,
      and swanctl.conf is generated properly.
PASS: After controller-1 is installed, verifiy IPsec SAs are established
      between controllers, and controller-1 is online.
PASS: controller-1 unlock, verfiy controller-1 is unlocked successfully,
      and comes up in "enabled" "available" state.
PASS: Verify system commands (such as "system host-list") are working
      properly.
PASS: Lock and unlock controller-1, verify they are successfull, IPsec
      SAs re-established after unlock, and controller-1 comes back in
      "enabled" and "available" state.

Story: 2010940
Task: 49926

Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/916839
Change-Id: I964d4d8fe10bbe8942f6effd8ca275218b8a4e92
Signed-off-by: Andy Ning <andy.ning@windriver.com>
2024-04-29 13:07:17 -04:00
Heitor Matsui cac8635e7a Clear deploy host alarm when host report inventory
After the "deploy host" command is executed to deploy a
software release, an alarm is created by [1] to indicate
success/failure and follow-up actions needed.

This commit clears the success alarm when the host
reboots running with the new software release.

[1] https://review.opendev.org/c/starlingx/update/+/916688

Test Case
PASS: run "deploy host", verify the alarm is created, unlock
      the host and verify the alarm is cleared after host
      reports inventory
PASS: lock/unlock host
PASS: install/bootstrap/unlock AIO-DX

Story: 2010676
Task: 49933

Depends-on: https://review.opendev.org/c/starlingx/update/+/916688

Signed-off-by: Heitor Matsui <heitorvieira.matsui@windriver.com>
Change-Id: Ia4494142b9f5487a0ddf472833d6c80719382f8a
2024-04-29 11:11:45 -03:00
Zuul cd5f286b14 Merge "Remove CentOS/OpenSUSE build support" 2024-04-29 13:17:11 +00:00
Scott Little 7c4d01df61 Remove CentOS/OpenSUSE build support
StarlingX stopped supporting CentOS builds in the after release 7.0.
This update will strip CentOS from our code base.  It will also remove
references to the failed OpenSUSE feature as well.

Story: 2011110
Task: 49944

Change-Id: I8cd4e23ab83f2fe064fa1f88553eb32a69a67265
Signed-off-by: Scott Little <scott.little@windriver.com>
2024-04-26 13:45:07 -04:00
Luis Eduardo Bonatti 6c6e8c192a Replace load sysinv data
This commit creates the sw_version field on i_host table, changes the
pxe file creation to use this field and also creates a migrate script
that will be executed on 22.12 to 24.09 scenario.

There are other places where the version is read/write from loads
table, these should be addressed by another commit.

Test Plan:
The tests below for deploy host was done from 24.03 to 24.09 iso.

PASS: AIO-DX install/bootstrap/unlock
PASS: AIO-SX install/bootstrap/unlock
PASS: System host-show <hostname> returning respective value at cli
PASS: Deploy host of controller-1

Story: 2010676
Task: 49865

Change-Id: I7be0c4e48a10d4296d1cda50c49d6d1992e89139
Signed-off-by: Luis Eduardo Bonatti <LuizEduardo.Bonatti@windriver.com>
2024-04-26 16:30:33 +00:00
Zuul 64021ed7da Merge "Change is_chart_enabled log output to warnings." 2024-04-26 12:33:24 +00:00
Igor Soares 7bd361b69b Enhance app updates during Kubernetes upgrades
Add the folowing enhancements to application updates during
Kubernetes upgrades:

1) Move the pre application update logic from kube-upgrade-start step
   to a specific separated step called via a new command line option
   named kube-pre-application-update, which can be triggered after the
   download images step and before upgrading networking.
2) Move the post application update logic from kube-upgrade-complete
   step to a specific separated step called via a new command line
   option named kube-post-application-update, which can be triggered
   after the kube-upgrade-complete stage and before the upgrade is
   deleted.
3) Introduce validation logic to kube-upgrade-start step to check if
   all applied apps have available versions compatible with intermediate
   and target Kubernetes versions. Upgrades are blocked if apps marked
   to be pre updated are incompatible with current and target Kubernetes
   versions. Upgrades are also blocked if apps marked to be post updated
   are incompatible with the target Kubernetes version.
4) Delete uploaded applications incompatible with the target Kubernetes
   version and upload one that is compatible if available.
5) Restore kube-upgrade-start and kube-upgrade-complete to their
   original logic before application updates during Kubernetes upgrades
   was implemented on task 49416. The kube-upgrade-start step is
   synchronous as it used to be before that change.
6) Update sysinv and cgts-client unit tests to account for the new
   Kubernetes upgrade steps.
7) Create a helper function called "patch_kube_upgrade" to improve code
   reuse when creating patch requests for new shell commands related to
   Kubernetes upgrades.

Test Plan:
AIO-SX Test Cases:
PASS: Fresh install.
PASS: Successful Kubernetes single version upgrade with no apps that
      need to be updated.
PASS: Successful Kubernetes multi-version upgrade with no apps that need
      to be updated.
PASS: Successful Kubernetes upgrade with apps that need to be updated
      before and after the new version is deployed.
PASS: Check if the upgrade is blocked if an app is incompatible with a
      Kubernetes intermediate version during a multi-version
      upgrade.
PASS: Check if the upgrade is blocked if an app marked to be pre updated
      is incompatible with the Kubernetes target version.
PASS: Check if the upgrade is blocked if an app marked to be post
      updated is incompatible with the Kubernetes target version.
PASS: Check if uploaded apps have been replaced by compatible versions.
PASS: Check if uploaded apps that do not have compatible versions were
      removed.
PASS: Failure to run kube-pre-application-update and successful
      retry.
PASS: Failure to run kube-post-application-update and successful
      retry.
PASS: Abort during kube-pre-application-update and start over.
PASS: Reject aborting Kubernetes upgrade after post-updated-apps state.
AIO-DX Test Cases:
PASS: Fresh install.
PASS  Successful Kubernetes upgrade with no apps that need to be
      updated.
PASS: Successful Kubernetes upgrade with apps that need to be updated
      before and after the new version is deployed.
PASS: Check if the upgrade is blocked if an app marked to be pre updated
      is incompatible with the Kubernetes target version.
PASS: Check if the upgrade is blocked if an app marked to be post
      updated is incompatible with the Kubernetes target version.

Story: 2010929
Task: 49595

Change-Id: I9b48567c39c9a12b7563d56ab90fbfe9dd7082aa
Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>
2024-04-25 17:59:54 -03:00
Zuul 02c7893348 Merge "Add IPsec certificate to "system certificate-list"" 2024-04-25 14:09:46 +00:00
Zuul 16117a606f Merge "Create unit tests for the auto update logic" 2024-04-25 14:09:34 +00:00
Joshua Reed f173b28fbe Change is_chart_enabled log output to warnings.
In both exception cases, the function return False and
do not thrown an error.  Therefore the previous logging
as an exception gives a misleading error.  Instead,
log as a warning.

Story: 2010929
Task: 49968

Change-Id: Idfde4a18375ed1746ef139e2d4ae0f4c0342a1bd
Signed-off-by: Joshua Reed <joshua.reed@windriver.com>
2024-04-24 14:08:52 -06:00
Zuul 8b82a82372 Merge "Export apiserver cluster IP to puppet var" 2024-04-24 18:54:47 +00:00
Erickson Silva de Oliveira 30edc5737e Changes the number of monitors required to unlock storage
When there is only one monitor available on a storage system,
for example, the standby controller and storage-0 are locked,
we should first unlock the controller, and only then can we
unlock storage-0.

To fix this, the storage unlock check has been modified so
that if it is storage-0 and already provisioned, only one
monitor is required to perform the unlock.

This allows quorum to be reestablished as quickly as possible.

Test Plan:
  PASS: Lock controller (not active) and storage-0
  PASS: Unlock storage-0 and controller

  PASS: Lock storage-0 and controller (not active)
  PASS: Unlock storage-0 and controller

  PASS: Lock controller (not active) and storage-0
  PASS: Unlock controller and storage-0

  PASS: Lock storage-0 and controller (not active)
  PASS: Unlock controller and storage-0

  PASS: Lock controller (not active), storage-1
        and reboot storage-0
  PASS: Unlock storage-1

  PASS: Lock controller (not active)
        and reinstall storage-0

  PASS: In fresh install, shutdown controller
        (not active) before unlocking storage-0

Closes-Bug: 2062569

Change-Id: I335be06c9dd17d5e099a7914955d4f7bf5f3b32e
Signed-off-by: Erickson Silva de Oliveira <Erickson.SilvadeOliveira@windriver.com>
2024-04-24 13:51:28 -03:00