Commit Graph

1792 Commits

Author SHA1 Message Date
Zuul 2211787bf3 Merge "Update cert-manager image tags for upgrade" 2024-04-26 15:51:13 +00:00
Zuul e5dba2566b Merge "Limit dcmanager related tasks to bootstrap mode" 2024-04-25 20:40:17 +00:00
Gustavo Pereira bdc2c5c89d Limit dcmanager related tasks to bootstrap mode
This commit fixes the solution introduced in
https://review.opendev.org/c/starlingx/ansible-playbooks/+/912317.

Test Plan:

PASS: Deploy a DC environment with one SX and one DX subcloud
and backup both subclouds. Restore the subclouds backup and
verify that both operations completes successfully.

Story: 2011035
Task: 49694

Signed-off-by: Gustavo Pereira <gustavo.lyrapereira@windriver.com>
Change-Id: I9f84328d15fba6acf867e6a322e97e4dd3b2a6df
2024-04-25 18:33:42 +00:00
amantri 6bcbd05fcf Update cert-manager image tags for upgrade
Add cert-manager images from v1.7.1 to v1.11.5 to support upgrade
from stx9.0 to stx10.0

Test Cases:
PASS: Perform an upgrade from stx9.0 to stx10.0 and after
      running upgrade playbook verify that cert-manager app
      is successfully running, perform upgrade activate
      and notice that app is upgraded.

Closes-Bug: 2063372

Change-Id: I30fc44bb3e76375c0590233708a8cc23b6e1141c
Signed-off-by: amantri <ayyappa.mantri@windriver.com>
2024-04-24 17:08:42 -04:00
Zuul 1aa1eb6905 Merge "Add L4 default ports during non-optimized restore" 2024-04-24 17:55:21 +00:00
Fabiano Correa Mercer 3a6a40c229 Add L4 default ports during non-optimized restore
Previously, L4 ports had default values defined in Puppet classes for
bootstrap and backup/restore scenarios.
These defaults were removed to ensure all ports are managed by the
firewall. The change is:
https://review.opendev.org/c/starlingx/stx-puppet/+/885586

While this functions well for fresh installations, it caused an issue
during DX subcloud backup and restore. Specifically, the Ansible
playbook wasn't configuring L4 ports during subcloud restore.

Test Plan:
IPv4 DC with subcloud AIO-DX fresh install
IPv4 AIO-DX fresh install
IPv4 AIO-SX fresh install
IPv4 Subcloud AIO-DX Backup and Restore
IPv4 AIO-DX Backup and Restore
IPv4 AIO-SX Backup and Restore

Closes-Bug: 2056054

Signed-off-by: Fabiano Correa Mercer <fabiano.correamercer@windriver.com>

Change-Id: I91b0d0e714aff1a2a0dbfbb1031975d010872c81
2024-04-23 15:20:19 -03:00
Zuul d8389aa2a1 Merge "Revert CNI images for K8s 1.24" 2024-04-22 17:31:01 +00:00
Zuul 766f111812 Merge "Ansible playbooks for vault backup and restore" 2024-04-22 15:10:06 +00:00
Tae Park 0f65fb3fb0 Ansible playbooks for vault backup and restore
Creating new ansible playbooks vault_backup and vault_restore that
creates a vault snapshot for backup and uses it to restore vault
respectively. Each playbook invokes the vault backup/restore script to
access vault REST API.

The vault_backup playbook has one required option and one optional option:
required:
--initial_backup_dir: the path to the directory, where the vault
subdirectory will be created. The vault_backup playbook will place the
resulting backup tarball in the subdur.
optional:
--encrypt_hc_vault_secret: a string that will be used as a secret key
for encrypting the backup tarball

The vault_restore playbook, in addition to the options for vault_backup,
has one additional required option:
--backup_filename: the filename of the backup tarball that will be used
to restore the vault application. This file must be in the vault
subdirectory of the initial_backup_dir directory

Test Plan:
PASS	vault backup then vault restore
PASS	vault backup/restore with custom encryption secret key
PASS	backup, rekey vault, lose the new key shards, restore from
backup
PASS	backup, delete the vault namespace and recreate the cluster,
restore

Story: 2011073
Task: 49841

Change-Id: I3824450ae8bb0c602c44cddd19dd10f5b307e8d6
Signed-off-by: Tae Park <tae.park@windriver.com>
2024-04-19 17:29:46 -04:00
Mohammad Issa 5ac4e11845 Revert CNI images for K8s 1.24
The CNI system images for the last version of the old release
and the first version of the new release should be the same.

Testing:
- Build successful
- All kube-system pods came up
- Manual K8s upgrade

Story: 2010639
Task: 49900

Change-Id: Id28ba013c3470c3656ca36745e09a53924ad6dcf
Signed-off-by: Mohammad Issa <mohammad.issa@windriver.com>
2024-04-19 18:34:14 +00:00
Zuul 7d69c5b3ef Merge "Remove conditional statement for enabling IPv6 firewall in Calico" 2024-04-19 16:22:50 +00:00
Zuul fc0993aa62 Merge "Change default subject for platform certificates" 2024-04-19 14:28:12 +00:00
Zuul ecefb4fb3d Merge "Do not allow backups when cert related errors present" 2024-04-19 14:08:41 +00:00
Marcelo Loebens 85712e2fb9 Change default subject for platform certificates
Included a default entries for the fields:
- 'commonName' - default now is <cert_short_name>
- 'localities' - default now is <region>
- 'organization' - default now is 'starlingx'

Where:
<region> is the region name
<cert_short_name> is an internal proper name used for each of the
platform certs.

These fields can still be overridden by the user during bootstrap / CA
update. The override 'subject_prefix' is now removed.

Modified update_platform_certificates.yml playbook to delete/recreate
the leaf certificates instead of re-configuring it. In some cases,
just re-configuring would not change nested values in the Certificate
spec entries. Also, waited for the local OpenLDAP cert to be ready
before progressing, avoiding issues with remaining tasks caused by
delays in cert-manager.

Test plan:
PASS: Bootstrap system without overriding 'subject_L', 'subject_O'
      or 'subject_CN'.
      Verify that the default fields are included.

PASS: W/ default values, test Horizon access.

PASS: W/ default values, test access through remote CLI.

PASS: W/ default values, test pulling images from the local
      registry externally (outside the system).

PASS: Update platform certificates overriding all 'subject_*' fields.
      Verify that the overridden values are included in the
      respective fields.

Story: 2009811
Task: 49831

Change-Id: I208c30a6eb2c60397d50e6ea411ee5994fa27f9a
Signed-off-by: Marcelo Loebens <Marcelo.DeCastroLoebens@windriver.com>
2024-04-18 14:25:47 -04:00
Zuul 8ea2fa935f Merge "Update CA certificate install command in migration playbook" 2024-04-18 13:43:34 +00:00
Joshua Kraitberg 0c941aec1d Do not allow backups when cert related errors present
This is to update the health check in backup to match the new output
that includes alarm info related to certs.

Presently, it is possible to create backups when expired certs are
present.  After this change that will no longer be possible.

TEST PLAN
PASS: AIO-SX backup fails when expired certs present
PASS: AIO-SX backup fails when mgmt affecting alarms present
PASS: AIO-SX backup works when no alarms present
PASS: AIO-SX backup works when only minor alarms present

Closes-Bug: 2062087
Change-Id: I5a66fc4b59c619623b9da8c688d576e67f262d33
Signed-off-by: Joshua Kraitberg <joshua.kraitberg@windriver.com>
2024-04-17 20:28:11 -04:00
amantri 021d102096 Update CA certificate install command in migration playbook
Added new ca certificate install commands in the playbook

Testcases:
PASS: Bootstrap the system with changes and verify that system is
      installed successfully
PASS: Run update_platform_certificates and verify it
      is successful
PASS: Bootstrap systemcontroller and verify that system is installed
      successfully, bootstrap subcloud from systemcontroller and
      verify subcloud is installed fine.

Story: 2010848
Task: 48473

Change-Id: I4151e1be84e2cc9d65f5740a9280408a202c1765
Signed-off-by: amantri <ayyappa.mantri@windriver.com>
Depends-on: https://review.opendev.org/c/starlingx/config/+/893799
2024-04-17 15:16:24 +00:00
Zuul f4e79031e6 Merge "Parallelize the deployment of user specified platform applications" 2024-04-16 18:15:50 +00:00
Gustavo Pereira 06debbfe0b Parallelize the deployment of user specified platform applications
This commits enables user defined applications to be uploaded
and applied in parallel using ansible async module.

Test plan:
PASS: Deploy one controller with two applications defined in
localhost.yml file. Bootstrap the controller and verify that
the upload and apply tasks were executed in parallel

PASS: Deploy a system controller with all applications that
can be applied by the user. Verify that all applications were
deployed in parallel.

PASS: Deploy a system controller without user applications
added to localhost.yml. Verify that the bootstrap finishes
successfully.

PASS: Deploy a subcloud with two applications defined in
overrides file. Replay the subcloud bootstrap adding an
application in overrides. Verify that all applications
are applied.

PASS: Deploy a subcloud with two applications defined in
subcloud overrides file. Force an application failure case
and verify that the bootstrap fails successfuly.

e.g.
Add portieris to overrides without a caCert.yaml file.

Story: 2011035
Task: 49584

Signed-off-by: Gustavo Pereira <gustavo.lyrapereira@windriver.com>
Change-Id: Ic9140aaf3c9b1a60c11c441f745d8b9206413d41
2024-04-16 17:38:58 +00:00
Zuul a2c8b3db0b Merge "Move sysinv bootstrap from Puppet to Ansible" 2024-04-15 18:32:28 +00:00
Raphael Lima a918f6e3b4 Move sysinv bootstrap from Puppet to Ansible
Add sysinv_bootstrap task file to apply_bootstrap_manifest role
to reduce bootstrap time. The corresponding sysinv bootstrap
implementation in puppet will be removed.

Changes include:
- Create a template for sysinv.conf and sysinv/api-paste.ini files
- Ensure the installation of sysinv packages
- Ensure the execution of sysinv-api, sysinv-conductor
and sysinv-agent services

Test plan:
1. PASS: Deploy a DC system with one system controller and two subclouds
and ensure the subclouds can be managed
2. PASS: Deploy an AIO-SX system and verify the host unlocks
3. PASS: Perform bootstrap replay and ensure the host unlocks after
re-execution
4. PASS: Verify the openstack user, role, service and endpoints
   configuration for sysinv after bootstrap for each deployment type
5. PASS: Verify the sysinv.conf and api-paste.ini file for each
deployment type
6. PASS: Validate the sql dump of the keystone database generated in
a subcloud deployment in relation to the one generated before the
changes

Depends-On: https://review.opendev.org/c/starlingx/config/+/915365

Story: 2011035
Task: 49764

Change-Id: I7cc9b7d45b770b454178da3f6c974bdbf7fc1e57
Signed-off-by: Raphael Lima <Raphael.Lima@windriver.com>
2024-04-12 18:32:56 -03:00
Zuul 5d49f9592d Merge "Update IPSec certs when system-local-ca is updated" 2024-04-12 14:39:37 +00:00
Lucas Ratusznei Fonseca a42d306b7c Remove conditional statement for enabling IPv6 firewall in Calico
This change removes the conditional statement around the section that
enables the firewall for IPv6 in Calico. The IPv6 firewall will be
permanently enabled regardless of the setup, so that even if IPv6 is
unused, traffic will be blocked.

Test plan
=========

The tests for https://review.opendev.org/c/starlingx/config/+/915508
also cover this change.

Story: 2011027
Task: 49816
Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/915509
Change-Id: I986f361493b29596851e781632c782c92ec22546
Signed-off-by: Lucas Ratusznei Fonseca <lucas.ratuszneifonseca@windriver.com>
2024-04-11 20:37:01 -03:00
Leonardo Mendes 8903cd6e19 Update IPSec certs when system-local-ca is updated
This commit update IPSec certificates, including trusted CA
certificates when system-local-ca is updated in the system.

Test plan:
PASS: In a DX system with IPsec Initial Auth configured on each
      host and SAs established. Run "ansible-playbook /usr/share/
      ansible/stx-ansible/playbooks/update_platform_certificates.yml
      -i inventory.yaml --extra-vars "target_list=localhost mode=update
      ignore_alarms=yes" following documentation to create inventory
      file. After execution, run "swanctl --list-certs" and observe
      strongswan have all CA certificates, including Root CA if it's
      an intermediate CA, and SAs are still established and it's
      possible to ping all nodes.
PASS: In a DC system with a central DX and a subcloud DX with IPsec
      Initial Auth configured on each host and SAs established.
      Run "ansible-playbook /usr/share/ansible/stx-ansible/playbooks/
      update_platform_certificates.yml -i inventory.yaml --extra-vars
      "target_list=localhost,all_online_subclouds mode=update
      ignore_alarms=yes" following documentation to create inventory
      file. After execution, run "swanctl --list-certs" and observe
      strongswan have all CA certificates, including Root CA if it's
      an intermediate CA, and SAs are still established and it's
      possible to ping all nodes.

Story: 2010940
Task: 49823

Depends-On: https://review.opendev.org/c/starlingx/config/+/914969

Change-Id: Ie18990fde89b92c98a013782454919eddf3f8fdf
Signed-off-by: Leonardo Mendes <Leonardo.MendesSantana@windriver.com>
2024-04-11 09:52:48 -03:00
Zuul 176be6c9a3 Merge "Fix for snapshot-controller failure during restore" 2024-04-10 16:36:35 +00:00
Zuul 723b7f839e Merge "Upgrade trident templates for version 24.02.0" 2024-04-10 14:36:17 +00:00
Gabriel de Araújo Cabral 523ca7bcb2 Fix for snapshot-controller failure during restore
The change made in review [1] introduced the creation of the
snapshot resources in all installations during the bootstrap.

An issue was identified during the restore process (uses
bootstrap playbook) in a specific scenario: systems with
more than one host where the volume snapshot controller pod
was not running on controller-0 when the backup was performed.

During the restore process, within the snapshot-controller role,
it checks if the snapshot-controller pod is running, resulting
in a failure in the above scenario, since the assigned node
will not be ready, as only controller-0 is ready to run pods during
restore.

Therefore, the fix involves skipping the execution of the role
during restore. It's important to note that if the
snapshot-controller pod was created before the backup, it will
later be restored and will operate correctly after completing
the restore process regardless of the node that was attached.

[1]: https://review.opendev.org/c/starlingx/ansible-playbooks/+/904360

Test Plan:
 PASS: Successful backup and restore on an AIO-DX whose
       snapshot-controller pod was running on controller-0
       during backup
 PASS: Successful backup and restore on an AIO-DX whose
       snapshot-controller pod was running on controller-1
       during backup
 PASS: Successful backup and restore on a Standard (2+1) whose
       snapshot-controller pod was running on controller-1
       during backup
 PASS: AIO-SX | AIO-DX fresh install + Check if the CRDs
       and snapshot-controller were created during bootstrap

Closes-bug: 2060675

Change-Id: Ia2f69fafba4854236ea2d6c26932e99e63059ff8
Signed-off-by: Gabriel de Araújo Cabral <gabriel.cabral@windriver.com>
2024-04-09 13:14:49 -03:00
Michel Thebeau 16536b552b backup/restore scripts for vault
This code will be integrated with ansible playbook(s) for backup and
restore of the Vault application data. The integration will follow with
commits for ansible role for vault.

Depends-on: Id786105aa8ddba2e77085b3897c0c8efd7e98c9b

Test Plan:
PASS  unit test
PASS  bashate
PASS  backup and restore of vault using the scripts

Change-Id: I324b270ec738f864410068c4ac661301ca8176fd
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2024-04-08 23:11:34 +00:00
Mohammad Issa 2839f12760 Revert Multus image version back to 3.9.3
Experiencing issues related to the
initcontainer: "delete-multus-conf".
Which was initially added as a workaround.

Avoid using upstream "apline" image inside the container.
For now, revert back to multus v3.9.3.

Testing:
- All kube-system pods came up
- Multus conf file was generated properly
- Was able to deploy pods with multiple interfaces

Story: 2010639
Task: 49830

Change-Id: I4d4f420784cf49316ae9146f2b8bcc4f29f748f6
Signed-off-by: Mohammad Issa <mohammad.issa@windriver.com>
2024-04-08 20:39:28 +00:00
Erickson Silva de Oliveira 666657b9ba Upgrade trident templates for version 24.02.0
Upgrade trident yaml templates, generated by tridentctl
24.02.0 with the following command:
tridentctl -n trident install --generate-custom-yaml

Some of the templates have been changed to make them
compatible with StarlingX. The parts of the code that
have been changed are marked with an "STX_change" comment.

To support upgrading from k8s v1.24 to newer versions,
it was necessary to verify and remove trident PSPs
after installation.

Additionally, the snapshot-controller role has been
removed, as it already runs in bootstrap.

Test Plan:
- PASS: Fresh install on SX/DX/Standard with Trident 24.02.0

- PASS: Backup in AIO-SX with Trident 23.10.0 and restore
with Trident 24.02.0 and tested if the backend and PVCs
created using Trident 23.10.0 still worked.

- PASS: Upgraded Trident to version 24.02.0 rerunning
ansible-playbook and tested if the information stored
in Netapp persisted.

- PASS: Tested with k8s v1.24.4 to v1.29.2 (Fresh install
for each version of k8s and also upgrading k8s from v1.24)

Story: 2011080
Task: 49786

Depends-On: https://review.opendev.org/c/starlingx/integ/+/914094

Change-Id: I7d0ee191bf4db86f3fbc4d28f0b46e4c0f9f4c45
Signed-off-by: Erickson Silva de Oliveira <Erickson.SilvadeOliveira@windriver.com>
2024-04-03 15:59:31 -03:00
Marcelo Loebens 351aa195f9 Retrieve system-local-ca old values in legacy restore
Included code to retrieve the values during legacy restore,
avoiding changes in system-local-ca secret values.

Test plan:
PASS: Bootstrap AIO-DX w/ system-local-ca overrides.
      Run backup playbook.
      Reinstall system.
      Run restore playbook (legacy).
      Observe that system-local-ca maintained same values.

Story: 2009811
Task: 49797

Change-Id: Ifdb1458a95dbf96639a08d6ca06637d82c5d7784
Signed-off-by: Marcelo Loebens <Marcelo.DeCastroLoebens@windriver.com>
2024-04-02 13:57:12 +00:00
Ramesh Kumar Sivanandam 9c4a5ef225 Add support for Kubernetes 1.29.2
This adds support for Kubernetes 1.29.2 version. This creates
symlinks to previously defined Docker image versions, uses the
same version of container images used for k8s 1.28 and uses
the same volume-snapshot-controller of K8s 1.26 and CRDs of k8s 1.25.

Test Plan:
PASS: Install ISO with k8s 1.29 on AIO-SX
PASS: Perform multi k8s upgrade from 1.24 to 1.29 on AIO-SX.
PASS: Install ISO with k8s 1.29 on AIO-DX
PASS: Install ISO with k8s 1.29 on Standard

Story: 2011047
Task: 49773

Change-Id: I2b01353ffec48973601ce6cdf88beecde51683f3
Signed-off-by: Ramesh Kumar Sivanandam <rameshkumar.sivanandam@windriver.com>
2024-04-01 06:23:55 -04:00
Hugo Brito 4deeea626b Move Barbican bootstrap from puppet to ansible
This commit creates the Barbican related tasks within the bootstrap
playbook, moving away puppet based boostrap during the
apply_bootstrap_manifest phase and moves the "Create Barbican secret
for * registry if credentials exist" tasks to execute after the
update_sysinv_database script. The script is responsible for creating
the Barbican users in keystone and update the barbican.conf.

Openstack related operations (user, service and endpoint configuration)
are now handled exclusively by sysinv config_endpoints.

The replaced puppet class will be removed in commit:
https://review.opendev.org/c/starlingx/stx-puppet/+/913590

Test Plan:
1. PASS: Verify full DC system deployment - System Controller + 3
         Subclouds install/bootstrap (virtual lab)
2. PASS: Verify Barbican database and user are setup correctly
3. PASS: Validate contents of /etc/barbican/barbican.conf:
4. PASS: Verify Openstack user, service and endpoint configuration
5. PASS: Bootstrap re-execution test and deployment.

Depends-on: https://review.opendev.org/c/starlingx/config/+/913587

Story: 2011035
Task: 49740

Change-Id: I586048b11116b77b01a38a358244f5b115f7fb32
Signed-off-by: Hugo Brito <hugo.brito@windriver.com>
2024-03-27 15:02:24 +00:00
Zuul 26d62e2679 Merge "Move FM bootstrap from puppet to ansible" 2024-03-27 14:13:14 +00:00
Salman Rana f8fd86fb83 Move FM bootstrap from puppet to ansible
Introduced FM related tasks within the bootstrap playbook,
moving away from puppet based bootstrap during the
apply_bootstrap_manifest phase.

This includes:
- Generating the fm.conf and api-paste.ini configuration files.
- Setting up the FM database and user account.
- Managing the installation of FM packages.
- Initiating the fm-api service.

Openstack related operations (user, service and endpoint
configuration) are now handled exclusively by
sysinv config_endpoints. The related change is made in commit:
https://review.opendev.org/c/starlingx/config/+/913249

The replaced puppet class will be removed in commit:
https://review.opendev.org/c/starlingx/stx-puppet/+/913247

Overall, these changes reduce bootstrap time by ~30s.

Test Plan:
1. PASS: Verify full DC system deployment - System Controller + 3
         Subclouds install/bootstrap (virtual lab).
           - Ensure subclouds in deploy-complete online state
2. PASS: Verify FM database and user are setup correctly
3. PASS: Validate contents of /etc/fm/fm.conf and /etc/fm/api-
         paste.ini:
         - Generate the file using both the new ansible tasks and
           previous puppet bootstrap and ensure no diff in
           key-value pairs.
4. PASS: Verify Openstack user, service and endpoint configuration
         * See config commit test plan
5. PASS: Verify bootstrap re-execution: ensure successful host
         unlock after re-rerunning bootstrap.

Story: 2011035
Task: 49721

Change-Id: I07bfcdbf9c30b11e286e4aa46cfbde5908e8f460
Signed-off-by: Salman Rana <salman.rana@windriver.com>
2024-03-27 09:34:04 -04:00
Zuul afeecb5b50 Merge "Uprev CNI images for k8s v1.24-v1.28" 2024-03-25 19:45:55 +00:00
Zuul 302e27d697 Merge "Enable docker registry and HTTPS cert by default" 2024-03-25 18:11:01 +00:00
Zuul 65eb75e27a Merge "Only wait for essential pods in cert recovery" 2024-03-25 15:07:23 +00:00
Zuul bca2e13d37 Merge "Hide log output for system-local-ca data" 2024-03-25 15:07:17 +00:00
Mohammad Issa 32f6cb1817 Uprev CNI images for k8s v1.24-v1.28
This commit uprevs the container networking images as follows:

calico: v3.25.0 -> v3.26.4
multus: v3.9.3 -> v4.0.2
sriov-cni: v2.7.0 -> unchanged
sriov-device-plugin: v3.5.1 -> v3.6.2

The following changes have been made:
- create a new directory for k8s 1.28.4
- symlink k8s 1.24-1.27 directories to k8s v1.28.4
- Apply Starlingx custom changes on top of base

Testing:

- Ensure uprev'd images work on a fresh install with k8s 1.28.4
- Successful system deployment (bootstrap and unlock)
- Perform several networking operations on k8s 1.28.4:
  - Calico:
    * pod -> pod connectivity
    * pod -> service connectivity
    * ingress connectivity
    * IPAM testing
  - Multus / SR-IOV verification:
    * Run the SR-IOV automated tests with a full pass
  - Test IPv4 and IPv6:
    * Ensure all pods come up under each environment
    * Test pod -> pod connectivity on both
  - Test manual upgrade from k8s 1.24 -> 1.28

Story: 2010639
Task: 49710

Change-Id: Ife456d63043825476c17e91e310d8283f829f7f4
Signed-off-by: Mohammad Issa <mohammad.issa@windriver.com>
2024-03-22 20:41:02 +00:00
Marcelo Loebens 524d6cac49 Hide log output for system-local-ca data
Included no_log flag to hide sensible content in tasks that handle
with system-local-ca certs/key in bootstrap/update platform certs
playbooks.

Test plan:
PASS: Executed with verbosity -vv and analyzed the logs for
      sensitive data for the following playbooks/scenarios:
      - Bootstrap AIO-SX w/ system-local-ca overrides.
      - Bootstrap AIO-SX w/o system-local-ca overrides.
      - Bootstrap DC + SX subcloud w/ system-local-ca overrides.
      - Platform certificates update.

Story: 2009811
Task: 49770

Change-Id: Icbdc11b5af42797a82cb54d65c470f1070201109
Signed-off-by: Marcelo Loebens <Marcelo.DeCastroLoebens@windriver.com>
2024-03-22 16:36:20 -04:00
Rei Oliveira 5a304af6e1 Only wait for essential pods in cert recovery
The certificate recovery role will trigger a restart of every pod
in the k8s cluster so that they can be updated with the latest
certificate information.

After pods restart the procedure waits every pod to recover and become
READY. This change modifies that behaviour to only wait for essential
pods to recover, being those in the core namespaces armada,
cert-manager, flux-helm and kube-system.

Test case:

PASS: Run certificate recovery with crashing pods in a custom namespace

Closes-Bug: 2058751

Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com>
Change-Id: I3ea403a3e324ecbb5f2c1f56d6ce1c8bd80fabee
2024-03-22 18:04:47 +00:00
Zuul e96b497109 Merge "Move dcmanager and mtce bootstrap from puppet to ansible" 2024-03-22 18:00:09 +00:00
Gustavo Pereira a816427233 Move dcmanager and mtce bootstrap from puppet to ansible
Add dcmanager and mtce bootstrap tasks to ansible to improve execution
time.
The related puppet class and tasks will be removed in commit:
https://review.opendev.org/c/starlingx/stx-puppet/+/912319.

Test Plan:
PASS: Deploy a subcloud without the changes and record its bootstrap
execution time. Deploy another subcloud with the proposed changes.
Verify successful subcloud deployment and the bootstrap execution
time is 80s faster.

PASS: Verify a successful AIO-SX deployment.

PASS: Verify a successful AIO-DX controller deployment.

PASS: Verify a successful DC environment deployment.

PASS: Verify a successful subcloud bootstrap replay.

Story: 2011035
Task: 49694

Signed-off-by: Gustavo Pereira <gustavo.lyrapereira@windriver.com>
Change-Id: Iad670b276250cb36dcd2019bf42c78ea27e9adc4
2024-03-22 13:42:14 -03:00
Zuul 3d0f7815c7 Merge "Restore support of luks volume on different hardware" 2024-03-21 16:36:47 +00:00
Zuul 4db49840b8 Merge "Increase timeout to unlock host on cert recovery" 2024-03-20 20:39:33 +00:00
Zuul 0309a77a5d Merge "Add flock in wipe_osds.sh to avoid race condition" 2024-03-20 15:18:38 +00:00
Rei Oliveira d441c0afc7 Increase timeout to unlock host on cert recovery
The certificate recovery role may try to unlock a host while it's
on a reboot loop. In order for the host-unlock command to be accepted
to host needs to be online.

For slower hardware the current 2 min timeout is not enough.
This commit changes it to 5 min.

Test case:

PASS: Run certificate recovery rehoming with multi node systems.

Closes-Bug: 2058426

Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com>
Change-Id: I7a7faf7707a19f477a968766cbe383e7dfdbd1cd
2024-03-19 20:12:34 -03:00
Marcelo Loebens 0d9051a469 Enable docker registry and HTTPS cert by default
Remove feature flag that controlled the creation of Docker Registry and
Rest API/GUI (HTTPS). This will enable by default creation of these
certs during bootstrap, which then will be used after first
controller's unlock.

Certs will be anchored using the system-local-ca issuer CA
certificates, that can be provided by user using bootstrap overrides.
If not provided, they will be anchored using the k8s RCA.

Test plan:
PASS: Bootstrap AIO-SX w/ system-local-ca overrides.
      - Verify certificates w/ sudo show-certs.sh;
      - Login into registry.local;
      - Access horizon - Verify that the certificate provided to the
        browser is correct.
      Bootstrap DC + SX subcloud w/ system-local-ca overrides.
      - Verify certificates w/ sudo show-certs.sh;
      - Login into registry.local and registry.central;
      - Access horizon - Verify that the certificate provided to the
        browser is correct.

Story: 2009811
Task: 49704

Change-Id: Iccbf53ecd7ef5d8cc64092bbf0da77c13787008b
Signed-off-by: Marcelo Loebens <Marcelo.DeCastroLoebens@windriver.com>
2024-03-19 13:33:30 +00:00
Erickson Silva de Oliveira e63bfd683e Add flock in wipe_osds.sh to avoid race condition
When running the script to wipe the OSDs, it sometimes happened
that the second partition was not found, although it was there.

Analyzing the code, it was possible to replicate the problem,
which is caused by a race condition when running the parted
command on the first partition, which causes udev to be reloaded
while the second partition was being processed.

To solve this problem, the “flock” command was used to lock the
entire disk, not just the partition.

Additionally, the use of udevadm has also been removed.

Test Plan:
  - PASS: (AIO-SX) Replace wipe_osds.sh with changes,
          run and check script output

Closes-Bug: 2056765

Change-Id: Icbc351a868b413a51dc8273ca422737d39756b3b
Signed-off-by: Erickson Silva de Oliveira <Erickson.SilvadeOliveira@windriver.com>
2024-03-14 15:41:18 -03:00