Commit Graph

1983 Commits

Author SHA1 Message Date
Zuul cc5940090c Merge "Add puppet support for tuning sysinv_api_workers" 2024-05-13 21:24:55 +00:00
Zuul f00a516d92 Merge "Cleanup previous files for puppet network runtime execution" 2024-05-13 20:53:42 +00:00
Andre Kantek 074cc9cc3f Cleanup previous files for puppet network runtime execution
Some networks (OAM, for AIO-SX, and ADMIN) support network runtime
configuration, meaning that they do not require a lock/unlock cycle.

It was observed that the previously generated files were not removed:
in network_ifupdown.sh the use of wildcard "ifcfg-*" to a function
was not expanding as an argument inside of do_rm(), preventing the
removal, and the puppet-network plugin (responsible to generate the
interfaces file) was concatenating its content with the previous one.

This change corrects this errors by explicitly removing the files.

Test Plan
[PASS] Install AIO-SX in single-stack and then add dual-stack config
       for OAM network in runtime and observe that there is no traffic
       interruption as the secondary address is added
[PASS] Install AIO-DX in single-stack with the following variants:
       - ethernet port with {mgmt, cluster-host, pxeboot} networks
       - ethernet port with pxeboot and vlan with {mgmt, cluster-host}
          networks
       - bonding port with {mgmt, cluster-host, pxeboot} networks
       - bonding port with pxeboot and vlan with {mgmt, cluster-host}
          networks

Story: 2011027
Task: 50055

Change-Id: I85c218e230d392ee1aa4097d089acc18e8bbbc89
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2024-05-13 14:40:53 -03:00
Kyale, Eliud d49b5d4596 Add puppet support for tuning sysinv_api_workers
New system-service-parameter that will allow a user to specify the
number of sysinv api workers for horizontal scaling

The values:
- service -> platform
- section -> config
- name -> sysinv_api_workers
- value -> [1 .. n ]
- personality -> None
- resource -> ::platform::sysinv::params::sysinv_api_workers

Sample:

system service-parameter-add platform config sysinv_api_workers=5

This change adds logic to select the new parameter if defined or
else keep the existing default behaviour

Test plan:

PASS - AIO-SX: iso install
       confirm in system.yaml content not present by default
       verify default sysinv_api_workers in /etc/sysinv/sysinv.conf
       verify number of sysinv_api worker process ( ps -ef )

PASS - AIO-DX: iso install
       confirm in system.yaml content not present by default
       verify default sysinv_api_workers in /etc/sysinv/sysinv.conf
       verify number of sysinv_api worker process ( ps -ef )

PASS - Test system service-parameter-add|modify|delete
       followed by host-unlock
       verify sysinv db content system service-parameter-list
       verify content of system.yaml
       verify /etc/sysinv/sysinv.conf
       verify number of sysinv_api worker process ( ps -ef )

Story: 2011106
Task: 50064

Change-Id: I8d45581274565e2b6b476a2ca7d26fc4e88dcc9b
Signed-off-by: Kyale, Eliud <Eliud.Kyale@windriver.com>
2024-05-10 16:21:21 -04:00
Rei Oliveira 29471b23fb Enable keystone logging on debian
This commit enables keystone logging to /var/log/keystone/keystone.log
and makes the default log level as INFO.

Test plan:

PASS: Full build, install, bootstrap and unlock
PASS: Run authenticated commands such as 'system host-list' and verify
      that it gets logged to /var/log/keystone/keystone.log

Story: 2011106
Task: 50067

Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com>
Change-Id: I8cb1dce87ff1a46253573c48ce340be902292008
2024-05-09 19:56:14 +00:00
Zuul 494fbef3bb Merge "add secondary address variable for public HAproxy config" 2024-05-07 13:38:56 +00:00
Andre Kantek 1d60e3b936 add secondary address variable for public HAproxy config
This change adds the variable public_secondary_ip_address to
platform::haproxy::params filled with the secondary OAM address pool
floating address value, in a similar way that is done for the primary
address pool. This will be used in HAproxy to bind the necessary L4
public ports to the secondary address.

Test plan
[PASS] Install and add a secondary pool via CLI and, then, after
        lock/unlock, check that all public endpoints (openstack
        endpoint list) are available in the primary and secondary
        addresses, on the following setups:
        - AIO-SX (prim:IPv4, sec:IPv6)
        - AIO-SX (prim:IPv6, sec:IPv4)
        - AIO-DX (prim:IPv4, sec:IPv6) with system-controller role
        - AIO-DX (prim:IPv6, sec:IPv4) with system-controller role
[PASS] Access the public APIs on both protocols using curl.

Story: 2011027
Task: 49997

Depends-On: https://review.opendev.org/c/starlingx/config/+/917250
Change-Id: I5a274565e2cd9435478beb2de3f9a1578a1679e5
2024-05-06 09:10:04 -03:00
Zuul c4666d214a Merge "use symlinks instead of bind mounts for K8s versioning" 2024-05-03 21:12:35 +00:00
Zuul 795d2cb64a Merge "Update IPsec puppet to generate two swanctl.conf" 2024-05-03 13:40:30 +00:00
Andy Ning e5566f082d Update IPsec puppet to generate two swanctl.conf
This commit updated strongswan.pp puppet classes so they work with
ipsec-client to generate two copies of swanctl configurtion files for
controller nodes, one for when the node is active controller
(swanctl_active.conf), and one for when the node is standby controller
(swanctl_standby.conf). A symlink (swanctl.conf) is created pointing to
one of the two config files based on the role of the node. When
controller swact, the symlink will be updated by a SM service.

Test Plan (IPv4 and IPv6 DX system):
PASS: controller-0 bootstrap, verify swanctl configuration files and
      symlink are created in /etc/swanctl directory:
      /etc/swanctl/swanctl_standby.conf
      /etc/swanctl/swanctl_active.conf
      /etc/swanctl/swanctl.conf -> /etc/swanctl/swanctl_active.conf
PASS: controller-1 installation, after installed, verify swanctl
      configuration files and symlink are created in /etc/swanctl
      directory:
      /etc/swanctl/swanctl_standby.conf
      /etc/swanctl/swanctl_active.conf
      /etc/swanctl/swanctl.conf -> /etc/swanctl/swanctl_standby.conf
PASS: controller-1 unlock, after controller-1 is unlocked, verfiy that
      during drbd synchronization there is no uncontrolled swact, and
      controller-1 comes up in "enabled" and "available" state after
      drbd is fully synced.

Story: 2010940
Task: 49930

Change-Id: Ief8e078a6e2cdd9a9aa713aa18b7cb6d177eafd5
Signed-off-by: Andy Ning <andy.ning@windriver.com>
2024-05-01 09:29:59 -04:00
Chris Friesen 0526b759c6 use symlinks instead of bind mounts for K8s versioning
Switch to using "stage1" and "stage2" symlinks under
/var/lib/kubernetes to select versions for kubeadm and kubelet/
kubectl.

We have been using bind mounts to select K8s versions, but they are not
well supported by Puppet and suffer from fragility since you cannot
remove a bind mount while an executable is still running from it.  They
also need to be re-created when creating an OSTree hotfix.

Symlinks suffer from no such issues, they just need to be created in
a filesystem that is not managed by OSTree.

Also, fix up a case where the existing code was using "include" when it
should have used "require", and remove some redundant dependencies that
were not needed.

Depends-On: https://review.opendev.org/c/starlingx/integ/+/916337

NOTE: This also requires the following change in ansible-playbooks,
all three commits must be merged together.

https://review.opendev.org/c/starlingx/ansible-playbooks/+/916336

Story: 2011047
Task: 49916

TEST PLAN:
See integ repo commit for test plan.

Change-Id: Iea7410241028e3ac9ced9e5653460a249892aed0
Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
2024-04-29 17:08:37 -06:00
Zuul f389e1fc8f Merge "Remove CentOS/OpenSUSE build support" 2024-04-29 13:17:22 +00:00
Scott Little b3144d026c Remove CentOS/OpenSUSE build support
StarlingX stopped supporting CentOS builds in the after release 7.0.
This update will strip CentOS from our code base.  It will also remove
references to the failed OpenSUSE feature as well.

Story: 2011110
Task: 49961
Change-Id: Ibdaf1d43ab35382bd4d2b34ae9737a01b8ef9a5d
Signed-off-by: Scott Little <scott.little@windriver.com>
2024-04-26 14:16:56 -04:00
Zuul f64ceeee1d Merge "Split IP services in IPv4 and IPv6 for dual-stack support" 2024-04-25 18:38:42 +00:00
Zuul a0505b075d Merge "Added system IPs to services "NO_PROXY" list" 2024-04-24 18:54:49 +00:00
Joao Victor Portal 937132aafb Added system IPs to services "NO_PROXY" list
When configuring the Docker proxy (see feature doc at
https://docs.starlingx.io/configuration/docker_proxy_config.html), the
system IPs should be added automatically to the "NO_PROXY" environment
variable of services "docker" and "containerd". This configuration was
lost long time ago during a code cleanup (review
https://review.opendev.org/c/starlingx/config/+/703516 , file
controllerconfig/controllerconfig/controllerconfig/configassistant.py ,
line 2286). This commit implements again the addition of system IPs to
"NO_PROXY" list.

Test Plan:

PASS: Successfully deploy an IPv4 AIO-SX and an IPv6 AIO-DX with no
bootstrap overrides.
PASS: In the deployed IPv4 AIO-SX with no bootstrap overrides, apply the
configuration below and verify that the pod "ceph-pools-audit" (executed
every 5 minutes) continues working correctly:
source /etc/platform/openrc
system service-parameter-add docker proxy
https_proxy=http://1.2.3.4:3128
system service-parameter-add docker proxy http_proxy=http://1.2.3.4:3128
system service-parameter-add docker proxy no_proxy="5.6.7.8"
system service-parameter-apply docker
PASS: Repeat the test above in the IPv6 AIO-DX with no bootstrap
overrides.
PASS: Successfully deploy an IPv4 AIO-SX and an IPv6 AIO-DX with Docker
proxy bootstrap overrides. Verify that the environment variables for
"docker" and "containerd" services (at
/etc/systemd/system/docker.service.d/http-proxy.conf and
/etc/systemd/system/containerd.service.d/http-proxy.conf) are correct.
Verify that the pod "ceph-pools-audit" (executed every 5 minutes)
continues working correctly.

Partial-Bug: 2062079

Depends-On: https://review.opendev.org/c/starlingx/config/+/916019
Change-Id: I7691fab7c4e2ba813bac1bf71c0ed7d4c4432380
Signed-off-by: Joao Victor Portal <Joao.VictorPortal@windriver.com>
2024-04-19 19:21:10 -03:00
Zuul 04d3283655 Merge "In SX mark floating IPs as deprecated in dual-stack" 2024-04-19 17:12:26 +00:00
Zuul d9e6174439 Merge "Remove firewall extra rule that blocks IPv6 traffic for IPv4 setups" 2024-04-19 16:13:06 +00:00
Zuul c2554a69bd Merge "Fix puppet class to wipe new PV" 2024-04-18 23:14:43 +00:00
Andre Kantek c9c3ad18cf In SX mark floating IPs as deprecated in dual-stack
With the dual-stack feature the system now can have 2 floating IPs per
network. In non-SX systems the floating IPs are managed by SM, but not
in AIO-SX, this is done via puppet, and it requires to mark floating
addresses as deprecated.

This change can now process IPv4 and IPv6 addresses present in the
"platform::network::addresses::address_config" variable

Test Plan
[PASS] install AIO-SX and check if floating IPs have the correct
       flags
[PASS] in the installation configure dual-stack and check if floating
       IPs have the correct flags

Story: 2011027
Task: 49888
Depends-On: https://review.opendev.org/c/starlingx/config/+/916282
Change-Id: Ieb886eeb7844b58502bb3939a8b203595570c44c
2024-04-18 09:46:42 -03:00
Zuul 7fb6a7bcb4 Merge "Enabling QAT service" 2024-04-17 18:29:36 +00:00
Md Irshad Sheikh 5d21e08507 Enabling QAT service
This commit supports QAT devices with device ids 4940 & 4942.

The commit provides provision to create QAT devices
configuration files (Eg: 4xxx_dev0,4xxxvf_dev0.conf)
in /etc directory.

The configuration files will be read by qat_service to up
the QAT devices endpoints and persist the devices status
across reboot.

Also, the vfio-pci will be loaded as part of this commit.

TEST CASES:

PASSED: The development iso should be successfully deployed on the QAT
        hardware. Also should have log "QAT device found.".

After the deployment is complete, validate below test cases.

PASSED: Check "systemctl status qat_service.service"
        Service should be up and running.
PASSED: Check the "systemctl is-enabled qat_service.service".
        Service should be enabled.
PASSED: Check the "/etc/init.d/qat_service status".
	The number of QAT VF endpoints should match to QAT
	supported sriov numvfs i.e 16.
PASSED: Check the number of PF and VF config files
        (Eg: 4xxx_dev0,4xxxvf_dev0.conf) in /etc directory. It
	should match the total QAT PFs and number of sriov numvfs.
PASSED: Check "lsmod | grep vfio-pci".
        The vfio-pci driver should be loaded.
PASSED: Reboot the system and check all above test cases. Also pf and
        vf configuration files should not be recreated.
PASSED: The development iso should be successfully deployed on the
        non-QAT hardware. Also should have log "QAT device not found.".

Story: 2010604
Task: 49700

Change-Id: Ia925bfaa890d853b853ad2274e2377221631a6a7
Signed-off-by: Md Irshad Sheikh <mdirshad.sheikh@windriver.com>
2024-04-17 13:36:27 +00:00
Hediberto C Silva 349f4e9799 Fix puppet class to wipe new PV
With the change made in [1], when processing more than one
nova-local, puppet fails with "Duplicate declaration", due
to the exec "vgchange -an nova-local".

To resolve this, the variable $name was added, so that it
becomes dynamic.

Furthermore, the review mentioned above deactivate VG to perform
wiping, however, it is not activated again after that. So to
resolve this, another exec was added, so that VG is activated
after wipe new PG.

[1]: https://review.opendev.org/c/starlingx/stx-puppet/+/863871

Test Plan:
SX:  Delete instances fs and add 4 nova-local
     B&R with nova-local instead of instances

DX+: Add 4 nova-local in compute-0
     B&R with nova-local created

STD: Add 4 nova-local in compute-0
     B&R with nova-local created

Closes-Bug: 2061526

Change-Id: I7449c5cd7199541551dccee17e22a8bda48414e1
Signed-off-by: Hediberto C Silva <hediberto.cavalcantedasilva@windriver.com>
Signed-off-by: Erickson Silva de Oliveira <Erickson.SilvadeOliveira@windriver.com>
2024-04-16 14:15:58 +00:00
Andre Kantek 9095c5fe45 Split IP services in IPv4 and IPv6 for dual-stack support
This change splits the IP service for each platform network into ipv4
and ipv6 t support dual-stack. It still supporting single-stack (when
there is only ipv4 or ipv6). Each service is instantiated if there is
a configuration for it.

Test Plan:
[PASS] install, lock, unlock and swact for the following setups:
       - AIO-SX (IPv4 and IPv6)
       - AIO-DX (IPv4 and IPv6)
       - Standard (IPv4 and IPv6)
       - DC (SisCtrl=AIO-DX, subcloud=AIO-SX)
[PASS] Add dual-stack configuration and validate services operation
       with lock, unlock and swact:
       - AIO-SX (IPv4 and IPv6)
       - AIO-DX (IPv4 and IPv6)
       - Standard (IPv4 and IPv6)
       - DC (SisCtrl=AIO-DX, subcloud=AIO-SX), using the admin network

Story: 2011027
Task: 49762

Depends-On: https://review.opendev.org/c/starlingx/ha/+/912418

Change-Id: I480c89a59309137c5517db7bd630df7eb2dfa552
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2024-04-16 08:55:20 -03:00
Zuul 65ca94a953 Merge "Remove sysinv bootstrap" 2024-04-15 18:32:33 +00:00
Lucas Ratusznei Fonseca c49b369902 Remove firewall extra rule that blocks IPv6 traffic for IPv4 setups
This change removes the extra rule that is added directly to ip6tables
to block IPv6 traffic in IPv4 setups. Instead, the firewall for IPv6
will be permanently enabled in Calico.

Test plan
=========

The tests for https://review.opendev.org/c/starlingx/config/+/915508
also cover this change.

Story: 2011027
Task: 49816
Depends-On: https://review.opendev.org/c/starlingx/config/+/915508
Change-Id: Ia7a8a7e2a12c80e0ec0f99af0417efa9dcd8a7a6
Signed-off-by: Lucas Ratusznei Fonseca <lucas.ratuszneifonseca@windriver.com>
2024-04-11 20:24:20 -03:00
Gleb Aronsky 88f8f06d39 Update kubelet system overrides on unlock
Add logic to the `platform::kubernetes::configuration` method
to generate the kubelet's systemd override file. This
change ensures the file is generated every time a host is
unlocked. This facilitates delivery of systemd service changes
via patches to existing installs.

We only want to update on lock and unlock, so we need to
check that the flag is_initial_k8s_config is not
set before creating the resource in platform::kubernetes::configuration.
This ensures that the file is only regenerated on host unlock and
not during the initial installation, which is currently handled
in platform::kubernetes::master::init.

This change is needed by bug 2027810 to ensure that the
orphan volume cleanup script is executed as part of the systemd
ExecStartPre kubelet service override.

This bug is an update for this reverted commit:
https://review.opendev.org/c/starlingx/stx-puppet/+/896154

Test Plan:
PASS - Verify successful installation from an ISO on AIO-SX with
       the controller unlocked.
PASS - Verify successful installation from an ISO on AIO-DX with
       the controllers unlocked.
PASS - Verify successful installation from an ISO on STANDARD with
       the controllers unlocked.
PASS - Verify that kube-stx-override.conf is updated on AIO-SX:
       - Update the kube-stx-override.conf.erb file.
       - Lock/Unlock the AIO-SX host.
       - Verify that kube-stx-override.conf has been updated.
PASS - Verify that kube-stx-override.conf is updated on STANDARD:
       - Update the kube-stx-override.conf.erb file on compute-0.
       - Lock/Unlock compute-0 and verify that kube-stx-override.conf
         is updated.

Partial-Bug: 2027810
Change-Id: Id473fd0e2c807d1e9d1e3fdd707bc3e9e36688b1
Signed-off-by: Gleb Aronsky <gleb.aronsky@windriver.com>
2024-04-11 14:37:41 -07:00
Raphael Lima 46db458e62 Remove sysinv bootstrap
This commit removes the sysinv bootstrap class from Puppet,
following the migration of sysinv bootstrap to Ansible:
https://review.opendev.org/c/starlingx/ansible-playbooks/+/913930.

Test plan:
All of the following items were tested with the addition of the
changes from the above specified commit.
1. PASS: Deploy a DC system with one system controller and two subclouds
and ensure the subclouds can be managed
2. PASS: Deploy an AIO-SX system and verify the host unlocks
3. PASS: Perform bootstrap replay and ensure the host unlocks after
re-execution
4. PASS: Verify the openstack user, role, service and endpoints
   configuration for sysinv after bootstrap for each deployment type
5. PASS: Verify the sysinv.conf and api-paste.ini file for each
deployment type

Depends-On: https://review.opendev.org/c/starlingx/ansible-playbooks/+/913930

Story: 2011035
Task: 49765

Change-Id: Ide37577c6ec580acfd468819428a4f80e21625f8
Signed-off-by: Raphael Lima <Raphael.Lima@windriver.com>
2024-04-09 12:19:30 -03:00
Hugo Brito 01729f91ae Remove Barbican bootstrap class
This commit removes the Barbican bootstrap class from puppet.

Test Plan:
1. PASS: Verify full DC system deployment (System Controller + 3
         Subclouds) install/bootstrap (virtual lab)
2. PASS: Verify Barbican database/user setup
3. PASS: Verify Openstack user, service and endpoint configuration

Depends-on: https://review.opendev.org/c/starlingx/config/+/913587
Depends-on: https://review.opendev.org/c/starlingx/ansible-playbooks/+/913589

Story: 2011035
Task: 49739

Change-Id: I31840be36f69a8dd8cb2592195a407907399779f
Signed-off-by: Hugo Brito <hugo.brito@windriver.com>
2024-03-27 15:08:28 +00:00
Zuul 74af50a98b Merge "Remove FM bootstrap" 2024-03-27 14:35:03 +00:00
Salman Rana 0c3df8e92f Remove FM bootstrap
This commit removes the FM bootstrap class from Puppet, following
the migration of FM bootstrap to Ansible/sysinv configuration:
 - https://review.opendev.org/c/starlingx/ansible-playbooks/+/913251
 - https://review.opendev.org/c/starlingx/config/+/913249

Test Plan:
1. PASS: Verify full DC system deployment (System Controller + 3
         Subclouds) install/bootstrap (virtual lab)
2. PASS: Verify FM database/user setup and fm.conf/api-paste.ini
         file contents
         * See ansible-playbook commit test plan
3. PASS: Verify Openstack user, service and endpoint configuration
         * See config commit test plan

Story: 2011035
Task: 49723

Depends-on: https://review.opendev.org/c/starlingx/ansibleplaybooks/+/913251
Depends-on: https://review.opendev.org/c/starlingx/config/+/913249

Change-Id: I5eab7fdcae326af9fb9ac78bc20586218b50d0de
Signed-off-by: Salman Rana <salman.rana@windriver.com>
2024-03-27 09:14:10 -04:00
Zuul 19f82521fc Merge "Remove TopologyManager feature-gate" 2024-03-25 22:28:16 +00:00
Gustavo Pereira 1a9444b42b Remove mtce and dcmanager keystone bootstrap from puppet
Changes related to dcmanager and mtce bootstrap are moved to ansible
and config repos to improve execution time. This commit removes the
related dcmanager and mtce bootstrap classes and tasks from puppet.

Test Plan:
PASS: Deploy a subcloud without this change and
verify its bootstrap execution time. Add a different
subcloud with the proposed changes and bootstrap it.
Verify that the changes mande the bootstrap execution
around 80 seconds faster.

PASS: Verify a successful AIO-SX deployment.

PASS: Verify a successful AIO-DX controller deployment.

PASS: Verify a successful DC environment deployment.

PASS: Verify a successful subcloud bootstrap replay.

Depends-on: https://review.opendev.org/c/starlingx/ansible-playbooks/+/912317
Depends-on: https://review.opendev.org/c/starlingx/config/+/912318

Story: 2011035
Task: 49696

Change-Id: Ia0bb798eb5d5a56b8ad578b83561ec2e4c1fcaa4
Signed-off-by: Gustavo Pereira <gustavo.lyrapereira@windriver.com>
2024-03-22 14:41:52 +00:00
Boovan Rajendran ce2ac915a0 Remove TopologyManager feature-gate
This change remove the deprecated feature gate "TopologyManager"
since it is no longer supported in K8s 1.29.

Note: TopologyManager feature-gate is enabled by default
      from k8s 1.18.

Test Plan:
PASS: Tested by installing ISO as AIO-SX and verified that
      "--feature-gates TopologyManager=true" is not present in
      /usr/bin/kubelet by running the command "ps -ef | grep kubelet"
PASS: Tested by performing k8s upgrade from 1.24.4 to 1.28.4.

Story: 2011047
Task: 49760

Change-Id: I01add38a2ba9a909b55dd4adb826aecae0a1d971
Signed-off-by: Boovan Rajendran <boovan.rajendran@windriver.com>
2024-03-22 02:37:42 -04:00
Fabiano Correa Mercer c1aec0b159 Use FQDN for memcached
After the management network reconfig
the memcached was still using the old
mgmt IP after the unlock.
The new mgmt network is applied by puppet
during the system startup, it is executed
by the controller_config script.
Until the puppet reconfigures the system the
/etc/hosts was not updated, and the IPs for
hostname was the old one.
The memcached is started by init.d and was
using the hostname, it was starting before
the /etc/hosts update.
Using the FQDN, this problem doesn't occurs,
the memcached will start but will just get the
IP after the dnsmasq starts.

Tests done:
IPv4 AIO-SX fresh install
IPv4 AIO-DX fresh install
IPv4 DC with AIO-SX subcloud fresh install
IPv4 AIO-SX mgmt network reconfig
IPv4 DC with AIO-SX subcloud mgmt network reconfig

Story: 2010722
Task: 49743

Change-Id: I810189638275a09127c3c228aeeb3416731d350d
Signed-off-by: Fabiano Correa Mercer <fabiano.correamercer@windriver.com>
2024-03-20 12:16:41 +00:00
Zuul df267e75c0 Merge "Set Kubernetes control-plane upgrade timeout to 210s" 2024-03-19 23:51:43 +00:00
Zuul 46ed4c33ae Merge "Update SM puppet to fix rook-ceph's storage-backend-add" 2024-03-19 22:06:58 +00:00
Caio Correa 59e9fe879b Update SM puppet to fix rook-ceph's storage-backend-add
Update SM's puppet manifest to fix an issue in DX that was causing
an uncontrolled swact. This fix sets the correct order of execution
of the classes called by the SM in a storage-backend-add ceph-rook
command.

Test Plan:
PASS - Deploy built ISO on a DX and run storage-backend-add
PASS - Lock, Unlock and Swact
PASS - Install a rook-ceph cluster

Story: 2011055
Task: 49623

Change-Id: I9b404b1f9e3b81c461fb9a9cd364b624de4c20bd
Signed-off-by: Caio Correa <caio.correa@windriver.com>
2024-03-18 17:19:35 -03:00
Saba Touheed Mujawar 6c15b7a41b Set Kubernetes control-plane upgrade timeout to 210s
In the case of a rare intermittent failure behaviour during the
upgrading control plane step where puppet hits timeout first before
the upgrade is completed or kubeadm hits its own Upgrade Manifest
timeout (at 5m).

This change sets puppet timeouts slightly larger than the
engineered kubeadm timeout settings. Typical puppet apply times
are less than 90 seconds, though we have seen infrequent outliers
hit the default 5m timeout.

We engineer the timeout for kubeadm-upgrade-apply and
kubeadm-upgrade-node to 210 seconds,  based on setting 3 minute
kubeadm UpgrademManifestTimeout and 30 second buffer.

Note: 'kubeadm-upgrade-apply' and 'kubeadm-upgrade-node' take the
      same amount of time for the control-plane upgrade.

TEST PLAN:
PASS: Perform k8s upgrade and verify puppet does not timeout
      before kubeadm-upgrade-apply and kubeadm-upgrade-node .

Partial-Bug: 2056326

Change-Id: Iec60476c964140f7b717c6d4dcdb266b0229b556
Signed-off-by: Saba Touheed Mujawar <sabatouheed.mujawar@windriver.com>
2024-03-15 03:38:29 -04:00
Zuul d298294244 Merge "Fix LDAP issue for DC subcloud" 2024-03-13 20:18:26 +00:00
Zuul aae3c5ad5e Merge "Revert "Add use_usm parameter to dcorch.conf"" 2024-03-13 14:44:37 +00:00
Zuul 305d6f3832 Merge "Introduce Puppet variables for primary and secondary pool addresses." 2024-03-13 14:30:44 +00:00
Zuul 76c446081c Merge "Set default options for the application framework" 2024-03-13 13:51:13 +00:00
Zuul 1fa8ed8b95 Merge "Refining rule to remove weak ciphers from lighttpd" 2024-03-12 22:05:22 +00:00
Karla Felix 075a39e1a2 Refining rule to remove weak ciphers from lighttpd
This review will be refining https ciphers rule, for
lighttpd service on port 8443, to avoid the useof
ciphers considered weak based on the NIST list.
The ciphers excluded are the ones that use CBC,
CAMELLIA, ARIA and 3DES encryption mode, and any
cipher that uses SHA1.

The ciphers that will be used by https:
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (ecdh_x25519)
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (ecdh_x25519)
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 (ecdh_x25519)
- TLS_AES_256_GCM_SHA384 (ecdh_x25519)
- TLS_CHACHA20_POLY1305_SHA256 (ecdh_x25519)
- TLS_AES_128_GCM_SHA256 (ecdh_x25519)

Test Plan:
PASS: Run build-pkgs -c -p puppet-manifests.
PASS: Enable https and run nmap to verify if only the
      listed ciphers are returned.
PASS: Run build-image.
PASS: Run bootstrap playbook.
PASS: Unlock controller-0.
PASS: Enable https and access horizon via browser
      using https.
PASS: Disable https and access horizon via browser
      using http.

Closes-Bug: 2054813

Change-Id: Ib21eb1155540f820a77ee7f7b9203663038ab69b
Signed-off-by: Karla Felix <karla.karolinenogueirafelix@windriver.com>
2024-03-12 10:39:33 -03:00
Zuul 63596262b5 Merge "Do not use FQDN during upgrade" 2024-03-08 19:56:54 +00:00
Steven Webster ff0782df39 Fix LDAP issue for DC subcloud
This commit fixes an LDAP authentication issue seen on worker nodes
of a subcloud after a rehoming procedure was performed.

Currently, the system uses an SNAT rule to allow worker/storage nodes
to authenticate with the system controller when the admin network is
in use.  This is because the admin network only exists between
controller nodes of a distributed cloud.  The SNAT rule is needed to
allow traffic from the (private) management network of the subcloud
over the admin network to the system controller and back again.
If the admin network is _not_ being used, worker/storage nodes of
the subcloud can authenticate with the system controller, but routes
must be installed on the worker/storage nodes to facilitate this.
It becomes tricky to manage in certain circumstances of rehoming.
This traffic really should be treated in the same way as that of the
admin network.

This commit addresses the above by generalizing the current admin
network nat implementation to handle the management network as well.

Test Plan:

IPv4, IPv6 distributed clouds

1. Rehome a subcloud to another system controller and back again
   (mgmt network)
2. Update the subcloud to use the admin network (mgmt -> admin)
3. Rehome the subcloud to another system controller and back again
   (admin network)
4. Update the subcloud to use the mgmt network (admin -> mgmt)

After each of the numbered steps, the following were performed:

a. Ensure the system controller could become managed, online, in-sync
b. Ensure the iptables SNAT rules were installed or updated
   appropriately on the subcloud controller nodes.
c. Log into a worker node of the subcloud and ensure sudo commands
   could be issued without LDAP timeout.

In general, tcpdump was also used to ensure the SNAT translation was
actually happening.

Closes-Bug: #2056560
Depends-On: https://review.opendev.org/c/starlingx/config/+/912261

Change-Id: If583b8eec7a385fb9b38e3ff80d58f5d842fe944
Signed-off-by: Steven Webster <steven.webster@windriver.com>
2024-03-08 09:27:18 -05:00
Andre Kantek 90a231f2bd Introduce Puppet variables for primary and secondary pool addresses.
This change create classes to receive the network addresses per
family as they will be generated by sysinv based on the pools
associated with the network. The current classes will be filled with
the primary pool addresses, to be used by manifests that don't have
a preferred protocol address family.

Test Plan:

[PASS] AIO-SX, Standard installation (IPv4 and IPv6)
       - using the dependency change the secondary pool was introduced
       - system was lock/unlocked and no puppet manifests were
          detected
       - inspection of system.yaml and controller-0.yaml to verify
         variables content
       - no alarms or disabled services were found
[PASS] For standard systems during upgrade, simulate node unlock by:
       - Clearing the "network_addresspools" table after Ansible
         execution and before DM configuration.
       - Installing remaining nodes with the table empty. This mimics
         the post-upgrade scenario.

Story: 2011027
Task: 49680

Depends-On: https://review.opendev.org/c/starlingx/config/+/911114

Change-Id: I1520e620fc51d339ba80efd2c43e10bb715f78c5
Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
2024-03-08 07:13:29 -03:00
Igor Soares 8e0d873674 Set default options for the application framework
Set the 'missing_auto_update' and 'fluxcd_hr_reconcile_check_delay'
options to 'True' and '60' (seconds), respectively.

The 'missing_auto_update' option sets the default behavior for
automatically updating apps if not specified in the application
metadata.

The 'fluxcd_hr_reconcile_check_delay' option sets the default delay that
the application framework should wait before checking the FluxCD
reconciliation result.

Test Plan:
PASS: AIO-SX fresh install
PASS: Build platform-integ-apps without the 'upgrades' metadata section.
      Confirm that the default 'missing_auto_update' value was correctly
      inserted into 'auto_update' field of the 'kube_app_bundle'
      database table.
      Confirm that the default reconciliation delay was respected.
PASS: Change the 'missing_auto_update' value to False and the
      'fluxcd_hr_reconcile_check_delay' value to 30 in sysinv.conf.
      Add a new version of platform-integ-apps without the 'upgrades'
      metadata section to /usr/local/share/applications/helm/.
      Restart sysinv-conductor.
      Confirm that the 'auto_update' column for the new version is
      False.
      Confirm that the application framework will not update to the new
      version.
      Apply platform-integ-apps and check if the new reconciliation
      delay was respected.

Depends-on: https://review.opendev.org/c/starlingx/config/+/911383

Story: 2010929
Task: 49671

Change-Id: Ia30c1ca7fa78cd3980bbb6e7352ed529eb6c58dd
Signed-off-by: Igor Soares <Igor.PiresSoares@windriver.com>
2024-03-07 15:25:45 -03:00
Zuul 33ae0f5c72 Merge "Distribute slow REQs to 2nd USM service backend" 2024-03-06 22:08:19 +00:00