The DC OCF scripts were not updated over the switch to Debian
in StarlingX 8.0. As a result, it could lead to orphan processes
over the service restart or controller swact. The orphan processes
consume resources and perform duplicate/obsolete tasks (e.g.
auditing the same subclouds as the corresponding worker processes)
until their work queues are empty.
This commit fixes up the pgrep option to restore the functionality
of the confirm_stop function of the OCF script. Processes that
fail to be terminated will get killed.
Test Plan:
- Deploy a small DC system. Verify that all DC services can
be started, stopped and restarted by SM.
- Deploy a large DC system with many subclouds. Reduce the
thread_pool_size of dcmanager-audit-worker. Let the system
soak for a couple of hours. Restart the service in the
middle of the audit cycle. Verify that dcmanager-audit-worker
sevice was successfully restarted and there are no orphan
processes.
Closes-Bug: 2064368
Change-Id: Ie5cbc89cde374e32d4e0a3799a9f8833c071d206
Signed-off-by: Tee Ngo <tee.ngo@windriver.com>
The sysinv API call for certificate installation with type
openldap_ca will extract ca data included in the certificate bundle
and include it in the 'system-local-ca' ca.crt field.
Modified dcmanager to perform the call using this structure, passing
a bundle with TLS cert + CA cert + TLS key from the 'system-local-ca'
in the SystemController.
This code is called during DX subcloud upgrade and is used to keep
the current 'system-local-ca' on the subcloud consistent with the
one in the SystemController.
Test plan:
PASS: In a DC w/ DX subcloud in stx 9:
- Perform cert-manager migration.
- Upgrade the SystemController.
- Verify system-local-ca secret content in the
SystemController and the subcloud.
- Start orchestrated upgraded for th DX subcloud.
- Verify dcmanager/state.log. After the step
"Stage: 2, State: transferring CA certificate"
verify the system-local-ca secret content in the subcloud.
The secret should have been replaced to match the one in
the SystemController.
- While c1 is upgrading, verify OpenLDAP by creating user
in SystemController with 'ldapusersetup' and log into it
in the subcloud.
Story: 2009811
Task: 49044
Change-Id: I42e6308f066126f903738f4e3c319c6027c8cb0b
Signed-off-by: Marcelo Loebens <Marcelo.DeCastroLoebens@windriver.com>
StarlingX stopped supporting CentOS builds in the after release 7.0.
This update will strip CentOS from our code base. It will also remove
references to the failed OpenSUSE feature as well.
Story: 2011110
Task: 49949
Change-Id: If8c5d8d04e0a5ae766239912886f93332614fa4e
Signed-off-by: Scott Little <scott.little@windriver.com>
Improves unit test coverage for dcmanager's subclouds API
from 55% to 90%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49725
Change-Id: If31005a8f3420e94dd17f15bdac97af5253e8d5a
Signed-off-by: Raphael Lima <Raphael.Lima@windriver.com>
Refactor the subcloud_get_all_with_status function to
query only the endpoint_type and sync_status from the
subcloud_status table for improved efficiency.
Test Plan:
PASS: Get all subclouds with the right sync_status
PASS: Create a dcmanager strategy
- Fail if sync_status = Unknown
Story: 2011106
Task: 49893
Change-Id: Ie99abc6cb820800a632f1fd90ee7d7e0869a8312
Signed-off-by: Hugo Brito <hugo.brito@windriver.com>
This commit includes new unit tests for subcloud_manager.py,
covering new test cases in deploy, add, delete, update, compose,
backup and restore, redeploy, backup, prestage and migrate
operations.
Test plan:
1) PASS: Run tox py39, pylint and pep8 envs and
verify that they are all passing.
2) PASS: Check 'tox -e cover' command output.
Coverage increased from 70% to 79%
Depends-On: https://review.opendev.org/c/starlingx/distcloud/+/914074
Story: 2007082
Task: 49618
Change-Id: Ibfd30fc616c5c756ad73f3a33432411d7d189812
Signed-off-by: Swapna Gorre <swapna.gorre@windriver.com>
This commit updates the peer group association sync status to
'out-of-sync' after the user updates the
peer-controller-gateway-address attribute of the system-peer object.
This commit also modifies the subcloud update function to update the
subcloud route whenever the systemcontroller_gateway_address is
updated on the primary side and synced to the secondary.
It also adds an informative message to remind the caller to run the
sync command after updating the peer-controller-gateway-address.
Test Plan:
1. PASS: Do the following steps:
- Create a system peer with an incorrect systemcontroller
gateway address that's inside the management subcloud, but
outside the reserved IP range and then create an association.
Verify that the secondary subcloud and a route was created
using the incorrect IP.
- Update the system peer with the correct systemcontroller
gateway address on the primary site. Verify that the PGA
sync status is set to 'out-of-sync' on both sites.
- Sync the PGA and verify that the secondary subcloud
systemcontroller gateway address was updated and that the
old route was deleted and a new one using the new address
was created.
- Migrate the SPG to the non-primary site and verify that
it completes successfully and that the subcloud becomes
online and managed.
2. PASS: Repeat the first step of test case #1, but use an incorrect
address that's outside the management subnet. Then create
a PGA and verify that it fails due to the following
validation:
"systemcontroller_gateway_address invalid: Address must be in
subnet <management subnet>"
3. PASS: Repeat the first step of test case #1, but use an incorrect
address that's inside the reserved IP range. Then create
a PGA and verify that it fails due to the following
validation:
"systemcontroller_gateway_address invalid, is within
management pool <ip range>"
4. PASS: Create a system peer with a correct systemcontroller gateway
address for the first time and then create an association.
Verify that the secondary subcloud and a route was created
using the correct IP.
5. PASS: Update an attribute of the subcloud (e.g. the subcloud
description) on the primary site and verify that the sync
status chages to 'out-of-sync' on both sites, then run
the PGA sync operation and verify that the attribute was
synced to the secondary subcloud on the peer site.
Closes-Bug: 2062372
Change-Id: Ibffe6c86656a56a85d10deca54c161bbed7f0d17
Signed-off-by: Gustavo Herzmann <gustavo.herzmann@windriver.com>
Improves unit test coverage for dcmanager's system_peers
API from 68% to 98%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49682
Change-Id: Iec150e1df79c48e1afddebe612dbd27c3e742685
Signed-off-by: rlima <Raphael.Lima@windriver.com>
This commit includes new unit tests for subcloud_manager.py,
covering new test cases in deploy, add, delete, update, compose,
backup and restore, redeploy, backup, prestage and migrate
operations.
Test plan:
1) PASS: Run tox py39, pylint and pep8 envs and
verify that they are all passing.
2) PASS: Check 'tox -e cover' command output.
Coverage increased from 61% to 70%
Depends-On: https://review.opendev.org/c/starlingx/distcloud/+/913989
Story: 2007082
Task: 49618
Change-Id: I05604a4940eac62b311dd8476498965a0f021be0
Signed-off-by: Swapna Gorre <swapna.gorre@windriver.com>
Improves unit test coverage for dcmanager's sw_update_options
API from 19% to 100%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49667
Change-Id: I626bdc6436b47e08f4fc51526e6d8c21ebd19e09
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's sw_update_strategy
API from 76% to 98%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49652
Change-Id: I1d403abcaf503cccfed2b8242703f29fbb26844f
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's subcloud_group
API from 77% to 98%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49641
Change-Id: I4f8a87379c770409590ed84b0d2cd50c06110199
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's phased_subcloud_deploy
API from 81% to 99%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49319
Change-Id: Ie649e35dd17796e1cd4e7e7b452b32c7c2b5cd9f
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's subcloud_backup
API from 81% to 98%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49581
Change-Id: I7cb6b661dd75a9d92f31113ff7ba5a887e489abc
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's subcloud_deploy
API from 85% to 99%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49578
Change-Id: Ie8aa1feb9f5f1f7e9d73b76112d2f2a6db528e6a
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's subcloud_peer_group API
from 50% to 99%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49320
Change-Id: If327a7bbef984fa37d26e27e1d4994a09a97dce8
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's peer_group_association API
from 65% to 99%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49318
Change-Id: I547a3bebeb3b88dab11800c51ee382b210f38830
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Classes are created for logical structuring and
referenced with its respective object.
Test plan:
1) PASS: Run tox py39, pylint and pep8 envs and
verify that they are all passing.
2) PASS: Check 'tox -e cover' command output.
Depends-On: https://review.opendev.org/c/starlingx/distcloud/+/913988
Story: 2007082
Task: 49618
Change-Id: I60fde3af0c34bff0334fc6578a7d7652d36ecb70
Signed-off-by: Swapna Gorre <swapna.gorre@windriver.com>
Leverage mock methods from base.py
Test plan:
1) PASS: Run tox py39, pylint and pep8 envs and
verify that they are all passing.
2) PASS: Check 'tox -e cover' command output.
Story: 2007082
Task: 49618
Change-Id: If86b586933930749c7a0a7776fa397b6a7587ce0
Signed-off-by: Swapna Gorre <swapna.gorre@windriver.com>
Improves unit test coverage for dcmanager's orchestrator/states/software
functionality from 77% to 100%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2010676
Task: 49313
Change-Id: Id3de824406df4d4c6c6504aa07e95dc457e7c2df
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's orchestrator/states/software/cache
functionality from 63% to 99%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2010676
Task: 49314
Change-Id: I7cb03f657489e8b1523472f4c63de173c936de95
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's orchestrator/states/prestage
functionality from 78% to 99%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49310
Change-Id: If041d9aede8f93b99b8cc1001f99088c1a1c77be
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's orchestrator/states/firmware
functionality from 70% to 100%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49312
Change-Id: I00895fb7616d0c3eeb54f623a8a1ccd223169bd9
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves UT coverage for dcmanager's manager/service.py
from 72% to 92%
Test plan:
1) PASS: Run tox py39, pylint and pep8 envs and
verify that they are all passing.
2) PASS: 'tox -e cover' command output is 92%
Story: 2007082
Task: 49605
Change-Id: I72a361bf89d3df32f71e78e16e107b183974c53e
Signed-off-by: Swapna Gorre <swapna.gorre@windriver.com>
If the subcloud rehome_data contains an incorrect bootstrap-address in
site A and the user migrates the corresponding peer group to site B,
the migration would fail. Subsequently, it will have the 'rehome-failed'
deploy-status in site B and 'rehome-pending' deploy-status in site A.
Then the user won't be able to update the bootstrap-address in either
site due to the following restrictions:
a) Primary site (site A) is not the current leader of the peer group;
b) Update in non-primary site (site B) is not allowed.
To fix this issue, the following changes are made:
1. In the non-primary site, if the subcloud deploy-status is
rehome-failed and the primary site is unavailable, updating
the bootstrap-values and bootstrap-address will be allowed, and the PGA
will be marked as out-of-sync.
2. Modify audit to automatically sync the rehome_data from non-primary
site to primary site if subcloud in the non-primary site is managed and
online and the PGA is out-of-sync.
Additional fix for the system_leader_id issue: When migrating SPG from
one site to another, if all of the subclouds rehome fail, the leader id
of the SPG in the target site has already been updated to the target
site's UUID. However, in the source site, the leader id is not updated
to the target UUID. The fix ensures that regardless of the migration's
success, only if the migration completes, the leader id in both sites
should be updated to the target UUID.
Test plan:
Pre-Steps: 1. Create the system peer from Site A to Site B
2. Create System peer from Site B to Site A
3. Create the subcloud peer group in the Site A
4. Add a subcloud with an incorrect bootstrap-address
to the peer group
5. Create peer group association to associate system peer
and subcloud peer group - Site A
6. Check current sync status in sites A and B. Verify
they are 'in-sync'.
7. Run migration for the subcloud peer group from Site B.
8. Verify 'rehome-failed' deploy-status in both sites.
PASS: Verify that the bootstrap-address can be updated in site B when
site A is down, and the PGA sync status is set to out-of-sync
in site B. Also, verify that the audit will sync the rehome_data
to site A and change back the PGA to in-sync once the reattempt of
migration is successful and site A is up.
PASS: Verify that the bootstrap-values and bootstrap-address are
the only fields that can be updated in site B when site A is down.
PASS: Verify that the update of bootstrap-address was rejected in site B
when site A is up.
PASS: Verify that even if all of the subclouds in an SPG experience
rehome failures, the system_leader_id in both sites is updated to
the target's UUID.
PASS: Verify that when site A is always online or recovered during
the migration to site B, the subcloud deploy_status in both sites
is "rehome-failed" after the migration completes. In this
scenario, site A can migrate the subcloud back, even though it's
still failed. However, after correcting the bootstrap-address in
site A, the reattempt of migration in site A succeeds.
Closes-Bug: 2057981
Change-Id: I999dbf035e29950fd823e9cdb087160ce40fd4ca
Signed-off-by: lzhu1 <li.zhu@windriver.com>
This commit changes the bootstrap address parameter from
"bootstrap-address" to "bootstrap_address" during the subcloud update
call made during the PGA sync operation. This fixes the issue where
the bootstrap_address was not being updated on the peer site, as the
subcloud update API expects the "bootstrap_address" parameter, with
an underscore.
Test Plan:
1. PASS - Create a peer group association and let it do the initial
sync. Modify the bootstrap-address of the subcloud using
the subcloud update command and then run the PGA sync
command. Verify that the rehome_data of the secondary
subcloud was updated with the new address.
Closes-Bug: 2057973
Change-Id: Ib5786a56c90f771b940e740bc095ebc8168d2830
Signed-off-by: Gustavo Herzmann <gustavo.herzmann@windriver.com>
The current message prints the error_msg.keys() directly, which returns
a dict_keys object. This commit fix the SPA sync_message by casting it
to a list.
This commit also fixes a log message where the subcloud name and peer
name order was inverted.
Test Plan:
1. PASS - Try to sync a PGA that has the rehome-failed state in the
non-primary site, causing the sync to fail. Verify that the
sync_message prints the subcloud list without including the
'dict_keys' string;
2. PASS - Introduce an error during the _delete_subcloud() function
and then try to delete the peer group association. Verify
that the sync_message prints the subcloud list without
including the 'dict_keys' string;
3. PASS - During SPG migration, verify that the modified log message
prints the subcloud name and peer site name in the correct
order.
Closes-Bug: 2057934
Change-Id: Idfdc2cc1731a51c6098a06863b2469c3085aa813
Signed-off-by: Gustavo Herzmann <gustavo.herzmann@windriver.com>
This commit adds a new rehome_data semantic check when attempting to
update which peer group a subcloud is part of. If rehome_data
is not already present, the request payload must contain both the
bootstrap-values and bootstrap-address; otherwise, the request will
be aborted.
Additionally, this commit updates the rehome_data during the subcloud
rename operation, guarenteeing that name is up-to-date.
Test Plan:
1. PASS - Attempt to add a subcloud with no rehome_data to a peer
group under the following conditions and verify that it
fails:
- Without passing bootstrap-address and bootstrap-values
- Passing only the bootstrap-address
- Passing only the bootstrap-values
2. PASS - Add a subcloud with rehome_data to a peer group and verify
that the operation succeeds regardless of the presence of
bootstrap-address and bootstrap-values.
3. PASS - Rename a subcloud with rehome_data and verify that the
rehome_data name field is updated to the new name.
4. PASS - Rename a subcloud without rehome_data and verify that the
rename operation still works.
5. PASS - Migrate a renamed subcloud back and forth and verify that
the migration completes successfully.
Closes-Bug: 2055883
Closes-Bug: 2056796
Change-Id: I4403dc50062db07a0de24e04139e3af8087c546f
Signed-off-by: Gustavo Herzmann <gustavo.herzmann@windriver.com>
Log subcloud health output for quick diagnosis. With this change,
the user would not need to log into the subcloud to check
health output. Sometimes, the health condition is resolved by
the time the user runs the system health-query command in the
subcloud. As the result, more time would be required to determine
what caused the failed health check in the first place.
Test Plan:
- Verify successful subcloud backup with the change.
- Induce a health condition (e.g. management affecting
alarm). Verify that subcloud backup request is rejected
and the subcloud health output is captured in
dcmanager-api.log
Closes-Bug: 2056721
Change-Id: I32fea354f9cf594ea45d412359a9090e7b1bfb83
Signed-off-by: Tee Ngo <tee.ngo@windriver.com>