Integrate dcorch master clients with the optimized OpenStackDriver.
Update the methods of creating subcloud keystone, dcdbsync and sysinv
clients.
Add subcloud management ip parameter to RPC calls between dcorch-engine
master and workers services to construct client endpoints.
Test Plan:
PASS: Change the admin password on the system controller using
the command "openstack --os-region-name SystemController user
password set". Verify that the admin password is synchronized
to the subcloud and the dcorch receives the corresponding sync
request, followed by successful execution of sync resources for
the subcloud.
PASS: Unmanage and then manage a subcloud, and verify that the initial
sync is executed successfully for that subcloud.
PASS: Verify successful dcorch audits every 5 minutes.
Story: 2011106
Task: 50113
Change-Id: Idfa493068dc7d2bac21aac2871238b9f0de12c9d
Signed-off-by: lzhu1 <li.zhu@windriver.com>
This commit adds a new management_ip field to the dcorch subcloud
table. This field will be used to build the subclouds service endpoints
after the OptimizedOpenStackDriver [1] is integrated into dcorch.
The DB upgrade script and related upgrade tests will be done in a
separate commit.
Test Plan:
1. PASS - Run the dcorch database migration script to update it to
version 009, verify that the management_ip column is added
to the subcloud table.
2. PASS - Add a new subcloud and verify in dcorch DB that the a new
subcloud item was added with the correct management_ip field.
3. PASS - Run a subcloud update with network reconfiguration, changing
the management_ip, verify that in dcorch DB that the subcloud
item was updated correctly.
[1]: https://review.opendev.org/c/starlingx/distcloud/+/918311
Story: 2011106
Task: 50105
Change-Id: If1c299700fd769dc8f89172c5088fe7de66d0774
Signed-off-by: Gustavo Herzmann <gustavo.herzmann@windriver.com>
dcmanager-orchestrator call the k8s python client to perform a
number of operations. The k8s python client creates temp files under
/tmp and continues use these tmp files for the life-cycle of the
processes.
However systemd-tmpfiles-clean.service will run every day to clean up
files in /tmp dir that are older than 10 days. If the k8s client code
is not triggered for more than 10 days (thus its temp files are not
accessed for more than 10 days), these temp files will be removed as
part of the cleanup. Certain dcmanager-orchestrator operations then
starts to fail with an error that the tmp file is no longer there.
This is a known issue of kubernetes python client:
https://github.com/kubernetes-client/python/issues/765
The commit fixes this issue by setting TMPDIR to /var/run/dcmanager_
orchestrator_tmp when sm starts dcmanager-orchestrator.
The following similar commits were added for sysinv,dcmanager
services in the past
https://review.opendev.org/c/starlingx/config/+/736761https://review.opendev.org/c/starlingx/distcloud/+/736247
Closes-bug: 2066048
Change-Id: I3d39f5b034e3ef2e6ad9636e86f26f0e93f16d45
Signed-off-by: amantri <ayyappa.mantri@windriver.com>
This commit modifies the dcmanager audit service to use the new
OptimizedOpenStackDriver [1].
It also fixes a typo in the OptimzedOpenStackDriver where the
get_cached_region_clients_for_thread was referencing the
OpenStackDriver class.
Test Plan:
1. PASS - Add a new subcloud and verify that it becomes online and that
its sync_status becomes in-sync;
2. PASS - Verify through the logs that the fetch_subcloud_mgmt_ips
function is being called to populate the endpoint cache;
3. PASS - Remove the subcloud endpoints from the keystone database,
restart the audit service and verify that the subcloud is
still audited correctly;
4. PASS - Leave the system running for 12h and check that new tokens
are obtained whenever they are close to expire (~1h).
[1]: https://review.opendev.org/c/starlingx/distcloud/+/918311
Story: 2011106
Task: 50111
Change-Id: Ia24a72a77a60d36cee5a31482fe71a341d2e7d83
Signed-off-by: Gustavo Herzmann <gustavo.herzmann@windriver.com>
1. Refactor dcorch's generic_sync_manager.py and initial_sync_manager
into a main process manager and a worker manager. The main manager
will handle the allocation of eligible subclouds to each worker.
2. Rename the current EngineService to EngineWorkerService and introduce
a new EngineService for the main process, similar to
DCManagerAuditService and DCManagerAuditWorkerService.
3. Rename the current RPC EngineClient to EngineWorkerClient and
introduce a new EngineClient. Adapt the RPC methods to accommodate
the modifications in these main process managers and worker managers.
4. Move master resources data retrieval from each sync_thread to engine
workers.
5. Implement 2 new db APIs for subcloud batch sync and state updates.
6. Remove code related to sync_lock and its associated db table schema.
7. Add ocf script for managing the start and stop of the dcorch
engine-worker service, and make changes in packaging accordingly.
8. Bug fixes for the issues related to the usage of
base64.urlsafe_b64encode and base64.urlsafe_b64decode in python3.
9. Update unit tests for the main process and worker managers.
Test Plan:
PASS: Verify that the dcorch audit runs properly every 5 minutes.
PASS: Verify that the initial sync runs properly every 10 seconds.
PASS: Verify that the sync subclouds operation runs properly every 5
seconds.
PASS: Successfully start and stop the dcorch-engine and
dcorch-engine-worker services using the sm commands.
PASS: Change the admin password on the system controller using
the command "openstack --os-region-name SystemController user
password set". Verify that the admin password is synchronized
to the subcloud and the dcorch receives the corresponding sync
request, followed by successful execution of sync resources for
the subcloud.
PASS: Unmanage and then manage a subcloud, and verify that the initial
sync is executed successfully for that subcloud.
PASS: Verify the removal of the sync_lock table from the dcorch db.
Story: 2011106
Task: 50013
Change-Id: I329847bd1107ec43e67ec59bdd1e3111b7b37cd3
Signed-off-by: lzhu1 <li.zhu@windriver.com>
This commit introduces the enroll command API.
Test Plan:
PASS: Deploy a system controller and run subcloud add
enroll in CLI without bootstrap-values. Verify that
the API returns an error.
PASS: Deploy a system controller and run subcloud add
enroll passing all required parameters in CLI. Verify
in dcmanager log that the API returned a success code.
Story: 2011100
Task: 50005
Change-Id: I525d26166dbb7d7afcb26b96191b5045eee7b52d
Signed-off-by: Gustavo Pereira <gustavo.lyrapereira@windriver.com>
This commit implements an optimized OpenStackDriver that builds the
endpoints for subclouds directly using their management IPs instead of
retrieving them from the keystone database. Subcloud endpoints will be
removed from Keystone due to performance reasons in a future commit.
- The driver now accepts a fetch_subcloud_ips function as an argument.
- This function retrieves a dictionary of subcloud region names to
their management IPs (without a region argument) or a specific
subcloud's management IP (with a region argument).
- Dcmanager services and dcorch should implement their own
fetch_subcloud_ips function to provide the driver with subcloud
IP information.
This approach improves performance and prepares for the removal of
subcloud endpoints from Keystone.
NOTE: The original OpenStackDriver, KeystoneClient and EndpointCache
will be removed in a future commit, after the DC services are updated
to use the new optimized OpenStackDriver. The optimized one will be
integrated with the DC services in separate commits.
Test Plan:
Remove the subcloud endpoints from the keystone DB, modify the
dcmanager-audit service to use the new classes and then run the
following tests:
1. PASS - Verify that audit is able to get both the RegionOne and
subclouds endpoints without issues using the new driver.
2. PASS - Verify that the hourly token refresh only triggers the
refresh of central region token and endpoints.
3. PASS - Verify that when adding a new subcloud, the endpoint cache
is updated to include the endpoints for the new subcloud.
Story: 2011106
Task: 50035
Change-Id: I146592eb17f6a5433eae25f20e8de2f01c813055
Signed-off-by: Gustavo Herzmann <gustavo.herzmann@windriver.com>
This commit includes new unit tests for system_peer_manager.py,
covering new test cases in sync subclouds, delete, update
operations.
Test plan:
1) PASS: Run tox py39, pylint and pep8 envs and
verify that they are all passing.
2) PASS: Check 'tox -e cover' command output.
Coverage increased from 69% to 92%
Depends-On: https://review.opendev.org/c/starlingx/distcloud/+/915055
Story: 2007082
Task: 49713
Change-Id: I0b5a7d024f7b3a5c5ef4adb4aa29dc9d0e7f9de4
Signed-off-by: Swapna Gorre <swapna.gorre@windriver.com>
This commit adds a new parameter (patch) to the patch orchestration,
allowing the upload and apply of a specific patch file to a subcloud.
This change is essencial for enabling the new USM feature on subclouds
running older version.
Test Plan:
PASS: Fail if perform patch orchestation using --patch parameter
with the subcloud and systemcontroller with the same version.
PASS: Perform patch orchestration using --patch parameter
- The patch should be uploaded, applied and installed to the subcloud
PASS: Perform patch orchestration using --patch and --upload-only
- The patch should be uploaded to the subcloud
Obs.:
1. Tests were performed without the patch being applied to the
systemcontroller
2. Tests were performed with subcloud in-sync and out-of-sync
Story: 2010676
Task: 50012
Change-Id: I7eb2940c708668b17ff93977b5622c3cff4cb3da
Signed-off-by: Hugo Brito <hugo.brito@windriver.com>
This commit will be updating default password occurrences on
distcloud files to comply with new password rules, that will be:
- Minimum 12 characters
- At least 1 Uppercase letter
- At least 1 number
- At least 1 special character
- Cannot reuse past 5 passwords
- Default password expiry period should be set to 90 days.
The default passwords are updated as follows:
St8rlingX* -> St8rlingXCloud*
Test Plan:
PASS: Run build-pkgs -c -p distributedcloud
Story: 2011084
Task: 49824
Change-Id: I8c954ae023493048fb98d64b2df8df97a00ae1b7
Signed-off-by: Karla Felix <karla.karolinenogueirafelix@windriver.com>
This commit includes new unit tests for subcloud_manager.py,
covering new test cases in deploy, backup, migrate, install,
rename operations.
Test plan:
1) PASS: Run tox py39, pylint and pep8 envs and
verify that they are all passing.
2) PASS: Check 'tox -e cover' command output.
Coverage increased from 79% to 90%
Depends-On: https://review.opendev.org/c/starlingx/distcloud/+/914075
Story: 2007082
Task: 49618
Change-Id: I1219ce7c5f6cebc0d1cb564905eb5cc5b4045540
Signed-off-by: Swapna Gorre <swapna.gorre@windriver.com>
Update the proposed action displayed by "dcmanager subcloud error"
command when a subcloud is in bootstrap-failed state.
Instead of suggesting the deletion and reinstall of the subcloud,
it should indicate the usage of "dcmanager subcloud deploy resume"
after the cause of the failure has been resolved.
Test plan:
1. PASS: deploy a subcloud with the wrong password in
bootstrap-values file and verify that the error message
displayed in "dcmanager subcloud error <subcloud>"
informs the new proposed action.
Closes-Bug: 2065189
Change-Id: Ie41b38c5b527424bdd64ca5af1ed59c91bf03e70
Signed-off-by: Raphael Lima <Raphael.Lima@windriver.com>
Created system_peer_manager object.
Leveraged mock methods from base.py and
moved duplicate mock specifications to
TestSystemPeerManager
Test plan:
1) PASS: Run tox py39, pylint and pep8 envs and
verify that they are all passing.
2) PASS: Check 'tox -e cover' command output.
Story: 2007082
Task: 49713
Change-Id: I4a45dba61308e2e108f315423d314fd94c99aac1
Signed-off-by: Swapna Gorre <swapna.gorre@windriver.com>
Add usm to the list of users whose credentials needs to be
replicated in the subcloud. For any software commands to work
in side subcloud, after it is made to 'managed' state, the 'usm'
user credentials needs to be replicated in the subcloud.
Test Plan:
PASS: Install DC subcloud, ensure it is in managed state,
and execute software commands (Eg. software list)
Closes-Bug: 2063460
Change-Id: I4af841dcc51dc7fea2a6a12a37728cb9e0f8b59c
Signed-off-by: Joseph Vazhappilly <joseph.vazhappillypaily@windriver.com>
The DC OCF scripts were not updated over the switch to Debian
in StarlingX 8.0. As a result, it could lead to orphan processes
over the service restart or controller swact. The orphan processes
consume resources and perform duplicate/obsolete tasks (e.g.
auditing the same subclouds as the corresponding worker processes)
until their work queues are empty.
This commit fixes up the pgrep option to restore the functionality
of the confirm_stop function of the OCF script. Processes that
fail to be terminated will get killed.
Test Plan:
- Deploy a small DC system. Verify that all DC services can
be started, stopped and restarted by SM.
- Deploy a large DC system with many subclouds. Reduce the
thread_pool_size of dcmanager-audit-worker. Let the system
soak for a couple of hours. Restart the service in the
middle of the audit cycle. Verify that dcmanager-audit-worker
sevice was successfully restarted and there are no orphan
processes.
Closes-Bug: 2064368
Change-Id: Ie5cbc89cde374e32d4e0a3799a9f8833c071d206
Signed-off-by: Tee Ngo <tee.ngo@windriver.com>
The sysinv API call for certificate installation with type
openldap_ca will extract ca data included in the certificate bundle
and include it in the 'system-local-ca' ca.crt field.
Modified dcmanager to perform the call using this structure, passing
a bundle with TLS cert + CA cert + TLS key from the 'system-local-ca'
in the SystemController.
This code is called during DX subcloud upgrade and is used to keep
the current 'system-local-ca' on the subcloud consistent with the
one in the SystemController.
Test plan:
PASS: In a DC w/ DX subcloud in stx 9:
- Perform cert-manager migration.
- Upgrade the SystemController.
- Verify system-local-ca secret content in the
SystemController and the subcloud.
- Start orchestrated upgraded for th DX subcloud.
- Verify dcmanager/state.log. After the step
"Stage: 2, State: transferring CA certificate"
verify the system-local-ca secret content in the subcloud.
The secret should have been replaced to match the one in
the SystemController.
- While c1 is upgrading, verify OpenLDAP by creating user
in SystemController with 'ldapusersetup' and log into it
in the subcloud.
Story: 2009811
Task: 49044
Change-Id: I42e6308f066126f903738f4e3c319c6027c8cb0b
Signed-off-by: Marcelo Loebens <Marcelo.DeCastroLoebens@windriver.com>
StarlingX stopped supporting CentOS builds in the after release 7.0.
This update will strip CentOS from our code base. It will also remove
references to the failed OpenSUSE feature as well.
Story: 2011110
Task: 49949
Change-Id: If8c5d8d04e0a5ae766239912886f93332614fa4e
Signed-off-by: Scott Little <scott.little@windriver.com>
Improves unit test coverage for dcmanager's subclouds API
from 55% to 90%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49725
Change-Id: If31005a8f3420e94dd17f15bdac97af5253e8d5a
Signed-off-by: Raphael Lima <Raphael.Lima@windriver.com>
Refactor the subcloud_get_all_with_status function to
query only the endpoint_type and sync_status from the
subcloud_status table for improved efficiency.
Test Plan:
PASS: Get all subclouds with the right sync_status
PASS: Create a dcmanager strategy
- Fail if sync_status = Unknown
Story: 2011106
Task: 49893
Change-Id: Ie99abc6cb820800a632f1fd90ee7d7e0869a8312
Signed-off-by: Hugo Brito <hugo.brito@windriver.com>
This commit includes new unit tests for subcloud_manager.py,
covering new test cases in deploy, add, delete, update, compose,
backup and restore, redeploy, backup, prestage and migrate
operations.
Test plan:
1) PASS: Run tox py39, pylint and pep8 envs and
verify that they are all passing.
2) PASS: Check 'tox -e cover' command output.
Coverage increased from 70% to 79%
Depends-On: https://review.opendev.org/c/starlingx/distcloud/+/914074
Story: 2007082
Task: 49618
Change-Id: Ibfd30fc616c5c756ad73f3a33432411d7d189812
Signed-off-by: Swapna Gorre <swapna.gorre@windriver.com>
This commit updates the peer group association sync status to
'out-of-sync' after the user updates the
peer-controller-gateway-address attribute of the system-peer object.
This commit also modifies the subcloud update function to update the
subcloud route whenever the systemcontroller_gateway_address is
updated on the primary side and synced to the secondary.
It also adds an informative message to remind the caller to run the
sync command after updating the peer-controller-gateway-address.
Test Plan:
1. PASS: Do the following steps:
- Create a system peer with an incorrect systemcontroller
gateway address that's inside the management subcloud, but
outside the reserved IP range and then create an association.
Verify that the secondary subcloud and a route was created
using the incorrect IP.
- Update the system peer with the correct systemcontroller
gateway address on the primary site. Verify that the PGA
sync status is set to 'out-of-sync' on both sites.
- Sync the PGA and verify that the secondary subcloud
systemcontroller gateway address was updated and that the
old route was deleted and a new one using the new address
was created.
- Migrate the SPG to the non-primary site and verify that
it completes successfully and that the subcloud becomes
online and managed.
2. PASS: Repeat the first step of test case #1, but use an incorrect
address that's outside the management subnet. Then create
a PGA and verify that it fails due to the following
validation:
"systemcontroller_gateway_address invalid: Address must be in
subnet <management subnet>"
3. PASS: Repeat the first step of test case #1, but use an incorrect
address that's inside the reserved IP range. Then create
a PGA and verify that it fails due to the following
validation:
"systemcontroller_gateway_address invalid, is within
management pool <ip range>"
4. PASS: Create a system peer with a correct systemcontroller gateway
address for the first time and then create an association.
Verify that the secondary subcloud and a route was created
using the correct IP.
5. PASS: Update an attribute of the subcloud (e.g. the subcloud
description) on the primary site and verify that the sync
status chages to 'out-of-sync' on both sites, then run
the PGA sync operation and verify that the attribute was
synced to the secondary subcloud on the peer site.
Closes-Bug: 2062372
Change-Id: Ibffe6c86656a56a85d10deca54c161bbed7f0d17
Signed-off-by: Gustavo Herzmann <gustavo.herzmann@windriver.com>
Improves unit test coverage for dcmanager's system_peers
API from 68% to 98%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49682
Change-Id: Iec150e1df79c48e1afddebe612dbd27c3e742685
Signed-off-by: rlima <Raphael.Lima@windriver.com>
This commit includes new unit tests for subcloud_manager.py,
covering new test cases in deploy, add, delete, update, compose,
backup and restore, redeploy, backup, prestage and migrate
operations.
Test plan:
1) PASS: Run tox py39, pylint and pep8 envs and
verify that they are all passing.
2) PASS: Check 'tox -e cover' command output.
Coverage increased from 61% to 70%
Depends-On: https://review.opendev.org/c/starlingx/distcloud/+/913989
Story: 2007082
Task: 49618
Change-Id: I05604a4940eac62b311dd8476498965a0f021be0
Signed-off-by: Swapna Gorre <swapna.gorre@windriver.com>
Improves unit test coverage for dcmanager's sw_update_options
API from 19% to 100%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49667
Change-Id: I626bdc6436b47e08f4fc51526e6d8c21ebd19e09
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's sw_update_strategy
API from 76% to 98%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49652
Change-Id: I1d403abcaf503cccfed2b8242703f29fbb26844f
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's subcloud_group
API from 77% to 98%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49641
Change-Id: I4f8a87379c770409590ed84b0d2cd50c06110199
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's phased_subcloud_deploy
API from 81% to 99%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49319
Change-Id: Ie649e35dd17796e1cd4e7e7b452b32c7c2b5cd9f
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's subcloud_backup
API from 81% to 98%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49581
Change-Id: I7cb6b661dd75a9d92f31113ff7ba5a887e489abc
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's subcloud_deploy
API from 85% to 99%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49578
Change-Id: Ie8aa1feb9f5f1f7e9d73b76112d2f2a6db528e6a
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's subcloud_peer_group API
from 50% to 99%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49320
Change-Id: If327a7bbef984fa37d26e27e1d4994a09a97dce8
Signed-off-by: rlima <Raphael.Lima@windriver.com>
Improves unit test coverage for dcmanager's peer_group_association API
from 65% to 99%.
Test plan:
All of the tests were created taking into account the
output of 'tox -c tox.ini -e cover' command
Story: 2007082
Task: 49318
Change-Id: I547a3bebeb3b88dab11800c51ee382b210f38830
Signed-off-by: rlima <Raphael.Lima@windriver.com>