It's possible to also use --networks-file instead of
-n in the overcloud deploy command. This change adds
a secondary search string to the awk command to ensure
we capture the file regardless of which argument has
been used in the overcloud deploy script.
Resolves: rhbz#2064354
Change-Id: I13137b558b8a55f22462a2c6a26cdb8616fdc64e
If overcloud is deployed with fencing enabled, then
we should temporarily disable fencing for the duration of the update.
We are introducing a pre-undercloud update step to disable
it and post-update step to enable it back.
Train/queens supports command "sudo pcs property show" instead of
"sudo pcs property config" to verify stonith is disabled.
Change-Id: Ie6324fc2c9cdbeac6126e5bdbbce41f23e143be5
(cherry picked from commit 224d964ba7)
(cherry picked from commit 62359da2ce)
(cherry picked from commit 0f060bcc01)
In https://review.opendev.org/c/openstack/tripleo-upgrade/+/856723 we
removed the linter check as it wouldn't install correctly anymore.
It doesn't work as nothing get triggered and zuul won't merge the
change. So we use the noop action.
Closes-Bug: #1989005
Change-Id: I5438116b1c16454c76062306dc81615cd6959187
We also remove the duplicate zuul definition and keep
zuul.d/layout.yaml as it's the default on the tripleo project.
Change-Id: I17affa26617f62444f1b768a9bc41e2ff4dafac9
Closes-Bug: #1989005
The microversions aren't correct for an Queens undercloud
and a Newton overcloud, causing the workload testing to
fail with:
Creating overcloud instance instance_1742604c63
FAILURE: Instance failed to boot within 120 seconds
Version 2.60 is not supported by the API. Minimum is 2.1
and maximum is 2.38. (HTTP 406) (Request-ID: <snip>)
Change-Id: I578c13b91d5cb65229d18a87f3fcff0c28c14427
When upgrade-workloadsriov is set, workload is generated with an SRIOV
PF port on the existing external network
A mechanism has been added to remove workload previously created after
update/upgrade is completed, when upgrade-workloadcleanup is set
Change-Id: I969d901be4eeb93e0abe4e8b06f6c54399b52ea2
(cherry picked from commit 85c12706d1)
When upgrading the Undercloud in the FFU procedure a sequential
upgrade is carried out. If using rhos-release, the currently
running version of RHEL is used as the default.
This may be fine, but if we want to change that there is currently
no way of doing so.
This patch allows the ffu_undercloud_repo_args to optionally include
the appropriate default parameters needed for rhos-release and sets
them for the role execution. These parameters would be passed to
tripleo-upgrade via an extra var.
This is queens-only due to this whole mechanism being different in
later branches.
We also now remove the ffu_undercloud_repo_args/rhos_release bits
from defaults because they can be provided in the extra var from
outside instead.
Change-Id: I8284daebd4ddd5cb48ad2e2f510d688203dd8357
When upgrading the Undercloud in the FFU procedure a sequential
upgrade is carried out. If using rhos-release, the 'latest'
build is used by default, which is not what we want.
This patch extends the ffu_undercloud_repo_args to include the
appropriate default parameters needed for rhos-release and sets
them for the role execution.
This is queens-only due to this whole mechanism being different in
later branches.
Change-Id: Ib763e2350724022d71750f53b39d2652c9853a3c
In Ansible 2.8+ the following error occurs when using 'become'
on a task inclusion:
ERROR <class 'ansible.errors.AnsibleParserError'>: 'become'
is not a valid attribute for a IncludeRole
The rhos-release role already uses become correctly on individual
tasks so this argument isn't required on this task.
Change-Id: I072cfb33ff91cb765f8a241e1f865cd9f3ce5725
According to ansible documentation[1] the filter syntax shouldn't be
used in test since 2.5.
One of the strange outcome I've experienced is ceph update failure was
undetected and the job would end up in timeout.
Fix all the tests still using that idiom to the new one.
[1] https://docs.ansible.com/ansible/latest/user_guide/playbooks_tests.html#test-syntax
Change-Id: Ibee9655139c0b61ba04417ac39967d1f53793404
(cherry picked from commit 2a22df22ae)
This add those two options. They work like their ir overcloud
conterpart.
This add the possibility to add new parameters *during*
update/upgrade/ffwd if required.
Currently adjustement has been done only to update templates.
The other limitation is that this is currently comptatible only in
infrared context and thus is skipped for upstream CI.
Change-Id: I43148a0bc494cf3fc6ff109e7f3a4b94cf751d99
(cherry picked from commit 212c02fb4e)
(cherry picked from commit 29e8fc8470)
(cherry picked from commit f9cf26d8bf)
(cherry picked from commit 149225157b)
Passing a tags to the tripleo-upgrade role doesn't work as all the
initial set_facts tasks are skipped.
Make sure we always run those set_facts regardless of the tags used by
the user.
Change-Id: I62a2e21fd062e302a03b898730555e2ab7d5a542
Closes-Bug: #1843442
(cherry picked from commit 593afa2337)
(cherry picked from commit 5a41e20937)
(cherry picked from commit ef0194637f)
When doing the backport of that feature we missed the
ceph-upgrade-run.sh.j2 file as that command is not using heat in later
version.
Queen Only because, as said, later version use an external task for
ceph update.
Change-Id: I75f9748b5e0c75c3bed218dc2c001d0b38d8b61b
Closes-Bug: #1881119
We can encounter corner case pacemaker issues with parallel role
update. While we solve them, we need a way to disable parallel role
update.
Using a idiom mentionned in the ansible documentation[1] we start role
update by batch. When batch is 1, this is serial update, one role
after another.
This is the default.
[1] https://docs.ansible.com/ansible/latest/user_guide/playbooks_async.html
Change-Id: I03378557653d07113fa70782e5d22bf5e3e969b8
(cherry picked from commit 8d2027f1f1)
(cherry picked from commit a7704f6559)
(cherry picked from commit 52c2dda30a)
(cherry picked from commit b24ac90e24)
tripleo-ansible-inventory use overcloud as the name of the stack by
default. On some ci, that value may be different, this then produces
an undercloud only inventory file.
Change-Id: Ic420c0717165e01df99ad2368a23fc9fc10e71c1
Closes-Bug: #1857120
(cherry picked from commit 6483436d7d)
(cherry picked from commit 1da968309d)
(cherry picked from commit 889259d861)
The tripleo-upgrade has a workarounds logic which allows us to
apply patches or specific modifications prior and after some of
the upgrade steps. However, if these workarouds need to be applied
in the overcloud nodes, the only way to do it was via a bash script
which would iterate over the nodes and apply the patch or perform a
change on each of the nodes.
This patch adds a new workarounds field: ansible_hosts. When this
option will be present in the workaround and it will be different
than an empty string then the workaround will be aplied via Ansible in
the nodes specified in that ansible_hosts field. This ansible_hosts
option needs to be used in combination with the command one, as
the command will be transformed in a shell Ansible task which will
be executed in the nodes passed in the ansible_hosts option.
Example:
pre_overcloud_upgrade_prepare_workarounds:
- set_root_password:
patch: false
basedir: ''
id: ''
ansible_hosts: 'overcloud'
command: |
echo redhat | passwd root --stdin
will turn into a set_root_password.yaml Ansible playbook under
~/ansible_workarounds:
cat ~/ansible_workarounds/set_root_password.yaml
- hosts: overcloud
tasks:
- name: set_root_password workaround
shell: |
echo redhat | passwd root --stdin
When executing the workarounds, a new bash function ansible_patch
has been included which will take care of executing the generated
Ansible playbook.
Also, an optional input parameter could be passed to workarounds.sh,
when passed, it will be taken as input for the --limit option when
executing ansible-playbook. This way, we can execute a workaround
specifically in a server, instead of running it in all of them.
Change-Id: I421ebecfc5504ac2fd225de0c4fb0cbf735bbdaf
(cherry picked from commit e582d2f304)
(cherry picked from commit b6bac90695)
(cherry picked from commit 81035c28da)
In middle of Queens and Rocky release we introduced option to
set parallel execution of minor update on selected role types.
If the environment went through FFWD or was deployed before this
option got backported we need to update the roles_data.
By default we have update_serial either unset at all for OSP13 and OSP14 or
we have it set where Pacemaker enabled nodes, CephOSD nodes and Networkers
have it set to 1. This is mostly defensive precaution and we do allow running
in parallel for CephOSD and Networkers for production systems that did enough
testing on preprod or can take small outage. We should also parallelize it in
CI as we just waste time here.
Change-Id: I4cff09dc6aa9ac944b20a52ae087a8923f55209f
(cherry picked from commit 2044957694)
(cherry picked from commit ad0e9b3954)
Previously we ran role by role which is not necessary and not
intended way of production update. In case of proper testing on
preprod customers should run update as fast as possible which
means leveraging ability to run all roles at once and even all
nodes of selected roles at once. With this patch the Pacemaker
enabled roles will still update in serial but the roles will be
done in parallel. In case of 3 controller 3 database 3 messaging
setup it will look roughly like this:
Update of controller[0] database[0] messaging[0]
Update of controller[0] database[1] messaging[1]
Update of controller[1] database[2] messaging[2]
Update of controller[2]
This is due to not blocking between roles and each role taking
different amount of time to apply the update. At any moment the
pacemaker quorum is not broken.
Change-Id: Ib119210139886382726bc0ccddfdb4f7f6803015
(cherry picked from commit a2b433133d)
This broke FFWD tests, if we will merge next iteration we will
need proof of job running with this.
This reverts commit 55a5830f09.
Change-Id: I273aaca107d33a4ce952b73a58b43ba093444ff3
Change the mail author to use a generic address. I think it's better.
Change-Id: I6d535a6d4a0d8483ddb958a9d1dcb7810bccd468
(cherry picked from commit a51bd6f178)
(cherry picked from commit 6b34cc0700)
The --reverse option causes patch to fail even if the diff is
just few lines off. The reverse option to check if patch was
applied is not a good idea with yaml. It's fairly easy to create
reverse patch for patch that wasn't applied.
Change-Id: I4a1459344794f5d602dc1b781d15a591ea2ac135
(cherry picked from commit a4fa67c1f9)
Up to now, the minor updates workarounds was not being used very much
being a little bit left away in comparision with what it was implemented
for the upgrades one. This patch allows minor updates to benefit from
the same workarounds mechanism, at the same time that any improvement
in the upgrades mechanism will be available for updates too.
Also, it was removed the references to the {{ working_dir }} variable
in those shell tasks which have the argument changing directory to that
very same {{ working_dir }}, mostly to simplify the tasks and remove
redundancy.
Change-Id: Ibc57c51ff19ebad093c887bee545ca6a7d51827f
(cherry picked from commit ef15456503)
To speed up CI and development process it would be better to disable
this script. CI will fail anyway even without this validation. Users
will be able to enable it if/when they decides to use tripleo-upgrade
for production upgrade.
Closes-bug: #1844567
Change-Id: Ia5a767491c2f297b396c5cc937c1495e4267a4e3
(cherry picked from commit 72dd2c49e3)
(cherry picked from commit 87639f0bad)
Moved the file to README.rst for consistency with other OpenStack
project documentation.
Change-Id: I4754a085c6255f977142302d2bee135220056c4f
(cherry picked from commit e3e97beb53)
(cherry picked from commit 4f1d4f792f)
Remove the build of the documentation from tox, the project doesn't use
sphinx. That fix an error during the build of the documentation, because
the configuration file for sphinx is not found.
Change-Id: I577a05c2f7916bfca637ecb4451abbee5bd7714e
(cherry picked from commit 4776547196)
(cherry picked from commit 59d715fb25)
It's needed to apply post-ceph upgrade workarounds after ceph
upgrade is performed, not after pre-converge workarounds are
applied.
Change-Id: I3981324bb092f408dd597d895b2b2017fed516ba
(cherry picked from commit 9fec3161c6)
(cherry picked from commit 617b693edc)
Currenty, a lot of code was being repeted in the template which
would force to modify it in multiple places if a change was being
done in one of the workaround blocks. By using the Jinja2 macro,
which behaves as a function, we can reuse the code defining it in
one single place and calling to it multiple times.
Change-Id: I63db9e8b30c9f9e99c501300982b513b627bed07
(cherry picked from commit 8c37eaaa2b)
(cherry picked from commit d8d12b926d)
(cherry picked from commit c02dcb765b)
So far we are missing mechanism to apply workarounds(if needed)
pre- and post-upgrade of ceph cluster.
Change-Id: I5b7333d9bfb8954b3c52c66edc1090a8bd7dac17
(cherry picked from commit bd559cb50a)
(cherry picked from commit 7a29bec1ac)
The workarounds can be either patches posted on review.openstack.org
or arbitrary shell commands.
Below is an example of a workarounds:
---
pre_undercloud_upgrade_workaround:
- BZ#xxxxxyz:
patch: false
basedir: ''
id: ''
command: 'touch /home/stack/pre_workaround_applied'
post_undercloud_deploy_workarounds:
- BZ#xxxyyzz:
patch: true
basedir: '/usr/share/openstack-tripleo-heat-templates/'
id: 'xxxyyzz'
command: ''
Change-Id: Id33c6d9c043433b395d4a4905a36820a18604860
(cherry picked from commit ed4cdcc948)
(cherry picked from commit 19c195ed66)
With the release of Ansible 2.5, the recommended way to perform
loops is the use the new loop keyword instead of with_X style loops.
This review addresses aforementioned change for common tasks
within tripleo-upgrade role.
Change-Id: I70d387b381b6ce297507cbfe669ea7be902df605
(cherry picked from commit b25817a233)
(cherry picked from commit 876c3334a7)
Use same mechanism for templates as is used within upgrades tasks.
Change-Id: Idcec723addb392363241d8b625cdd53ece4f3c83
(cherry picked from commit 42beb3e008)
(cherry picked from commit f1768392df)
Include has some unintuitive behaviours depending on
if it is running in a static or dynamic in play or in playbook context,
in an effort to clarify behaviours move to a new set of modules:
include_tasks, include_role, import_playbook, import_tasks.
Change-Id: I32198527a084d35f8a2c91e3e7d3f32b6fbe9e1e
(cherry picked from commit d67e6e7166)
(cherry picked from commit fa9b7897d9)
Include has some unintuitive behaviours depending on
if it is running in a static or dynamic in play or in playbook context,
in an effort to clarify behaviours move to a new set of modules:
include_tasks, include_role, import_playbook, import_tasks.
Change-Id: I33018bcc8f4798f33f73e1aad47419d8094269c8
(cherry picked from commit 6e9762f3ec)
(cherry picked from commit 99598fe4e2)
Include has some unintuitive behaviours depending on
if it is running in a static or dynamic in play or in playbook context,
in an effort to clarify behaviours move to a new set of modules:
include_tasks, include_role, import_playbook, import_tasks.
Change-Id: I08e9abfc9a39a4ca50e5c747f65b2953d34ccbfa
(cherry picked from commit 571d6b0b5b)
(cherry picked from commit d98add8534)
To mimic real life environment during update/upgrade/ffwd
we'll expose a 'HTTP-live' test.
This test asserts that web server running in overcloud is
reachable via its FIP and that content served is accessible.
To access WEB server(s) running in overcloud appropriate ports
80 and 443 have to be added to the security group(s).
Initial work done in https://review.openstack.org/#/c/547220/
Co-Authored-By: Marius Cornea <mcornea@redhat.com>
Change-Id: I818949e0e498d5cd8da4640bda2be99aafd1aa76
(cherry picked from commit 8e55883fd0)