This changes moves docker services from puppet to deployment directory.
Change-Id: I11a34708ee91f5b5928d7c647c83e95ca1b01cae
Related-Blueprint: services-yaml-flattening
The container-registry role is idempotent in a way that the
restarting of the docker service will be done only if some
configuration value has changed.
During the upgrade, host_prep_tasks are being run and if the
new templates bring some configuration change then the Docker
service gets restarted. The issue is the point at which they
get restarted, which is after the upgrade_tasks have already
run and prior to the deploy_tasks. This is causing issues with
Pacemaker handled resources.
For that reason, we include the very same task running in host_prep_tasks
into upgrade_tasks for the docker and docker-registry services,
forcing the Docker service reconfiguration to happen during
upgrade_tasks instead of at a latter point.
Closes-Bug: #1807418
Change-Id: I5e6ca987c01ff72a3c7e8900f9572024521164de
When SELinuxMode is set to enforced in thedeployment, configure Docker
to enable SElinux.
It'll wire container_registry_selinux variable in ansible-role-container-registry
which enables --selinux-enabled if set to True.
Change-Id: Ic030ecbe8b6719ba45cb7c27c6cf44bab14fed88
When installing OpenShift by means of TripleO, after
the initial docker configuration, openshift-ansible
also adds several parameters there.
Then, if we want to remove a single node, then a stack
update is performed, which returns the configuration
to its original state. In other words, it removes all
parameters added by openshift-ansible, which breaks OpenShift.
This commit adds the ability to disable reconfiguration of
docker at the time of stack update for all roles associated
with OpenShift.
Closes-Bug: #1804790
Depends-On: I0bcaeea9cd24ab35a81d8c3d6fc3a384c1e4c3c2
Change-Id: If202be5d27d81672e39cbe21867459d277220e23
Add ContainerCli parameter, default to docker. Possible values:
podman/docker (default).
Deprecate DockerAdditionalSockets so it does nothing for podman.
Nested podman CLI replaces docker sockets. Only bind mount
/var/lib/openstack for the neutron/ovn agents for docker.
Support debug messages for Neutron/OVN wrappers controled via
NeutronWrapperDebug and OWNWrapperDebug (defaults to False). Or
globally controlled by Debug.
Make the wrapper containers managed by its parent processes and
not exited/removed forcibly, when the parent container restarts.
Background for podman CLI replacing the docker socket:
We'll use 'nsenter -m -n -p -t 1 podman' in wrappers
to execute podman in the same namespaces as on the host
and to NOT bind-mount world for that, like:
- /sys/fs/cgroup:/sys/fs/cgroup
- /run/libpod:/run/libpod
- /run/containers:/run/containers
- /run/runc:/run/runc
- /run/runc-ctrs:/run/runc-ctrs
- /var/lib/containers:/var/lib/containers
- /etc/containers:/etc/containers:ro
- /usr/bin/podman:/usr/bin/podman:ro
- /usr/bin/runc:/usr/bin/runc:ro
- /usr/libexec/podman/conmon:/usr/libexec/podman/conmon:ro
- /usr/lib64/libseccomp.so.2:/usr/lib64/libseccomp.so.2:ro
...
We cannot use chroot /host instead as there is more bind-mounts to use
outside of the /host chroot. Maybe varlink is a good replacement for
all of that, but it's not there yet.
Change-Id: I055fb7a5fd20932c5bee665bb96678f3ae92bffe
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
This is required to fix bug #1800958 so that DockerRegistryMirror is
available to make mirror requests during prepare.
Change-Id: If896c22bf449a3ac91ca363648f84dd5b9aef227
This change moves docker and docker-distribution installation from
step 1 to host_prep_tasks, then runs container prepare as
external_deploy_tasks.
Container image prepare needs to run before any steps which require
container images, and it always needs to run on the undercloud because
it only ever populates the undercloud registry. Docker needs to be
installed in host_prep_tasks, since it is just before
external_deploy_tasks step 1.
The aim is to move these prepare tasks into their own service file so
they can be run during undercloud install and overcloud deploy, as an
alternative to having a dedicated mistral action for the overcloud
prepare.
Change-Id: I2c82b6829e574424067130d1a369ff30030fb4bc
Blueprint: container-prepare-workflow
Problem: RHEL and CentOS8 will deprecate the usage of Yum.
From DNF release note:
DNF is the next upcoming major version of yum, a package
manager for RPM-based Linux distributions.
It roughly maintains CLI compatibility with YUM and defines a strict API for
extensions.
Solution: Use "package" Ansible module instead of "yum".
"package" module is smarter when it comes to detect with package manager
runs on the system. The goal of this patch is to support both yum/dnf
(dnf will be the default in rhel/centos 8) from a single ansible module.
Change-Id: I8e67d6f053e8790fdd0eb52a42035dca3051999e
OVN metadata agent uses haproxy as part of its implementation.
Running it in a separate container prevents dataplane breakages
(ie. restarting VMs or spawning new ones) on agent restart/stop.
This patch triggers the creation of such sidecar container and
mounting of haproxy wrapper for spawning it in a separate
container.
Change-Id: I59e08384080cda0b6c0f03c9ed8fb6f6a5661e6b
Related-Bug: #1749209
Signed-off-by: Daniel Alvarez <dalvarez@redhat.com>
w/a https://github.com/ansible/ansible/issues/42621
Include ansible-role-container-registry roles/tasks
with handlers having the right variables scope.
Closes-Bug: #1781198
Change-Id: I26cc07aa05912c3e84d59003686eae210e924a16
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
The default docker0 brige should be normally given a
value that does not conflict to any of the existing
networks' CIDR ranges.
If there is a conflict for the default value `172.31.0.1/24`,
allow users to alter the the docker service startup ``--bip``
option via ``DockerNetworkOptions``.
Change-Id: I9b3e729ba48811415106c9fa460cd5a677067fb7
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
Deploy Docker with Ansible instead of Puppet so later we will be able
to prepare the registry before deploying any containerized service
and do tasks in the middle like updating containers.
Remove the Puppet run from update_tasks, we'll move these tasks later in
ansible-role-container-registry.
Change-Id: Iee0e08cd48f173a39a6f3a1ea54b29e370d4f334
The new master branch should point now to rocky.
So, HOT templates should specify that they might contain features
for rocky release [1]
Also, this submission updates the yaml validation to use only latest
heat_version alias. There are cases in which we will need to set
the version for specific templates i.e. mixed versions, so there
is added a variable to assign specific templates to specific heat_version
aliases, avoiding the introductions of error by bulk replacing the
the old version in new releases.
[1]: https://docs.openstack.org/heat/latest/template_guide/hot_spec.html#rocky
Change-Id: Ib17526d9cc453516d99d4659ee5fa51a5aa7fb4b
The neutron agents use things like dnsmasq and keepalived as part of
their implementation. Running these "subprocesses" in separate
containers prevent dataplane breakages/unnecessary failover on agent
container restart. This patch triggers the creation and mounting of
wrappers for launching these processes in containers.
Related-Bug: #1749209
Depends-On: Icd4c24ac686d957391548a04722266cefc1bce27
Depends-On: I8d93f4eccde1dc6e55e10399184ee80671355769
Depends-On: Ib2d2ad4960ea34ec9e3fca1eeb322742341f7eb7
Change-Id: Iea53489c916765bcfd88d7d12e6a32e1b6276d81
Via Ic08468854ce92e81cd84bd6c86a6b672b5a9d49b we fixed the problem of
docker being restarted when puppet triggers a change while pacemaker is
up and running. That approach, while more correct than what existed
previously, is still suboptimal because we are stopping all docker
containers even though we don't have to.
Let's detect if applying the profile::base::docker manifest would
introduce any changes and also detect if the docker rpm is going to be
updated. If one of the two conditions is true we need to stop
containers.
This way rerunning the update workflow on a node should be much less
disruptive. Tested this and correctly observed that the first run did
correctly stopped the docker containers whereas subsequent runs did not
stop containers.
Change-Id: I9176da730b0156d06e2a1ef5f2fcc061e2a6abf6
Related-Bug: #1747851
Via Ib3ea6de7f235d2a2d53a6576e0876ab171128b34 we make sure that we
use the --live-restore docker option by default so we can avoid docker
service restarts bringing down the whole control-plane. While this fixes
new deployments, it leaves a problematic window open in case an operator
already deployed pike and wants to do a minor update. In such case, the
change mentioned above will fix things, but might be disruptive at the
very first call: that's because the first docker restart to push the
--live-restore option will trigger a docker restart while the cluster
is up which will bring down all containers at the wrong time (i.e.
without pacemaker knowing about it)
In order to avoid this condition, let's stop all docker containers and
the docker service, let's update the docker package and then let's run
apply puppet for the docker profile (which will make sure the docker
service is started). This simple approach avoids any potential docker
service restarts due to puppet changes.
It should be safe to do this because at step2 we are guaranteed that the
cluster is down on that node and so the API services will not be
reachable anyway. This way before bringing the cluster back up we have
a docker service which is running with the live-restore option and will
be more resilient in the face of docker service restarts.
It can be argued that we should stop all containers and restart docker
only when there is either a docker package update or if puppet will trigger
a restart. Due to the puppet bug [1] it becomes rather hackish to detect
if puppet would restart the resource before actually running it.
That is one reason why in this pass we do it all the time, the other
reason being that paunch might trigger a restart of most services
anyway, so there is not that much point in protecting ourselves in this
part of the code. The control plane in any case is usually largely
unaffected by this, since pacemaker will move all the resources to the
other nodes. In the case of a compute node an operator should be aware
that services might be restarted and that critical workloads should be
migrated to other compute nodes before a minor update.
Updated 7 different controller nodes with this patch and they all worked
correctly
NB: we did hit a race with older docker 1.12 versions see RHBZ#1545356.
1.13 versions were fine.
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
[1] https://tickets.puppetlabs.com/browse/PUP-686
Change-Id: Ic08468854ce92e81cd84bd6c86a6b672b5a9d49b
Related-Bug: #1747851
See context here: Ia5cc7b34ebee8cf2f49300ce23050370d5f1038a
This user will be useful for containerized undercloud, to maintain
parity with what was done in instack-undercloud.
Depends-On: Ia5cc7b34ebee8cf2f49300ce23050370d5f1038a
Depends-On: Ifd1bec1262dfbd213810bb2b4d561f47bf010e69
Change-Id: I48ab4a0ba0240e931391602943b471b5b6ec8e80
Via change Ib3ea6de7f235d2a2d53a6576e0876ab171128b34 we added
--live-restore to the default options in the
tripleo::profile::base::docker manifest. This is insufficient, because
we explicitely set the tripleo::profile::base::docker::docker_options
hiera key in the templates. So let's change the defaults there as well.
Change-Id: Ia2d7fe4f3a818c2986a44d8b0effe04d5228301e
Related-Bug: #1747851
This converts "tags: stepN" to "when: step|int == N" for the direct
execution as an ansible playbook, with a loop variable 'step'.
The tasks all include the explicit cast |int.
This also adds a set_fact task for handling of the package removal
with the UpgradeRemovePackages parameter (no change to the interface)
The yaml-validate also now checks for duplicate 'when:' statements
Q upgrade spec @ Ibde21e6efae3a7d311bee526d63c5692c4e27b28
Related Blueprint: major-upgrade-workflow
[0]: 394a92f761/tripleo_common/utils/config.py (L141)
Change-Id: I6adc5619a28099f4e241351b63377f1e96933810
This patch exposes puppet_tripleo's docker_options
in the tripleo-heat-templates.
Change-Id: I1b48b2a25dfa5afc3d2e4e4c8f0593e03ead3907
Closes-bug: #1715134
Implement a mechanism to enable docker service debug logging.
If DockerDebug is unset defaults to the normal Debug parameter
setting.
Change-Id: I4f4627c7d8e90121c1262b2518b02989f5aaed18
We want to use this for the containerized undercloud to
be able to consume an in-rack registry mirror for
our CI jobs.
Change-Id: Ia0a2b4a2ddd99c9ee9b71875b144824aa7543da1
This was previously conflicting with the InternalApiNetCidr value in
environments/network-environment.yaml.
Change-Id: I3f1cb6f056fb19a1ba93d1076191abe7aca4fa21
Depends-On: Ie803b33c93b931f7fefb87b6833eb22fd59cd92d
Closes-Bug: #1726773
We allow using multiple registries (e.g. for OpenStack vs. Ceph
container images). We should allow it also in the insecure registry
configuration.
Change-Id: Icf4a51baf2a230b3fa0d5ced0e9cd1983cd93fb0
Closes-Bug: #1709310
Depends-On: I5cddd20a123a85516577bde1b793a30d43171285
This patch removes more of the DockerNamespace references as part
of the cleanup/reorg of the container configuration patches.
This also adds a centos-rdo environment file for use with
the new interface. This file was generated with the command
"openstack overcloud container image prepare"
Depends-On: I729fa00175cb36b02b882d729aae5ff06d0e3fbc
Depends-On: I292162d66880278de09f7acbdbf02e2312c5bb2b
Co-Authored-By: Dan Prince <dprince@redhat.com>
Change-Id: Ice7b57c25248634240a6dd6e14e6d411e7806326
Makes it possible to resolve network subnets within a service
template; the data is transported into a new property ServiceData
wired into every service which hopefully is generic enough to
be extended in the future and transport more data.
Data can be consumed in service templates to set config values
which need to know what is the subnet where a deamon operates (for
example the Ceph Public vs Cluster network).
Change-Id: I28e21c46f1ef609517175f7e7ee19e28d1c0cba2
When a service is enabled on multiple roles, the parameters for the
service will be global. This change enables an option to provide
role specific parameter to services and other templates.
Two new parameters - RoleName and RoleParameters, are added to the
service template. RoleName provides the role name of on which the
current instance of the service is being applied on. RoleParameters
provides the list of parameters which are configured specific to the
role in the environment file, like below:
parameters_default:
# Default value for applied to all roles
NovaReservedHostMemory: 2048
ComputeDpdkParameters:
# Applied only to ComputeDpdk role
NovaReservedHostMemory: 4096
In above sample, the cluster contains 2 roles - Compute, ComputeDpdk.
The values of ComputeDpdkParameters will be passed on to the templates
as RoleParameters while creating the stack for ComputeDpdk role. The
parameter which supports role specific configuration, should find the
parameter first in in the RoleParameters list, if not found, then the
default (for all roles) should be used.
Implements: blueprint tripleo-derive-parameters
Change-Id: I72376a803ec6b2ed93903cc0c95a6ffce718b6dc
This uses a puppet-tripleo profile to configure and start docker
in step1 of the deployment, which is before we attempt to deploy
any containers (see docker/services/README.rst#docker-steps)
This enables existing environments on upgrade to configure things
correctly, without using the docker/firstboot/setup_docker_host.sh
- the firstboot approach may still be needed for atomic, but for
environments where we can run puppet on the host this integrates
more cleanly with our existing architecture I think.
Depends-On: Id8add1e8a0ecaedb7d8a7dc9ba3747c1ac3b8eea
Change-Id: If4ffe21579bcb2770f4e5a96be7960b52927a27b