With update of ansible-lint to version >=6.0.0 a lot of new
linters were added, that enabled by default. In order to comply
with linter rules we're applying changes to the role.
This is a follow-up change to [1].
[1] https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/888180
Change-Id: I2564e3dcb2efad8f6a2ed21bec61668c1b6f6209
We also leverage systemd-networkd for managing lxc-net and replace
using of custom service template for lxc-dnsmasq service with our
systemd-service role. These changes are quite tighten together, so
it's quite hard to split them in different patchsets.
Depends-On: https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/861350
Change-Id: I5ac99e2b6c6e6ccd9da18ae68e1f8801f95f4f4e
Openstack-Ansible does not maintain support for deploying on gentoo
so we can simplify this ansible role
Change-Id: If2a63a2743714745e0f0b0eea2ee3d5b8d4c9a35
For running bigger amount of ansible forks, we need to increase
ssh MaxSessions parameter for lxc hosts, since
all connections to lxc containers occur through hosts
Depends-On: https://review.opendev.org/758399
Change-Id: Ib3e850ba79658a42995cd782a11342aca6858342
There are a few manual workarounds that we're placing in order
to workaround old versions of machinectl however we don't actually
leverage those and they seem to be causing a dbus restart which
causes extra problems.
This patch removes those workarounds in order to prevent restarting
dbus which causes the system to start timing out on systemd-logind.
Change-Id: I86483225754a5b1c6030ef21e2c0cdf2cd908c3b
Closes-Bug: #1807405
In https://review.openstack.org/588962 the implementation
of the apt key store copy into the container was changed
for bionic, but left alone for xenial. This patch makes
the approach uniform across both distributions.
Change-Id: I79f49fd02be3bbee5f22cdde000b19578167e3ca
With the more recent versions of ansible, we should now use
"is" instead of the "|" sign for the tests.
This should fix it.
Change-Id: I7ba6ca7d7c8a9bbaf85933370d0ced9931f9a34b
Now that bionic testing is added into the tests repos, we can
start testing it in the repo.
cgmanager isn't in bionic, and therefore is removed
The service module isn't in bionic, and therefore it's been renamed to
"systemd".
The apparmor setup we were doing was breaking the apparmor profiles
required. While this worked in xenial it breaks bionic. To fix this
we're just disabling the apparmor profiles instead of trying to to
augment them through block file changes.
Depends-On: https://review.openstack.org/#/c/566959/
Change-Id: Ie4bca80d0dba7b0da0b5829b91cd6d815894aeaa
Co-Authored-By: Kevin Carter <kevin.carter@rackspace.com>
The machinectl default options, while functional, could be tuned for
better overall performance. This change adds several options which will
ensure container workloads are using the lest amount of storage with the
best possible performance.
For more information on the options being used see
* https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)#MOUNT_OPTIONS
All of the "machines" mount procedures have been moved into a unified
volume task file. This was done to ensure a consistent experience across
our supported distros. To ensure any new options are non-disruptive, the
mount handler has been changed to use "reload-or-restart" which will first
try to reload a mount instead of restarting it mounts.
Change-Id: Ia962fd4c5bb2a73ddd884d3bb3837c47b43d6903
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
For a very long time we've been parsing and using the lxc images as
provided by upstream lxc. While these images are functional there are by
no means optimal. In general they're quite a bit larger than they need
to be and contian a lot of little sharp edges that have cut us over
the years. This change removes all of the lxc image cache parsing and
meta-data linking and simply downloads the rootfs a given url. To
maintain compatibility with the legacy images a script has been created
to parse the image index and return the legacy image url.
The result of this change:
* Access to smaller more optimal base image which is well known by the
corresponding communities.
* Deployers now have the ability to set and forget the download url for an
internal image instead of having to create a cache infrastructure
compatible with the lxc download template.
* Any rootfs tarball will work as an image.
* Fewer tasks are executed and less memory is consumed resulting in faster
deployment times.
* The base cache has a uniform meta-data setup giving all container
types the same access to config, devices, and templating.
Change-Id: I1775e775bbb7fe86bdffdd8296c2cff5ebc5bac8
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The current lxc meta-data process is one where we download an archive
from the upstream lxc images and store it locally on the host. While the
archive is small, this is a process that can break due to transient
networking issues and is an external dependency that we don't need.
The meta-data for the containers we build is all the same between
distros so it's easy to replicate and maintain as a local dependency.
This change creates a templates meta-data folder and stores our
required meta-data items within it. With this change we'll ensure
all containers are built with the same capabilities without requiring
access to an upstream repo and will improve the general speed of
deployment due to the task simplification and removal of an external
dependency.
Change-Id: I999d7068ce05645c477408fbd40556427c202a40
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The machinectl cache is currently set image to 16G by default. If
multiple container images are imported into the cache this may be too
small by default. This change sets the cache to "64G" by default allowing
the cache more room to grow by.
This change also disables the quota system once the limit has been set
The option `lxc_host_machine_quota_disabled` has been added to disable or
enable the quota system as needed. This is done after the default limit has
been set so an adequately sized sparce file can be created should it not
already exist.
> More documentation can be seen here [0] with regard to the set-limit
option.
Because we support both modern and older systemd, the cache prep tasks
for old systemd have been updated so that deployers using earlier
versions of systemd can benefit from the ability to grow an existing
cache via playbook run.
[0] https://www.freedesktop.org/software/systemd/man/machinectl.html#set-limit%20%5BNAME%5D%20BYTES
Closes-Bug: #1745361
Change-Id: I85fefc6ce186bb6808ac37a9ea79a50e29671115
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
These changes further optimise the lxc_host role so that it's using more
of the built in modules and making better use of handlers.
Moving the dnsmasq process to a unit file gives operators the ability to
restart the dnsmasq process if there's an issue with the service. It
also ensures the service stays running as systemd will take better care
of the service by isolating it within a specific cgroup, ensuring good
reporting and memory management, and providing the ability to recover
from failures in an automated way.
Closes-Bug: #1518485
Change-Id: I42d0caa3b12e70a3601c30051eefc067e81a71bb
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The LXC host role can be tuned up for better overall efficiency.
Highlights:
* Move async wait to a later position for role performance. The
async wait we're doing can be moved elsewhere in the role so
that we're able to do more in parallel. This change simply moves
the async wait to a postition just before its required.
* Move container creation tasks into their own sub-files which are
accessed using dynamic routing.
* Several syntatic items were cleaned up.
* All of the basic cache cleanup has been moved to handlers.
Closes-Bug: #1718979
Change-Id: I26eae11be8f7d5b691fbccd3d2fe1cfb21b8cf55
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
systemd-2.28 introduced DefaultTasksMax which is used to control
the default TasksMax= setting for services and scopes running on the
system. (TasksMax= is the primary setting that exposes the "pids"
cgroup controller on systemd and was introduced in the previous
systemd release.) The setting now defaults to 512, which means
services that are not explicitly configured otherwise will only
be able to create 512 processes or threads at maximum, from this
version on. However, the 512 limit seems too strict and sometimes
leads to failures like the following one on busy containers
==> opensuse422: fatal: [container3]: FAILED! => {"changed": false, "cmd": "/usr/sbin/rabbitmqctl -q -n '' list_user_permissions guest", "failed": true, "msg": "/usr/sbin/rabbitmqctl: fork: retry: No child processes\n/usr/lib64/rabbitmq/lib/rabbitmq_server-3.6.6//sbin/rabbitmq-env: fork: retry: Resource temporarily unavailable\n/usr/lib64/rabbitmq/lib/rabbitmq_server-3.6.6//sbin/rabbitmq-env: fork: retry: No child processes\n/usr/lib64/rabbitmq/lib/rabbitmq_server-3.6.6//sbin/rabbitmq-env: fork: retry: No child processes\nFailed to create thread: Resource temporarily unavailable (11)\r\nAborted (core dumped)", "rc": 134, "stderr": "/usr/sbin/rabbitmqctl: fork: retry: No child processes\n/usr/lib64/rabbitmq/lib/rabbitmq_server-3.6.6//sbin/rabbitmq-env: fork: retry: Resource temporarily unavailable\n/usr/lib64/rabbitmq/lib/rabbitmq_server-3.6.6//sbin/rabbitmq-env: fork: retry: No child processes\n/usr/lib64/rabbitmq/lib/rabbitmq_server-3.6.6//sbin/rabbitmq-env: fork: retry: No child processes\nFailed to create thread: Resource temporarily unavailable (11)\r\nAborted (core dumped)\n", "stderr_lines": ["/usr/sbin/rabbitmqctl: fork: retry: No child processes", "/usr/lib64/rabbitmq/lib/rabbitmq_server-3.6.6//sbin/rabbitmq-env: fork: retry: Resource temporarily unavailable", "/usr/lib64/rabbitmq/lib/rabbitmq_server-3.6.6//sbin/rabbitmq-env: fork: retry: No child processes", "/usr/lib64/rabbitmq/lib/rabbitmq_server-3.6.6//sbin/rabbitmq-env: fork: retry: No child processes", "Failed to create thread: Resource temporarily unavailable (11)", "Aborted (core dumped)"], "stdout": "", "stdout_lines": []}
and with messages in the kernel log such as
[ 2925.999021] cgroup: fork rejected by pids controller in /init.scope/lxc/container1
[ 3083.704049] cgroup: fork rejected by pids controller in /init.scope/lxc/container2
As we see, even though the /init.scope/lxc/container1 as pids.max set to 'max', the /init.scope
has pids.max set to 512 and in cgroups we always respect the lowest
boundary
~> cat /sys/fs/cgroup/pids/init.scope/lxc/container1/pids.max
max
~> cat /sys/fs/cgroup/pids/init.scope/pids.max
512
As a result of which, the 512 limit is enforced.
As such, we add a new variable to make this limit configurable. The
default limit has now been increased to 8192.
Change-Id: I8b4143aac84d4c795cab9c0d978c9a97ebea1793
The 'Reload apparmor' handler can fail if the apparmor service is not
already in a running state. Add an additional handler to ensure that
apparmor is started and enabled on boot.
Change-Id: If2752d69beb2c646a64f2ca02ce39a0d4161a5b5
loading lxc-openstack profile into apparmor is done with service reloading,
so the redundant loading handler of lxc-openstack is removed.
The reloading handler is flushed right away in case of interrupted execution.
Change-Id: I7a0e9d886808e0949a0e8301c6a5ea2994c6cd49
closes-bug: 1620757
The LXC download template sets hostnames within containers by an
in-place string replacement of 'LXC_NAME' in /etc/hosts and
/etc/hostnames with the given container name.
Create the base cache container image with the name 'LXC_NAME' so that
this this in-place text replacement happens and containers are created
with the expected hostnames.
Change-Id: I851f29d8feebc41e9bcbc1866bba1782c6727d6a
This change updates the lxc-host setup role to build the lxc cache using the
download template based on default images found here:[0]. These images are
upsteam builds from the greater LXC/D community.
This update adds support for Ubuntu 14.04, 16.04 and RHEL/CentOS 7 container
types and the cache will be generated from the host Operating system.
[0] - https://images.linuxcontainers.org/
Change-Id: Ie13be2322d28178760481c59805101d6aeef4f36
Co-Authored-By: Jesse Pretorius <jesse.pretorius@rackspace.co.uk>
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>
The change moves the role out from the main repo lxc_host
repository and into its own standalone repository.
Items within this change:
* The role has been updated to ensure it runs standalone.
* Tests added to the role within tox.
* Functional tests added to the role that can either be run
via the run_tests.sh script or using tox.
* dev requirements have been updated for testing usecases.
* Docs added to both the README.rst file as well as the docs
folder.
Signed-off-by: Kevin Carter <kevin.carter@rackspace.com>