Commit Graph

185 Commits

Author SHA1 Message Date
Emilien Macchi d0e81c22ca Retire Paunch
Change-Id: I8b2e9e20e477f2f00ad922f03e82114ff13212fe
2020-06-15 10:07:12 -04:00
Cédric Jeanneret 324d5f67df Allow to pass a list for "security_opt" param
Since we can put multiple "--security-opt" in the command line, let's
reflect it in paunch!

Change-Id: I6a60fd12edb692d386e542f0635253fde92ec6d5
2020-06-11 08:41:51 +02:00
Bogdan Dobrelya a838a69c4f Fix return results on cmd failure and error msgs
Return an empty list if container_names fails since that result is used
in a loop for many places.

Add error messages for better observability as well.

Change-Id: Ia79cbf74faa4d8190d2280757403fd2e5b67fbe0
Co-authored-by: Alex Schultz <aschultz@redhat.com>
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
2020-05-15 17:36:05 +02:00
Emilien Macchi 3349a62393 Officially deprecate Paunch
Paunch has been replaced by tripleo_container_manage role in
tripleo-ansible  during Ussuri cycle.

It is not tested anymore in this version and will be removed one day.

It it strongly encouraged to switched to the Ansible role to manage
containers; which should be the default if you deploy TripleO from
master at this time.

If you get the warning, it's possible that a parameter (EnablePaunch) is
set to True; while the default was switched.

Paunch will remain supported in Ussuri and backward, but not in Victoria
and forward.

Change-Id: I2be96c5929f0602296c8f2cebb65b755a2178195
2020-04-16 17:38:41 -04:00
Bogdan Dobrelya 21c32c8c60 Process labels before building container run args
The consistent way of building container run arguments is processing
labels (possibly multiple) first.

Fix 'debug' actions 'run' and 'print-cmd' to not falling behind of that
pattern already proved working well for containers 'apply'.

Change-Id: I771d086cc75695d7ce2db35c852bb35bb4c59708
Related-Bug: #1798362
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
2020-04-09 14:26:26 +02:00
Emilien Macchi 9b62765121 Do not set cpuset-cpus if cconfig['cpuset_cpus'] == 'all'
In the case of nova_libvirt container, we want to use all CPUs that are
reported online.
Rather than computing the list with Python (which has proven to be
problematic on PPC), let the container engine figuring it out by itself
like it was the case before.

Change-Id: I5d1b88c90dbc4114d996008a407cd1dd9a6eb9da
Closes-Bug: #1868135
2020-03-19 13:00:41 -04:00
Emilien Macchi 7bb9471423 Don't set cpuset_cpus if empty
If cpuset_cpus is set to '', we don't want to configure it, just skip
it.

Closes-Bug: #1867357
Change-Id: I615c798bdbcc0115f0ca9642969193ceada9b9cd
2020-03-13 11:17:30 -04:00
Zuul 0fe6a8e159 Merge "Cleanup containers in the same loop as they are created" 2020-03-05 14:32:30 +00:00
Emilien Macchi eb3d3b75bb Cleanup containers in the same loop as they are created
Split delete_missing_and_updated() into 2 methods:

  * delete_missing(), that will remove all containers installed on the
    host but missing from the given config. This runs outside of the
    loop, once.
  * delete_updated(), that will remove a container installed on the host
    (if present), that is part of the config, if config_data changed or
    didn't exist. It runs within the create loop, so the downtime
    between a container removal and creation should be shorter than
    before.
  * make delete_missing(), delete_updated() and rename_containers()
    returning True, if any container has been touched by either. Use
    that flag in order to keep the container_names contents always
    actual.
  * in order to make that cached container_names working and saving off
    extra podman ps/inspect calls, rework it to return a list instead
    of an iterator. There is no huge lists of containers, iterators buy
    us nothing here, while podman CLI calls are the more expensive
    thing and we optimize the latter instead.

Co-Authored-By: Bogdan Dobrelya <bdobreli@redhat.com>
Change-Id: I3c6d0670e11d035287d12f4207489a13e0891943
Closes-Bug: #1862954
2020-02-21 10:20:46 +01:00
Bogdan Dobrelya 3006b547cb Fallback to a rm -f action for podman
There is a podman specific behavior that cannot be reproduced with
docker:

$ docker run -itd --privileged --name=foo busybox sleep 1000
$ docker exec -it foo sh -c "sleep 180"
$ kill -STOP <that process>
$ docker stop foo
$ docker rm foo

works w/o errors, while podman fails the latter step:
Error: cannot remove container <id> has active exec sessions: container
state improper

To make it consistent, add a fallback to re-try it with rm -f.

Related-bug: #1860004
Change-Id: Id387f624078ef874aa902656952582c9c54f3f2e
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
2020-02-18 15:43:51 +01:00
Emilien Macchi 8489a0cfbe Allow to not cleanup containers that aren't in config
Similar to Ic06ac0f41767ca513c612b1ebb38d2ff92500ea5 :

Relax Paunch and allows an operator to apply a config on a specific
container without removing the other containers in the same config_id.
Thanks to the paunch apply (...) --cleanup=False, if a container is
installed on the host for a given config_id, but not in the given config
anymore; it won't be removed.

The cleanup still happens by default for backward compatibility.

Change-Id: I479eaa3b58c3df091e1b78a01c4fb0595d81b37c
2020-02-11 19:31:08 -05:00
Cédric Jeanneret 3012fe75aa Execute healthchecks as root
Some containers doesn't have the "default" user set to root (which is
good). This lead to healthcheck_port() function to return a message
because the non-root user isn't allowed to call "ss" command as itself.

Ensuring we're running the healthchecks as root will also allow to stop
duplicating some commands, making them faster and smaller for the
system.

This was discovered and discussed on Red Hat bugzilla first, then ported
to Launchpad.

Change-Id: I2e49d4dd5b385237f4f79929c70365424f6fa22d
Closes-Bug: 1860569
Related: https://bugzilla.redhat.com/show_bug.cgi?id=1778881
2020-01-22 16:11:21 +01:00
Bogdan Dobrelya 3813fc7f2b Do not force remove containers
Paunch does docker/podman rm -f, when removes containers. It seems it
returns too early, having some leftovers (like docker service
endpoints) behind or pending for it to be removed later, in case of
big fat deamons.

Long story short, don't do rm -f and allow it to do its job gracefully
and without a hurry.

Change-Id: I346c49cb204f273bd7077ca5153412cda9846534
Closes-Bug: #1860004
Co-authored-by: Sergii Golovatiuk <sgolovat@redhat.com>
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
2020-01-17 10:47:32 +01:00
Emilien Macchi 0560631b2d load_config: allow the config to be overriden
This will enable an operator using the paunch Ansible module to override
any container parameter e.g. the image, directly from Ansible without
modifying the config that is on the host.

Change-Id: I6639ab859b120aa9349dc72b0d6e7d575be20c7a
2020-01-14 10:48:43 -05:00
Zuul ec16c53227 Merge "Return an error if tripleo-ansible deployed containers" 2020-01-07 18:01:42 +00:00
Cédric Jeanneret 505dc92a63 Add SyslogIdenfier to healthcheck systemd unit
Adding this new field will allow to filter all healthcheck logs using
the Idenfier value.

For instance, using journalctl, you would be able to run this:
`journalctl -t healthcheck_collectd'

It will also allow to get a dedicated file out of (r)syslog if needed.

This is the reflection of Icdc5caf4cedc46291a807c39c0a31c74955a4a74

Change-Id: I6861baa287f2a8288b87be26aacecbcc061cd96f
Closes-Bug: #1856573
2019-12-17 16:09:48 +01:00
Zuul 854de3e5e1 Merge "builder: include environment when running an exec" 2019-12-12 23:41:30 +00:00
Emilien Macchi b5be45067a builder: include environment when running an exec
Some container execs have an environment, let's make sure they are
included when running exec.

Closes-Bug: #1855932
Change-Id: Ic2e2c2d50f5883f7db28768ba215e74bcbf9fd8b
2019-12-11 01:23:12 -05:00
James Slagle 3ab0936c03 Exit 1 if a container fails and return the error
When a container's volumes failed validation, paunch still exited 0.
This caused the deployment to continue running even though not all
containers had been started.

This patch changes the rc to 1 when a container's volumes fail
validation and the container can't be started. The error message is also
returned in stderr so that it's available to the paunch ansible module
and will be seen in the deployment output.

Depends-On: I1f062b8b9f936e6fbf2febf64244e91b59b8ba1b
Change-Id: I67860a79572c0ff4dcaca9ec9597c41f56792fca
Closes-Bug: #1855444
2019-12-09 22:23:11 +00:00
Michele Baldessari 14fa098fcb Do not always needlessly pull the images via podman
Via https://review.opendev.org/#/c/665731/ there was a tentative
reduction of podman inspect calls. The problem was that a podman
pull was performed even when the image already existed due to
a wrong if check, since image_exist() returns true/false and not
the return code of the podman image shell command.

Closes-Bug: #1855128
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>

Change-Id: Ia0c71de6a0a1fe243a39dbd7b1458610b81bd6be
2019-12-04 15:57:19 +01:00
Emilien Macchi 3e19521fd4 Return an error if tripleo-ansible deployed containers
For now, we assume that tripleo-ansible & paunch aren't playing well
together. So let's make sure that when containers were deployed with
tripleo-ansible, paunch CLI can't be used anymore.

Depends-On: I722cb8faa3b7eee81b418da83451bf802351dd79
Change-Id: I3f79a42da8798e24fad1a3ccc18efcc26e118c4d
2019-12-03 10:47:38 -05:00
Zuul 43876b01b9 Merge "Add missing ExecReload in container service unit file" 2019-11-29 23:12:19 +00:00
Bogdan Dobrelya 12ccf36b0e Fix action Apply ignoring managed-by arg
From the very beginning (06036fd6db), the
action apply was ignoring the passed --managed-by values and was always
taking defaults ('paunch').

Fix this and provide no upgrade/update impact, which is for whatever
--managed-by value given to paunch, perform all checks and searches
also for that historically "wrong" value 'paunch':

* if a container needs to be (re)started by a new 'managed-by', make sure
  it can be found by the default 'paunch' value as well, then reset its
  managed-by to the desired value.

Closes-bug: #1853812

Change-Id: If129bbc1ff32941d06ff480f26870b10840591e0
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
2019-11-29 13:14:38 +01:00
Cédric Jeanneret de7e74acd0 Add missing ExecReload in container service unit file
It may happen that we want to just reload the container. Before this
patch, it was a "stop and start", while podman has the "podman kill"
available, accepting the HUP signal.

Doing so allows other automated tools to actually just "reload" the
container as we would do for a standard service.

Change-Id: I5946a36e56dd1f3b49104deaef4ab69bada8e264
2019-11-29 09:22:12 +01:00
Emilien Macchi 68ecaf9061 Fix backward compatibility for old config startup files
In I1cf8923a698d0f6e0b1e00a7985f363a83e914c4, we changed the format for
container-startup-config and now have one JSON file per container, per
step. It'll make it easier to operate containers one by one, instead of
in one big JSON per step.

However this change wasn't 100% backward compatible, and this patch aims
to fix it.

This patch does:

1) Support a directory of configs without container name

The --file argument can now be a directory where the container
configuration file is located.

Example:
paunch debug (...) --file /var/lib/tripleo-config/container-startup-config/step_1

All configs will be returned.

2) Support a directory of config and a container name

Example:
paunch debug (...) --container haproxy --file /var/lib/tripleo-config/container-startup-config/step_1

Only the container config will be returned.

3) Support the old format file without container name

If the user specifies:
--file /var/lib/tripleo-config/hashed-container-startup-config-step_1.json

It'll return all container configs for the JSON files in:
/var/lib/tripleo-config/container-startup-config/step_1/ (directory)

4) Support the old format file with container name

If the user specifies:
--container haproxy --file /var/lib/tripleo-config/hashed-container-startup-config-step_1.json

It'll return the hashed container config file:
/var/lib/tripleo-config/container-startup-config/step_1/hashed-haproxy.json

5) Add support for running paunch with a file + new format

The new format would be:

--container haproxy --file /var/lib/tripleo-config/container-startup-config/step_1/hashed-haproxy.json

The container config would be returned.

Note: if no name is specified, it'll try to guess the name based on the
file name. It'll remove "hashed-' from it in case it was an hashed file.

This patch should resolve all backward compatibility issues so Paunch
can be used with both the new and old format.

Closes-Bug: #1850050
Change-Id: I917679da22fa09614e73053654df6ce181cf98fe
2019-10-31 11:00:03 +01:00
Emilien Macchi b9b3e32b4e list_or_dict_arg: fix the else condition
The "else" was wrong, if the key was empty with no value, it would try
to iterate on something that might not be a list. Let's be sure the
instance is a list.

Change-Id: I98c75e03d78885173d829fa850f35c52c625e6bb
2019-10-15 10:21:31 -04:00
Emilien Macchi cfa2fc51ab builder: allow to pass a dict to 'environment'
To allow to pass container environment as a dict instead of a list.
It's backward compatible, a list can still be passed.

We introduce a new function "list_or_dict_arg" so we can re-use it for
other parameters later if needed.

Change-Id: I85999889d3328dc9d2116b8539ac959b39cb833a
2019-10-07 23:02:36 -04:00
Javier Pena 2af0996fa1 Properly mock cpu_affinity on all unit tests
Two unit tests (test_cont_run_args_validation_true and
test_cont_run_args_validation_false) relied on the machine running
the tests having 8 CPUs. This is not always the case, so let's mock
it, just like we do on tests expecting 4 CPUs.

Change-Id: I98d930d9c7a1a2d54863749d08b97fc9e8867a0e
2019-10-02 10:04:02 +02:00
Zuul 54fc78cf31 Merge "Add unique names support for cont_exec_args method" 2019-10-01 23:54:46 +00:00
Emilien Macchi 5e88de63f5 Add --cpuset-cpus support for both Docker and Podman
Limit the specific CPUs or cores a container can use.

If cpuset-cpus is configured in the container layout, then the value
will be used when running the container cli with --cpuset-cpus.
If 'all' is used as a value, we'll then take all available cpus,
computed by: "0-" + str(psutil.cpu_count()-1)

If unset (default), the cpuset-cpus value is computed by using psutil with a
new function which returns a comma-separated list range of CPUs that a
container can use.

This parameter is particulary useful for NFV:
https://bugzilla.redhat.com/show_bug.cgi?id=1750781
Indeed, for NFV workloads, in order to achieve 0 packet loss, linux processes,
ovs-dpdk (if applicable) and VMs are isolated thanks to kernel args (isolcpus)
and tuned profiles (cpu-partitioning).

Change-Id: I9443ad60affe9c7823b17daa259efee542c6fe22
2019-09-24 03:48:54 +00:00
Zuul 9ed99000d3 Merge "Fix discovering container names" 2019-09-24 01:00:01 +00:00
Zuul e7a12e883c Merge "Improve volume validation" 2019-09-20 06:11:52 +00:00
Emilien Macchi 31b050120b Cleanup useless warnings for podman
Both "cleanup" and "delete" commands are supported when podman is the
container cli. The warnings were probably added when podman didn't
support them but now we have the code in Paunch which can rename
containers, needed by these functions.

Change-Id: I7025cb7fee5362adcba7d8539916a705a0ed2f87
2019-09-17 09:45:21 -04:00
Alex Schultz 3ce8c1ad98 Improve volume validation
We need to support both a source filesystem location and container
volume for the volume mounts. This change adds the ability to validate
that the provided source container is a container volume. Additionally
aligns the validation between the docker/podman methods so they operate
in the same fashion.

Change-Id: I9a55698b04dc2c5f01d776c6bdc2f26d47baf803
Closes-Bug: #1843734
2019-09-13 15:32:34 -06:00
Emilien Macchi 2609ef6870 Add unique names support for cont_exec_args method
In I5617e11f5d315f408d818e1ce47aa68f4a0d777a we switched
container_run_args to run the container cli run into a unique container
name, but we forgot to do it for the container cli execs.

So this patch will run the exec using the delegated container_name if we
can otherwise fall back on the fixed container name.

Co-Authored-By: Bogdan Dobrelya <bdobreli@redhat.com>
Change-Id: I2654148d566f62b3e3620baf84f504113cb7312d
Closes-Bug: #1840992
2019-08-29 18:58:29 +02:00
Bogdan Dobrelya 3dcbe5e68c Fix discovering container names
Make discover_container_name returning None, if the ps command failed
or returned nothing useful.

* For 'run', if no container name has been discovered, use its
  predictable (fixed) container service name.

* For 'exec', also raise an error, if no name has been discovered for
  the fixed/service container. Do not use additional checks as the
  None returned by discover_container_name() already tells us all we
  need to know about the subject container.

Related-Bug: #1839929

Co-Authored-By: Cédric Jeanneret <cjeanner@redhat.com>
Change-Id: I8a495d2c98617bb5edbe13ccf737d6c630eea7ad
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
2019-08-29 18:55:58 +02:00
Zuul 13b017d9f0 Merge "Fix idempotency on RHEL8" 2019-08-28 21:37:59 +00:00
Emilien Macchi 2ed3e83fde Fix idempotency on RHEL8
On RHEL8, podman is shipped by default with podman-docker which installs
an alias so the operators can still use "docker" CLI, but the command
will be converted to run "podman" instead:

  Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg

We don't want to remove the rpm or create /etc/containers/nodocker so
the operators have the warning and now we are moving to Podman by
default.

So we need to change Paunch to detect if Docker was actually running
before removing docker-managed containers, instead of using 'which
docker', we'll check for the actual docker socket if it exists.

Co-Authored-By: Cédric Jeanneret <cjeanner@redhat.com>
Change-Id: I628c2fe7e962412f0d7e1c3e044d02c5a2fabd77
2019-08-28 10:23:28 -04:00
Emilien Macchi bd400ccd81 Revert "Handle defined containers that are stopped."
It is causing some containers (memcached) to be re-created at
every run, with the unique name. We need to rework the code
as well as it's breaking pacemaker-managed containers.
This reverts commit 85fb2ed4d3.

Change-Id: I22d3eb35dced8178fe6beb0c76c855746efdf94d
2019-08-27 23:23:10 +00:00
Emilien Macchi 983ab98f61 Check if container is running before doing an exec
container_running is a new method which will allow to return True if a
container is detected as running or False if not running.
There is a retry mechanism which if "podman ps" is used, will add "--sync" to
the command so we synchronize the state of OCI runtime. Before doing a
"podman ps", we try to check if the service is running in systemd.
There is a very short sleep between the retries to give a chance to
podman to find the container if it takes a bit of time to start or seen
as started.

It will be used by the builder when a container is configured to
run "podman exec"; we'll first verify that the container exist otherwise
return an error and stop the deployment.

This patch is mainly a workaround against a race condition where in
heavy-loaded environments, an exec can be run too early in a step where
the container is still starting.

It also consolidate the discover_container_name method in order to get
more chance to actually get a name.

Co-Authored-By: Cédric Jeanneret <cjeanner@redhat.com>
Closes-Bug: #1839559
Change-Id: If4d8c268218bf83abed877a699fc583fb55ccbed
2019-08-27 23:19:05 +00:00
Zuul 72061b4111 Merge "Fix mismatching fixed vs unique container names" 2019-08-15 21:36:21 +00:00
Bogdan Dobrelya 2eaebe2cd9 Fix systemd service start rate limiting
The default limit is to allow 5 restarts in a 10sec period. If a
service goes over that threshold due to the Restart= config option in
the service definition, it will not attempt to restart any further.

We should not set StartLimitIntervalSec to 0 to disable any kind of
rate limiting as that may end up impacting the node load.

Instead, use tenacity to retry with an exponential backoff, when the
service unit enablement fails. Before to retry it, reset the unit's
failure counters with the systemctl wrapper. This is a crash-loop
approach that provides an efficient feature parity to the classic
rate limiting, shall we want to implement that for the systemctl
command wrapper instead.

Closes-bug: #1839841

Change-Id: I537fbf9933f2cbe6e1c2f627ba77da645bd55f25
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
2019-08-13 14:29:08 +02:00
Bogdan Dobrelya 5d174c1bea Fix mismatching fixed vs unique container names
Sometimes, like when redeploying in-place, containers for the "fixed"
service name might already existed, thus get the prefixed names. That
might create mismatches. For example the pidfile names may diverge by
the "fixzed" container service name vs its predictable prefixed unique
name.

Fix that by using the predictable unique names instead of the service
container names for the builder and paunch actions run,
debug/print-cmd that rely on it. This is achieved via a new parameter
for the real container name (a delegate) used for the "fixed" service
container name.

For podman builder, we use that delegate for conmon pidfile and logging
setup. So that now it always matches the PIDFile specified in the
systemd unit generated for that container.

For docker builder, we have no special uses for delegates, but we
support that parameter to simplify the code around (so that there will
be no need to wrap things with 'if cli == podman else...').

Closes-bug: #1839929

Change-Id: I5617e11f5d315f408d818e1ce47aa68f4a0d777a
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
2019-08-13 12:56:41 +02:00
Zuul aa0e32b004 Merge "Handle defined containers that are stopped." 2019-08-08 12:18:43 +00:00
Zuul 1c19455489 Merge "Optimize container CLI for getting unique names" 2019-08-02 19:49:04 +00:00
Zuul 714e6bb4db Merge "Generate addition drop-in dependencies for podman containers" 2019-08-01 11:48:49 +00:00
Luke Short 85fb2ed4d3 Handle defined containers that are stopped.
Delete the stopped container and recreate it to ensure
there are no local modifications to the container.

Change-Id: Ia3fe06dd221d0894a5de2c18beaa48ba29440003
Resolves: rhbz#1674517
Signed-off-by: Luke Short <ekultails@gmail.com>
2019-07-30 12:40:51 -04:00
Bogdan Dobrelya a6df749249 Optimize container CLI for getting unique names
Do not call it twice in the while{} loop. It is pointless to
double-check for a container existence after we have determined weither
its name is in use or not.

This decreases the pressure on the container system.

Change-Id: Ife65f082f389ad4a730b3f9cf07d32ac4be32b45
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
2019-07-30 16:34:20 +02:00
Grzegorz Grasza 6ca0170eeb Check if volume paths exist before executing Docker.
If the file to be mounted doesn't exist, Docker version 1.13.1
creates a directory in it's place. This behavior is different from
Podman's, which throws an error. We want to replicate the Podman
behavior. Creating directories in place of files causes errors
which are very difficult to diagnose.

Related-Bug #1830734

Change-Id: Ibc458d3badb807a0deef5a799db6682d7128dbc1
2019-07-23 13:53:28 +02:00
Damien Ciabrini b33aeea972 Generate addition drop-in dependencies for podman containers
If a container managed by Paunch and has the drop-in file enabled, we
want to start the containers with the script provided by the Paunch rpm:
paunch-start-podman-container

The script will take care of proper ordering when creating systemd scope
files.

Change-Id: Idaf5d4871ad1231f2592238a7925857af8f40548
2019-07-22 12:29:34 -04:00