Add firewall flush rules to zuul pre-update gates.
Wrap gate scripts by run-gates.sh script in order to preserve the scripts execution contexts.
Also migrated chart building process to Helm v3.x.
Fixed 020-test-divingbell.sh script.
Change-Id: I6295d55338a6a75ac43b54c092704670d61854d9
The default behavior of divingbell-perm is to fail when trying to assign
permissions to non-existent files.
This change adds an option to values.yaml to skip any missing files and
proceed with the rest of the assignments.
conf:
perm:
ignore_missing: true # default is false
This may be useful in cases where files will never exist on a node, or
cases where the file does not exist yet, but will exist later. Note that
with this option enabled, a run in which files are skipped is considered
successful, so the rerun_policy and rerun_interval will determine if and
when another attempt will be made.
Change-Id: I15505d6292dda66942c66eea5a4d0666bd6bdfa7
The hash used by divingbell-perms to decide whether or not to rerun the
permissions script was being generated incorrectly, using a fixed value
instead of actually looking at the values passed to the chart.
This change updates the hash to reflect conf.divingbell.perms, and will
rerun the script if the hash changes.
Also fixes the logic to revert permissions.
Change-Id: I74f056f69a1b7f0eb9223915b1671e1e18091483
When divingbell-apt is managing the apt sources list, remove the
contents of /var/lib/apt/lists before running apt-get update.
Change-Id: I379af0b1a887bc81bc76f57289f35bae64e146c6
The divingbell pods use a hostPath volume for the root filesystem.
Because this mount includes /var/lib/kubelet, the pod holds a reference
to every volume mounted by every pod on the same host.
The most visible case where this causes a problem is the termination of
a pod that uses a ceph-backed PVCs. When kubelet tries to unmap the rbd
device, it is unable to do so, manifesting in the kubelet logs as:
rbd: unmap failed: (16) Device or resource busy
This change sets the mountPropagation to HostToContainer for the rootfs
volume, so that the divingbell pods will not prevent kubelet from
releasing these devices.
https://kubernetes.io/docs/concepts/storage/volumes/#mount-propagation
Change-Id: I6e91fb9b9d7cbe852c5e6dc8b7224d6085175590
This change adds the ability to configure node selectors per module. The
default node selector is 'kubernetes.io/os=linux'. For example:
labels:
apt:
node_selector_key=divingbell-apt
node_selector_value=enabled
Will result in a node selector of 'divingbell-apt=enabled'.
Change-Id: I7150c5f998afa30dce22f505be4d0d164254214f
Since we introduced chart version check in gates, requirements are not
satisfied with strict check of 0.1.0
Change-Id: I9a9cfd54cd14c9624c20b6e4399137bd32b85c33
The current `dpkg --configure -a` command does not always work if the
package that needs to be configured has a modified conffile which can
require user input to resolve. This change adds flags to make these
lines work as intended in that scenario.
Change-Id: I8f459b0c1c2fc7ecbe1ff478bdb77fd9af31dc90
While working on another change, I discovered conditions
in many test cases that echoed fail messages but did not
actually exit, so the gate could succeed even though some
tests failed. This patchset aims to fix those problems, and
then fix the problems masked by those problems:
1) fix bug in revert function of file permissions module
preventing permissions from being reverted.
2) fix various syntax and logic problems in test script
3) add wait_for_tiller_ready function to avoid race condition
with test script using helm too early
4) add install for ethtool in test script
5) ignore ethtool pod failures (see note #1 in [0])
6) make logging of test results more uniform
7) Fix error message logic in perm.sh
8) Fix case in _shcommon.tpl where error message was not
logged, causing test script to unnecessarily wait for
container timeout
[0]: https://review.opendev.org/676010
Change-Id: I22182d35250c37c96e73d9f5f49abfb2246f2a35
This adds default AppArmor profile to divingbell.
Also, update to gate script to install ethtool if it is not present.
Change-Id: I7abb13a533b596f4db5fe65fdae5eb7fc57ec00a
This change adds the --no-install-recommends flag to the apt-get
install command portion of _apt.sh.tpl. This will modify Divingbell
to only install direct dependencies of packages instead of following
the default apt behavior, which is to also install recommended packages
Change-Id: I118a72e1e591101b0e2878e088e9fbaa96067d2c
This change adds a whitelist of packages that will be ignored when using
strict mode.
Change-Id: I9138f35a72618100e6094575271f6160336332f4
Signed-off-by: Drew Walters <andrew.walters@att.com>
This patchset makes two changes for strict mode only:
1) Removes the --autoremove flag from the apt-get purge
command line
2) Causes the install stage to call apt-get install on
all packages regardless of whether they're already
installed. This will have the effect of marking all
requested packages as manually installed if they
were previously auto-installed.
Change-Id: Ic1a39205c941973af9d82685180d28457ea2011f
Currently, divingbell-apt will only remove packages that aren't
on the current requested package list when they were previously
installed by divingbell-apt. This patchset adds a "strict" mode
which causes it to remove packages not on the requested package
list regardless of whether divingbell installed them (i.e., it
can remove unwanted packages that were part of the host's base
image).
Change-Id: Ie2ba5d47646bfaaf030cb54673e644ab0e917fd4
This change allows conf.apt.packages to be defined as a map of lists,
allowing for logical grouping and easier substitution when values.yaml
is being assembled from multiple sources.
The existing format (conf.apt.packages as a list) is still supported.
Change-Id: I4d4c09723b2e9ac1f0ecf847e786d991cc6e669a
The patch introduces network policy configuration similar
to openstack-helm services. It allows users to configure
policies depending on the environment.
* Network policies are disabled by default.
* When enabled default policies allow all ingress and
egress traffic (i.e. policy set to {}), this may be
changed in future patch-sets.
Change-Id: I2adb5e652c1da0a1982ab18c498f033910a47cd8
Currently, the APT daemonset allows the installation of new packages or
upgrade of existing packages to a newer version. Sometimes, it may be
desirable to trigger an update for all packages. This change introduces
the ability to trigger a full-system upgrade using the .conf.apt.upgrade
chart value. The new option is disabled by default.
Change-Id: I611422c2093b9dbbae4e2d7cc05ebd726e895c88
Signed-off-by: Drew Walters <andrew.walters@att.com>
1. There is an ocassional timing issue when container logs are
unavailabile at certain points in the crash loop at the same
time the gate script tries to request them. The gate will now retry
this operation, instead of terminating right away with failure.
2. Re-enable uamlite security context so that useradd operations would
succeed.
3. Change apt pinning tests to use a version of the package that is
available in the apt repo. Upstream repos change, so we should not
pin to an explicit version that will be removed in the future and
break the gate.
4. Update helm version to 2.14.1 to sync with openstack-helm-infra
5. Fix divingbell build script: git --depth=1 incompatible with explicit
non-master commit checkout
6. Enhance overrides test case #7 to test for the issue identified in
[0].
7. Change hostname scheduling to match minikube hostname now configured
by OSH gate, instead of using the node's actual hostname
8. Re-enable gate voting
[0] https://storyboard.openstack.org/#!/story/2005936
Depends-On: https://review.opendev.org/671875/
Change-Id: Iad983ce363711e16ccd54e663c23d30a4a6a1177
This makes the main container within the apt daemonset run as
privileged, which is required to perform kernel upgrades through it.
It was confirmed that even with all capabilities enabled, an
unprivileged apt is unable to perform the necessary updates to
the boot partition during a kernel upgrade.
Change-Id: I4e996794f24fcfc9d8ced7a58cecd2ceec36f6c5
Previously _uamlite.sh.tpl would fail to render if any user data
had an empty user_sshkeys array. This is because the template would
check to see if the key existed, but not actually make sure that the
array contained within that key had any elements. "first" would be
called against the empty array, which would return nil, and then
the outer eq function call would fail (as it can't be used to
compare nil values).
This patch set adds a default statement after the "first" function,
so that if the array is empty and first returns nil, a default of
"Unmanaged" will be returned, which will end up making the eq
statement evaluate to false, and the code inside the if statement to
not be run.
Change-Id: I52713795284cd1d0961bd430858061f9df9c5f78
Use the common logger for consistent log output for some echo statements
that were not making use of it.
Change-Id: I7fae2a950318f5cd3245a4571dc464009726d4ae
This PS allows to avoid of using assignments which are not supported
in older versions of Helm (GO<1.11).
Change-Id: Ic0dad4d1b60071c4366c63834f1ad7e3a76fdcd8
Divingbell runs all its containers as privileged. Some Divingbell
containers can perform their jobs with the default set of Linux
capabilities that Docker gives to unprivileged containers while others
need additional capabilities. The default list of capabilties include
the following:
- SETPCAP
- MKNOD
- AUDIT_WRITE
- CHOWN
- NET_RAW
- DAC_OVERRIDE
- FOWNER
- FSETID
- KILL
- SETGID
- SETUID
- NET_BIND_SERVICE
- SYS_CHROOT
- SETFCAP
The capabilities listed in the daemonset templates function as a
whitelist in that the corresponding containers have access to the Linux
capabilities listed in their SecurityContext, but also the
aforementioned capabilties included by default by Docker.
Summary of testing for each daemonset:
The bcc-capable tool [0] was used to discover which Linux capabilities
the Divingbell containers invoke. The tool was ran against all the
processes running in the container. The Divingbell logs for each
container were also carefully analyzed for failed permission checks.
daemonset-exec:
A recent change to use nsenter to enter all host namespaces when running
exec prevents divingbell-exec from being able to run unprivileged as
there are no Linux capabilties that allows write access to '/proc'.
When trying to run as unprivileged, the following prevents the pod from
coming up:
"nsenter: cannot open /proc/1/ns/ipc: Permission denied"
daemonset-sysctl:
Ran the divingbell-sys containers as unprivileged and the kernel config
on the host updated as defined in the manifest. Kernel configs were
checked before and after running divingbell-sys container as
unprivileged. Beyond the default Linux capabilties given by
Docker, the 'SYS_PTRACE', 'SYS_ADMIN', and 'SYS_RAWIO' Linux
capabilities are needed. The following is a snippet of the logs showing
under which circumstance these privileges are needed:
"INFO * Applying /etc/sysctl.d/10-kernel-hardening.conf ...
INFO sysctl: setting key "kernel.kptr_restrict": Operation not permitted
INFO * Applying /etc/sysctl.d/10-ptrace.conf ...
INFO sysctl: setting key "kernel.yama.ptrace_scope": Operation not
permitted
INFO * Applying /etc/sysctl.d/10-zeropage.conf ...
INFO sysctl: setting key "vm.mmap_min_addr": Operation not permitted"
daemonset-perm:
Ran the divingbell-perm containers as unprivileged and the file
ownership and permissions on the host updated as defined in the
manifest. As a test, the daemon was configured to run every minute
and the targeted files ownership and permissions were manually
changed. It was then verified that divingbell restored the ownership
and permissions of the file to what it should be. This applies to
the divingbell-perm-default and the divingbell-perm-calico containers.
daemonset-limits:
Ran the divingbell-limits containers as unprivileged and checked the
ulimits on the host before and after running divingbell and the ulimit
updated to the value defined in the manifest. The capable tool also
showed that no additional Linux capabilties are needed.
daemonset-apparmor:
Ran the divingbell-apparmor containers as unprivileged and logs show no
evidence of failed permission checks. Additionally, the apparmor config
was updated in the manifest and the apparmor profile successfully
loaded. Beyond the default Linux capabilties given by Docker, the
'MAC_ADMIN' Linux capability is needed to load an apparmor profile.
daemonset-apt:
Ran the divingbell-apt containers as unprivileged and was able to
successfully install package without issues. As a test, the
manifest was updated to install 'htop' and after running Divingbell,
it was confirmed that 'htop' installed successfully. Here is
a snippet from the logs:
DEBUG + INSTALLED_THIS_TIME=' htop'
DEBUG + REQUESTED_PACKAGES=' htop'
daemonset-ethtool:
Ran the divingbell-ethtool containers as unprivileged and was able to
manage NIC tunables. As a check, the NIC tunables for ens3 was checked
before and after running Divingbell - 'ethtool -k ens3'. Divingbell
configured the NIC as defined in the manifest. Beyond the default Linux
capabilties given by Docker, the 'NET_ADMIN' Linux capability is needed.
The following is a log snippet showing what happens when the 'NET_ADMIN'
capability is not added:
"DEBUG + /sbin/ethtool -K cali86cb821b7db tx-nocache-copy off
INFO Cannot set device feature settings: Operation not permitted"
daemonset-uamlite:
Ran the divingbell-uamlite containers as unprivileged and was able to
successfully add user accounts as defined in the manifest. No additional
Linux capabilities are needed.
daemonset_mounts:
Ran the divingbell-mounts containers as unprivileged and was able to
successfully add host level mounts as defined in the manifest. No
additional Linux capabilities are needed.
[0]https://github.com/iovisor/bcc/blob/master/tools/capable.py
Change-Id: I26a1b5e06ad27c854d95e6675de05b884ce3bdc1
This PS moves to pivot to the hosts namespaces rather than chroot
so as to allow scripts to run fully in the context of the host.
Change-Id: I6b4dab92b6f8a7f9fa5b895d546117fdae43d731
Signed-off-by: Pete Birley <pete@port.direct>
- When reverting permissions on a file, there is no check for existence
causing a deleted file to CL the perm module
Change-Id: Ifae0ac196acf8ac2ccef84102967b6b4305a7691
- Adds the ability to rerun divingbell-perm at specified interval.
- Adds the ability to specify a rerun policy of
'always', 'never', 'once_successfully'. Default value is 'always'.
Demo: https://asciinema.org/a/220289
Change-Id: I3909b4d92f8e2bdb0d826ca1cfbd62f937c2532d