For stability bump size of rabbitmq PV to 1Gi

The rabbitmq chart requests a 256Mi PV for operational storage. With
CentOS 7.5 and 7.6 kernels, a jbd2 kernel thread hang is observed after
a long soak period. Once this occurs, a host reboot is required to
recover access to the PV.

We have been able to reliably recreate this using the stock upstream
CentOS 7.6 kernel and the latest Ceph Jewel LTS (10.2.11) version using
fsstress. This is currently pointing to a race condition in the
filesystem code.

With a reliable test available for this, other scenarios to characterize
this have been performed including using different volume sizes and
using different ext4 filesystem formatting options.

We've been unable to cause the hang using a 1Gi PV over an extended soak
period so we'll update the stx-openstack manifest to request a 1Gi PV
until the root cause and fix has been addressed in the kernel.

Change-Id: Ia0e5b7ffb049c6e3cedfb4a6d3afda597eedb18a
Related-Bug: #1814595
Signed-off-by: Robert Church <robert.church@windriver.com>
This commit is contained in:
Robert Church 2019-03-08 15:54:19 -05:00
parent f56056cffc
commit 77cbb985f2
2 changed files with 7 additions and 1 deletions

View File

@ -1,3 +1,3 @@
SRC_DIR="stx-openstack-helm"
COPY_LIST_TO_TAR="$PKG_BASE/../../../helm-charts/rbd-provisioner $PKG_BASE/../../../helm-charts/garbd $PKG_BASE/../../../helm-charts/ceph-pools-audit"
TIS_PATCH_VER=7
TIS_PATCH_VER=8

View File

@ -329,6 +329,12 @@ data:
anti:
type:
default: requiredDuringSchedulingIgnoredDuringExecution
# TODO: Revert to upstream defaults once the following LP is resolved:
# https://bugs.launchpad.net/starlingx/+bug/1814595. By changing this PV
# size to 1Gi from the default 265Mi, this avoids the kernel hang from the
# filesystem race as seen in the LP.
volume:
size: 1Gi
source:
type: tar
location: http://172.17.0.1/helm_charts/rabbitmq-0.1.0.tgz