For stability bump size of rabbitmq PV to 1Gi

The rabbitmq chart requests a 256Mi PV for operational storage. With CentOS 7.5 and 7.6 kernels, a jbd2 kernel thread hang is observed after a long soak period. Once this occurs, a host reboot is required to recover access to the PV. We have been able to reliably recreate this using the stock upstream CentOS 7.6 kernel and the latest Ceph Jewel LTS (10.2.11) version using fsstress. This is currently pointing to a race condition in the filesystem code. With a reliable test available for this, other scenarios to characterize this have been performed including using different volume sizes and using different ext4 filesystem formatting options. We've been unable to cause the hang using a 1Gi PV over an extended soak period so we'll update the stx-openstack manifest to request a 1Gi PV until the root cause and fix has been addressed in the kernel. Change-Id: Ia0e5b7ffb049c6e3cedfb4a6d3afda597eedb18a Related-Bug: #1814595 Signed-off-by: Robert Church <robert.church@windriver.com>
2019-03-08 15:54:19 -05:00 · 2019-03-08 15:54:19 -05:00 · 77cbb985f2
parent f56056cffc
commit 77cbb985f2
2 changed files with 7 additions and 1 deletions
--- a/kubernetes/applications/stx-openstack/stx-openstack-helm/centos/build_srpm.data
+++ b/kubernetes/applications/stx-openstack/stx-openstack-helm/centos/build_srpm.data
@ -1,3 +1,3 @@
 SRC_DIR="stx-openstack-helm"
 COPY_LIST_TO_TAR="$PKG_BASE/../../../helm-charts/rbd-provisioner $PKG_BASE/../../../helm-charts/garbd $PKG_BASE/../../../helm-charts/ceph-pools-audit"
-TIS_PATCH_VER=7
+TIS_PATCH_VER=8
--- a/kubernetes/applications/stx-openstack/stx-openstack-helm/stx-openstack-helm/manifests/manifest.yaml
+++ b/kubernetes/applications/stx-openstack/stx-openstack-helm/stx-openstack-helm/manifests/manifest.yaml
@ -329,6 +329,12 @@ data:
        anti:
          type:
            default: requiredDuringSchedulingIgnoredDuringExecution
+    # TODO: Revert to upstream defaults once the following LP is resolved:
+    # https://bugs.launchpad.net/starlingx/+bug/1814595. By changing this PV
+    # size to 1Gi from the default 265Mi, this avoids the kernel hang from the
+    # filesystem race as seen in the LP.
+    volume:
+      size: 1Gi
  source:
    type: tar
    location: http://172.17.0.1/helm_charts/rabbitmq-0.1.0.tgz