Docs: Update ceph documentation

- Adding section for Ceph troubleshoot
- Rearrange Testing section to include Ceph

Co-Authored-By: portdirect <pete@port.direct>

Change-Id: Ib04e9b59fea2557cf6cad177dfcc76390c161e06
Signed-off-by: Pete Birley <pete@port.direct>
This commit is contained in:
Renis 2018-05-04 11:36:39 -07:00 committed by Hee Won Lee
parent 79128a94fc
commit 5a9c8d41b7
11 changed files with 610 additions and 7 deletions

View File

@ -17,7 +17,7 @@ Contents:
install/index
readme
specs/index
testing
testing/index
troubleshooting/index
Indices and Tables

View File

@ -0,0 +1,25 @@
========================================
Resiliency Tests for OpenStack-Helm/Ceph
========================================
Mission
=======
The goal of our resiliency tests for `OpenStack-Helm/Ceph
<https://github.com/openstack/openstack-helm/tree/master/ceph>`_ is to
show symptoms of software/hardware failure and provide the solutions.
Caveats:
- Our focus lies on resiliency for various failure scenarios but
not on performance or stress testing.
Software Failure
================
* `Monitor failure <./monitor-failure.html>`_
* `OSD failure <./osd-failure.html>`_
Hardware Failure
================
* `Disk failure <./disk-failure.html>`_
* `Host failure <./host-failure.html>`_

View File

@ -0,0 +1,171 @@
============
Disk Failure
============
Test Environment
================
- Cluster size: 4 host machines
- Number of disks: 24 (= 6 disks per host * 4 hosts)
- Kubernetes version: 1.10.5
- Ceph version: 12.2.3
- OpenStack-Helm commit: 25e50a34c66d5db7604746f4d2e12acbdd6c1459
Case: A disk fails
==================
Symptom:
--------
This is to test a scenario when a disk failure happens.
We monitor the ceph status and notice one OSD (osd.2) on voyager4
which has ``/dev/sdh`` as a backend is down.
.. code-block:: console
(mon-pod):/# ceph -s
cluster:
id: 9d4d8c61-cf87-4129-9cef-8fbf301210ad
health: HEALTH_WARN
too few PGs per OSD (23 < min 30)
mon voyager1 is low on available space
services:
mon: 3 daemons, quorum voyager1,voyager2,voyager3
mgr: voyager1(active), standbys: voyager3
mds: cephfs-1/1/1 up {0=mds-ceph-mds-65bb45dffc-cslr6=up:active}, 1 up:standby
osd: 24 osds: 23 up, 23 in
rgw: 2 daemons active
data:
pools: 18 pools, 182 pgs
objects: 240 objects, 3359 bytes
usage: 2548 MB used, 42814 GB / 42816 GB avail
pgs: 182 active+clean
.. code-block:: console
(mon-pod):/# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 43.67981 root default
-9 10.91995 host voyager1
5 hdd 1.81999 osd.5 up 1.00000 1.00000
6 hdd 1.81999 osd.6 up 1.00000 1.00000
10 hdd 1.81999 osd.10 up 1.00000 1.00000
17 hdd 1.81999 osd.17 up 1.00000 1.00000
19 hdd 1.81999 osd.19 up 1.00000 1.00000
21 hdd 1.81999 osd.21 up 1.00000 1.00000
-3 10.91995 host voyager2
1 hdd 1.81999 osd.1 up 1.00000 1.00000
4 hdd 1.81999 osd.4 up 1.00000 1.00000
11 hdd 1.81999 osd.11 up 1.00000 1.00000
13 hdd 1.81999 osd.13 up 1.00000 1.00000
16 hdd 1.81999 osd.16 up 1.00000 1.00000
18 hdd 1.81999 osd.18 up 1.00000 1.00000
-2 10.91995 host voyager3
0 hdd 1.81999 osd.0 up 1.00000 1.00000
3 hdd 1.81999 osd.3 up 1.00000 1.00000
12 hdd 1.81999 osd.12 up 1.00000 1.00000
20 hdd 1.81999 osd.20 up 1.00000 1.00000
22 hdd 1.81999 osd.22 up 1.00000 1.00000
23 hdd 1.81999 osd.23 up 1.00000 1.00000
-4 10.91995 host voyager4
2 hdd 1.81999 osd.2 down 0 1.00000
7 hdd 1.81999 osd.7 up 1.00000 1.00000
8 hdd 1.81999 osd.8 up 1.00000 1.00000
9 hdd 1.81999 osd.9 up 1.00000 1.00000
14 hdd 1.81999 osd.14 up 1.00000 1.00000
15 hdd 1.81999 osd.15 up 1.00000 1.00000
Solution:
---------
To replace the failed OSD, excecute the following procedure:
1. From the Kubernetes cluster, remove the failed OSD pod, which is running on ``voyager4``:
.. code-block:: console
$ kubectl label nodes --all ceph_maintenance_window=inactive
$ kubectl label nodes voyager4 --overwrite ceph_maintenance_window=active
$ kubectl patch -n ceph ds ceph-osd-default-64779b8c -p='{"spec":{"template":{"spec":{"nodeSelector":{"ceph-osd":"enabled","ceph_maintenance_window":"inactive"}}}}}'
Note: To find the daemonset associated with a failed OSD, check out the followings:
.. code-block:: console
(voyager4)$ ps -ef|grep /usr/bin/ceph-osd
(voyager1)$ kubectl get ds -n ceph
(voyager1)$ kubectl get ds <daemonset-name> -n ceph -o yaml
3. Remove the failed OSD (OSD ID = 2 in this example) from the Ceph cluster:
.. code-block:: console
(mon-pod):/# ceph osd lost 2
(mon-pod):/# ceph osd crush remove osd.2
(mon-pod):/# ceph auth del osd.2
(mon-pod):/# ceph osd rm 2
4. Find that Ceph is healthy with a lost OSD (i.e., a total of 23 OSDs):
.. code-block:: console
(mon-pod):/# ceph -s
cluster:
id: 9d4d8c61-cf87-4129-9cef-8fbf301210ad
health: HEALTH_WARN
too few PGs per OSD (23 < min 30)
mon voyager1 is low on available space
services:
mon: 3 daemons, quorum voyager1,voyager2,voyager3
mgr: voyager1(active), standbys: voyager3
mds: cephfs-1/1/1 up {0=mds-ceph-mds-65bb45dffc-cslr6=up:active}, 1 up:standby
osd: 23 osds: 23 up, 23 in
rgw: 2 daemons active
data:
pools: 18 pools, 182 pgs
objects: 240 objects, 3359 bytes
usage: 2551 MB used, 42814 GB / 42816 GB avail
pgs: 182 active+clean
5. Replace the failed disk with a new one. If you repair (not replace) the failed disk,
you may need to run the following:
.. code-block:: console
(voyager4)$ parted /dev/sdh mklabel msdos
6. Start a new OSD pod on ``voyager4``:
.. code-block:: console
$ kubectl label nodes voyager4 --overwrite ceph_maintenance_window=inactive
7. Validate the Ceph status (i.e., one OSD is added, so the total number of OSDs becomes 24):
.. code-block:: console
(mon-pod):/# ceph -s
cluster:
id: 9d4d8c61-cf87-4129-9cef-8fbf301210ad
health: HEALTH_WARN
too few PGs per OSD (22 < min 30)
mon voyager1 is low on available space
services:
mon: 3 daemons, quorum voyager1,voyager2,voyager3
mgr: voyager1(active), standbys: voyager3
mds: cephfs-1/1/1 up {0=mds-ceph-mds-65bb45dffc-cslr6=up:active}, 1 up:standby
osd: 24 osds: 24 up, 24 in
rgw: 2 daemons active
data:
pools: 18 pools, 182 pgs
objects: 240 objects, 3359 bytes
usage: 2665 MB used, 44675 GB / 44678 GB avail
pgs: 182 active+clean

View File

@ -0,0 +1,98 @@
============
Host Failure
============
Test Environment
================
- Cluster size: 4 host machines
- Number of disks: 24 (= 6 disks per host * 4 hosts)
- Kubernetes version: 1.10.5
- Ceph version: 12.2.3
- OpenStack-Helm commit: 25e50a34c66d5db7604746f4d2e12acbdd6c1459
Case: One host machine where ceph-mon is running is rebooted
============================================================
Symptom:
--------
After reboot (node voyager3), the node status changes to ``NotReady``.
.. code-block:: console
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
voyager1 Ready master 6d v1.10.5
voyager2 Ready <none> 6d v1.10.5
voyager3 NotReady <none> 6d v1.10.5
voyager4 Ready <none> 6d v1.10.5
Ceph status shows that ceph-mon running on ``voyager3`` becomes out of quorum.
Also, six osds running on ``voyager3`` are down; i.e., 18 osds are up out of 24 osds.
.. code-block:: console
(mon-pod):/# ceph -s
cluster:
id: 9d4d8c61-cf87-4129-9cef-8fbf301210ad
health: HEALTH_WARN
6 osds down
1 host (6 osds) down
Degraded data redundancy: 195/624 objects degraded (31.250%), 8 pgs degraded
too few PGs per OSD (17 < min 30)
mon voyager1 is low on available space
1/3 mons down, quorum voyager1,voyager2
services:
mon: 3 daemons, quorum voyager1,voyager2, out of quorum: voyager3
mgr: voyager1(active), standbys: voyager3
mds: cephfs-1/1/1 up {0=mds-ceph-mds-65bb45dffc-cslr6=up:active}, 1 up:standby
osd: 24 osds: 18 up, 24 in
rgw: 2 daemons active
data:
pools: 18 pools, 182 pgs
objects: 208 objects, 3359 bytes
usage: 2630 MB used, 44675 GB / 44678 GB avail
pgs: 195/624 objects degraded (31.250%)
126 active+undersized
48 active+clean
8 active+undersized+degraded
Recovery:
---------
The node status of ``voyager3`` changes to ``Ready`` after the node is up again.
Also, Ceph pods are restarted automatically.
Ceph status shows that the monitor running on ``voyager3`` is now in quorum.
.. code-block:: console
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
voyager1 Ready master 6d v1.10.5
voyager2 Ready <none> 6d v1.10.5
voyager3 Ready <none> 6d v1.10.5
voyager4 Ready <none> 6d v1.10.5
.. code-block:: console
(mon-pod):/# ceph -s
cluster:
id: 9d4d8c61-cf87-4129-9cef-8fbf301210ad
health: HEALTH_WARN
too few PGs per OSD (22 < min 30)
mon voyager1 is low on available space
services:
mon: 3 daemons, quorum voyager1,voyager2,voyager3
mgr: voyager1(active), standbys: voyager3
mds: cephfs-1/1/1 up {0=mds-ceph-mds-65bb45dffc-cslr6=up:active}, 1 up:standby
osd: 24 osds: 24 up, 24 in
rgw: 2 daemons active
data:
pools: 18 pools, 182 pgs
objects: 208 objects, 3359 bytes
usage: 2635 MB used, 44675 GB / 44678 GB avail
pgs: 182 active+clean

View File

@ -0,0 +1,12 @@
===============
Ceph Resiliency
===============
.. toctree::
:maxdepth: 2
README
monitor-failure
osd-failure
disk-failure
host-failure

View File

@ -0,0 +1,125 @@
===============
Monitor Failure
===============
Test Environment
================
- Cluster size: 4 host machines
- Number of disks: 24 (= 6 disks per host * 4 hosts)
- Kubernetes version: 1.9.3
- Ceph version: 12.2.3
- OpenStack-Helm commit: 28734352741bae228a4ea4f40bcacc33764221eb
We have 3 Monitors in this Ceph cluster, one on each of the 3 Monitor
hosts.
Case: 1 out of 3 Monitor Processes is Down
==========================================
This is to test a scenario when 1 out of 3 Monitor processes is down.
To bring down 1 Monitor process (out of 3), we identify a Monitor
process and kill it from the monitor host (not a pod).
.. code-block:: console
$ ps -ef | grep ceph-mon
ceph 16112 16095 1 14:58 ? 00:00:03 /usr/bin/ceph-mon --cluster ceph --setuser ceph --setgroup ceph -d -i voyager2 --mon-data /var/lib/ceph/mon/ceph-voyager2 --public-addr 135.207.240.42:6789
$ sudo kill -9 16112
In the mean time, we monitored the status of Ceph and noted that it
takes about 24 seconds for the killed Monitor process to recover from
``down`` to ``up``. The reason is that Kubernetes automatically
restarts pods whenever they are killed.
.. code-block:: console
(mon-pod):/# ceph -s
cluster:
id: fd366aef-b356-4fe7-9ca5-1c313fe2e324
health: HEALTH_WARN
mon voyager1 is low on available space
1/3 mons down, quorum voyager1,voyager3
services:
mon: 3 daemons, quorum voyager1,voyager3, out of quorum: voyager2
mgr: voyager4(active)
osd: 24 osds: 24 up, 24 in
.. code-block:: console
(mon-pod):/# ceph -s
cluster:
id: fd366aef-b356-4fe7-9ca5-1c313fe2e324
health: HEALTH_WARN
mon voyager1 is low on available space
1/3 mons down, quorum voyager1,voyager2
services:
mon: 3 daemons, quorum voyager1,voyager2,voyager3
mgr: voyager4(active)
osd: 24 osds: 24 up, 24 in
We also monitored the status of the Monitor pod through ``kubectl get
pods -n ceph``, and the status of the pod (where a Monitor process is
killed) changed as follows: ``Running`` -> ``Error`` -> ``Running``
and this recovery process takes about 24 seconds.
Case: 2 out of 3 Monitor Processes are Down
===========================================
This is to test a scenario when 2 out of 3 Monitor processes are down.
To bring down 2 Monitor processes (out of 3), we identify two Monitor
processes and kill them from the 2 monitor hosts (not a pod).
We monitored the status of Ceph when the Monitor processes are killed
and noted that the symptoms are similar to when 1 Monitor process is
killed:
- It takes longer (about 1 minute) for the killed Monitor processes to
recover from ``down`` to ``up``.
- The status of the pods (where the two Monitor processes are killed)
changed as follows: ``Running`` -> ``Error`` -> ``CrashLoopBackOff``
-> ``Running`` and this recovery process takes about 1 minute.
Case: 3 out of 3 Monitor Processes are Down
===========================================
This is to test a scenario when 3 out of 3 Monitor processes are down.
To bring down 3 Monitor processes (out of 3), we identify all 3
Monitor processes and kill them from the 3 monitor hosts (not pods).
We monitored the status of Ceph Monitor pods and noted that the
symptoms are similar to when 1 or 2 Monitor processes are killed:
.. code-block:: console
$ kubectl get pods -n ceph -o wide | grep ceph-mon
NAME READY STATUS RESTARTS AGE
ceph-mon-8tml7 0/1 Error 4 10d
ceph-mon-kstf8 0/1 Error 4 10d
ceph-mon-z4sl9 0/1 Error 7 10d
.. code-block:: console
$ kubectl get pods -n ceph -o wide | grep ceph-mon
NAME READY STATUS RESTARTS AGE
ceph-mon-8tml7 0/1 CrashLoopBackOff 4 10d
ceph-mon-kstf8 0/1 Error 4 10d
ceph-mon-z4sl9 0/1 CrashLoopBackOff 7 10d
.. code-block:: console
$ kubectl get pods -n ceph -o wide | grep ceph-mon
NAME READY STATUS RESTARTS AGE
ceph-mon-8tml7 1/1 Running 5 10d
ceph-mon-kstf8 1/1 Running 5 10d
ceph-mon-z4sl9 1/1 Running 8 10d
The status of the pods (where the three Monitor processes are killed)
changed as follows: ``Running`` -> ``Error`` -> ``CrashLoopBackOff``
-> ``Running`` and this recovery process takes about 1 minute.

View File

@ -0,0 +1,107 @@
===========
OSD Failure
===========
Test Environment
================
- Cluster size: 4 host machines
- Number of disks: 24 (= 6 disks per host * 4 hosts)
- Kubernetes version: 1.9.3
- Ceph version: 12.2.3
- OpenStack-Helm commit: 28734352741bae228a4ea4f40bcacc33764221eb
Case: OSD processes are killed
==============================
This is to test a scenario when some of the OSDs are down.
To bring down 6 OSDs (out of 24), we identify the OSD processes and
kill them from a storage host (not a pod).
.. code-block:: console
$ ps -ef|grep /usr/bin/ceph-osd
ceph 44587 43680 1 18:12 ? 00:00:01 /usr/bin/ceph-osd --cluster ceph --osd-journal /dev/sdb5 -f -i 4 --setuser ceph --setgroup disk
ceph 44627 43744 1 18:12 ? 00:00:01 /usr/bin/ceph-osd --cluster ceph --osd-journal /dev/sdb2 -f -i 6 --setuser ceph --setgroup disk
ceph 44720 43927 2 18:12 ? 00:00:01 /usr/bin/ceph-osd --cluster ceph --osd-journal /dev/sdb6 -f -i 3 --setuser ceph --setgroup disk
ceph 44735 43868 1 18:12 ? 00:00:01 /usr/bin/ceph-osd --cluster ceph --osd-journal /dev/sdb1 -f -i 9 --setuser ceph --setgroup disk
ceph 44806 43855 1 18:12 ? 00:00:01 /usr/bin/ceph-osd --cluster ceph --osd-journal /dev/sdb4 -f -i 0 --setuser ceph --setgroup disk
ceph 44896 44011 2 18:12 ? 00:00:01 /usr/bin/ceph-osd --cluster ceph --osd-journal /dev/sdb3 -f -i 1 --setuser ceph --setgroup disk
root 46144 45998 0 18:13 pts/10 00:00:00 grep --color=auto /usr/bin/ceph-osd
$ sudo kill -9 44587 44627 44720 44735 44806 44896
.. code-block:: console
(mon-pod):/# ceph -s
cluster:
id: fd366aef-b356-4fe7-9ca5-1c313fe2e324
health: HEALTH_WARN
6 osds down
1 host (6 osds) down
Reduced data availability: 8 pgs inactive, 58 pgs peering
Degraded data redundancy: 141/1002 objects degraded (14.072%), 133 pgs degraded
mon voyager1 is low on available space
services:
mon: 3 daemons, quorum voyager1,voyager2,voyager3
mgr: voyager4(active)
osd: 24 osds: 18 up, 24 in
In the mean time, we monitor the status of Ceph and noted that it takes about 30 seconds for the 6 OSDs to recover from ``down`` to ``up``.
The reason is that Kubernetes automatically restarts OSD pods whenever they are killed.
.. code-block:: console
(mon-pod):/# ceph -s
cluster:
id: fd366aef-b356-4fe7-9ca5-1c313fe2e324
health: HEALTH_WARN
mon voyager1 is low on available space
services:
mon: 3 daemons, quorum voyager1,voyager2,voyager3
mgr: voyager4(active)
osd: 24 osds: 24 up, 24 in
Case: A OSD pod is deleted
==========================
This is to test a scenario when an OSD pod is deleted by ``kubectl delete $OSD_POD_NAME``.
Meanwhile, we monitor the status of Ceph and noted that it takes about 90 seconds for the OSD running in deleted pod to recover from ``down`` to ``up``.
.. code-block:: console
root@voyager3:/# ceph -s
cluster:
id: fd366aef-b356-4fe7-9ca5-1c313fe2e324
health: HEALTH_WARN
1 osds down
Degraded data redundancy: 43/945 objects degraded (4.550%), 35 pgs degraded, 109 pgs undersized
mon voyager1 is low on available space
services:
mon: 3 daemons, quorum voyager1,voyager2,voyager3
mgr: voyager4(active)
osd: 24 osds: 23 up, 24 in
.. code-block:: console
(mon-pod):/# ceph -s
cluster:
id: fd366aef-b356-4fe7-9ca5-1c313fe2e324
health: HEALTH_WARN
mon voyager1 is low on available space
services:
mon: 3 daemons, quorum voyager1,voyager2,voyager3
mgr: voyager4(active)
osd: 24 osds: 24 up, 24 in
We also monitored the pod status through ``kubectl get pods -n ceph``
during this process. The deleted OSD pod status changed as follows:
``Terminating`` -> ``Init:1/3`` -> ``Init:2/3`` -> ``Init:3/3`` ->
``Running``, and this process taks about 90 seconds. The reason is
that Kubernetes automatically restarts OSD pods whenever they are
deleted.

View File

@ -1,9 +1,6 @@
=======
Testing
=======
==========
Helm Tests
----------
==========
Every OpenStack-Helm chart should include any required Helm tests necessary to
provide a sanity check for the OpenStack service. Information on using the Helm
@ -27,7 +24,6 @@ chart. If Rally tests are not appropriate or adequate for a service chart, any
additional tests should be documented appropriately and adhere to the same
expectations.
Running Tests
-------------

View File

@ -0,0 +1,9 @@
=======
Testing
=======
.. toctree::
:maxdepth: 2
helm-tests
ceph-resiliency/index

View File

@ -0,0 +1,59 @@
Backing up a PVC
^^^^^^^^^^^^^^^^
Backing up a PVC stored in Ceph, is fairly straigthforward, in this example we
use the PVC ``mysql-data-mariadb-server-0`` as an example, but this will also
apply to any other services using PVCs eg. RabbitMQ, Postgres.
.. code-block:: shell
# get all required details
NS_NAME="openstack"
PVC_NAME="mysql-data-mariadb-server-0"
# you can check this by running kubectl get pvc -n ${NS_NAME}
PV_NAME="$(kubectl get -n ${NS_NAME} pvc "${PVC_NAME}" --no-headers | awk '{ print $3 }')"
RBD_NAME="$(kubectl get pv "${PV_NAME}" -o json | jq -r '.spec.rbd.image')"
MON_POD=$(kubectl get pods \
--namespace=ceph \
--selector="application=ceph" \
--selector="component=mon" \
--no-headers | awk '{ print $1; exit }')
# copy admin keyring from ceph mon to host node
kubectl exec -it ${MON_POD} -n ceph -- cat /etc/ceph/ceph.client.admin.keyring > /etc/ceph/ceph.client.admin.keyring
sudo kubectl get cm -n ceph ceph-etc -o json|jq -j .data[] > /etc/ceph/ceph.conf
export CEPH_MON_NAME="ceph-mon-discovery.ceph.svc.cluster.local"
# create snapshot and export to a file
rbd snap create rbd/${RBD_NAME}@snap1 -m ${CEPH_MON_NAME}
rbd snap list rbd/${RBD_NAME} -m ${CEPH_MON_NAME}
# Export the snapshot and compress , make sure we have enough space on host to accomidate big files that we are working .
# a. if we have enough space on host
rbd export rbd/${RBD_NAME}@snap1 /backup/${RBD_NAME}.img -m ${CEPH_MON_NAME}
cd /backup
time xz -0vk --threads=0 /backup/${RBD_NAME}.img
# b. if we have less space on host we can directly export and compress in single command
rbd export rbd/${RBD_NAME}@snap1 -m ${CEPH_MON_NAME} - | xz -0v --threads=0 > /backup/${RBD_NAME}.img.xz
Restoring is just as straightforward. Once the workload consuming the device has
been stopped, and the raw RBD device removed the following will import the
back up and create a device:
.. code-block:: shell
cd /backup
unxz -k ${RBD_NAME}.img.xz
rbd import /backup/${RBD_NAME}.img rbd/${RBD_NAME} -m ${CEPH_MON_NAME}
Once this has been done the workload can be restarted.

View File

@ -11,6 +11,7 @@ Sometimes things go wrong. These guides will help you solve many common issues w
persistent-storage
proxy
ubuntu-hwe-kernel
ceph
Getting help
============