Add pause/resume cluster health actions

Add actions to pause and resume cluster health monitoring within ceph for all osd devices.

This will ensure that no rebalancing is done whilst maintenance actions are happening within the cluster.
This commit is contained in:
James Page 2016-02-18 11:02:17 +00:00
commit f16e3fac52
6 changed files with 49 additions and 4 deletions

View File

@ -9,15 +9,15 @@ juju
# Usage
The ceph charm has two pieces of mandatory configuration for which no defaults
are provided. You _must_ set these configuration options before deployment or the charm will not work:
are provided. You _must_ set these configuration options before deployment or the charm will not work:
fsid:
uuid specific to a ceph cluster used to ensure that different
clusters don't get mixed up - use `uuid` to generate one.
monitor-secret:
monitor-secret:
a ceph generated key used by the daemons that manage to cluster
to control security. You can use the ceph-authtool command to
to control security. You can use the ceph-authtool command to
generate one:
ceph-authtool /dev/stdout --name=mon. --gen-key
@ -45,7 +45,7 @@ At a minimum you must provide a juju config file during initial deployment
with the fsid and monitor-secret options (contents of cepy.yaml below):
ceph:
fsid: ecbb8960-0e21-11e2-b495-83a88f44db01
fsid: ecbb8960-0e21-11e2-b495-83a88f44db01
monitor-secret: AQD1P2xQiKglDhAA4NGUF5j38Mhq56qwz+45wg==
osd-devices: /dev/vdb /dev/vdc /dev/vdd /dev/vde
@ -59,6 +59,12 @@ By default the ceph cluster will not bootstrap until 3 service units have been
deployed and started; this is to ensure that a quorum is achieved prior to adding
storage devices.
## Actions
This charm supports pausing and resuming ceph's health functions on a cluster, for example when doing maintainance on a machine. to pause or resume, call:
`juju action do --unit ceph/0 pause-health` or `juju action do --unit ceph/0 resume-health`
## Scale Out Usage
You can use the Ceph OSD and Ceph Radosgw charms:

4
actions.yaml Normal file
View File

@ -0,0 +1,4 @@
pause-health:
description: Pause ceph health operations across the entire ceph cluster
resume-health:
description: Resume ceph health operations across the entire ceph cluster

6
actions/pause-health Executable file
View File

@ -0,0 +1,6 @@
#!/bin/bash
set -eux
ceph osd set nodown
ceph osd set noout

6
actions/resume-health Executable file
View File

@ -0,0 +1,6 @@
#!/bin/bash
set -eux
ceph osd unset nodown
ceph osd unset noout

View File

@ -2,6 +2,7 @@
import amulet
import time
from charmhelpers.contrib.openstack.amulet.deployment import (
OpenStackAmuletDeployment
)
@ -440,6 +441,27 @@ class CephBasicDeployment(OpenStackAmuletDeployment):
u.log.debug('Pool list on all ceph units produced the '
'same results (OK).')
def test_402_pause_resume_actions(self):
"""Veryfy that pause/resume works"""
u.log.debug("Testing pause")
cmd = "ceph -s"
sentry_unit = self.ceph0_sentry
action_id = u.run_action(sentry_unit, 'pause-health')
assert u.wait_on_action(action_id), "Pause health action failed."
output, code = sentry_unit.run(cmd)
if 'nodown' not in output or 'noout' not in output:
amulet.raise_status(amulet.FAIL, msg="Missing noout,nodown")
u.log.debug("Testing resume")
action_id = u.run_action(sentry_unit, 'resume-health')
assert u.wait_on_action(action_id), "Resume health action failed."
output, code = sentry_unit.run(cmd)
if 'nodown' in output or 'noout' in output:
amulet.raise_status(amulet.FAIL, msg="Still has noout,nodown")
def test_410_ceph_cinder_vol_create(self):
"""Create and confirm a ceph-backed cinder volume, and inspect
ceph cinder pool object count as the volume is created

View File

@ -19,3 +19,4 @@ packages:
- python-novaclient
- python-pika
- python-swiftclient
- python-nose