Add pause/resume cluster health actions

Add actions to pause and resume cluster health monitoring within ceph for all osd devices. This will ensure that no rebalancing is done whilst maintenance actions are happening within the cluster.
2016-02-18 11:02:17 +00:00 · 2016-02-18 11:02:17 +00:00 · f16e3fac52
parent a0ffb8bf97 7709b7d538
commit f16e3fac52
6 changed files with 49 additions and 4 deletions
--- a/README.md
+++ b/README.md
@ -9,15 +9,15 @@ juju
 # Usage

 The ceph charm has two pieces of mandatory configuration for which no defaults
-are provided. You _must_ set these configuration options before deployment or the charm will not work: 
+are provided. You _must_ set these configuration options before deployment or the charm will not work:

    fsid:
        uuid specific to a ceph cluster used to ensure that different
        clusters don't get mixed up - use `uuid` to generate one.

-    monitor-secret: 
+    monitor-secret:
        a ceph generated key used by the daemons that manage to cluster
-        to control security.  You can use the ceph-authtool command to 
+        to control security.  You can use the ceph-authtool command to
        generate one:

            ceph-authtool /dev/stdout --name=mon. --gen-key
@ -45,7 +45,7 @@ At a minimum you must provide a juju config file during initial deployment
 with the fsid and monitor-secret options (contents of cepy.yaml below):

    ceph:
-        fsid: ecbb8960-0e21-11e2-b495-83a88f44db01 
+        fsid: ecbb8960-0e21-11e2-b495-83a88f44db01
        monitor-secret: AQD1P2xQiKglDhAA4NGUF5j38Mhq56qwz+45wg==
        osd-devices: /dev/vdb /dev/vdc /dev/vdd /dev/vde

@ -59,6 +59,12 @@ By default the ceph cluster will not bootstrap until 3 service units have been
 deployed and started; this is to ensure that a quorum is achieved prior to adding
 storage devices.

+## Actions
+
+This charm supports pausing and resuming ceph's health functions on a cluster, for example when doing maintainance on a machine. to pause or resume, call:
+
+`juju action do --unit ceph/0 pause-health` or `juju action do --unit ceph/0 resume-health`
+
 ## Scale Out Usage

 You can use the Ceph OSD and Ceph Radosgw charms:
--- a/actions.yaml
+++ b/actions.yaml
@ -0,0 +1,4 @@
+pause-health:
+  description: Pause ceph health operations across the entire ceph cluster
+resume-health:
+  description: Resume ceph health operations across the entire ceph cluster
--- a/actions/pause-health
+++ b/actions/pause-health
@ -0,0 +1,6 @@
+#!/bin/bash
+
+set -eux
+
+ceph osd set nodown
+ceph osd set noout
--- a/actions/resume-health
+++ b/actions/resume-health
@ -0,0 +1,6 @@
+#!/bin/bash
+
+set -eux
+
+ceph osd unset nodown
+ceph osd unset noout
--- a/tests/basic_deployment.py
+++ b/tests/basic_deployment.py
@ -2,6 +2,7 @@

 import amulet
 import time
+
 from charmhelpers.contrib.openstack.amulet.deployment import (
    OpenStackAmuletDeployment
 )
@ -440,6 +441,27 @@ class CephBasicDeployment(OpenStackAmuletDeployment):
            u.log.debug('Pool list on all ceph units produced the '
                        'same results (OK).')

+    def test_402_pause_resume_actions(self):
+        """Veryfy that pause/resume works"""
+        u.log.debug("Testing pause")
+        cmd = "ceph -s"
+
+        sentry_unit = self.ceph0_sentry
+        action_id = u.run_action(sentry_unit, 'pause-health')
+        assert u.wait_on_action(action_id), "Pause health action failed."
+
+        output, code = sentry_unit.run(cmd)
+        if 'nodown' not in output or 'noout' not in output:
+            amulet.raise_status(amulet.FAIL, msg="Missing noout,nodown")
+
+        u.log.debug("Testing resume")
+        action_id = u.run_action(sentry_unit, 'resume-health')
+        assert u.wait_on_action(action_id), "Resume health action failed."
+
+        output, code = sentry_unit.run(cmd)
+        if 'nodown' in output or 'noout' in output:
+            amulet.raise_status(amulet.FAIL, msg="Still has noout,nodown")
+
    def test_410_ceph_cinder_vol_create(self):
        """Create and confirm a ceph-backed cinder volume, and inspect
        ceph cinder pool object count as the volume is created
--- a/tests/tests.yaml
+++ b/tests/tests.yaml
@ -19,3 +19,4 @@ packages:
    - python-novaclient
    - python-pika
    - python-swiftclient
+    - python-nose