Add Support For Erase-Coded Pools

Add support for erasure coded pools. 1) Pool name of replicated and EC pools can now be set via pool-name config option. 2) Weight of replicated and EC pools can now be set via ceph-pool-weight config option. 3) Charm no longer uses initialize_mds from the ceph-mds interface. This moves the charm inline with ceph-client charms where the charm explicitly creates the pools they need. 4) Metadata pool name format is preserved with an underscore rather than a hyphen. Change-Id: I97641c6daeeb2a1a65b081201772c89f6a7f539c
2020-08-18 08:51:02 +00:00 · 2020-08-18 08:51:02 +00:00 · f347a37d69
parent 25317a00cd
commit f347a37d69
2 changed files with 213 additions and 3 deletions
--- a/src/config.yaml
+++ b/src/config.yaml
@ -47,3 +47,131 @@ options:
      order for this charm to function correctly, the privacy extension must be
      disabled and a non-temporary address must be configured/available on
      your network interface.
+  ceph-osd-replication-count:
+    type: int
+    default: 3
+    description: |
+      This value dictates the number of replicas ceph must make of any
+      object it stores within the images rbd pool. Of course, this only
+      applies if using Ceph as a backend store. Note that once the images
+      rbd pool has been created, changing this value will not have any
+      effect (although it can be changed in ceph by manually configuring
+      your ceph cluster).
+  ceph-pool-weight:
+    type: int
+    default: 5
+    description: |
+      Defines a relative weighting of the pool as a percentage of the total
+      amount of data in the Ceph cluster. This effectively weights the number
+      of placement groups for the pool created to be appropriately portioned
+      to the amount of data expected. For example, if the compute images
+      for the OpenStack compute instances are expected to take up 20% of the
+      overall configuration then this value would be specified as 20. Note -
+      it is important to choose an appropriate value for the pool weight as
+      this directly affects the number of placement groups which will be
+      created for the pool. The number of placement groups for a pool can
+      only be increased, never decreased - so it is important to identify the
+      percent of data that will likely reside in the pool.
+  rbd-pool-name:
+    default:
+    type: string
+    description: |
+      Optionally specify an existing rbd pool that cinder should map to.
+  pool-type:
+    type: string
+    default: replicated
+    description: |
+      Ceph pool type to use for storage - valid values include ‘replicated’
+      and ‘erasure-coded’.
+  ec-profile-name:
+    type: string
+    default:
+    description: |
+      Name for the EC profile to be created for the EC pools. If not defined
+      a profile name will be generated based on the name of the pool used by
+      the application.
+  ec-rbd-metadata-pool:
+    type: string
+    default:
+    description: |
+      Name of the metadata pool to be created (for RBD use-cases). If not
+      defined a metadata pool name will be generated based on the name of
+      the data pool used by the application.  The metadata pool is always
+      replicated, not erasure coded.
+  ec-profile-k:
+    type: int
+    default: 1
+    description: |
+      Number of data chunks that will be used for EC data pool. K+M factors
+      should never be greater than the number of available zones (or hosts)
+      for balancing.
+  ec-profile-m:
+    type: int
+    default: 2
+    description: |
+      Number of coding chunks that will be used for EC data pool. K+M factors
+      should never be greater than the number of available zones (or hosts)
+      for balancing.
+  ec-profile-locality:
+    type: int
+    default:
+    description: |
+      (lrc plugin - l) Group the coding and data chunks into sets of size l.
+      For instance, for k=4 and m=2, when l=3 two groups of three are created.
+      Each set can be recovered without reading chunks from another set. Note
+      that using the lrc plugin does incur more raw storage usage than isa or
+      jerasure in order to reduce the cost of recovery operations.
+  ec-profile-crush-locality:
+    type: string
+    default:
+    description: |
+      (lrc plugin) The type of the crush bucket in which each set of chunks
+      defined by l will be stored. For instance, if it is set to rack, each
+      group of l chunks will be placed in a different rack. It is used to
+      create a CRUSH rule step such as step choose rack. If it is not set,
+      no such grouping is done.
+  ec-profile-durability-estimator:
+    type: int
+    default:
+    description: |
+      (shec plugin - c) The number of parity chunks each of which includes
+      each data chunk in its calculation range. The number is used as a
+      durability estimator. For instance, if c=2, 2 OSDs can be down
+      without losing data.
+  ec-profile-helper-chunks:
+    type: int
+    default:
+    description: |
+      (clay plugin - d) Number of OSDs requested to send data during
+      recovery of a single chunk. d needs to be chosen such that
+      k+1 <= d <= k+m-1. Larger the d, the better the savings.
+  ec-profile-scalar-mds:
+    type: string
+    default:
+    description: |
+      (clay plugin) specifies the plugin that is used as a building
+      block in the layered construction. It can be one of jerasure,
+      isa, shec (defaults to jerasure).
+  ec-profile-plugin:
+    type: string
+    default: jerasure
+    description: |
+      EC plugin to use for this applications pool. The following list of
+      plugins acceptable - jerasure, lrc, isa, shec, clay.
+  ec-profile-technique:
+    type: string
+    default:
+    description: |
+      EC profile technique used for this applications pool - will be
+      validated based on the plugin configured via ec-profile-plugin.
+      Supported techniques are ‘reed_sol_van’, ‘reed_sol_r6_op’,
+      ‘cauchy_orig’, ‘cauchy_good’, ‘liber8tion’ for jerasure,
+      ‘reed_sol_van’, ‘cauchy’ for isa and ‘single’, ‘multiple’
+      for shec.
+  ec-profile-device-class:
+    type: string
+    default:
+    description: |
+      Device class from CRUSH map to use for placement groups for
+      erasure profile - valid values: ssd, hdd or nvme (or leave
+      unset to not use a device class).
--- a/src/reactive/ceph_fs.py
+++ b/src/reactive/ceph_fs.py
@ -13,7 +13,9 @@
 # limitations under the License.

 from charms import reactive
-from charmhelpers.core import hookenv
+from charmhelpers.core.hookenv import (
+    service_name,
+    config)

 import charms_openstack.bus
 import charms_openstack.charm as charm
@ -51,6 +53,86 @@ def config_changed():
@reactive.when_not('ceph.create_pool.req.sent')
@reactive.when('ceph-mds.connected')
 def storage_ceph_connected(ceph):
-    ceph.announce_mds_name()
-    ceph.initialize_mds(hookenv.service_name())
+    ceph_mds = reactive.endpoint_from_flag('ceph-mds.connected')
+    ceph_mds.announce_mds_name()
+    service = service_name()
+    weight = config('ceph-pool-weight')
+    replicas = config('ceph-osd-replication-count')
+
+    if config('rbd-pool-name'):
+        pool_name = config('rbd-pool-name')
+    else:
+        pool_name = "{}_data".format(service)
+
+    # The '_' rather than '-' in the default pool name
+    # maintains consistency with previous versions of the
+    # charm but is inconsistent with ceph-client charms.
+    metadata_pool_name = (
+        config('metadata-pool') or
+        "{}_metadata".format(service)
+    )
+    # Metadata sizing is approximately 20% of overall data weight
+    # https://ceph.io/planet/cephfs-ideal-pg-ratio-between-metadata-and-data-pools/
+    metadata_weight = weight * 0.20
+    # Resize data pool weight to accomodate metadata weight
+    weight = weight - metadata_weight
+
+    if config('pool-type') == 'erasure-coded':
+        # General EC plugin config
+        plugin = config('ec-profile-plugin')
+        technique = config('ec-profile-technique')
+        device_class = config('ec-profile-device-class')
+        bdm_k = config('ec-profile-k')
+        bdm_m = config('ec-profile-m')
+        # LRC plugin config
+        bdm_l = config('ec-profile-locality')
+        crush_locality = config('ec-profile-crush-locality')
+        # SHEC plugin config
+        bdm_c = config('ec-profile-durability-estimator')
+        # CLAY plugin config
+        bdm_d = config('ec-profile-helper-chunks')
+        scalar_mds = config('ec-profile-scalar-mds')
+        # Profile name
+        profile_name = (
+            config('ec-profile-name') or "{}-profile".format(service)
+        )
+        # Create erasure profile
+        ceph_mds.create_erasure_profile(
+            name=profile_name,
+            k=bdm_k, m=bdm_m,
+            lrc_locality=bdm_l,
+            lrc_crush_locality=crush_locality,
+            shec_durability_estimator=bdm_c,
+            clay_helper_chunks=bdm_d,
+            clay_scalar_mds=scalar_mds,
+            device_class=device_class,
+            erasure_type=plugin,
+            erasure_technique=technique
+        )
+
+        # Create EC data pool
+        ceph_mds.create_erasure_pool(
+            name=pool_name,
+            erasure_profile=profile_name,
+            weight=weight,
+            app_name=ceph_mds.ceph_pool_app_name,
+            allow_ec_overwrites=True
+        )
+        ceph_mds.create_replicated_pool(
+            name=metadata_pool_name,
+            weight=metadata_weight,
+            app_name=ceph_mds.ceph_pool_app_name
+        )
+    else:
+        ceph_mds.create_replicated_pool(
+            name=pool_name,
+            replicas=replicas,
+            weight=weight,
+            app_name=ceph_mds.ceph_pool_app_name)
+        ceph_mds.create_replicated_pool(
+            name=metadata_pool_name,
+            replicas=replicas,
+            weight=metadata_weight,
+            app_name=ceph_mds.ceph_pool_app_name)
+    ceph_mds.request_cephfs(service)
    reactive.set_state('ceph.create_pool.req.sent')